A walk through one orchestration run in set-core — what a spec turns into between “I run the planner” and “the change is merged into main”. Author: setcode.dev.
Bottom line
An orchestration run is five named stages: decompose, dispatch, verify, merge, archive. Every transition is a typed JSON event in a journal you can replay. The planner runs as one LLM call by default; very large specs route to a three-phase pipeline. Each change runs in its own git worktree. Gates produce structured pass/fail outputs that the verify-retry loop and the merge queue both consume. The merge queue is serial on purpose.
The shape of the run is what makes parallel agents survivable. The shape is also what this article describes — not a particular model, not a particular project type.
TL;DR
- One spec → planner → N change records. For typical specs the planner is one LLM call (
decompose_brief); very large specs route to a three-phase pipeline (decompose_brief→decompose_domainper domain →decompose_merge). - Each change gets its own git worktree (
git worktree add …). N agents can run at the same time, bounded bymax_parallel(default 3). - The universal gate stack for a
featurechange isbuild,test,scope_check,test_files,review,spec_verify,rules. Project profiles register extras — for the web profile:e2e,lint,design-fidelity,i18n_check,required-components. Most gates are exit-code based;reviewandspec_verifyare graded by a reviewer LLM. - Merges are serial. The queue integrates fresh
maininto the change branch and runs the integration gate stack before each fast-forward. - Every state transition is a JSONL event (
DISPATCH,LLM_CALL,VERIFY_GATE,STATE_CHANGE,CHANGE_INTEGRATION_FAILED, …). The event log is the source of truth for what happened.
decompose_brief); very large specs route to a three-phase pipeline (1 + N + 1 calls, dashed sub-boxes). Dispatch fans the changes out into N git worktrees, each with its own agent and verify gate stack. Verified changes converge into a serial merge queue with an integration gate against fresh main. Archive syncs the change’s delta specs into the project’s main OpenSpec tree.
The five stages
spec.md ─► decompose ─► dispatch ─► verify ─► merge ─► archive
(planner) (worktree) (gates) (queue) (openspec)
Each stage has one job and a clear handoff. The interesting choice is that the handoff is always state plus event, not a function return value. A change moves from one stage to the next by changing its status in orchestration-state.json and emitting a typed event into the JSONL log. Anything watching the run — the dashboard, the supervisor, a re-run after a crash — reads the same two surfaces.
This sounds like an implementation detail, but it is the core constraint: the run can be paused, resumed, replayed, and inspected from outside, because nothing important lives only inside a Python function call.
Stage 1 — Decompose: one call, or three
The planner is in lib/set_orch/planner.py. It reads a spec (plus the project’s existing OpenSpec specs, conventions file, and any in-flight changes) and produces a structured plan: an ordered list of changes, with dependencies and roles.
For a typical spec the planner does this in one LLM call (decompose_brief). The brief alone is the plan: domain priorities, resource ownership, cross-cutting work, a phasing strategy, and the leaf change list. That output is what the rest of the run consumes.
For specs whose estimated input would blow past a token threshold, the planner switches to a three-phase pipeline:
decompose_brief— same as above, one call. Output: the JSON brief.decompose_domain— input: one domain’s summary, its requirements, the brief from Phase 1, and the test plan. Output: that domain’s list of changes. One call per domain.decompose_merge— input: all domain plans concatenated, plus the brief and the dependency map. Output: a single unified plan with a topological order, deduplicated cross-cutting work, and final change names. One call.
Total LLM calls in the three-phase mode: 1 + N + 1, where N is the number of domains. The routing decision is in _resolve_planner_strategy — serial and parallel are the two endpoints, auto (the default) chooses based on a token estimate against planner.single_call_max_input_tokens.
Why two modes. A single call is cheaper, more cache-friendly, and avoids the orchestration overhead of the multi-phase fan-out. It works as long as the spec fits comfortably into one prompt with all its context. Past that size, the model starts to struggle with global consistency — picking a phasing strategy late, dropping cross-cutting concerns, returning a poorly ordered plan. The three-phase pipeline is a deliberate trade: pay for more LLM round-trips to keep the per-call inputs small enough that the model can reliably commit to the structure.
The output of the planner — whichever mode — lands in orchestration-plan.json with a list of named changes. The directory under openspec/changes/<name>/ is materialized later, by the agent itself, during dispatch.
Stage 2 — Dispatch: a worktree per change
Once the plan exists, the dispatcher (lib/set_orch/dispatcher.py) walks it. For each change whose dependencies are met:
- Lock the change record. Inside an atomic state lock, the dispatcher flips the change’s status from
pendingtodispatched. Without the lock, two dispatcher polls could pick the same change up. - Create a worktree.
git worktree add <path> <branch>gives the change a separate working tree on a separate branch. Same.gitdirectory, different files on disk. - Launch an agent inside it. The dispatcher starts a
set-loop(Ralph) process in the worktree — a retry loop wrapping the Claude CLI. The change scope, the OpenSpec roadmap item, and (for projects that have one) a design context are passed in as the iteration’s inputs. - Emit
DISPATCHandAGENT_SESSION_DECISION. The journal records the change scope and the session decision (fresh start vs. re-attach to an existing session ID).
Worktrees are the isolation primitive. Ten agents writing to ten worktrees do not collide on the filesystem and do not touch each other’s branches. The .git directory is shared, but git itself is the synchronisation point — a git fetch from one worktree is visible in all of them. When an agent finishes, its worktree gets torn down (or kept around for re-dispatch on a verify retry, which is cheaper than starting from scratch).
Parallelism is bounded by one knob: max_parallel. The dispatcher counts in-flight changes and stops launching new ones when the cap is hit. Topological order is enforced separately — a change does not get dispatched until everything in depends_on is in a terminal state.
There is no “pool of workers.” There are N agents, N worktrees, N branches. When an agent crashes, only its branch is in a weird state.
Stage 3 — Verify: gates produce pass/fail, not opinions
After the agent reports done, the change goes into verify (lib/set_orch/verifier.py, lib/set_orch/gate_profiles.py). A gate stack runs against the change’s worktree.
The universal stack, by change type, is in gate_profiles.py:UNIVERSAL_DEFAULTS. For a feature change it is:
| Gate | Mode | What it checks |
|---|---|---|
build |
run | The project builds (compile / bundler exit code). |
test |
run | Unit tests pass. |
scope_check |
run | The change branch contains implementation code — i.e., the diff against merge-base is more than just OpenSpec proposal / task artifacts. |
test_files |
run | The change ships test files commensurate with the requirements it covers. |
review |
run | A reviewer LLM reads the diff against the spec and signs off, with a structured rubric. |
spec_verify |
run | The implementation actually satisfies each requirement in the change’s delta spec. |
rules |
warn | Project rules (the .claude/rules/ set) are respected. |
Each gate’s mode is one of run (blocking on failure), warn / soft (non-blocking), or skip. Profile plugins register additional gates. The web project profile, for example, adds e2e (Playwright suite), lint (linter), design-fidelity (Playwright + pixel diff against a v0.app reference), i18n_check, and required-components (presence of expected shadcn primitives). Profiles for other project types (voice agents, mobile, backend) register their own.
Most gates are deterministic — they call out to a shell command and read the exit code. Two are LLM-graded: review and spec_verify. The distinction matters: a deterministic gate’s “pass” means a process produced the expected exit. An LLM-graded gate’s “pass” means a separate model said yes against a structured prompt. Both produce the same pass/fail/warn output that the next stage can act on.
When a gate fails, the verifier emits a VERIFY_GATE event with the failure detail and builds a retry_context payload — the structured artefact the agent needs to fix the failure: a stack trace, a failing requirement ID, a missing test file, a pixel diff. The agent re-runs with that context. Retries are bounded by a per-change retry budget; when the budget is exhausted, the change transitions to one of the terminal failed states (failed:retry_budget_exhausted and friends) and stops blocking the queue.
The point of structured gate output is that the agent does not have to introspect why it’s stuck. The gate output is the failure description, in machine-readable form. The next iteration’s prompt has the retry_context baked in.
Stage 4 — Merge: serial, with an integration gate
Verified changes go into the merge queue. The merger (lib/set_orch/merger.py, execute_merge_queue) drains it one change at a time. Serially. On purpose.
Per change:
- Dependency check. Every entry in
depends_onmust be in a terminal merged state. If not, the change is parked asdep-blockedand the queue moves on. - Integrate fresh main. The merger updates the change’s branch with the current
main. This is where two merges that were independently green can collide. If the integration fails (build break, test regression caused by interaction with a sibling change merged a minute ago), the change is markedintegration-failedand goes back to verify with the integration’s diagnostic asretry_context. - Integration gate stack. The merged-state branch runs
build,test, and (where the profile registers it)e2e. The web profile’s integratione2estep has an optional smoke sub-phase that runs first against inherited sibling specs — a fast fail signal if a previous merge already broke the suite. If integration passes, the merger continues; if not, same path as #2. - Fast-forward into main.
git merge --ff-only. No merge commits. By the time the merge runs, the branch already contains the latestmain, so a fast-forward is always possible — and the absence of merge commits keeps the history ofmainlinear. - Emit
STATE_CHANGEwith statusmerged.
Why serial. Two parallel merges that both pass their integration gate against the same main can still produce a broken main after both land. Serialising the queue means each merge sees the previous one’s actual result on main before its integration gate runs. The cost is throughput at the merge step. The benefit is that main is, by construction, a state every gate has signed off on against the actual neighbours.
The work that takes time — the agent’s implementation, the verify stack — has already happened in parallel. The merge step is what the queue serialises, and that step is fast.
Stage 5 — Archive: the spec is updated, not just the code
A merged change is not done until OpenSpec is updated.
archive_change (in merger.py) does three things, the first two delegated to the openspec archive <name> CLI:
- Move
openspec/changes/<name>/to a timestampedarchive/subdirectory. - Sync the change’s delta specs into the project’s main
openspec/specs/tree — so the next planner run sees the new state of the world. - Stage and commit the spec update.
Until the archive step runs, a freshly merged change’s delta exists in openspec/changes/, and the next planner run would see it as still in flight. After archive, the change is part of the spec base, and the planner treats its requirements as known.
This is the loop closure. The next spec the planner reads is the one this change just helped write.
What the journal actually looks like
The event log is one file: orchestration-events.jsonl, one JSON object per line. Per-change journals live at <state-dir>/journals/<change_name>.jsonl and contain the same events filtered to one change.
A working subset of the event types you will see:
DISPATCH— a change went into a worktree.AGENT_SESSION_DECISION— fresh session, or re-attach to an existing session ID.LLM_CALL— every model call, with role (decompose_brief,agent,review,spec_verify, …), input/output token counts, cache reads, and the resolved model ID.VERIFY_GATE— one gate ran, with name and result.GATE_SET_EXPANDED— the gate registry added gates based on the change’s content (e.g. it touched UI code, sodesign-fidelitywas attached).STATE_CHANGE— a change moved from one status to another.CHANGE_INTEGRATION_FAILED— the merge queue’s integration gate failed for a change.MONITOR_HEARTBEAT/WATCHDOG_HEARTBEAT— supervisor liveness pings.
A stripped-down trace for one change, in event order:
STATE_CHANGE name=add-blog-list status: pending → dispatched
DISPATCH data={scope: ...}
AGENT_SESSION_DECISION decision=fresh
LLM_CALL role=agent tokens_out=14820
LLM_CALL role=agent tokens_out=22104
GATE_SET_EXPANDED added=[e2e, lint, design-fidelity]
VERIFY_GATE name=build result=pass
VERIFY_GATE name=test result=pass
VERIFY_GATE name=design-fidelity result=fail retry_context=token-mismatch
LLM_CALL role=agent tokens_out=8740 verify_retry=1
VERIFY_GATE name=design-fidelity result=pass
VERIFY_GATE name=review result=pass
VERIFY_GATE name=spec_verify result=pass
STATE_CHANGE status: dispatched → merged
STATE_CHANGE status: merged → done
Each line is timestamped. The whole trace is replayable, diffable, and survives crashes — because nothing about it lives only in process memory.
What this shape buys you
Three concrete things, in order of importance:
Crash recovery is cheap. State lives in orchestration-state.json. Per-change progress lives in the per-change journal. Worktrees live on disk. If the orchestrator process dies, restarting it picks up exactly where it stopped — no in-flight change forgets which step it was on. The supervisor process can also tell, from the same state, which agents it needs to put back.
Parallelism is bounded by isolation, not by coordination. Ten worktrees on ten branches do not need to coordinate writes; they need to merge in order. The hard part — making sure two parallel changes do not produce a broken main — is concentrated in one place (the merge queue’s integration gate), not smeared across the whole pipeline.
Failure has a shape, not a story. Every gate produces a structured pass/fail with the artefacts the agent needs to act on: a stack trace, a failing requirement ID, a missing test file, a pixel diff. Those artefacts get handed back as retry_context on the next iteration. The agent does not narrate its own failures — the gate does, in machine-readable form, and the verify-retry loop is what turns that into the next prompt.
What this article deliberately does not specify
- The exact threshold between single-call and three-phase planner. The two modes are stable; the threshold (
planner.single_call_max_input_tokens) and the auto-routing heuristic are tunable, project-specific, and being adjusted from real production runs. - Specific anomaly rules in the supervisor. The supervisor’s anomaly-detection rules are still being hardened from real production runs. The supervisor’s role (watch, restart, replay) is stable; the rule set is not.
- Project-type-specific gate sets. Web, voice-agent, and other project profiles register their own gates and configure modes differently. The article describes the universal stack and the idea of profile extension, not the per-profile detail.
- Exact retry budgets and timeouts. These are knobs, configured per project.
Reproducibility
Every claim in this article maps to a file in github.com/tatargabor/set-core:
- Decompose:
lib/set_orch/planner.py(_resolve_planner_strategy,_phase1_planning_brief,_decompose_single_domain,_phase3_merge_plans). - Dispatch:
lib/set_orch/dispatcher.py(dispatch_change,dispatch_ready_changes). - Gate stack:
lib/set_orch/gate_profiles.py(UNIVERSAL_DEFAULTS),lib/set_orch/verifier.py. - Merge queue:
lib/set_orch/merger.py(execute_merge_queue,archive_change). - Event log and journals:
lib/set_orch/events.py(EventBus),lib/set_orch/state.py. - Model resolver (13 roles, 5-tier chain):
lib/set_orch/model_config.py.
The shape described here is what the code does today. The knobs are documented separately in the project’s README and OpenSpec specs.