Claude Code Workflows · Explainer 01

The mental model, and when a workflow earns its cost

You already hand-roll agent loops against the raw API. A workflow is that instinct, lifted into the harness — and the whole skill of using it well is one judgment plus one default.

Anchor to what you know

You've written this loop in Python: call the model, read the tool call, run the tool, feed the result back, repeat. You've paid per token and felt the cost. Hold that picture. Everything below is a relabelling of it — the only new ideas are where the determinism lives and that each step gets a fresh context window.

1 · What a workflow actually is

A Claude Code workflow is a deterministic JavaScript orchestrator that spawns non-deterministic subagents. You write a plain JS script; the harness runs it in the background. Two layers, and keeping them straight is the entire mental model:

Deterministic

The script (your code)

Loops, ifs, fan-out, counting, dedup, budget checks. Ordinary JS. Runs the same way every time. This is where you put control flow you don't want an LLM to improvise.

Non-deterministic

The agent() calls

Each one spawns a fresh subagent — its own context window, its own tools — does a task, and returns its result to your script. This is the LLM call in your hand-rolled loop.

So the mapping from your Python loop:

Your hand-rolled loopA Claude Code workflow
Your for / while orchestration codeThe workflow script (deterministic JS)
One messages.create() API call + its tool loopOne agent(prompt) call (a whole subagent, not one turn)
Same conversation, growing contextFresh, isolated context per agent() — they don't see each other
You parse the model's text yourselfagent(prompt, {schema}) returns a validated object
Your token meter ticking upA shared budget across the whole run + the main loop

The crucial difference from a single subagent (one Agent call) is the deterministic glue. A lone subagent decides its own steps. A workflow lets you dictate the structure — fan out exactly 8 readers, verify every finding with 3 skeptics, loop until two dry rounds — and the model only fills in the leaf tasks. That's the orchestrator-workers pattern from Anthropic's agent taxonomy, made concrete. Building Effective Agents ↗

Check yourself: in a workflow, which layer decides "spawn 5 reviewers then verify each finding twice" — the script or the subagents?
The script. That's deterministic control flow you wrote in JS, so it happens identically every run. The subagents only decide what they find inside their own task — never the shape of the orchestration. If you ever catch yourself wishing "the agent will probably remember to verify," stop: put the verification in the script as another agent() call. Structure you care about belongs in the deterministic layer.

2 · The one judgment: does a workflow earn its cost?

Using workflows well is mostly knowing when not to. A workflow can spawn dozens of subagents and burn a lot of tokens — so it has to buy something a single agent can't. There are three reasons it does, and if none apply, don't reach for it:

1
Comprehensiveness — the work decomposes into many independent pieces you want covered in parallel (review N files, search M ways, migrate K call-sites). One context can't hold it all, or would do it serially and slowly.
2
Confidence — you want independent perspectives before committing: N candidate designs scored by judges, or every finding attacked by skeptics. Redundancy you deliberately want, not waste.
3
Scale beyond one context — a migration or audit too big for a single context window, where fresh isolated contexts per item are the point.

The cost gate, stated plainly

Workflows are an explicit opt-in tool. In normal Claude Code use you (or the harness) only launch one when you ask for it — the keyword ultracode, an explicit "use a workflow / fan out agents," or a skill that calls it. That gate exists because the cost is real. So your private test before authoring one is blunt:

"Would a single capable agent do this about as well? Then don't fan out."

A refactor, a bugfix, a one-file edit, a quick lookup — solo. Breadth, independent verification, or scale — workflow.

Try this — rapid triage

For each task, decide solo agent or workflow before you open the answer. Say why in one word (breadth / confidence / scale / none).

  1. "Rename getUser to fetchUser across the repo."
  2. "Audit this 400-file service for auth bugs, and I want each finding double-checked."
  3. "What does the publish.sh script deploy?"
  4. "Give me three competing designs for the caching layer and tell me which wins."
Reveal verdicts

1 · Solo — none. A scripted rename or one agent handles it; fanning out buys nothing. (A purist could pipeline per-file edits in worktrees, but for a rename that's ceremony.)

2 · Workflow — scale and confidence. 400 files exceeds one context, and "double-check each finding" is the adversarial-verify pattern: find → verify, every finding attacked independently.

3 · Solo — none. One Read answers it. Reaching for a workflow here is the classic over-reach.

4 · Workflow — confidence. The judge-panel pattern: generate N independent attempts, score with parallel judges, synthesize the winner. Three rivalrous designs beat one design iterated.

3 · The three primitives

Almost everything is built from three hooks. Learn these and you can read most workflow scripts.

agent(prompt, opts?) — spawn one subagent

The atom. Returns the subagent's final text as a string — or, with a schema, a validated object. Fresh context every time.

const summary = await agent("Read src/auth.ts and summarize the login flow.")
// with structure:
const bugs = await agent("Find auth bugs in src/auth.ts.", { schema: BUGS_SCHEMA })

parallel(thunks) — fan out, then wait for all (a barrier)

Runs tasks concurrently and blocks until every one finishes before returning the array. Use it only when the next step genuinely needs all results at once.

const reviews = await parallel(
  DIMENSIONS.map(d => () => agent(d.prompt, { schema: FINDINGS }))
)
// nothing past this line runs until the SLOWEST review returns

pipeline(items, stage1, stage2, …) — each item flows through all stages, no barrier

Each item runs through every stage independently. Item A can be in stage 3 while item B is still in stage 1. Wall-clock ≈ the slowest single chain, not the sum of slowest-per-stage.

const results = await pipeline(
  DIMENSIONS,
  d      => agent(d.prompt, { schema: FINDINGS, phase: 'Review' }),
  review => parallel(review.findings.map(f => () =>
             agent(`Verify: ${f.title}`, { schema: VERDICT, phase: 'Verify' })))
)
// dimension "bugs" verifies while dimension "perf" is still being reviewed

4 · The default that separates good from wasteful: pipeline over barrier

Default to pipeline(). Reach for a parallel() barrier only when stage N truly needs every result from stage N-1 at once.

This is the single highest-leverage habit. A barrier makes your fast workers sit idle waiting for the slowest one. If five finders run and the slowest takes 3× the fastest, a barrier wastes two-thirds of the fast finders' time. A pipeline lets each finding move on the instant it's ready.

A barrier is justified only when stage N needs cross-item context from all of stage N-1:

Barrier OK

Genuine cross-item need

  • Dedup/merge across the full result set before expensive downstream work
  • Early-exit on the total ("0 bugs found → skip verification entirely")
  • Stage N compares one finding against the others
Not a barrier

These feel like reasons but aren't

  • "I need to flatten/map/filter first" — do it inside a pipeline stage
  • "The stages are conceptually separate" — separate ≠ synchronized
  • "It's cleaner code" — barrier latency is a real cost

The smell test

If you wrote this:

const a = await parallel(...)
const b = transform(a)              // flatten / map / filter — no cross-item dependency
const c = await parallel(b.map(...))

…that middle transform doesn't need the barrier. Rewrite as a pipeline with the transform inside a stage. When in doubt: pipeline.

Check yourself: you fan out 6 reviewers, then want to dedupe their findings against each other before verifying. Pipeline or barrier?
Barrier (parallel). Dedup-across-the-full-set is the textbook cross-item dependency — you literally can't dedupe finding #1 until you've seen all six reviewers' output. This is one of the few cases that earns the wait. Contrast: if you just verified each finding on its own, that's a pipeline — no reviewer's findings depend on another's.
Check yourself: why is parallel called "a barrier" but pipeline isn't?
A barrier is a synchronization point: every concurrent task must reach it before anyone proceeds. parallel() awaits all thunks before returning, so the slowest one gates the rest. pipeline() has no such point between stages — each item advances on its own clock. Same total concurrency cap; completely different wall-clock when item durations vary.

5 · Two footguns to bank now

Failures resolve to null

A thunk that throws inside parallel/pipeline becomes null in the results — the call never rejects. Always .filter(Boolean) before using results, or a single failed subagent silently corrupts your downstream logic.

It's JS, not TS — and no clocks

Type annotations, interfaces, generics fail to parse. And Date.now(), Math.random(), argless new Date() all throw (they'd break run-resume). Vary by index for "randomness"; pass timestamps in via args.


Where this lands

You now have the load-bearing model:

Deterministic script · non-deterministic agents Each agent() = fresh context Earn the cost: breadth / confidence / scale agent · parallel(barrier) · pipeline(no barrier) Pipeline by default
Before next session

Think of one real task from your own work that passes the cost gate (breadth, confidence, or scale). Don't write the script — just name the task and which of the three reasons it qualifies under. Next session we'll turn it into a real pipeline with structured schema output, and meet the patterns that make workflows trustworthy: adversarial-verify and loop-until-dry.

Sources