Swarms: Parallel Content Without Chaos
Fanning out many AI workers at once sounds like the obvious path to speed — until you hit the rate-limit cliff, watch a batch of jobs come back empty, and realize one large job quietly hung. Here is the disciplined pattern that actually works.
Why parallel generation is tempting (and dangerous)
Once you have an orchestrator routing work to individual workers — as we covered in Part 4 — the natural next thought is: why run them one at a time? If one worker can produce a page in thirty seconds, ten workers should produce ten pages in thirty seconds. In principle, yes. In practice, you are sharing a single global rate budget with every job you launch, and the moment you saturate it, everything stalls simultaneously.
We learned this the hard way during a bulk content run. The operator queued several large generation jobs at once rather than one controlled job at a time. The rate limit — shared across all of them — was exhausted within minutes. Most workers came back with empty or truncated output. One job stopped mid-batch and never resumed on its own. The work was not lost — the job kept a journal, and every completed item was recoverable from it — but the time cost of diagnosing and rerunning was significant. The standing rule changed that day: one controlled job at a time, batched small internally.
Hard lesson: Firing several big parallel jobs at once does not run faster when they share a global rate budget. It trips the limit and returns empty results across the board. The fix is not "less parallelism" — it is parallelism inside one controlled job, not many uncontrolled jobs running concurrently.
The architecture of a safe swarm
A swarm is not "many jobs at once." A swarm is one coordinated job that fans out internally across many workers, under a single controller that tracks the global rate budget and throttles when needed. Here is what that looks like in practice.
One job, many workers inside it
The orchestrator holds a queue of items to generate — say, a set of educational pages on related topics. It dispatches workers from that queue in small batches. Each worker takes one item, generates the output, saves it to disk, and signals completion. The orchestrator advances the queue only as fast as the rate budget permits.
The key structural rules:
- Workers are briefed individually, with clear scope and a concrete output format. "Write a 400-word article on X, return it as a file named Y.html, flag any claim you are uncertain about."
- Workers save output to disk before they do anything else. If the job is interrupted, nothing is lost — the journal of what is done versus pending is the filesystem itself.
- Workers do not deploy. Workers do not commit. Workers write a draft and return. The orchestrator is the one that moves things forward after review.
- The orchestrator tracks how many concurrent workers are active and enforces a ceiling. That ceiling is set conservatively, not optimistically.
The journal pattern
Every swarm job maintains a simple log of its state. Before generating item N, the controller records "pending:N". After saving the output, it records "done:N". If the job is interrupted, a restart skips everything already marked done and picks up at the first pending item. This is the pattern that made our hung job recoverable — nothing was truly lost, it just needed to be restarted from the checkpoint.
# Example journal structure (plain text or JSON lines)
{"id": "page-001", "status": "done", "file": "output/page-001.html"}
{"id": "page-002", "status": "done", "file": "output/page-002.html"}
{"id": "page-003", "status": "pending", "file": null}
{"id": "page-004", "status": "pending", "file": null}
A restart reads this file, skips the done entries, and resumes from page-003. Fifteen lines of code. Massive peace of mind.
Design principle: Build for interruption from day one. Assume your job will be cut short. If the output of every completed step is already on disk and the journal records it as done, a restart is indistinguishable from a normal run.
The quality gate is not optional
Here is where many operators make the critical mistake: they treat generation and publication as a single step. They do not. They must be two separate steps with a skeptical reviewer in between.
In our own bulk runs, roughly one in five pages produced by an AI worker contained an invented fact — a wrong date, a subtly incorrect definition, a made-up attribution. The content looked plausible. It read fluently. It would have passed a casual glance. The gate caught it; without the gate, those pages would have shipped.
Hard lesson: Fabrication is the default failure mode of bulk AI generation — not an edge case. "It's low-stakes content, just ship it" is exactly the thinking that leads to publishing wrong information at scale. A gate is not optional; it is the entire point.
The generate-save-review-fix-ship pattern has five distinct phases:
- Generate. Workers produce drafts. Drafts are saved to a staging directory, never to production.
- Save. Every draft is written to disk immediately, with a filename that matches its journal entry. Generation and saving are atomic — you never have an in-memory draft that is not also on disk.
- Review. A separate, skeptical review pass reads every draft. This reviewer does not know what the generator "intended" — it only sees the output. It flags anything that looks fabricated, inconsistent, or below quality. This can be a second AI pass with an explicit "find what is wrong" prompt, a human spot-check, or both. The important thing is that the reviewer is structurally separate from the generator and is actively looking for problems.
- Fix. Only flagged items are touched. Items that pass review are not re-generated — unnecessary re-generation introduces new risk. The fix step is surgical.
- Ship. Only what has passed review moves from staging to production. The orchestrator does the move, not the individual workers.
What the reviewer prompt looks like
The tone of the review prompt matters more than you might expect. A prompt that says "check this for quality" tends to rubber-stamp. A prompt that says "find everything that could be wrong" surfaces problems. We phrase it roughly like this:
You are a skeptical editor reviewing a draft page before publication.
Your job is to find problems, not to approve. Flag:
- Any factual claim you cannot verify from the brief provided
- Any date, name, or statistic that seems invented
- Any section that is vague where the brief asked for specifics
- Any quality issue that would embarrass a professional publisher
Return a list of flags. If you find nothing to flag, say so explicitly.
Do NOT rewrite the content. Only identify problems.
The "do not rewrite" instruction is important. A reviewer that also rewrites becomes a second generator, which means you now have two unreviewed outputs. Keep the roles separate.
Respecting the rate budget
The global rate limit is not a problem you solve once. It is a constraint you manage continuously. A few practical rules:
- Set your worker concurrency ceiling conservatively. If you think you can safely run eight workers at once, start at four. You can always increase it; recovering from a stalled batch costs more than the time you saved by rushing.
- Add a small delay between dispatching workers in a batch. Even a two-second gap between launching each worker spreads the load enough to avoid hitting the per-minute ceiling in a burst.
- If you get a rate-limit error, stop the entire job immediately. Do not retry with the same concurrency. Log the checkpoint, wait, and resume at a lower ceiling. Fighting a rate limit by retrying harder is counterproductive.
- Monitor active worker count, not just queued items. The relevant pressure on the rate budget is how many workers are currently waiting for a response, not how many items are left in the queue.
Practical rule: One controlled job, batched small internally, with a concurrency ceiling set conservatively. Many uncontrolled jobs at once is not a swarm — it is a traffic jam.
What makes a good swarm candidate
Not every content task benefits from a swarm. The pattern works best when:
- The items are structurally similar — the same template, the same brief structure, just different subject matter.
- Each item is genuinely independent — no item needs to know what another item said.
- The volume is large enough that sequential processing would take significantly longer.
- You have a quality gate ready before you start. Never launch a swarm without knowing exactly how you will review the output.
Tasks that are a poor fit: anything that requires creative coherence across items (a multi-chapter narrative), anything where each item depends on the output of another, anything where you do not yet know what "good" looks like well enough to review it.
The compounding value
The reason to build this infrastructure carefully is not the first batch. It is the tenth batch. Once the journal pattern, the staging directory, the review prompt, and the ship step are in place as reusable components, each new content initiative costs a fraction of the first. The templates are already written. The rate ceiling is already calibrated. The reviewer prompt is already tested. You brief the orchestrator on what to generate, and the machine does the rest — up to and including the quality gate.
That is what makes parallel generation genuinely valuable: not speed in isolation, but speed combined with discipline, so the output that reaches your readers is consistently trustworthy.
Download: Swarm + QA Recipe
A tool-agnostic checklist and template covering the generate-save-review-fix-ship pattern, the journal structure, a starter review prompt, and the rate-budget rules — ready to adapt to your own stack.
Get the Swarm + QA Recipe