Planner-Worker Separation for Long-Running Agents

Problem

Running multiple AI agents in parallel for complex, multi-week projects creates significant coordination challenges:

Flat structures lead to conflicts, duplicated work, and agents stepping on each other
Dynamic coordination through shared files with locking becomes a bottleneck - most agents spend time waiting rather than working
Equal status agents become risk-averse, avoiding difficult tasks and making only small, safe changes instead of tackling end-to-end implementation
No agent takes ownership of hard problems or overall project direction

Solution

Separate agent roles into a hierarchical planner-worker structure:

Planners: Continuously explore the codebase and create tasks. They can spawn sub-planners for specific areas, making planning itself parallel and recursive.
Workers: Pick up tasks and focus entirely on completing them. They don't coordinate with other workers or worry about the big picture. They grind on their assigned task until done, then push changes.
Judge: At the end of each cycle, determines whether to continue or if the goal is achieved.

This creates an iterative cycle where each iteration starts fresh, combating drift and tunnel vision.

How to use it

Use cases for planner-worker separation:

Massive codebases: Projects that would take human teams months (1M+ lines of code, 1000+ files)
Ambitious goals: Building complex systems from scratch (web browser, Windows emulator, Excel clone)
Large-scale migrations: In-place framework migrations (Solid to React, Java LSP implementation)
Performance optimization: Complete rewrites in different languages for speed (C++ to Rust)

Implementation considerations:

Model choice per role: Different models excel at different roles. Use planning-focused models for planners even if coding-focused models exist for workers.
Fresh starts: Each cycle should start fresh to combat drift and tunnel vision from long-running contexts.
Parallel planning: Planners can spawn sub-planners, making the planning process itself parallel and recursive.
Worker isolation: Workers should be task-focused and not worry about coordination with other workers.

Prompting is critical: Getting agents to coordinate well, avoid pathological behaviors, and maintain focus over long periods requires extensive experimentation with prompts.

Trade-offs

Pros:

Scalability: Hundreds of agents can work concurrently on a single codebase for weeks
Clear ownership: Planners own the big picture; workers own task completion
Parallel planning: Planning itself scales through sub-planner spawning
Reduced coordination overhead: Workers don't need to coordinate with each other
Combats tunnel vision: Iterative cycles with fresh starts prevent drift

Cons:

System complexity: Requires orchestration infrastructure for role separation and task distribution
Prompt engineering difficulty: Coordination behavior requires extensive prompt experimentation
Cost: Running hundreds of concurrent agents for weeks is expensive
Not perfectly efficient: Significant token waste, but far more effective than expected
Still evolving: Planners should wake up when tasks complete; agents sometimes run too long

References

Scaling long-running autonomous coding - Cursor blog post on running hundreds of concurrent agents for weeks at a time
Browser source code on GitHub - 1M+ lines of agent-generated code
Feudal Networks (FuN) - ICML 2017 paper introducing manager-worker separation in hierarchical RL (Vezhnevets et al.)
The Options Framework - Seminal work on temporal abstraction creating planning-execution hierarchy (Sutton et al., 1999)
HIRO: Hierarchical RL with Off-Policy Correction - ICML 2020 paper on high-level planners and low-level workers (Lee et al.)