Parallel Tool Call Learning

Problem

Agents often execute tool calls sequentially even when they could run in parallel:

Unnecessary latency: Sequential calls add up when tool execution time dominates inference time
Inefficient exploration: Agent waits for one result before deciding the next action
Poor tool utilization: Multiple independent information needs handled one-by-one
Suboptimal learned behavior: Base models may not naturally parallelize without training signal

Example bottleneck:

Sequential (slow):
1. search("Intel financial data") → 2s
2. read_file("2023_report.pdf") → 1.5s
3. search("return metrics") → 2s
4. read_file("returns_table.csv") → 1.5s

Total: 7 seconds

Cognition observed this with Devon: the baseline model would make 8-10 sequential tool calls during file planning, taking significant time even though many calls could have run in parallel.

Solution

Use Agent RFT to teach the model to parallelize independent tool calls, dramatically reducing latency when tool execution is faster than inference.

How Models Learn Parallelization:

During RL exploration, the agent discovers that:

Multiple tool calls can be made simultaneously
When tool results arrive together, the next reasoning step has more context
Parallel patterns receive similar rewards in less time (implicit efficiency reward)
The model naturally converges toward parallel execution patterns

Tool Classification for Safe Parallelization:

Agents learn to distinguish between:

Read-only tools: Safe to parallelize (search, read_file, list)
State-modifying tools: Require serialization (write_file, delete, state-changing APIs)

This classification prevents race conditions while maximizing parallelism for safe operations.

Natural Emergence through RL:

Unlike explicit programming, the parallelization emerges from:

Exploration: Agent tries different tool call patterns
Reward shaping: Faster completions may get slight bonuses (optional)
Efficiency pressure: Light penalty on token usage encourages efficiency
Pattern reinforcement: Successful parallel patterns get reinforced

Typical Learned Pattern:

Parallel (fast):
Batch 1 (parallel):
  
- search("Intel financial data")
  - read_file("2023_report.pdf")
  - search("return metrics")
  - list("/financial_reports")
→ All complete in ~2s (dominated by slowest)

Batch 2 (parallel, based on Batch 1 results):
  
- read_file("returns_table.csv")
  - read_file("competitor_data.csv")
→ Complete in ~1.5s

Total: ~3.5s (50% faster)

How to use it

Prerequisites:

Your infrastructure must support parallel tool execution:

# Tool server must handle concurrent requests
@app.cls(
    image=base_image,
    concurrency_limit=10,  # Allow 10 concurrent tools per rollout
    allow_concurrent_inputs=True
)
class ParallelToolExecutor:
    @method()
    async def execute_tool(self, rollout_id: str, tool: str, params: dict):
        # Use async for I/O-bound operations
        result = await self._async_execute(tool, params)
        return result

Training Setup:

No special configuration needed - parallelization emerges naturally:

# Standard Agent RFT setup
job = client.fine_tuning.jobs.create(
    training_file="file-abc123",
    model="gpt-4o",
    method="rft",
    rft={
        "tools": tools,
        "grader": grader,
        "hyperparameters": {
            "n_epochs": 3,
            "batch_size": 16,
            "compute_multiplier": 1
        }
    }
)

# No special "parallelization" flag needed!
# Model discovers this pattern during exploration

Optional: Explicit Latency Rewards

You can encourage parallelization through reward shaping:

class LatencyAwareGrader:
    def grade(self, question, answer, tool_trace, ground_truth):
        # Standard correctness score
        correctness = self.check_correctness(answer, ground_truth)

        # Bonus for efficiency
        num_sequential_rounds = self.count_sequential_rounds(tool_trace)

        if num_sequential_rounds <= 3:
            efficiency_bonus = 0.1
        elif num_sequential_rounds <= 5:
            efficiency_bonus = 0.05
        else:
            efficiency_bonus = 0.0

        return {
            "score": correctness + efficiency_bonus,
            "subscores": {
                "correctness": correctness,
                "efficiency": efficiency_bonus
            }
        }

    def count_sequential_rounds(self, tool_trace):
        """
        Count how many back-and-forth rounds with tools
        Parallel calls in same round = 1 round
        """
        rounds = 0
        current_round_calls = set()

        for call in tool_trace:
            if call['type'] == 'tool_call':
                current_round_calls.add(call['id'])
            elif call['type'] == 'tool_response':
                if call['call_id'] in current_round_calls:
                    current_round_calls.remove(call['call_id'])
                    if not current_round_calls:
                        rounds += 1

        return rounds

Monitoring Parallelization:

Track during training to see if model learns parallel patterns:

def analyze_parallelization(tool_trace):
    """
    Analyze how many parallel calls the agent made
    """
    parallel_batches = []
    current_batch = []

    for event in tool_trace:
        if event['type'] == 'tool_call':
            current_batch.append(event)
        elif event['type'] == 'assistant_message':
            # End of reasoning, start of new tool batch
            if current_batch:
                parallel_batches.append(len(current_batch))
                current_batch = []

    return {
        'num_batches': len(parallel_batches),
        'calls_per_batch': parallel_batches,
        'max_parallelism': max(parallel_batches) if parallel_batches else 0,
        'total_calls': sum(parallel_batches)
    }

# Example output showing learned parallelization:
# Baseline: {'num_batches': 8, 'calls_per_batch': [1,1,1,1,1,1,1,1], 'max_parallelism': 1}
# Fine-tuned: {'num_batches': 2, 'calls_per_batch': [4,2], 'max_parallelism': 4}

Trade-offs

Pros:

Dramatic latency reduction: 40-50% reduction common when applicable
No manual coding: Parallelization emerges from training, not programming
Adaptive: Model learns optimal parallelization for your specific tools and tasks
Scales naturally: Works across different numbers of tools and complexity levels
Better UX: Faster agent responses improve user experience

Cons:

Infrastructure requirements: Tool servers must handle concurrent requests
Resource usage: More simultaneous tool calls = higher peak resource usage
Doesn't always emerge: Requires enough variance in training data
May need reward shaping: Explicit latency bonuses can help if parallelization doesn't emerge naturally
Debugging complexity: Parallel execution makes traces harder to follow

References

OpenAI Build Hour: Agent RFT - Cognition Case Study (November 2025)
Parallel Tool Execution Pattern
Related patterns: Agent Reinforcement Fine-Tuning, Tool Use Incentivization via Reward Shaping