LLM Map-Reduce Pattern

Problem

When many untrusted documents are processed in a single reasoning context, one malicious item can influence global conclusions. This creates cross-document contamination where a single poisoned input affects unrelated items and final decisions.

Solution

Adopt a map-reduce workflow:

Map: Spawn lightweight, sandboxed LLMs—each ingests one untrusted chunk and emits a constrained output (boolean, JSON schema, enum).
Reduce: Aggregate validated summaries via deterministic code (count, filter, majority-vote) or a privileged LLM that sees only sanitized fields.

Isolation is the core control: each map worker handles one item with constrained output contracts, so contamination cannot spread laterally. The reducer consumes validated summaries only, which preserves scalability and reduces injection blast radius.

results = []
for doc in docs:
    ok = SandboxLLM("Is this an invoice? (yes/no)", doc)
    results.append(ok)
final = reduce(results)  # no raw docs enter this step

How to use it

File triage, document summarization, resume filters, code migration verification—any N-to-1 decision where each item's influence should stay local.

Best fit when: N ≥ 10 items, processing time > 30s/item, items are independent, and aggregation is needed.

Trade-offs

Pros: A malicious item can't taint others; scalable parallelism; smaller contexts reduce cost.
Cons: Requires strict output validation; extra orchestration overhead; loses cross-item context.

References

Beurer-Kellner et al., §3.1 (3) LLM Map-Reduce.
Dean & Ghemawat (2008). MapReduce: Simplified Data Processing on Large Clusters.

Primary source: https://arxiv.org/abs/2506.08837
Foundational MapReduce: https://doi.org/10.1145/1327452.1327492