Problem
In long agent sessions, raw user text and tool outputs often remain in-context long after they are needed. If those tokens include adversarial instructions, they can silently bias later reasoning steps, even when the current step is unrelated. This creates delayed prompt-injection risk and unnecessary context bloat.
Solution
Purge or redact untrusted segments once they've served their purpose:
- After transforming input into a safe intermediate (query, structured object), strip the original prompt from context.
- Subsequent reasoning sees only trusted data, eliminating latent injections.
- A strong variant also removes intermediate LLM outputs that may have been tainted.
Treat context as a staged pipeline: ingest untrusted text, transform it, then aggressively discard the original tainted material. Keep only signed-off structured artifacts that downstream steps are allowed to consume.
sql = LLM("to SQL", user_prompt)
remove(user_prompt) # tainted tokens gone
rows = db.query(sql)
answer = LLM("summarize rows", rows)
How to use it
Customer-service chat, medical Q&A, database query generation, any multi-turn flow where initial text shouldn't steer later steps.
Trade-offs
- Pros: Simple; no extra models needed; helps prevent context window anxiety by reducing overall context usage; provides compliance benefits (HIPAA/GDPR data minimization).
- Cons: Later turns lose conversational nuance; may hurt UX; overly aggressive minimization can remove useful context; risks broken referential coherence when earlier turns are referenced ("the function I mentioned before").
Example
References
- Beurer-Kellner et al., §3.1 (6) Context-Minimization.
- Building Companies with Claude Code - Emphasizes discrete phase separation and distilled handoffs to prevent context contamination.
- OpenAI, Unrolling the Codex Agent Loop - Documents context auto-compaction in production.