01
Problem
Static system prompts become stale or overly brittle as an agent encounters new tasks and edge-cases. Manually editing them is slow and error-prone.
02
Solution
Let the agent rewrite its own system prompt after each interaction:
- Reflect on the latest dialogue or episode.
- Draft improvements to the instructions (add heuristics, refine tool advice, retire bad rules).
- Validate the draft (internal sanity-check or external gate).
- Replace the old system prompt with the revised version; persist in version control.
- Use the new prompt on the next episode, closing the self-improvement loop.
# pseudo-code
dialogue = run_episode()
delta = LLM("Reflect on dialogue and propose prompt edits", dialogue)
if passes_guardrails(delta):
system_prompt += delta
save(system_prompt)
03
How to use it
- Best for low-risk domains with high-volume, well-defined workflows (e.g., formatting, style)
- Requires strong guardrails: structural validation, intent preservation checks, change magnitude limits
- Include version control integration and rollback capability
- Consider dual-agent architecture (executor + critic) for safer delta generation
- Avoid in safety-critical or high-regulation domains without human approval gates
04
Trade-offs
Pros: Rapid adaptation; data-driven improvements; no training infrastructure required.
Cons: Risk of drift or jailbreak; prompt bloat; oscillation and instability.
06
References
- Goodman, Meta-Prompt: A Simple Self-Improving Language Agent. (noahgoodman.substack.com)
- Shinn et al., Reflexion: Language Agents with Verbal Reinforcement Learning. arXiv:2303.11366 (2023)
- Madaan et al., Self-Refine: Large Language Models Can Self-Correct. arXiv:2303.05125 (2023)
- Khattab et al., DSPy: Declarative Self-Improving Language Programs. (github.com/stanfordnlp/dspy)