Problem
Stateless request handling causes agents to repeatedly rediscover decisions, constraints, and prior failures. Over multi-session workflows this leads to redundant work, inconsistent behavior, and shallow planning because each turn lacks durable historical context.
Solution
Add a vector-backed episodic memory store:
- After every episode, write a short "memory blob" (event, outcome, rationale) to the DB.
- On new tasks, embed the prompt, retrieve top-k similar memories, and inject as hints in the context.
- Apply TTL or decay scoring to prune stale memories.
Design memory writes as structured records (decision, evidence, outcome, confidence) rather than raw transcripts. Structured memory reduces repetitive outputs and improves reasoning (ParamMem 2026). At retrieval time, filter by task scope and recency so injected memories improve reasoning quality instead of introducing retrieval noise. Episodic memory with self-reflection achieved 91% pass@1 on HumanEval vs 80% baseline (Reflexion, NeurIPS 2023).
How to use it
- Use this in multi-session coding agents, support copilots, and long-running research workflows.
- Start with a small
top-kand strict metadata filters (task,repo,owner,timestamp). - Add memory quality review jobs to remove low-value or contradictory memories.
- Track whether retrieved memories improved outcomes versus baseline.
Trade-offs
Pros: richer continuity, fewer repeated mistakes.
Cons: retrieval noise if memories aren't curated; storage cost.
References
-
Reflexion (Shinn et al., NeurIPS 2023): https://arxiv.org/abs/2303.11366
-
ParamMem (Yao et al., 2026): https://arxiv.org/abs/2602.23320v1
-
MemGPT (Packer et al., UC Berkeley 2023): https://arxiv.org/abs/2310.08560
-
Cursor "10x-MCP" persistent memory layer
-
Windsurf Memories docs
-
Primary source: https://forum.cursor.com/t/agentic-memory-management-for-cursor/78021