Persistent Memory Looks Like a Real Win for Coding Agents

May 2, 2026

One of the most common complaints I see about coding agents is orientation cost. People report sessions where 75% of the token budget gets spent just figuring out where the agent is, what the project does, and how to move forward. That sounds awful, but it also sounds believable if the agent has to keep re-reading code and rebuilding context from scratch.

I wanted to see how that looked in my own usage, so I analyzed Claude Code transcripts and attributed token spend to memory, search, retrieval, and storage behavior. Over my most recent 14 day window, the orientation-ish overhead came out to 40.3% of total tokens. Shared memory itself was only 9.0%. The other 31.3% was broader search and retrieval activity like reading files, grepping, globbing, and web lookups.

Then I ran the same analysis on the full month of February, which is useful because I was using Anthropic models then and I am using OpenAI models now. February came out meaningfully worse on orientation cost: 49.3% total orientation-ish overhead. Shared memory was only 1.3%, while broader search and retrieval was 48.0%.

That is not a clean lab experiment. The model changed. The projects changed. The task mix changed. I did not disable the memory server and run a painful A/B test for multiple days. But the directional signal is pretty strong. In February the agent spent much more of its budget scanning and searching. In the recent window, a larger share of the orientation burden appears to be handled by memory lookup instead of fresh project analysis.

So my takeaway is pretty simple: persistent memory looks like it is doing real work. It does not appear to be a major token sink, and it very likely helps the agent orient faster by recalling prior context instead of rediscovering everything from scratch. If you are building coding-agent workflows and you are not investing in memory, I think you are probably paying for the same orientation over and over again.