Skip to content
Tech News
← Back to articles

Claude, please stop trying to memorize random crap

read original more articles
Why This Matters

This article highlights that, contrary to common assumptions, providing agents with access to session transcripts does not improve performance in software engineering tasks and may even hinder it. This challenges the prevailing approach of leveraging session-backed memory tools, prompting a reevaluation of their effectiveness and implementation in the tech industry. Understanding these insights can help organizations optimize their AI strategies and resource allocation.

Key Takeaways

31 likes may not seem like a lot, but that's actually everyone on substack notes

We have found zero performance benefit on SWE tasks when agents have search access to their previous transcript sessions, provided they have access to other forms of context. We also have not found much benefit in trying to automatically trawl through session transcripts to improve agent context, unless there is a human in the loop.

This was pretty surprising.

Intuitively it feels like there's a lot of valuable information in a transcript between an agent and an engineer. Maybe it would have information about why the code exists, about user intent. Or it might have the other approaches that a user tried and discarded. At the least, it would have some amount of additional context that the agent could use to augment its understanding. I believed this so strongly that my company built an entire product around this concept. I used to tell folks that "session transcripts were the new oil," that they were more valuable than the code itself.

Other people have clearly had similar thoughts, which is why there are so many different tools to do session backed memory, including (of course) Claude Code itself.

I think the most common architecture is to do something like:

Store all transcripts across an organization in a DB

Put a vector search, an elastic search, or a SQL search layer in front of it. Ambitious teams will use all three. Maybe graphs will be involved.

Make this available to the agent using an MCP, or by exposing a cli with skills.

For us, this additional work doesn't seem to make a bit of difference. If anything, based on many months of testing with and without session search access, it may make the models worse.

... continue reading