Claude, please stop trying to memorize random crap

31 likes may not seem like a lot, but that's actually everyone on substack notes

We have found zero performance benefit on SWE tasks when agents have search access to their previous transcript sessions, provided they have access to other forms of context. We also have not found much benefit in trying to automatically trawl through session transcripts to improve agent context, unless there is a human in the loop.

This was pretty surprising.

Intuitively it feels like there's a lot of valuable information in a transcript between an agent and an engineer. Maybe it would have information about why the code exists, about user intent. Or it might have the other approaches that a user tried and discarded. At the least, it would have some amount of additional context that the agent could use to augment its understanding. I believed this so strongly that my company built an entire product around this concept. I used to tell folks that "session transcripts were the new oil," that they were more valuable than the code itself.

Other people have clearly had similar thoughts, which is why there are so many different tools to do session backed memory, including (of course) Claude Code itself.

I think the most common architecture is to do something like:

Store all transcripts across an organization in a DB

Put a vector search, an elastic search, or a SQL search layer in front of it. Ambitious teams will use all three. Maybe graphs will be involved.

Make this available to the agent using an MCP, or by exposing a cli with skills.

For us, this additional work doesn't seem to make a bit of difference. If anything, based on many months of testing with and without session search access, it may make the models worse.

... continue reading