We built a persistent agent memory layer on Elasticsearch with 0.89 recall

Agent Builder is available now GA. Get started with an Elastic Cloud Trial , and check out the documentation for Agent Builder here .

Building agent memory on Elasticsearch

Three indices, hybrid recall with a reranker, supersession, decay, and DLS. The architecture and the numbers behind a persistent memory layer for agents.

Sarah's smart bulbs are only showing white. Her smart-home assistant suggests resetting the hub. She did that in March, and again last week; neither reset fixed anything. The agent doesn't know that, and it doesn't know about the dog chewing through her sensor cables either. The history that mattered, what worked, what didn't, and who Sarah is ended with each session.

The standard workaround is to stuff prior context into the context window. That breaks down on cost, on latency, and on the well-documented "lost in the middle" effect, where models ignore facts placed far from the prompt's edges. A 1M-token context window is a scratchpad. It is not a memory system.

The context window is short-term memory: the active reasoning space for a single inference. What is missing is long-term memory: a persistent store that survives session end, scales to years of interaction, and lets you retrieve facts by content, by time, and by user.

This post is about the architecture of a real agent memory system, built on Elasticsearch and structured around three categories from cognitive science, one hybrid recall query with RRF and a cross-encoder reranker, supersession for contradictions, and per-user DLS isolation. On a QA-style eval over 168 questions, R@10 averages 0.89 with zero cross-tenant leaks.

The full implementation is on GitHub; this post is about why it is shaped the way it is.

What an agent memory store has to do

A user asks "what fix did we try last time?", a temporal query with an exact-match constraint. Or "Why are my smart bulbs only showing white?", which needs personal memory blended with a shared catalog. Memory itself doesn't behave uniformly: events the user lived, stable facts about them, and step-by-step playbooks all have different write rates and aging rules, so the store has to recognize the type and treat each accordingly. And in any multi-user deployment, each user's memory has to stay invisible to every other user. Fresh events accumulate fast enough that they have to be consolidated into the durable kinds, or the index turns into a haystack. When a user contradicts a recalled fact, the old version has to be superseded rather than deleted, so the audit trail stays. Older facts shouldn't outrank fresh ones, and facts the user touches often shouldn't sink. And the whole memory layer should be reachable by any MCP-speaking client, not tied to one agent runtime.

... continue reading