Memora: A Harmonic Memory Representation Balancing Abstraction and Specificity

At a glance Today’s AI agents don’t remember past interactions. They must repeatedly be fed relevant information or retrieve it from external sources, which becomes less efficient as they handle longer and more complex tasks. To scale agent capabilities, we need a more efficient way to retain and access information over time.

Memora is a scalable memory system that dramatically increases agent productivity on long-horizon tasks by decoupling what is stored (rich memory content) from how it’s retrieved (lightweight abstractions and cue anchors), balancing abstraction and specificity.

is a scalable memory system that dramatically increases agent productivity on long-horizon tasks by decoupling is stored (rich memory content) from it’s retrieved (lightweight abstractions and cue anchors), balancing abstraction and specificity. Memora sets new state-of-the-art on LoCoMo and LongMemEval, outperforming Mem0, RAG, and full-context inference while using up to 98% fewer context tokens.

Memora paper (opens in new tab) is published at ICML 2026. Memora code is available at https://github.com/microsoft/Memora (opens in new tab) .

Imagine a workplace AI assistant helping you run a multi-month project. Over weeks of conversations, you share constraints, agree on milestones, revise deadlines, and surface dozens of stakeholder preferences. When you later ask it to draft an update for a colleague, it should recall not just the latest decision but the journey that got you there: what was tried, what was ruled out, who weighed in. Today’s AI agents struggle with this. Modern large language models (LLMs) are powerful reasoners, but they are effectively stateless: every session starts from zero, every long conversation forces the model to re-read its entire history, and every new piece of information is either stored as raw text (fragmented and noisy) or compressed into a vague summary (precise details lost). As AI assistants and autonomous agents move into long-horizon deployments, such as copilots that track a project for many months or even research agents that build up domain expertise with long horizon usage, the absence of principled memory system has become the critical bottleneck.

A growing line of work has begun to fill this gap. Systems like Mem0 extract atomic facts from conversations; retrieval-augmented (RAG) approaches index raw text fragments for later recall; and graph-based memory systems such as Zep and GraphRAG impose structure through entity relations. Each represents real progress, yet each runs into the same wall: existing designs force an unavoidable tradeoff between specificity (preserving fine-grained detail) and abstraction (organizing memory efficiently as it grows). Memora is built to give agents both.

What is Memora

Memora is an agentic memory framework designed for long-horizon AI agents. Memora’s central insight is to decouple what is stored from how it is retrieved. Memory content can remain rich and expressive, such as a project timeline, a multi-turn discussion about constraints, while a separate, lightweight structural layer handles indexing and retrieval. The result is a memory system that scales: it consolidates related information into stable units, surfaces fine-grained details when they matter, and lets the agent navigate its own history without re-reading everything. On standard long-conversation benchmarks, Memora sets new state-of-the-art performance while using up to 98% fewer tokens than would be consumed by dumping the full history into context.

Why this is hard: the abstraction–specificity tension

Existing memory systems fall into two extremes. Content-fragmentation systems, such as RAG and Mem0, embed extracted facts or text fragments directly. This preserves detail but produces brittle, isolated entries that lose narrative coherence. Coarse-abstraction systems compress experience into compact summaries. They are efficient, but summarization strips away the constraints, edge cases, and numeric details that make memory useful in the first place. Graph-based systems add structure on top of content, yet still rely on the content itself for retrieval and typically require rigid ontologies that don’t generalize across domains. None of these resolves the underlying tension between abstraction (which keeps memory efficient) and specificity (which gives memory utility).

... continue reading