mnemo
Local-first AI memory layer for any LLM. Persistent knowledge graph, entity extraction, semantic retrieval — no cloud required.
What is mnemo?
Most LLMs forget everything the moment a conversation ends. mnemo fixes that.
mnemo is a sidecar service that watches every conversation you feed it, extracts named entities and relationships using an LLM, builds a persistent knowledge graph in SQLite, and injects relevant context back into future prompts — automatically, in under 50ms. It works with Ollama (fully local, free), OpenAI, Anthropic, or any OpenAI-compatible API. It ships as a single static binary with zero cloud dependency.
How it works
your app │ ▼ POST /ingest ──► entity extraction (LLM) ──► knowledge graph (SQLite + petgraph) │ POST /retrieve ◄── scoring + ranking ◄── graph traversal + full-text search │ ▼ context_prompt ──► inject into your LLM prompt
You POST raw text to /ingest (a conversation turn, a document, a note). mnemo sends it to your configured LLM and extracts entities (people, tools, places, concepts) and the relationships between them. Entities are deduplicated by name+type, aliases are merged, and everything is written to SQLite. The in-memory petgraph is updated atomically. On POST /retrieve , mnemo runs a 6-stage pipeline: full-text chunk search → entity name search → graph expansion (BFS over the knowledge graph) → relation filter → score+rank → assemble a context_prompt string. You inject context_prompt into your LLM's system prompt. Done.
Quickstart
Path A — Docker + Ollama (fully free, recommended)
... continue reading