π¦ LlamaFarm
Build powerful AI locally, extend anywhere.
LlamaFarm is an open-source framework for building retrieval-augmented and agentic AI applications. It ships with opinionated defaults (Ollama for local models, Chroma for vector storage) while staying 100% extendableβswap in vLLM, remote OpenAI-compatible hosts, new parsers, or custom stores without rewriting your app.
Local-first developer experience with a single CLI ( lf ) that manages projects, datasets, and chat sessions.
with a single CLI ( ) that manages projects, datasets, and chat sessions. Production-ready architecture that mirrors server endpoints and enforces schema-based configuration.
that mirrors server endpoints and enforces schema-based configuration. Composable RAG pipelines you can tailor through YAML, not bespoke code.
you can tailor through YAML, not bespoke code. Extendable everything: runtimes, embedders, databases, extractors, and CLI tooling.
πΊ Video demo (90 seconds): https://youtu.be/W7MHGyN0MdQ
π Quickstart (TL;DR)
Prerequisites:
Docker
Ollama (local runtime; additional options coming soon)
Install the CLI macOS / Linux curl -fsSL https://raw.githubusercontent.com/llama-farm/llamafarm/main/install.sh | bash Windows (via winget) winget install LlamaFarm.CLI Adjust Ollama context window Open the Ollama app, go to Settings β Advanced , and set the context window to match production (e.g., 100K tokens).
, and set the context window to match production (e.g., 100K tokens). Larger context windows improve RAG answers when long documents are ingested. Create and run a project lf init my-project # Generates llamafarm.yaml using the server template lf start # Spins up Docker services & opens the dev chat UI Start an interactive project chat or send a one-off message
# Interactive project chat (auto-detects namespace/project from llamafarm.yaml) lf chat # One-off message lf chat " Hello, LlamaFarm! "
Need the full walkthrough with dataset ingestion and troubleshooting tips? Jump to the Quickstart guide.
Prefer building from source? Clone the repo and follow the steps in Development & Testing.
Run services manually (without Docker auto-start):
git clone https://github.com/llama-farm/llamafarm.git cd llamafarm # Install Nx globally and bootstrap the workspace npm install -g nx nx init --useDotNxInstallation --interactive=false # Option 1: start both server and RAG worker with one command nx dev # Option 2: start services in separate terminals # Terminal 1 nx start rag # Terminal 2 nx start server
Open another terminal to run lf commands (installed or built from source). This is equivalent to what lf start orchestrates automatically.
π Why LlamaFarm
Own your stack β Run small local models today and swap to hosted vLLM, Together, or custom APIs tomorrow by changing llamafarm.yaml .
β Run small local models today and swap to hosted vLLM, Together, or custom APIs tomorrow by changing . Battle-tested RAG β Configure parsers, extractors, embedding strategies, and databases without touching orchestration code.
β Configure parsers, extractors, embedding strategies, and databases without touching orchestration code. Config over code β Every project is defined by YAML schemas that are validated at runtime and easy to version control.
β Every project is defined by YAML schemas that are validated at runtime and easy to version control. Friendly CLI β lf handles project bootstrapping, dataset lifecycle, RAG queries, and non-interactive chats.
β handles project bootstrapping, dataset lifecycle, RAG queries, and non-interactive chats. Built to extend β Add a new provider or vector store by registering a backend and regenerating schema types.
π§ Core CLI Workflows
Task Command Notes Initialize a project lf init my-project Creates llamafarm.yaml from server template. Start dev stack + chat TUI lf start Spins up server, rag worker, monitors Ollama/vLLM. Interactive project chat lf chat Opens TUI using project from llamafarm.yaml . Send single prompt lf chat "Explain retrieval augmented generation" Uses RAG by default; add --no-rag for pure LLM. Preview REST call lf chat --curl "What models are configured?" Prints sanitized curl command. Create dataset lf datasets create -s pdf_ingest -b main_db research-notes Validates strategy/database against project config. Upload files lf datasets upload research-notes ./docs/*.pdf Supports globs and directories. Process dataset lf datasets process research-notes Streams heartbeat dots during long processing. Semantic query lf rag query --database main_db "What did the 2024 FDA letters require?" Use --filter , --include-metadata , etc.
See the CLI reference for full command details and troubleshooting advice.
π REST API
LlamaFarm provides a comprehensive REST API (compatible with OpenAI's format) for integrating with your applications. The API runs at http://localhost:8000 .
Key Endpoints
Chat Completions (OpenAI-compatible)
curl -X POST http://localhost:8000/v1/projects/{namespace}/{project}/chat/completions \ -H " Content-Type: application/json " \ -d ' { "messages": [ {"role": "user", "content": "What are the FDA requirements?"} ], "stream": false, "rag_enabled": true, "database": "main_db" } '
RAG Query
curl -X POST http://localhost:8000/v1/projects/{namespace}/{project}/rag/query \ -H " Content-Type: application/json " \ -d ' { "query": "clinical trial requirements", "database": "main_db", "top_k": 5 } '
Dataset Management
# Upload file curl -X POST http://localhost:8000/v1/projects/{namespace}/{project}/datasets/{dataset}/data \ -F " [email protected] " # Process dataset curl -X POST http://localhost:8000/v1/projects/{namespace}/{project}/datasets/{dataset}/process
Finding Your Namespace and Project
Check your llamafarm.yaml :
name : my-project # Your project name namespace : my-org # Your namespace
Or inspect the file system: ~/.llamafarm/projects/{namespace}/{project}/
See the complete API Reference for all endpoints, request/response formats, Python/TypeScript clients, and examples.
ποΈ Configuration Snapshot
llamafarm.yaml is the source of truth for each project. The schema enforces required fields and documents every extension point.
version : v1 name : fda-assistant namespace : default runtime : provider : openai # "openai" for any OpenAI-compatible host, "ollama" for local Ollama model : qwen2.5:7b base_url : http://localhost:8000/v1 # Point to vLLM, Together, etc. api_key : sk-local-placeholder instructor_mode : tools # Optional: json, md_json, tools, etc. prompts : - role : system content : >- You are an FDA specialist. Answer using short paragraphs and cite document titles when available. rag : databases : - name : main_db type : ChromaStore default_embedding_strategy : default_embeddings default_retrieval_strategy : filtered_search embedding_strategies : - name : default_embeddings type : OllamaEmbedder config : model : nomic-embed-text:latest retrieval_strategies : - name : filtered_search type : MetadataFilteredStrategy config : top_k : 5 data_processing_strategies : - name : pdf_ingest parsers : - type : PDFParser_LlamaIndex config : chunk_size : 1500 chunk_overlap : 200 extractors : - type : HeadingExtractor - type : ContentStatisticsExtractor datasets : - name : research-notes data_processing_strategy : pdf_ingest database : main_db
Configuration reference: Configuration Guide β’ Extending LlamaFarm
π§© Extensibility Highlights
Swap runtimes by pointing to any OpenAI-compatible endpoint (vLLM, Mistral, Anyscale). Update runtime.provider , base_url , and api_key ; regenerate schema types if you add a new provider enum.
by pointing to any OpenAI-compatible endpoint (vLLM, Mistral, Anyscale). Update , , and ; regenerate schema types if you add a new provider enum. Bring your own vector store by implementing a store backend, adding it to rag/schema.yaml , and updating the server service registry.
by implementing a store backend, adding it to , and updating the server service registry. Add parsers/extractors to support new file formats or metadata pipelines. Register implementations and extend the schema definitions.
to support new file formats or metadata pipelines. Register implementations and extend the schema definitions. Extend the CLI with new Cobra commands under cli/cmd ; the docs include guidance on adding dataset utilities or project tooling.
Check the Extending guide for step-by-step instructions.
π Examples
Example What it Shows Location FDA Letters Assistant Multi-document PDF ingestion, RAG queries, reference-style prompts examples/fda_rag/ & Docs Raleigh UDO Planning Helper Large ordinance ingestion, long-running processing tips, geospatial queries examples/gov_rag/ & Docs
Run lf datasets and lf rag query commands from each example folder to reproduce the flows demonstrated in the docs.
π§ͺ Development & Testing
# Python server + RAG tests cd server uv sync uv run --group test python -m pytest # CLI tests cd ../cli go test ./... # RAG tooling smoke tests cd ../rag uv sync uv run python cli.py test # Docs build (ensures navigation/link integrity) cd .. nx build docs
Linting: uv run ruff check --fix . (Python), go fmt ./... and go vet ./... (Go).
π€ Community & Support
Discord β chat with the team, share feedback, find collaborators.
GitHub Issues β bug reports and feature requests.
Discussions β ideas, RFCs, roadmap proposals.
Contributing Guide β code style, testing expectations, doc updates, schema regeneration steps.
Want to add a new provider, parser, or example? Start a discussion or open a draft PRβwe love extensions!
π License & Acknowledgments
Licensed under the Apache 2.0 License.
Built by the LlamaFarm community and inspired by the broader open-source AI ecosystem. See CREDITS for detailed acknowledgments.
Build locally. Deploy anywhere. Own your AI.