Skip to content
Tech News
← Back to articles

Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks

read original get AI Safety Guardrails Kit → more articles
Why This Matters

Forge significantly enhances the reliability and performance of self-hosted LLMs by implementing guardrails and context management, pushing an 8B model's accuracy on agentic tasks from 53% to 99%. This advancement enables more robust, scalable, and customizable AI workflows for developers and organizations seeking to optimize local language models. Its flexible deployment options and middleware support make it a valuable tool for improving AI reliability across various applications.

Key Takeaways

forge

A reliability layer for self-hosted LLM tool-calling. Forge lifts an 8B local model to the top of its class on multi-step agentic workflows through guardrails (rescue parsing, retry nudges, step enforcement) and context management (VRAM-aware budgets, tiered compaction). The current top self-hosted config (Ministral-3 8B Instruct Q8 on llama-server) scores 86.5% across forge's 26-scenario eval suite — and 76% on the hardest tier.

Three ways to use it:

WorkflowRunner — Define tools, pick a backend, run structured agent loops. Forge manages the full lifecycle: system prompts, tool execution, context compaction, and guardrails. SlotWorker adds priority-queued access to a shared inference slot with auto-preemption — for multi-agent architectures where specialist workflows share a GPU slot. Best when you're building on forge directly.

Guardrails middleware — Use forge's reliability stack (composable middleware) inside your own orchestration loop. You control the loop; forge validates responses, rescues malformed tool calls, and enforces required steps.

Proxy server — Drop-in OpenAI-compatible proxy ( python -m forge.proxy ) that sits between any client (opencode, Continue, aider, etc.) and a local model server. Applies guardrails transparently — the client thinks it's talking to a smarter model.

Supports Ollama, llama-server (llama.cpp), Llamafile, and Anthropic as backends.

Requirements

Python 3.12+

A running LLM backend (see below)

... continue reading