Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now
Researchers at Katanemo Labs have introduced Arch-Router, a new routing model and framework designed to intelligently map user queries to the most suitable large language model (LLM).
For enterprises building products that rely on multiple LLMs, Arch-Router aims to solve a key challenge: how to direct queries to the best model for the job without relying on rigid logic or costly retraining every time something changes.
The challenges of LLM routing
As the number of LLMs grows, developers are moving from single-model setups to multi-model systems that use the unique strengths of each model for specific tasks (e.g., code generation, text summarization, or image editing).
LLM routing has emerged as a key technique for building and deploying these systems, acting as a traffic controller that directs each user query to the most appropriate model.
Existing routing methods generally fall into two categories: “task-based routing,” where queries are routed based on predefined tasks, and “performance-based routing,” which seeks an optimal balance between cost and performance.
However, task-based routing struggles with unclear or shifting user intentions, particularly in multi-turn conversations. Performance-based routing, on the other hand, rigidly prioritizes benchmark scores, often neglects real-world user preferences and adapts poorly to new models unless it undergoes costly fine-tuning.
More fundamentally, as the Katanemo Labs researchers note in their paper, “existing routing approaches have limitations in real-world use. They typically optimize for benchmark performance while neglecting human preferences driven by subjective evaluation criteria.”
The researchers highlight the need for routing systems that “align with subjective human preferences, offer more transparency, and remain easily adaptable as models and use cases evolve.”
... continue reading