Optimizing Tool Selection for LLM Workflows with Differentiable Programming

Modern agentic architectures rely heavily on chaining LLM calls. A typical pattern looks like:

Use an LLM to decide which tool to invoke Call the tool (e.g. search, calculator, API) Use another LLM call to interpret the result and generate a final response

This structure is easy to reason about, simple to prototype, and generalizes well.

But it scales poorly.

Each LLM call incurs latency, cost, and token overhead. More subtly, it compounds context: every step includes not only the original query, but intermediate outputs and scratchpad logic from earlier prompts. This creates a growing burden on both inference and model performance.

The consequence is that most agent stacks are paying GPT-4 to do what amounts to classical control flow — tool selection — with no reuse, no abstraction, and no efficiency gains at scale.

The Alternative: Differentiable Routing

Instead of using an LLM to route between tools, we can model the decision as a trainable function. A differentiable controller learns tool selection from data — typically via reinforcement or supervised fine-tuning — and runs entirely outside the LLM.

The benefits are architectural:

Local execution — avoids external API calls

... continue reading