Optimizing Tool Selection for LLM Workflows with Differentiable Programming
Modern agentic architectures rely heavily on chaining LLM calls. A typical pattern looks like: Use an LLM to decide which tool to invoke Call the tool (e.g. search, calculator, API) Use another LLM call to interpret the result and generate a final response This structure is easy to reason about, simple to prototype, and generalizes well. But it scales poorly. Each LLM call incurs latency, cost, and token overhead. More subtly, it compounds context: every step includes not only the original q