New framework simplifies the complex landscape of agentic AI

With the ecosystem of agentic tools and frameworks exploding in size, navigating the many options for building AI systems is becoming increasingly difficult, leaving developers confused and paralyzed when choosing the right tools and models for their applications.In a new study, researchers from multiple institutions present a comprehensive framework to untangle this complex web. They categorize agentic frameworks based on their area of focus and tradeoffs, providing a practical guide for developers to choose the right tools and strategies for their applications.For enterprise teams, this reframes agentic AI from a model-selection problem into an architectural decision about where to spend training budget, how much modularity to preserve, and what tradeoffs they’re willing to make between cost, flexibility, and risk.Agent vs. tool adaptationThe researchers divide the landscape into two primary dimensions: agent adaptation and tool adaptation.Agent adaptation involves modifying the foundation model that underlies the agentic system. This is done by updating the agent’s internal parameters or policies through methods like fine-tuning or reinforcement learning to better align with specific tasks.Tool adaptation, on the other hand, shifts the focus to the environment surrounding the agent. Instead of retraining the large, expensive foundation model, developers optimize the external tools such as search retrievers, memory modules, or sub-agents. In this strategy, the main agent remains "frozen" (unchanged). This approach allows the system to evolve without the massive computational cost of retraining the core model.The study further breaks these down into four distinct strategies:A1: Tool execution signaled: In this strategy, the agent learns by doing. It is optimized using verifiable feedback directly from a tool's execution, such as a code compiler interacting with a script or a database returning search results. This teaches the agent the "mechanics" of using a tool correctly.A prime example is DeepSeek-R1, where the model was trained through reinforcement learning with verifiable rewards to generate code that successfully executes in a sandbox. The feedback signal is binary and objective (did the code run, or did it crash?). This method builds strong low-level competence in stable, verifiable domains like coding or SQL.A2: Agent output Signaled: Here, the agent is optimized based on the quality of its final answer, regardless of the intermediate steps and number of tool calls it makes. This teaches the agent how to orchestrate various tools to reach a correct conclusion.An example is Search-R1, an agent that performs multi-step retrieval to answer questions. The model receives a reward only if the final answer is correct, implicitly forcing it to learn better search and reasoning strategies to maximize that reward. A2 is ideal for system-level orchestration, enabling agents to handle complex workflows.T1: Agent-agnostic: In this category, tools are trained independently on broad data and then "plugged in" to a frozen agent. Think of classic dense retrievers used in RAG systems. A standard retriever model is trained on generic search data. A powerful frozen LLM can use this retriever to find information, even though the retriever wasn't designed specifically for that LLM.T2: Agent-supervised: This strategy involves training tools specifically to serve a frozen agent. The supervision signal comes from the agent’s own output, creating a symbiotic relationship where the tool learns to provide exactly what the agent needs.For example, the s3 framework trains a small "searcher" model to retrieve documents. This small model is rewarded based on whether a frozen "reasoner" (a large LLM) can answer the question correctly using those documents. The tool effectively adapts to fill the specific knowledge gaps of the main agent.Complex AI systems might use a combination of these adaptation paradigms. For example, a deep research system might employ T1-style retrieval tools (pre-trained dense retrievers), T2-style adaptive search agents (trained via frozen LLM feedback), and A1-style reasoning agents (fine-tuned with execution feedback) in a broader orchestrated system.The hidden costs and tradeoffsFor enterprise decision-makers, choosing between these strategies often comes down to three factors: cost, generalization, and modularity.Cost vs. flexibility: Agent adaptation (A1/A2) offers maximum flexibility because you are rewiring the agent's brain. However, the costs are steep. For instance, Search-R1 (an A2 system) required training on 170,000 examples to internalize search capabilities. This requires massive compute and specialized datasets. On the other hand, the models can be much more efficient at inference time because they are much smaller than generalist models.In contrast, Tool adaptation (T1/T2) is far more efficient. The s3 system (T2) trained a lightweight searcher using only 2,400 examples (roughly 70 times less data than Search-R1) while achieving comparable performance. By optimizing the ecosystem rather than the agent, enterprises can achieve high performance at a lower cost. However, this comes with an overhead cost inference time since s3 requires coordination with a larger model.Generalization: A1 and A2 methods risk "overfitting," where an agent becomes so specialized in one task that it loses general capabilities. The study found that while Search-R1 excelled at its training tasks, it struggled with specialized medical QA, achieving only 71.8% accuracy. This is not a problem when your agent is designed to perform a very specific set of tasks. Conversely, the s3 system (T2), which used a general-purpose frozen agent assisted by a trained tool, generalized better, achieving 76.6% accuracy on the same medical tasks. The frozen agent retained its broad world knowledge, while the tool handled the specific retrieval mechanics. However, T1/T2 systems rely on the knowledge of the frozen agent, and if the underlying model can’t handle the specific task, they will be useless. Modularity: T1/T2 strategies enable "hot-swapping." You can upgrade a memory module or a searcher without touching the core reasoning engine. For example, Memento optimizes a memory module to retrieve past cases; if requirements change, you update the module, not the planner.A1 and A2 systems are monolithic. Teaching an agent a new skill (like coding) via fine-tuning can cause "catastrophic forgetting," where it degrades on previously learned skills (like math) because its internal weights are overwritten.A strategic framework for enterprise adoptionBased on the study, developers should view these strategies as a progressive ladder, moving from low-risk, modular solutions to high-resource customization.Start with T1 (agent-agnostic tools): Equip a frozen, powerful model (like Gemini or Claude) with off-the-shelf tools such as a dense retriever or an MCP connector. This requires zero training and is perfect for prototyping and general applications. It is the low-hanging fruit that can take you very far for most tasks.Move to T2 (agent-supervised tools): If the agent struggles to use generic tools, don't retrain the main model. Instead, train a small, specialized sub-agent (like a searcher or memory manager) to filter and format data exactly how the main agent likes it. This is highly data-efficient and suitable for proprietary enterprise data and applications that are high-volume and cost-sensitive.Use A1 (tool execution signaled) for specialization: If the agent fundamentally fails at technical tasks (e.g., writing non-functional code or wrong API calls) you must rewire its understanding of the tool's "mechanics." A1 is best for creating specialists in verifiable domains like SQL or Python or your proprietary tools. For example, you can optimize a small model for your specific toolset and then use it as a T1 plugin for a generalist model.Reserve A2 (agent output signaled) as the "nuclear option": Only train a monolithic agent end-to-end if you need it to internalize complex strategy and self-correction. This is resource-intensive and rarely necessary for standard enterprise applications. In reality, you rarely need to get involved in training your own model.As the AI landscape matures, the focus is shifting from building one giant, perfect model to constructing a smart ecosystem of specialized tools around a stable core. For most enterprises, the most effective path to agentic AI isn't building a bigger brain but giving the brain better tools.