A traditional RAG system often looks like the search we’ve built for years. An LLM calls a search API. We have query understanding and reranking functionality under the hood. All optimized for user engagement.
This thick-daddy search API does quite a bit, as shown below:
With small modifications (chunking, a focus on vector retrieval) the RAG implementations look familiar.
Agents, however, come with the ability to reason. And like human users, they might try a query, see the results, adjust their query, and zero-in on the best search terms to find what they want.
Like human users, the agent can build a “mental model” of how the search tool works. If there’s a clear, predictable correspondence between stimulus and output, the agent will figure it out. However, a large, complex system might not be as transparent.
Because of this, I’d argue the traditional, thick search APIs are counterproductive to being used by agents. They may be too complex for agents to reason about effectively.
So lately, I’ve tried a rather stupid search backend. More like this image:
In the furniture search examples I use in my Cheat at Search course, I tried this out.
Instead of a big, complex search system, I tell the agent about this dumb thing it can use to search. It then applies its own intelligence to use the tool correctly.
def search_products ( query : str , top_k : int = 5 ) -> List [ Dict ]: """ Search for furniture products. This is direct keyword search, no synonyms, only BM25 scoring on product name and description and basic snowball tokenization of query and document. Args: query: The search query string. top_k: The number of top results to return. Returns: A list of dictionaries containing product information. """ ...
... continue reading