As we all learn more about Context Engineering for LLMs (see Anthropic’s post for an excellent primer), we’ve identified a few important limitations. Conversations should be append-only to maximize cacheability. Models are typically more responsive to “fresh” context close to the end of the window. Models typically perform worse when overwhelmed with large amounts of context.
With this in mind, a key tension comes into focus: the model needs access to all valuable context, BUT ONLY when that context is relevant to the task at hand.
Context engineering is effectively the practice of finding ways to manage this tension. Popular solutions include:
Retrieval Augmented Generation (RAG) , which attempts to dynamically discover and load specific relevant context for the current query proactively.
, which attempts to dynamically discover and load specific relevant context for the current query proactively. Subagents , which encapsulate specialized instructions and tools to avoid polluting the main thread.
, which encapsulate specialized instructions and tools to avoid polluting the main thread. get_* Tools, which allow the model to proactively request information that it deems relevant using tool calls.
There’s one technique that I feel is woefully underutilized by agents today: the humble hyperlink.
The obligatory human analogy
If you, a human, need to learn something without an LLM (let’s say something about an open source library), you will probably follow a trajectory that looks something like the following:
Do a Google search for the topic you need to understand
... continue reading