The problem nobody talks about at demo scale
Here's a scenario that'll feel familiar if you've wired up MCP servers for anything beyond a demo.
You connect GitHub, Slack, and Sentry. Three services, maybe 40 tools total. Before your agent has read a single user message, 55,000 tokens of tool definitions are sitting in the context window. That's over a quarter of Claude's 200k limit. Gone.
It gets worse. Each MCP tool costs 550–1,400 tokens for its name, description, JSON schema, field descriptions, enums, and system instructions. Connect a real API surface, say a SaaS platform with 50+ endpoints, and you're looking at 50,000+ tokens just to describe what the agent could do, with almost nothing left for what it should do.
One team reported three MCP servers consuming 143,000 of 200,000 tokens. That's 72% of the context window burned on tool definitions. The agent had 57,000 tokens left for the actual conversation, retrieved documents, reasoning, and response. Good luck building anything useful in that space.
This isn't a theoretical concern. David Zhang (@dzhng), building Duet, described ripping out their MCP integrations entirely, even after getting OAuth and dynamic client registration working. The tradeoff was impossible:
Load everything up front → lose working memory for reasoning and history
→ lose working memory for reasoning and history Limit integrations → agent can only talk to a few services
→ agent can only talk to a few services Build dynamic tool loading → add latency and middleware complexity
He called it a "trilemma." That feels about right.
... continue reading