Skip to content
Tech News
← Back to articles

MCP Server Is Eating Your Context Window. There's a Simpler Way

read original get ChatGPT Plus Subscription → more articles
Why This Matters

This article highlights a critical challenge in scaling MCP (Multi-Channel Processing) servers for AI agents: the token limitations of large language models like Claude. Excessive tool definitions and integrations consume a significant portion of the context window, limiting the agent's ability to perform meaningful reasoning and conversation, which impacts both the efficiency and effectiveness of AI applications in the industry.

Key Takeaways

The problem nobody talks about at demo scale

Here's a scenario that'll feel familiar if you've wired up MCP servers for anything beyond a demo.

You connect GitHub, Slack, and Sentry. Three services, maybe 40 tools total. Before your agent has read a single user message, 55,000 tokens of tool definitions are sitting in the context window. That's over a quarter of Claude's 200k limit. Gone.

It gets worse. Each MCP tool costs 550–1,400 tokens for its name, description, JSON schema, field descriptions, enums, and system instructions. Connect a real API surface, say a SaaS platform with 50+ endpoints, and you're looking at 50,000+ tokens just to describe what the agent could do, with almost nothing left for what it should do.

One team reported three MCP servers consuming 143,000 of 200,000 tokens. That's 72% of the context window burned on tool definitions. The agent had 57,000 tokens left for the actual conversation, retrieved documents, reasoning, and response. Good luck building anything useful in that space.

This isn't a theoretical concern. David Zhang (@dzhng), building Duet, described ripping out their MCP integrations entirely, even after getting OAuth and dynamic client registration working. The tradeoff was impossible:

Load everything up front → lose working memory for reasoning and history

→ lose working memory for reasoning and history Limit integrations → agent can only talk to a few services

→ agent can only talk to a few services Build dynamic tool loading → add latency and middleware complexity

He called it a "trilemma." That feels about right.

... continue reading