Skip to content
Tech News
← Back to articles

Apideck CLI – An AI-agent interface with much lower context consumption than MCP

read original get AI Assistant Developer Kit → more articles
Why This Matters

The article highlights the significant token consumption challenges faced by MCP-based AI agents when integrating multiple tools and services, which limits their reasoning capacity and scalability. Apideck's CLI offers a more efficient alternative, drastically reducing context usage and enabling more practical, scalable AI integrations for the industry and consumers alike.

Key Takeaways

The problem nobody talks about at demo scale

Here's a scenario that'll feel familiar if you've wired up MCP servers for anything beyond a demo.

You connect GitHub, Slack, and Sentry. Three services, maybe 40 tools total. Before your agent has read a single user message, 55,000 tokens of tool definitions are sitting in the context window. That's over a quarter of Claude's 200k limit. Gone.

It gets worse. Each MCP tool costs 550–1,400 tokens for its name, description, JSON schema, field descriptions, enums, and system instructions. Connect a real API surface, say a SaaS platform with 50+ endpoints, and you're looking at 50,000+ tokens just to describe what the agent could do, with almost nothing left for what it should do.

One team reported three MCP servers consuming 143,000 of 200,000 tokens. That's 72% of the context window burned on tool definitions. The agent had 57,000 tokens left for the actual conversation, retrieved documents, reasoning, and response. Good luck building anything useful in that space.

This isn't a theoretical concern. David Zhang (@dzhng), building Duet, described ripping out their MCP integrations entirely, even after getting OAuth and dynamic client registration working. The tradeoff was impossible:

Load everything up front → lose working memory for reasoning and history

→ lose working memory for reasoning and history Limit integrations → agent can only talk to a few services

→ agent can only talk to a few services Build dynamic tool loading → add latency and middleware complexity

He called it a "trilemma." That feels about right.

... continue reading