Skip to content
Tech News
← Back to articles

I benchmarked Claude Code's caveman plugin against "be brief."

read original get Claude Code Plugin → more articles
Why This Matters

This benchmark reveals that Caveman, a popular Claude Code compression plugin, performs similarly to the default prompt in terms of token efficiency and response quality, highlighting its limited advantage for technical tasks. For developers and consumers, understanding these tools' capabilities helps in choosing the most effective solutions for code-related AI assistance.

Key Takeaways

Caveman, a popular Claude Code compression plugin, vs. "be brief." 24 prompts, six categories, five arms. The two-word prompt matched it on tokens and quality.

Repo

Caveman is a popular Claude Code compression plugin. The pitch is in the name: ultra-compressed responses, ~75% fewer tokens, all the technical accuracy. Six modes, slash commands, intensity dials, classical Chinese variants.

I benchmarked it against two words: "be brief."

Same quality. Same range of tokens. The plugin didn't beat the boring default on either axis.

This article is the long version of the video. If you want the verdict in two minutes, watch it.

What I tested

Category Failure mode Skill claim tested n Bug diagnosis Drops the why, gives fix without cause — 5 Concept explanation Strips nuance, edge cases, or compresses technical terms into plain English Technical terms exact 5 Architectural tradeoffs Drops caveats that change the advice — 4 Multi-step setup Collapses or reorders steps — 4 Security / destructive ops Missing warnings on irreversible actions Auto-Clarity escape 3 Error interpretation Paraphrases or truncates the error string Errors quoted exact 3

24 prompts across six categories: bug diagnosis, concept explanations, architecture tradeoffs, multi-step setup, security and destructive ops, error interpretation. Each prompt has a per-prompt rubric. Facts the answer must cover ( key_points ), terms it must use ( must_use_terms ), and dangerous wrong claims to avoid ( must_avoid ).

The dataset shape:

... continue reading