I benchmarked Claude Code's caveman plugin against "be brief."

Caveman, a popular Claude Code compression plugin, vs. "be brief." 24 prompts, six categories, five arms. The two-word prompt matched it on tokens and quality.

Repo

Caveman is a popular Claude Code compression plugin. The pitch is in the name: ultra-compressed responses, ~75% fewer tokens, all the technical accuracy. Six modes, slash commands, intensity dials, classical Chinese variants.

I benchmarked it against two words: "be brief."

Same quality. Same range of tokens. The plugin didn't beat the boring default on either axis.

This article is the long version of the video. If you want the verdict in two minutes, watch it.

What I tested

Category Failure mode Skill claim tested n Bug diagnosis Drops the why, gives fix without cause — 5 Concept explanation Strips nuance, edge cases, or compresses technical terms into plain English Technical terms exact 5 Architectural tradeoffs Drops caveats that change the advice — 4 Multi-step setup Collapses or reorders steps — 4 Security / destructive ops Missing warnings on irreversible actions Auto-Clarity escape 3 Error interpretation Paraphrases or truncates the error string Errors quoted exact 3

24 prompts across six categories: bug diagnosis, concept explanations, architecture tradeoffs, multi-step setup, security and destructive ops, error interpretation. Each prompt has a per-prompt rubric. Facts the answer must cover ( key_points ), terms it must use ( must_use_terms ), and dangerous wrong claims to avoid ( must_avoid ).

The dataset shape:

... continue reading