{"id":131,"date":"2026-04-21T06:47:52","date_gmt":"2026-04-21T06:47:52","guid":{"rendered":"https:\/\/abrarqasim.com\/blog\/?p=131"},"modified":"2026-04-21T06:47:53","modified_gmt":"2026-04-21T06:47:53","slug":"i-stopped-claude-from-rambling-and-cut-my-output-tokens-by-75","status":"publish","type":"post","link":"https:\/\/abrarqasim.com\/blog\/i-stopped-claude-from-rambling-and-cut-my-output-tokens-by-75\/","title":{"rendered":"I Stopped Claude From Rambling and Cut My Output Tokens by 75%"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">I used to watch Claude Code stream paragraphs of <em>&#8220;Let me think about this carefully&#8230;&#8221;<\/em> before every answer. Polite. Thorough. Also expensive.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Then I found Caveman, a one-line Claude Code plugin that just makes it stop.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The problem nobody talks about<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Output tokens cost more than input tokens. They also take longer to stream, and they bury the actual answer under filler. On a busy coding day, most of what Claude writes isn&#8217;t code. It&#8217;s hedging. <em>&#8220;I think we should probably consider&#8230;&#8221;<\/em> in front of every real sentence.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What Caveman does<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Caveman is a Claude Code skill that also works with Codex, Gemini, Cursor, Windsurf, Cline, and Copilot. It makes the agent drop articles, filler, and pleasantries while leaving code, paths, and commands alone. The tagline is perfect:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\">why use many token when few token do trick.<\/p>\n<\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\">Same technical answer, fraction of the words. The author&#8217;s benchmark across 10 real coding tasks: 1,214 tokens down to 294 on average, a 65% drop. Some tasks hit 87%.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Install it in one line<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>claude plugin marketplace add JuliusBrussee\/caveman\nclaude plugin install caveman@caveman\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">That&#8217;s it. A SessionStart hook auto-activates the skill, and a <code>[CAVEMAN]<\/code> badge shows up in the statusline so you know it&#8217;s live.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Three intensities<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>\/caveman lite<\/code> drops filler but keeps grammar<\/li>\n\n\n\n<li><code>\/caveman full<\/code> is the default, dropping articles and hedging<\/li>\n\n\n\n<li><code>\/caveman ultra<\/code> goes fully telegraphic, heavily abbreviated<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Say <em>&#8220;stop caveman&#8221;<\/em> or <code>\/caveman off<\/code> to go back to normal.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Before and after<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Normal Claude:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\">&#8220;I think the issue here is that your React component is re-rendering unnecessarily because the dependency array includes an object reference that changes on every render. You should consider memoizing it with <code>useMemo<\/code>.&#8221;<\/p>\n<\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\">Caveman Claude:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\">&#8220;Object in deps changes every render. Wrap in <code>useMemo<\/code>. Fix re-render.&#8221;<\/p>\n<\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\">Same fix. Seven seconds of reading instead of twenty.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Bonus features worth knowing<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Three extras punch above their weight.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><code>\/caveman-commit<\/code> writes Conventional Commit subjects capped at 50 chars. <code>\/caveman-review<\/code> spits out PR comments like <code>L42: \ud83d\udd34 bug: user null. Add guard.<\/code>. And <code>\/caveman:compress CLAUDE.md<\/code> rewrites your context file in caveman-speak for roughly 46% input-token savings. It keeps a <code>CLAUDE.original.md<\/code> backup, so nothing is lost.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">One honest caveat<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Caveman trims output. It does not change how the model thinks. The author is upfront: <em>&#8220;Caveman no make brain smaller. Caveman make mouth smaller.&#8221;<\/em> If your bottleneck is model capability, this won&#8217;t help you. And on tricky debugging, ultra mode can feel too terse when you actually need to follow the reasoning. Worth knowing when to flip it off.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Takeaway<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Most prompt-engineering tricks are fiddly. This is a one-line install that saves tokens and reads faster. Install it, run <code>\/caveman<\/code>, and see how much you actually missed the filler.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Repo: <a href=\"https:\/\/github.com\/JuliusBrussee\/caveman\" target=\"_blank\" rel=\"noopener\">github.com\/JuliusBrussee\/caveman<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I used to watch Claude Code stream paragraphs of &#8220;Let me think about this carefully&#8230;&#8221; before every answer. Polite. Thorough. Also expensive. Then I found&hellip;<\/p>\n","protected":false},"author":2,"featured_media":132,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"rank_math_title":"Caveman Plugin: Cut Claude Code Output Tokens by 75% %sep% %sitename%","rank_math_description":"Caveman is a one-line Claude Code plugin that slashes output tokens up to 75% without losing code quality. See install, commands, and real benchmarks.","rank_math_focus_keyword":"","rank_math_canonical_url":"","rank_math_robots":"","footnotes":""},"categories":[4],"tags":[114,109,108,110,116,115,112,113,111],"class_list":["post-131","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","tag-ai-coding-tools","tag-caveman-plugin","tag-claude-code","tag-claude-plugin","tag-claude-tips","tag-developer-productivity-2","tag-llm-cost-reduction","tag-prompt-engineering-2","tag-token-optimization"],"_links":{"self":[{"href":"https:\/\/abrarqasim.com\/blog\/wp-json\/wp\/v2\/posts\/131","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/abrarqasim.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/abrarqasim.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/abrarqasim.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/abrarqasim.com\/blog\/wp-json\/wp\/v2\/comments?post=131"}],"version-history":[{"count":1,"href":"https:\/\/abrarqasim.com\/blog\/wp-json\/wp\/v2\/posts\/131\/revisions"}],"predecessor-version":[{"id":133,"href":"https:\/\/abrarqasim.com\/blog\/wp-json\/wp\/v2\/posts\/131\/revisions\/133"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/abrarqasim.com\/blog\/wp-json\/wp\/v2\/media\/132"}],"wp:attachment":[{"href":"https:\/\/abrarqasim.com\/blog\/wp-json\/wp\/v2\/media?parent=131"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/abrarqasim.com\/blog\/wp-json\/wp\/v2\/categories?post=131"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/abrarqasim.com\/blog\/wp-json\/wp\/v2\/tags?post=131"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}