frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

I benchmarked Claude Code's caveman plugin against "be brief."

https://www.maxtaylor.me/articles/i-benchmarked-caveman-against-two-words
35•max-t-dev•3h ago

Comments

max-t-dev•3h ago
Author here. Caveman is a popular Claude Code plugin that compresses Claude's responses via a custom skill with intensity modes. I wanted to know whether it actually beats the simplest possible alternative, prepending "be brief." to prompts. 24 prompts, 5 arms, judged by a separate Claude against per-prompt rubrics covering required facts, required terms, and dangerous wrong claims to avoid. 120 scored responses, 100% key-point coverage across every arm, zero must_avoid triggers. Headline: "be brief." matched caveman on tokens (419 vs 401-449) and quality (0.985 vs 0.970-0.976). Caveman has real value beyond compression. Consistent output structure, intensity modes, the Auto-Clarity safety escape. But the compression itself isn't the differentiator I expected. Harness is open source and strategy-agnostic if anyone wants to add an arm: https://github.com/max-taylor/cc-compression-bench Happy to answer questions about methodology, the per-category variance findings, or the bits I cut from the writeup.
dataviz1000•59m ago
> there was 1 run per prompt per arm

My understanding is that there was only 1 run per configuration?

If that is correct, because of the run-to-run variability, it really doesn't say much. It will take several trails per prompt per arm before it will look like it is stabilizing on a plot. It is prohibitively expensive so I've been running same prompt, same model 5 times in order to get a visual understanding of performance.

Someone did the same with lambda calculus yesterday. I wanted to make the point about how much run-to-run variability and difference in cost with the same prompt with the same model running only 5 trials. I classified each of the thinking steps using Opus 4.6 (costs ~$4 in tokens per run just for that) and plotted them with custom flame graphs. [0]

When the run-to-run variability is between 8,163 and 17,334 tokens none of these tests mean that much.

[0] https://adamsohn.com/lambda-variance/

ricardobeat•56m ago
Thanks for sharing this, really interesting results.

Slightly off-topic: it's quite apparent that you've used Claude as an editor for the blog post. Every sentence has been sanded smooth — the rough edges filed off, the voice flattened, the rhythm set to metronome. It doesn't read like writing anymore. It reads like content. Neat little triplets. Tidy paragraphs. A structure so polished it could pass a rubric, but couldn't hold a conversation. /s

In my opinion that is unnecessary and detracts from a great, simple piece. I miss human writing.

max-t-dev•50m ago
Yeah definitely a good point, Claude assisted with editing and tidying up the content with the caveat that it can flatten the voice. I agree the humanity behind writing is disappearing and perhaps that's something I should consider in more detail next time. Thanks for the comment.
SwellJoe•27m ago
Also extremely verbose, in standard LLM slop style. Should have told Claude to "be brief" when telling it to write this post.
adamsmark•4m ago
Write caveman summary too. Fast read.
lofaszvanitt•1h ago
Caveman is useless for me. We are in the year 2026, computers are here to serve me, and bring me comfort. Caveman is a caveman, speaks like an idiot. I don't want to interact with an idiot. It's irritating, and as the article states, an overhyped turd.

It is the same idiocy that permeates EV cars. You buy an expensive car to go from A to B and at the same time offer you comfort. When I have to think about using the seat heating or not, I'm out of my comfort zone. So no, fuck caveman, and I don't fucking care about the burned tokens.

Be brief. It's easy, no setup needed, not another mindless mumbojumbo extension and its 325 dependencies.

eulgro•59m ago
I enabled it and I had to read carefully to check if it was really active... turns out I never read the words that caveman omits, so to me it makes zero difference.
max-t-dev•47m ago
Yeah, makes sense. The appeal is is more to cut output tokens for cost, than downstream reading experience. But the benchmark suggests it doesn't offer as much benefit as "be brief.".
loloquwowndueo•55m ago
> I don't want to interact with an idiot.

Then why are you using AI?

Not a big difference between an articulate idiot and a succinct one.

lofaszvanitt•48m ago
Have to test its limits.... to cut through the bs. otherwise you'd have to read whitepapers...
adamsmark•8m ago
But you can turn off brain. Try make self idiot. Save brain energy for important. Smarty speaks in idiot. When smarty speak like that is consistent. Idiot understand fast.

It would have been hilarious if the author spoke like a caveman in his video or had a section in that article where he explained his conclusions like a caveman.

rideontime•6m ago
Was this actually easier to write than just writing what comes naturally?
adamsmark•2m ago
Heck no.
kingstnap•8m ago
Of the things you could complain about in modern cars as being too complicated, you chose turning on seat heating???

Like you push the seat heating button if your seat feels cold. What is there to think about?

ramesh31•38m ago
Caveman sounds clever if you have no idea how LLM reasoning works. Talking through a problem out loud, in depth, is a critical part of how things like Claude Code even get to a result. Those aren't "wasted tokens", they're an integral part of how the LLM reaches a conclusion and completes its chain of reasoning.
max-t-dev•8m ago
Caveman doesn't compress the reasoning, only the output. The model still does its full reasoning before generating the response, caveman just affects how the final response is formatted.

Zed 1.0

https://zed.dev/blog/zed-1-0
1490•salkahfi•10h ago•486 comments

Copy Fail – CVE-2026-31431

https://copy.fail/
527•unsnap_biceps•6h ago•250 comments

> Be Alexandra Elbakyan

https://nitter.space/MushtaqBilalPhD/status/2049057344013881523#m
69•DanielleMolloy•2h ago•6 comments

OpenTrafficMap

https://opentrafficmap.org/
138•moooo99•5h ago•32 comments

HERMES.md in commit messages causes requests to route to extra usage billing

https://github.com/anthropics/claude-code/issues/53262
971•homebrewer•6h ago•408 comments

Cursor Camp

https://neal.fun/cursor-camp/
569•bpierre•9h ago•101 comments

FastCGI: 30 years old and still the better protocol for reverse proxies

https://www.agwa.name/blog/post/fastcgi_is_the_better_protocol_for_reverse_proxies
239•agwa•8h ago•60 comments

DRAM Crunch: Lessons for System Design

https://www.eetimes.com/what-the-dram-crunch-teaches-us-about-system-design/
28•giuliomagnifico•1d ago•1 comments

Why I still reach for Lisp and Scheme instead of Haskell

https://jointhefreeworld.org/blog/articles/lisps/why-i-still-reach-for-scheme-instead-of-haskell/...
158•jjba23•16h ago•50 comments

Vera: a programming language designed for machines to write

https://github.com/aallan/vera
36•unignorant•3h ago•17 comments

Laws of UX

https://lawsofux.com/
174•bobbiechen•7h ago•30 comments

Gooseworks (YC W23) Is Hiring a Founding Growth Engineer

https://www.ycombinator.com/companies/gooseworks/jobs/ztgY6bD-founding-growth-engineer
1•shivsak•3h ago

An open-source stethoscope that costs between $2.5 and $5 to produce

https://github.com/GliaX/Stethoscope
186•0x54MUR41•10h ago•75 comments

Ramp's Sheets AI Exfiltrates Financials

https://www.promptarmor.com/resources/ramps-sheets-ai-exfiltrates-financials
101•takira•7h ago•34 comments

I benchmarked Claude Code's caveman plugin against "be brief."

https://www.maxtaylor.me/articles/i-benchmarked-caveman-against-two-words
36•max-t-dev•3h ago•17 comments

What can we gain by losing infinity?

https://www.quantamagazine.org/what-can-we-gain-by-losing-infinity-20260429/
9•Tomte•9h ago•3 comments

Kyoto cherry blossoms now bloom earlier than at any point in 1,200 years

https://jivx.com/kyoto-bloom
224•momentmaker•5h ago•61 comments

Postgres's lateral joins allow for quite the good eDSL

https://bensimms.moe/postgres-lateral-makes-quite-a-good-dsl/
50•nitros•2d ago•6 comments

We need a federation of forges

https://blog.tangled.org/federation/
515•icy•10h ago•326 comments

Online age verification is the hill to die on

https://x.com/GlennMeder/status/2049088498163216560
729•Cider9986•9h ago•451 comments

How to Build the Future: Demis Hassabis [video]

https://www.youtube.com/watch?v=JNyuX1zoOgU
81•sandslash•10h ago•40 comments

The Lingua Franca of LaTeX

https://increment.com/open-source/the-lingua-franca-of-latex/
8•ripe•1d ago•1 comments

Ghostty is leaving GitHub

https://mitchellh.com/writing/ghostty-leaving-github
3340•WadeGrimridge•1d ago•992 comments

Virtualisation on Apple Silicon Macs is different

https://eclecticlight.co/2026/04/29/virtualisation-on-apple-silicon-macs-is-different/
68•zdw•8h ago•17 comments

Maryland becomes first state to ban surveillance pricing in grocery stores

https://www.theguardian.com/technology/2026/apr/29/maryland-grocery-stores-ban-surveillance-pricing
242•01-_-•8h ago•169 comments

GitHub – DOS 1.0: Transcription of Tim Paterson's DOS Printouts

https://github.com/DOS-History/Paterson-Listings
122•s2l•13h ago•6 comments

At Protocol: Building the Social Internet

https://atproto.com/
61•resiros•8h ago•34 comments

Mistral Medium 3.5

https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5
423•meetpateltech•9h ago•196 comments

Soft launch of open-source code platform for government

https://www.nldigitalgovernment.nl/news/soft-launch-for-government-open-source-code-platform/
523•e12e•15h ago•119 comments

Letting AI play my game – building an agentic test harness to help play-testing

https://blog.jeffschomay.com/letting-ai-play-my-game
118•jschomay•12h ago•27 comments