Show HN: I nerfed our coding agents on purpose

20•noahfradin•2h ago

Tl;dr: I trained a classifier to route to the least expensive model and reasoning depth to complete the request. Coupling that with additional automated token efficiency techniques has yielded 3x usage for the same spend. For anyone interested in trying it themselves: https://nerfguard.com

Various teammates and I switched over to Codex from Claude Code recently. We still bounce between the tools, but Codex’s speed and steerability coupled with performance gains were hard to ignore. One of the downsides was that the per token pricing kicked in way sooner. This is happening across the board, but we felt it in Codex more acutely. We’re a startup filled with people who work around the clock and are obsessed with building — naturally our daily bill alone was striking.

Luckily we’re going after a big mission and speed matters significantly more than marginal token spend on the edges. Still, it got us thinking about how it was ludicrous that while our product has a side effect of decreasing token spend and speeding up agentic workflows by many orders of magnitude, we were using these top tier models for all types of internal coding tasks without any of those optimizations. The waste felt pretty ridiculous — the most glaring culprit was that we were seemingly using the max intelligence model on max reasoning for every task even when the task clearly didn’t require it. As a company who spends a lot of time on cached intelligence, it was also easy for us to see how there was plenty of other low hanging fruit as well.

So, on a recent weekend, I quickly built a tool to optimize our usage. At its core is a very fast classifier that classifies your requests to the least intelligence required for the task and includes some nice token optimizations on top. The result is roughly the same quality for multiples lower token spend. But even more exciting for us, is that the properly bin packed intelligence and reasoning levels meant our speed also went up considerably. This wasn’t negligible.

We’ve observed up to 3x savings and hours per day per person in saved time that we would have otherwise been waiting on tool turns and coding agent responses.

For us, that means improved engineering velocity and significantly higher usage for the same spend. It also means more usage before getting throttled.

As I told friends about this, they also wanted to start using it to maximize the usage they could get out of their coding agent plans. There are now engineers across many of the most cutting edge AI companies using this tool to optimize their token utilization in this way. Not just to save money, but to maximize output. Turns out that the best way to avoid getting nerfed by Claude is to intentionally nerf yourself selectively. We decided to release it for the rest of the builder community to use as well. You can now turn on Nerfguard for yourself and start getting more usage today.

Comments

andrewlau624•2h ago

compelling. i've seen context compression and caching tools before, but combining spend optimization with model routing and throughput gains is a smart angle.

snookie139•2h ago

Nice! Always thought something like this should exist. Will definitely try it out!

FLFSandy•2h ago

Wow, we are really struggling with our token costs. I'll def be sharing it with our team!

woodedpisces•2h ago

how much do your tokens actually cost? for me, it's no more than a few thousand so I don't really see the need for this.

gnabgib•1h ago

What's a few thousand kilos of gold, between friends?

kburman•2h ago

All new accounts created within few min. Nothing to see here.

jonappleseed22•1h ago

Show HN: ABC Classic 100 Rankings visualised

Show HN: Lowfat – pluggable CLI filter that saved 91.8% of my LLM tokens

Show HN: Documenting an Obscure Japanese Wii Game – and-Kensaku

Show HN: Omni – Local-first multimodal file search on macOS

Show HN: I nerfed our coding agents on purpose

Show HN: On-device transcriber that's 97% accurate at identifying speakers

Show HN: OWASP VulnerableApp Modern Extensible and Scalable vulnerable app

Show HN: I rebuilt a tiny old volleyball game I loved

Show HN: Mercek – A Desktop IDE for AWS ECS

Show HN: Bash Runtime for AWS Lambda

Show HN: Prela – Purely Algebraic Relation Combinators

Show HN: Local-first fast CPU image to text for screenshots, PDFs, webpages

Show HN: Edsger – A handwritten Clojure REPL for the reMarkable 2

Show HN: Uruky (EU-based Kagi alternative) now has Image Search and URL Rewrites

Show HN: I reverse-engineered the world maps of Test Drive III (1990 DOS game)

Show HN: Altersend – File sharing without cloud

Show HN: A Simplistic UI for Rich Hickey's Design in Practice

Show HN: Eyeball

Show HN: Cost.dev (YC W21) – making agents cost-aware and cheaper to call

Show HN: Hitoku Draft – Context aware local assistant

Show HN: NoiR Code – because QR sounds similar to "noir"

Show HN: Papernews – self-hosted daily newspaper PDF for your reMarkable

Show HN: Nutrepedia – Nutrition info in 29 locales built with Clojure and Htmx

Show HN: Formally verified polygon intersection – Opus 4.8 oneshots, prev failed

Show HN: Boxes.dev: ditch localhost; run Claude Code and Codex in the cloud

Show HN: Netlora diagnose bufferbloat and why fast internet feels slow

Show HN: Mnemo – local-first AI memory layer for any LLM (Rust, SQLite,petgraph)

Show HN: Fast Android File Manager that works

Show HN: Live breath detection and biofeedback from a phone microphone

Show HN: I embedded 685M public texts in 32 minutes (on 8x A100, Rust, TensorRT)

Show HN: ABC Classic 100 Rankings visualised

Show HN: Lowfat – pluggable CLI filter that saved 91.8% of my LLM tokens

Show HN: Documenting an Obscure Japanese Wii Game – and-Kensaku

Show HN: Omni – Local-first multimodal file search on macOS

Show HN: I nerfed our coding agents on purpose

Show HN: On-device transcriber that's 97% accurate at identifying speakers

Show HN: OWASP VulnerableApp Modern Extensible and Scalable vulnerable app

Show HN: I rebuilt a tiny old volleyball game I loved

Show HN: Mercek – A Desktop IDE for AWS ECS

Show HN: Bash Runtime for AWS Lambda

Show HN: Prela – Purely Algebraic Relation Combinators

Show HN: Local-first fast CPU image to text for screenshots, PDFs, webpages

Show HN: Edsger – A handwritten Clojure REPL for the reMarkable 2

Show HN: Uruky (EU-based Kagi alternative) now has Image Search and URL Rewrites

Show HN: I reverse-engineered the world maps of Test Drive III (1990 DOS game)

Show HN: Altersend – File sharing without cloud

Show HN: A Simplistic UI for Rich Hickey's Design in Practice

Show HN: Eyeball

Show HN: Cost.dev (YC W21) – making agents cost-aware and cheaper to call

Show HN: Hitoku Draft – Context aware local assistant

Show HN: NoiR Code – because QR sounds similar to "noir"

Show HN: Papernews – self-hosted daily newspaper PDF for your reMarkable

Show HN: Nutrepedia – Nutrition info in 29 locales built with Clojure and Htmx

Show HN: Formally verified polygon intersection – Opus 4.8 oneshots, prev failed

Show HN: Boxes.dev: ditch localhost; run Claude Code and Codex in the cloud

Show HN: Netlora diagnose bufferbloat and why fast internet feels slow

Show HN: Mnemo – local-first AI memory layer for any LLM (Rust, SQLite,petgraph)

Show HN: Fast Android File Manager that works

Show HN: Live breath detection and biofeedback from a phone microphone

Show HN: I embedded 685M public texts in 32 minutes (on 8x A100, Rust, TensorRT)

Show HN: I nerfed our coding agents on purpose

Comments