frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Tell HN: I cut Claude API costs from $70/month to pennies

27•ok_orco•7h ago
The first time I pulled usage costs after running Chatter.Plus - a tool I'm building that aggregates community feedback from Discord/GitHub/forums - for a day hours, I saw $2.30. Did the math. $70/month. $840/year. For one instance. Felt sick.

I'd done napkin math beforehand, so I knew it was probably a bug, but still. Turns out it was only partially a bug. The rest was me needing to rethink how I built this thing. Spent the next couple days ripping it apart. Making tweaks, testing with live data, checking results, trying again. What I found was I was sending API requests too often and not optimizing what I was sending and receiving.

Here's what moved the needle, roughly big to small (besides that bug that was costin me a buck a day alone):

- Dropped Claude Sonnet entirely - tested both models on the same data, Haiku actually performed better at a third of the cost

- Started batching everything - hourly calls were a money fire

- Filter before the AI - "lol" and "thanks" are a lot of online chatter. I was paying AI to tell me that's not feedback. That said, I still process agreements like "+1" and "me too."

- Shorter outputs - "H/M/L" instead of "high/medium/low", 40-char title recommendation

- Strip code snippets before processing - just reiterating the issue and bloating the call

End of the week: pennies a day. Same quality.

I'm not building a VC-backed app that can run at a loss for years. I'm unemployed, trying to build something that might also pay rent. The math has to work from day one.

The upside: these savings let me 3x my pricing tier limits and add intermittent quality checks. Headroom I wouldn't have had otherwise.

Happy to answer questions.

Comments

arthurcolle•7h ago
Can you discuss a bit more of the architecture?
ok_orco•7h ago
Pretty straightforward. Sources dump into a queue throughout the day, regex filters the obvious junk ("lol", "thanks", bot messages never hit the LLM), then everything gets batched overnight through Anthropic's Batch API for classification. Feedback gets clustered against existing pain points or creates new ones.

Most of the cost savings came from not sending stuff to the LLM that didn't need to go there, plus the batch API is half the price of real-time calls.

dezgeg•1h ago
Are you also adding the proper prompt cache control attributes? I think Anthropic API still doesn't do it automatically
gandalfar•1h ago
Consider using z.ai as model provider to further lower your costs.
tehlike•52m ago
This is what i was going to suggest too.
DANmode•48m ago
Do they or any other providers offer any improvements on the often-chronicled variability of quality/effort from the major two services e.g. during peak hours?
viraptor•37m ago
Or minimax - m2.1 release didn't make a big splash in the news, but it's really capable.
LTL_FTC•57m ago
It sounds like you don’t need immediate llm responses and can batch process your data nightly? Have you considered running a local llm? May not need to pay for api calls. Today’s local models are quite good. I started off with cpu and even that was fine for my pipelines.
44za12•56m ago
This is the way. I actually mapped out the decision tree for this exact process and more here:

https://github.com/NehmeAILabs/llm-sanity-checks

joshribakoff•41m ago
Have you looked into https://maartengr.github.io/BERTopic/index.html ?
DeathArrow•13m ago
You also can try to use cheaper models like GLM, Deepseek, Qwen,at least partially.

The browser is the sandbox

https://simonwillison.net/2026/Jan/25/the-browser-is-the-sandbox/
65•enos_feedler•2h ago•39 comments

First, make me care

https://gwern.net/blog/2026/make-me-care
545•andsoitis•12h ago•160 comments

Scientists identify brain waves that define the limits of 'you'

https://www.sciencealert.com/scientists-identify-brain-waves-that-define-the-limits-of-you
157•mikhael•7h ago•31 comments

Iran's internet blackout may become permanent, with access for elites only

https://restofworld.org/2026/iran-blackout-tiered-internet/
184•siev•3h ago•88 comments

Things I've learned in my 10 years as an engineering manager

https://www.jampa.dev/p/lessons-learned-after-10-years-as
17•jampa•4d ago•0 comments

A macOS app that blurs your screen when you slouch

https://github.com/tldev/posturr
567•dnw•16h ago•183 comments

Ask HN: DDD was a great debugger – what would a modern equivalent look like?

17•manux81•9h ago•15 comments

A static site generator written in POSIX shell

https://aashvik.com/posts/shell-ssg/
18•todsacerdoti•5d ago•3 comments

Video Games as Art

https://gwern.net/video-game-art
42•andsoitis•5h ago•20 comments

Case study: Creative math – How AI fakes proofs

https://tomaszmachnik.pl/case-study-math-en.html
79•musculus•9h ago•50 comments

You can just port things to Cloudflare Workers

https://sigh.dev/posts/you-can-just-port-things-to-cloudflare-workers/
19•STRiDEX•5h ago•15 comments

Compiling models to megakernels

https://blog.luminal.com/p/compiling-models-to-megakernels
13•jafioti•1d ago•2 comments

The Science of Fermentation [audio]

https://www.bbc.co.uk/programmes/m002pqg6
40•fallinditch•2d ago•9 comments

Building a Real-Time HN Display for $15

https://medium.com/@lee.harding/building-a-real-time-hn-display-for-15-3ea1772051ff
27•kylegalbraith•3d ago•6 comments

Environmentalists worry Google behind bid to control Oregon town's water

https://www.sfgate.com/national-parks/article/mount-hood-water-google-21307223.php
75•voxadam•4h ago•11 comments

The future of software engineering is SRE

https://swizec.com/blog/the-future-of-software-engineering-is-sre/
92•Swizec•9h ago•44 comments

Delta single handle ball faucets (1963)

https://archive.org/details/DeltaSingleHandleBallFaucets
48•userbinator•4d ago•28 comments

Using PostgreSQL as a Dead Letter Queue for Event-Driven Systems

https://www.diljitpr.net/blog-post-postgresql-dlq
200•tanelpoder•16h ago•61 comments

I was right about ATProto key management

https://notes.nora.codes/atproto-again/
126•todsacerdoti•12h ago•89 comments

Clawdbot - open source personal AI assistant

https://github.com/clawdbot/clawdbot
201•KuzeyAbi•7h ago•138 comments

LED lighting undermines visual performance unless supplemented by wider spectra

https://www.nature.com/articles/s41598-026-35389-6
72•bookofjoe•10h ago•40 comments

Web-based image editor modeled after Deluxe Paint

https://github.com/steffest/DPaint-js
212•bananaboy•19h ago•19 comments

Guix for Development

https://dthompson.us/posts/guix-for-development.html
73•clircle•5d ago•24 comments

Spanish track was fractured before high-speed train disaster, report finds

https://www.bbc.com/news/articles/c1m77dmxlvlo
189•Rygian•12h ago•157 comments

Show HN: An interactive map of US lighthouses and navigational aids

https://www.lighthouses.app/
64•idd2•13h ago•19 comments

Bitwise conversion of doubles using only FP multiplication and addition (2020)

https://dougallj.wordpress.com/2020/05/10/bitwise-conversion-of-doubles-using-only-floating-point...
36•vitaut•17h ago•3 comments

Show HN: NukeCast – If it happened today, where would the fallout go

https://nukecast.com/
9•todd_tracerlab•4h ago•1 comments

ICE using Palantir tool that feeds on Medicaid data

https://www.eff.org/deeplinks/2026/01/report-ice-using-palantir-tool-feeds-medicaid-data
1146•JKCalhoun•14h ago•681 comments

Oneplus phone update introduces hardware anti-rollback

https://consumerrights.wiki/w/Oneplus_phone_update_introduces_hardware_anti-rollback
398•validatori•11h ago•238 comments

Turbopack: Building faster by building less

https://nextjs.org/blog/turbopack-incremental-computation
37•feross•5d ago•17 comments