frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Lunch with the FT: Tarek Mansour

https://www.ft.com/content/a4cebf4c-c26c-48bb-82c8-5701d8256282
1•hhs•2m ago•0 comments

Old Mexico and her lost provinces (1883)

https://www.gutenberg.org/cache/epub/77881/pg77881-images.html
1•petethomas•5m ago•0 comments

'AI' is a dick move, redux

https://www.baldurbjarnason.com/notes/2026/note-on-debating-llm-fans/
2•cratermoon•7m ago•0 comments

The source code was the moat. But not anymore

https://philipotoole.com/the-source-code-was-the-moat-no-longer/
1•otoolep•7m ago•0 comments

Does anyone else feel like their inbox has become their job?

1•cfata•7m ago•0 comments

An AI model that can read and diagnose a brain MRI in seconds

https://www.michiganmedicine.org/health-lab/ai-model-can-read-and-diagnose-brain-mri-seconds
1•hhs•10m ago•0 comments

Dev with 5 of experience switched to Rails, what should I be careful about?

1•vampiregrey•13m ago•0 comments

AlphaFace: High Fidelity and Real-Time Face Swapper Robust to Facial Pose

https://arxiv.org/abs/2601.16429
1•PaulHoule•14m ago•0 comments

Scientists discover “levitating” time crystals that you can hold in your hand

https://www.nyu.edu/about/news-publications/news/2026/february/scientists-discover--levitating--t...
1•hhs•16m ago•0 comments

Rammstein – Deutschland (C64 Cover, Real SID, 8-bit – 2019) [video]

https://www.youtube.com/watch?v=3VReIuv1GFo
1•erickhill•16m ago•0 comments

Tell HN: Yet Another Round of Zendesk Spam

1•Philpax•16m ago•0 comments

Postgres Message Queue (PGMQ)

https://github.com/pgmq/pgmq
1•Lwrless•20m ago•0 comments

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

https://github.com/kjnez/django-rclone
1•cui•23m ago•1 comments

NY lawmakers proposed statewide data center moratorium

https://www.niagara-gazette.com/news/local_news/ny-lawmakers-proposed-statewide-data-center-morat...
1•geox•24m ago•0 comments

OpenClaw AI chatbots are running amok – these scientists are listening in

https://www.nature.com/articles/d41586-026-00370-w
2•EA-3167•24m ago•0 comments

Show HN: AI agent forgets user preferences every session. This fixes it

https://www.pref0.com/
6•fliellerjulian•27m ago•0 comments

Introduce the Vouch/Denouncement Contribution Model

https://github.com/ghostty-org/ghostty/pull/10559
2•DustinEchoes•29m ago•0 comments

Show HN: SSHcode – Always-On Claude Code/OpenCode over Tailscale and Hetzner

https://github.com/sultanvaliyev/sshcode
1•sultanvaliyev•29m ago•0 comments

Microsoft appointed a quality czar. He has no direct reports and no budget

https://jpcaparas.medium.com/microsoft-appointed-a-quality-czar-he-has-no-direct-reports-and-no-b...
2•RickJWagner•30m ago•0 comments

Multi-agent coordination on Claude Code: 8 production pain points and patterns

https://gist.github.com/sigalovskinick/6cc1cef061f76b7edd198e0ebc863397
1•nikolasi•31m ago•0 comments

Washington Post CEO Will Lewis Steps Down After Stormy Tenure

https://www.nytimes.com/2026/02/07/technology/washington-post-will-lewis.html
13•jbegley•32m ago•2 comments

DevXT – Building the Future with AI That Acts

https://devxt.com
2•superpecmuscles•32m ago•4 comments

A Minimal OpenClaw Built with the OpenCode SDK

https://github.com/CefBoud/MonClaw
1•cefboud•33m ago•0 comments

The silent death of Good Code

https://amit.prasad.me/blog/rip-good-code
3•amitprasad•33m ago•0 comments

The Internal Negotiation You Have When Your Heart Rate Gets Uncomfortable

https://www.vo2maxpro.com/blog/internal-negotiation-heart-rate
1•GoodluckH•35m ago•0 comments

Show HN: Glance – Fast CSV inspection for the terminal (SIMD-accelerated)

https://github.com/AveryClapp/glance
2•AveryClapp•36m ago•0 comments

Busy for the Next Fifty to Sixty Bud

https://pestlemortar.substack.com/p/busy-for-the-next-fifty-to-sixty-had-all-my-money-in-bitcoin-...
1•mithradiumn•36m ago•0 comments

Imperative

https://pestlemortar.substack.com/p/imperative
1•mithradiumn•37m ago•0 comments

Show HN: I decomposed 87 tasks to find where AI agents structurally collapse

https://github.com/XxCotHGxX/Instruction_Entropy
2•XxCotHGxX•41m ago•1 comments

I went back to Linux and it was a mistake

https://www.theverge.com/report/875077/linux-was-a-mistake
4•timpera•42m ago•2 comments
Open in hackernews

Tell HN: I cut Claude API costs from $70/month to pennies

40•ok_orco•1w ago
The first time I pulled usage costs after running Chatter.Plus - a tool I'm building that aggregates community feedback from Discord/GitHub/forums - for a day hours, I saw $2.30. Did the math. $70/month. $840/year. For one instance. Felt sick.

I'd done napkin math beforehand, so I knew it was probably a bug, but still. Turns out it was only partially a bug. The rest was me needing to rethink how I built this thing. Spent the next couple days ripping it apart. Making tweaks, testing with live data, checking results, trying again. What I found was I was sending API requests too often and not optimizing what I was sending and receiving.

Here's what moved the needle, roughly big to small (besides that bug that was costin me a buck a day alone):

- Dropped Claude Sonnet entirely - tested both models on the same data, Haiku actually performed better at a third of the cost

- Started batching everything - hourly calls were a money fire

- Filter before the AI - "lol" and "thanks" are a lot of online chatter. I was paying AI to tell me that's not feedback. That said, I still process agreements like "+1" and "me too."

- Shorter outputs - "H/M/L" instead of "high/medium/low", 40-char title recommendation

- Strip code snippets before processing - just reiterating the issue and bloating the call

End of the week: pennies a day. Same quality.

I'm not building a VC-backed app that can run at a loss for years. I'm unemployed, trying to build something that might also pay rent. The math has to work from day one.

The upside: these savings let me 3x my pricing tier limits and add intermittent quality checks. Headroom I wouldn't have had otherwise.

Happy to answer questions.

Comments

arthurcolle•1w ago
Can you discuss a bit more of the architecture?
ok_orco•1w ago
Pretty straightforward. Sources dump into a queue throughout the day, regex filters the obvious junk ("lol", "thanks", bot messages never hit the LLM), then everything gets batched overnight through Anthropic's Batch API for classification. Feedback gets clustered against existing pain points or creates new ones.

Most of the cost savings came from not sending stuff to the LLM that didn't need to go there, plus the batch API is half the price of real-time calls.

dezgeg•1w ago
Are you also adding the proper prompt cache control attributes? I think Anthropic API still doesn't do it automatically
ok_orco•1w ago
No I need to look into this!
gandalfar•1w ago
Consider using z.ai as model provider to further lower your costs.
tehlike•1w ago
This is what i was going to suggest too.
DANmode•1w ago
Do they or any other providers offer any improvements on the often-chronicled variability of quality/effort from the major two services e.g. during peak hours?
viraptor•1w ago
Or minimax - m2.1 release didn't make a big splash in the news, but it's really capable.
ok_orco•1w ago
Will take a look!
andai•6d ago
Do you mean with the coding plan?

I haven't tested it extensively but I found that when I used Claude Code with it, it was reasonably fast (but actual Claude was way faster), but when I tried to use the API itself manually, it would be super slow.

My guess would be think they're filtering the traffic and prioritizing certain types. On my own script, I ran into a rate limit after 7 requests!

LTL_FTC•1w ago
It sounds like you don’t need immediate llm responses and can batch process your data nightly? Have you considered running a local llm? May not need to pay for api calls. Today’s local models are quite good. I started off with cpu and even that was fine for my pipelines.
queenkjuul•1w ago
Agreed, I'm pretty amazed at what I'm able to do locally just with an AMD 6700XT and 32GB of RAM. It's slow, but if you've got all night...
kreetx•1w ago
Though haven't done any extensive testing then I personally could easily get by with current local models. The only reason I don't is that the hosted ones all have free tiers.
ydu1a2fovb•1w ago
Can you suggest any good llms for cpu?
R_D_Olivaw•1w ago
Following.
LTL_FTC•1w ago
I started off using gpt-oss-120b on cpu. It uses about 60-65gb of memory or so but my workstation has 128gb of ram. If I had less ram, I would start off with the gpt-oss-20b model and go from there. Look for MoE models as they are more efficient to run.
Aerbil313•1w ago
Hey Olivaw, saw a comment of yours asking about planners. Wanted to reply but it’s expired. Check out bullet journalling.
R_D_Olivaw•4d ago
Thanks for the reply!

Bullet journaling is neat, but I'm far too whacky with my notes to stick to that kind of structure.

I have various other structures I implement, but they're just hodge podges of things.

LTL_FTC•1w ago
I started off using gpt-oss-120b on cpu. It uses about 60-65gb of memory or so but my workstation has 128gb of ram. If I had less ram, I would start off with the gpt-oss-20b model and go from there. Look for MoE models as they are more efficient to run.
ok_orco•1w ago
I haven't thought about that, but really want to dig in more now. Any places you recommend starting?
LTL_FTC•1w ago
I started off using gpt-oss-120b on cpu. It uses about 60-65gb of memory or so but my workstation has 128gb of ram. If I had less ram, I would start off with the gpt-oss-20b model and go from there. Look for MoE models as they are more efficient to run.

My old threadripper pro was seeing about 15tps, which was quite acceptable for the background tasks I was running.

44za12•1w ago
This is the way. I actually mapped out the decision tree for this exact process and more here:

https://github.com/NehmeAILabs/llm-sanity-checks

homeonthemtn•1w ago
That's interesting. Is there any kind of mapping to these respective models somewhere?
44za12•1w ago
Yes, I included a 'Model Selection Cheat Sheet' in the README (scroll down a bit).

I map them by task type:

Tiny (<3B): Gemma 3 1B (could try 4B as well), Phi-4-mini (Good for classification). Small (8B-17B): Qwen 3 8B, Llama 4 Scout (Good for RAG/Extraction). Frontier: GPT-5, Llama 4 Maverick, GLM, Kimi

Is that what you meant?

hyuuu•1w ago
at the sake of being obvious, do you have a tiny llm gating this decision and classifying and directing the task to its appropriate solution?
andai•6d ago
>Before you reach for a frontier model, ask yourself: does this actually need a trillion-parameter model?

>Most tasks don't. This repo helps you figure out which ones.

About a year ago I was testing Gemini 2.5 Pro and Gemini 2.5 Flash for agentic coding. I found they could both do the same task, but Gemini Pro was way slower and more expensive.

This blew my mind because I'd previously been obsessed with "best/smartest model", and suddenly realized what I actually wanted was "fastest/dumbest/cheapest model that can handle my task!"

joshribakoff•1w ago
Have you looked into https://maartengr.github.io/BERTopic/index.html ?
DeathArrow•1w ago
You also can try to use cheaper models like GLM, Deepseek, Qwen,at least partially.
deepsummer•1w ago
As much as I like the Claude models, they are expensive. I wouldn't use them to process large volumes of data. Gemini 2.5 Flash-Lite is $0.10 per million tokens. Grok 4.1 Fast is really good and only $0.20. They will work just as well for most simple tasks.
toxic72•1w ago
consider this for addtl cost savings if local doesnt interest you - https://docs.cloud.google.com/vertex-ai/generative-ai/docs/m...