frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

We decreased our LLM costs with Opus

https://www.mendral.com/blog/frontier-model-lower-costs
48•shad42•2h ago

Comments

wxw•1h ago
> We switched to the "triager" pattern: a Haiku agent with a very specific and narrow job. Is this issue already tracked or not? If it is, stop right there. If not, escalate to Opus.

> 4 out of 5 failures never reach Opus. A triager match costs around 25x less than a full investigation.

The title feels misleading. Why clickbait on that when you can just be genuine about the architecture?

idorosen•1h ago
The title does not match the article title: “We Upgraded to a Frontier Model and Our Costs Went Down”.
stingraycharles•43m ago
It’s still misleading, though.
cadamsdotcom•1h ago
I have rewritten the article to be slightly shorter:

“Let a cheap agent decide if the expensive one is needed.”

a_t48•32m ago
Sounds like L1 vs L2 support :)
whalesalad•1h ago
Looking at the diagram, is this seriously a case of replacing basic functional concepts like "write to clickhouse" or "have we seen this before" to a model? could those be actual function calls in some language?

just seems wasteful all around. having an agent in the critical path when a regular expression (or similar) could do just seems odd. yeah haiku is cheap but re.match() is cheaper.

saltyoldman•41m ago
I do a similar thing with a "planner agent" that uses the cheapest (I think it's using openai-gpt-5.2-mini or something at like 20 cents for 1M.) that more or less emits a plan name, task list and the task list has a recommended model in each task. It's not perfect, but many of our tasks are accomplished with lighter weight models. When doing code generation or fixing we upgrade to a more expensive model, planning and decisions are done more cheaply. Keep in mind the tasks are relatively constrained, so planning done with a cheap agent makes sense here. An open-ended agent would likely use a more expensive call for planning.
neya•25m ago
The whole clickbait article can be summarized in one line:

    Let a cheap agent decide if the expensive one is needed
syntaxing•18m ago
Is RAG dead? I would be very surprised a local small SOTA embedded model like llama-embed-nemotron-8b doesnt outperform the Haiku layer for this application. Should be pretty cheap and easy to prove out. With 32K context size, you can literally one shot the whole ticket.
preommr•2m ago
Yea, but RAG takes effort. At the very least some kind of system to organize the documents and do the retrieval.

My theory is that the AI frenzy has reached new levels of insane, where it's literally just throw anything and everything at the model, and just burn tokens to let the AI figure everything out. Why bother paying the upfront cost for a RAG, when the models/agents are constantly evolving, so just slap in a markdown file telling it to check a folder, and call it a day.

Like in design world, people are doing minor tweaks like changing the spacing by typing in prompts instead of just changing a number in an input field. We are legitimately approaching just using llms instead of calculators, or memes like that endpoint that calls an llm to generate the code to do some business logic, rather than directly code the logic.

2001zhaozhao•5m ago
> We switched to the "triager" pattern: a Haiku agent with a very specific and narrow job. Is this issue already tracked or not? If it is, stop right there. If not, escalate to Opus.

I'm planning to self host qwen3.6 27b basically for this purpose

Ghostty is leaving GitHub

https://mitchellh.com/writing/ghostty-leaving-github
1856•WadeGrimridge•7h ago•577 comments

Before GitHub

https://lucumr.pocoo.org/2026/4/28/before-github/
291•mlex•6h ago•87 comments

How ChatGPT serves ads

https://www.buchodi.com/how-chatgpt-serves-ads-heres-the-full-attribution-loop/
186•lmbbuchodi•3h ago•127 comments

We decreased our LLM costs with Opus

https://www.mendral.com/blog/frontier-model-lower-costs
48•shad42•2h ago•11 comments

Regression: malware reminder on every read still causes subagent refusals

https://github.com/anthropics/claude-code/issues/49363
156•thomashobohm•3h ago•50 comments

OpenAI models coming to Amazon Bedrock: Interview with OpenAI and AWS CEOs

https://stratechery.com/2026/an-interview-with-openai-ceo-sam-altman-and-aws-ceo-matt-garman-abou...
194•translocator•7h ago•75 comments

Show HN: Auto-Architecture: Karpathy's Loop, Pointed at a CPU

https://github.com/FeSens/auto-arch-tournament/blob/main/docs/auto-arch-tournament-blog-post.md
16•fesens•10h ago•0 comments

Behavioral timescale synaptic plasticity rewires the brain after an experience

https://www.quantamagazine.org/a-new-type-of-neuroplasticity-rewires-the-brain-after-a-single-exp...
71•ibobev•1d ago•0 comments

I won a championship that doesn't exist

https://ron.stoner.com/How_I_Won_a_Championship_That_Doesnt_Exist/
90•SEJeff•6h ago•61 comments

Intel Arc Pro B70 Review

https://www.pugetsystems.com/labs/articles/intel-arc-pro-b70-review/
122•zdw•4d ago•69 comments

GitHub RCE Vulnerability: CVE-2026-3854 Breakdown

https://www.wiz.io/blog/github-rce-vulnerability-cve-2026-3854
264•bo0tzz•11h ago•63 comments

Claude for Creative Work

https://www.anthropic.com/news/claude-for-creative-work
71•elsewhen•3h ago•43 comments

Who owns the code Claude Code wrote?

https://legallayer.substack.com/p/who-owns-the-claude-code-wrote
279•senaevren•15h ago•307 comments

Your phone is about to stop being yours

https://keepandroidopen.org/en/
1048•doener•11h ago•499 comments

Nonlinearity Affects a Pendulum

https://www.johndcook.com/blog/2026/04/24/nonlinear-pendulum/
8•ibobev•1d ago•1 comments

Warp is now open-source

https://www.warp.dev/blog/warp-is-now-open-source
181•meetpateltech•11h ago•56 comments

Apple CMF (Color-Matching Functions) 2026

https://www.lttlabs.com/articles/2026/04/11/apple-studio-display-xdr-display-testing-results
10•HeyMeco•3h ago•0 comments

Show HN: Drive any macOS app in the background without stealing the cursor

https://github.com/trycua/cua
63•frabonacci•11h ago•23 comments

CJIT: C, Just in Time

https://dyne.org/cjit/
90•smartmic•8h ago•26 comments

Localsend: An open-source cross-platform alternative to AirDrop

https://github.com/localsend/localsend
752•bilsbie•15h ago•235 comments

VibeVoice: Open-source frontier voice AI

https://github.com/microsoft/VibeVoice
331•tosh•15h ago•168 comments

I have officially retired from Emacs

https://nullprogram.com/blog/2026/04/26/
186•Fudgel•3d ago•124 comments

Talkie: a 13B vintage language model from 1930

https://talkie-lm.com/introducing-talkie
648•jekude•1d ago•262 comments

UAE to leave OPEC

https://www.ft.com/content/8c354f2d-3e66-47f1-aad4-9b4aa30e386d
357•bazzmt•14h ago•491 comments

An update on GitHub availability

https://github.blog/news-insights/company-news/an-update-on-github-availability/
326•salkahfi•17h ago•214 comments

A playable DOOM MCP app

https://chrisnager.com/blog/doom-runs-in-chatgpt-and-claude/
80•chrisnager•8h ago•29 comments

Infisical (YC W23) Is Hiring Full Stack Software Engineers (Remote)

https://jobs.ashbyhq.com/infisical/782b9da8-20e1-48b2-919e-6c5430c58628
1•vmatsiiako•10h ago

Patch applies fake diffs from commit messages

https://samizdat.dev/phantom-patch/
87•reconquestio•2d ago•26 comments

Show HN: Live Sun and Moon Dashboard with NASA Footage

https://www.lumara-space.app/
165•beeswaxpat•13h ago•58 comments

Waymo in Portland

https://waymo.com/blog/shorts/waymo-in-portland/
259•xnx•9h ago•425 comments