frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing

https://arxiv.org/abs/2508.12631
67•omarsar•3h ago

Comments

hodgehog11•3h ago
Wow, that was fast.

I've thought for a while that ensembling approaches would become the next stage of LLM development after CoT, since it provides yet another effective, independent axis for scaling laws. Great to see that perspective is taking off. The open weight community has an opportunity to take these ideas and run with them better than OpenAI has.

bachittle•3h ago
I’m fascinated by this new paradigm. We’ve more or less perfected Mixture-of-Experts inside a single model, where routing happens between subnetworks. What GPT-5 auto (and this paper) are doing is a step further: “LLM routing” across multiple distinct models. It’s still rough right now, but it feels inevitable that this will get much better over time.
NitpickLawyer•2h ago
> It’s still rough right now, but it feels inevitable that this will get much better over time.

Yeah, the signals they get will improve things over time. You can do a lot of heavy lifting with embedding models nowadays, get "satisfaction" signals from chats, and adjust your router based on those. It will be weird at first, some people will complain, but at the end of the day, you don't need imo-gold levels of thinking to write a fitness plan that most likely the user won't even follow :)

Signal gathering is likely the driver of most of the subsidised model offerings we see today.

phi-go•1h ago
Does this have a compute benefit or could one use different specialized LLM architectures / models for the subnetworks?
CuriouslyC•1h ago
I mean, agentic workflows have been a thing for a while now, this is just agentic chat.
datadrivenangel•2h ago
Paper and repo do not mention routing latency, which I think is a concern.

Also the paper has some pie chart crimes on page 6.

NitpickLawyer•2h ago
Just from a brief look at the repo they seem to be doing semantic embeddings w/ Qwen3-Embedding-8B, which should be in the high thousands pp t/s on recent hardware. With a sufficiently large dataset after using it for a while you could probably fine-tune a smaller model as well (4B and 0.6B available from the same family)
biggestfan•2h ago
Between these kinds of optimizations, improved data center efficiency, and smaller models being more capable, I wonder how long it will be before someone manages to make a profitable AI business. Maybe when they race to train better models slows down and they don't need to constantly upgrade capacity.
Justsignedup•2h ago
Reminds me of the early days of cloud computing. It was very pricey, but once the tools caught up in 5 or so years, it went from "omg cloud is so expensive" to "omg cloud is only expensive when its worth building your own data center"
darth_avocado•2h ago
AGI will not be a single model. It will be an ensemble of models that interact with each other. Just like different parts of your brain.
mgreg•2h ago
Link to repo for those interested: https://github.com/ZhangYiqun018/AvengersPro
whistle650•2h ago
It seems they use 70% of the benchmark query-answer pairs to cluster and determine which models work best for each cluster (by sending all queries to all models and looking at responses vs ground truth answers). Then they route the remaining 30% "test" set queries according to those prior determinations. It doesn't seem surprising that this approach would give you Pareto efficiency on those benchmarks.
visarga•46m ago
It's ok if you can update the router over time, the more data you have the better.
cubefox•1h ago
Based on my experience, the GPT-5 router either isn't very smart or is deliberately configured to be very stingy. It basically never uses the reasoning model by itself, even if that means it hallucinates nonsense.
srekhi•1h ago
Isn't this what NotDiamond (founded 2 years ago!) has been working to solve for? Maybe someone from their team will chime in (cc @t5-notdiamond)
visarga•44m ago
Essentially, instead of modifying the prompt itself, the system intelligently directs the prompt to the LLM that is best suited to handle it based on its learned performance and efficiency characteristics for similar types of queries. It's externally optimizing people's prompts.

Leaving Gmail for Mailbox.org

https://giuliomagnifico.blog/post/2025-08-18-leaving-gmail/
38•giuliomagnifico•37m ago•35 comments

Waymo granted permit to begin testing in New York City

https://www.cnbc.com/2025/08/22/waymo-permit-new-york-city-nyc-rides.html
180•achristmascarl•1h ago•88 comments

FFmpeg 8.0

https://ffmpeg.org/index.html#pr8.0
376•gyan•2h ago•119 comments

Sprinkling Self-Doubt on ChatGPT

https://justin.searls.co/posts/sprinkling-self-doubt-on-chatgpt/
10•ingve•33m ago•0 comments

Launch HN: BlankBio (YC S25) - Making RNA Programmable

11•antichronology•1h ago•3 comments

Io_uring, kTLS and Rust for zero syscall HTTPS server

https://blog.habets.se/2025/04/io-uring-ktls-and-rust-for-zero-syscall-https-server.html
402•guntars•14h ago•113 comments

Show HN: Clyp – Clipboard Manager for Linux

https://github.com/murat-cileli/clyp
19•timeoperator•2h ago•18 comments

LabPlot: Free, open source and cross-platform Data Visualization and Analysis

https://labplot.org/
131•turrini•9h ago•21 comments

Does MHz Still Matter?

https://www.ubicloud.com/blog/does-mhz-still-matter
32•furkansahin•3h ago•10 comments

DeepSeek-v3.1

https://api-docs.deepseek.com/news/news250821
697•wertyk•23h ago•232 comments

Show HN: Pinch – macOS voice translation for real-time conversations

https://www.startpinch.com/
11•christiansafka•2d ago•7 comments

The US Department of Agriculture Bans Support for Renewables

https://insideclimatenews.org/news/19082025/usda-bans-farm-renewables-support/
20•mooreds•23m ago•5 comments

Vibe Debugging: Enterprises' Up and Coming Nightmare

https://marketsaintefficient.substack.com/p/vibe-debugging-enterprises-up-and
55•someoneloser•2h ago•40 comments

What about using rel="share-url" to expose sharing intents?

https://shkspr.mobi/blog/2025/08/what-about-using-relshare-url-to-expose-sharing-intents/
60•edent•6h ago•26 comments

Launch HN: Inconvo (YC S23) – AI agents for customer-facing analytics

24•ogham•5h ago•14 comments

The issue of anti-cheat on Linux

https://tulach.cc/the-issue-of-anti-cheat-on-linux/
18•todsacerdoti•17h ago•3 comments

Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing

https://arxiv.org/abs/2508.12631
67•omarsar•3h ago•16 comments

XSLT removal will break multiple government and regulatory sites

https://github.com/whatwg/html/issues/11582
50•colejohnson66•40m ago•20 comments

A Guide to Gen AI / LLM Vibecoding for Expert Programmers

https://www.stochasticlifestyle.com/a-guide-to-gen-ai-llm-vibecoding-for-expert-programmers/
60•ChrisRackauckas•3h ago•50 comments

Everything is correlated (2014–23)

https://gwern.net/everything
220•gmays•16h ago•100 comments

Control shopping cart wheels with your phone (2021)

https://www.begaydocrime.com/
244•mystraline•17h ago•113 comments

Build Log: Macintosh Classic

https://www.jeffgeerling.com/blog/2025/build-log-macintosh-classic
16•speckx•4h ago•2 comments

VHS-C: When a lazy idea stumbles towards perfection [video]

https://www.youtube.com/watch?v=HFYWHeBhYbM
144•surprisetalk•4d ago•83 comments

The Minecraft Code (2024) [video]

https://www.youtube.com/watch?v=nz2LeXwJOyI
38•zichy•11h ago•52 comments

It’s not wrong that "🤦🏼‍♂️".length == 7 (2019)

https://hsivonen.fi/string-length/
103•program•12h ago•141 comments

Code formatting comes to uv experimentally

https://pydevtools.com/blog/uv-format-code-formatting-comes-to-uv-experimentally/
328•tanelpoder•21h ago•229 comments

Closing the Nix Gap: From Environments to Packaged Applications for Rust

https://devenv.sh/blog/2025/08/22/closing-the-nix-gap-from-environments-to-packaged-applications-for-rust/
5•domenkozar•2h ago•0 comments

4chan will refuse to pay daily online safety fines, lawyer tells BBC

https://www.bbc.co.uk/news/articles/cq68j5g2nr1o
261•donpott•8h ago•275 comments

All managers make mistakes; good managers acknowledge and repair

https://terriblesoftware.org/2025/08/22/the-management-skill-nobody-talks-about/
231•matheusml•5h ago•107 comments

Go is still not good

https://blog.habets.se/2025/07/Go-is-still-not-good.html
432•ustad•8h ago•560 comments