frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

France's homegrown open source online office suite

https://github.com/suitenumerique
169•nar001•2h ago•90 comments

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
365•theblazehen•2d ago•126 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
60•AlexeyBrin•3h ago•12 comments

Reinforcement Learning from Human Feedback

https://arxiv.org/abs/2504.12501
37•onurkanbkrc•2h ago•2 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
744•klaussilveira•17h ago•232 comments

Coding agents have replaced every framework I used

https://blog.alaindichiappari.dev/p/software-engineering-is-back
101•alainrk•2h ago•96 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
996•xnx•23h ago•568 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
130•jesperordrup•8h ago•55 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
4•vinhnx•59m ago•0 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
87•videotopia•4d ago•19 comments

Ga68, a GNU Algol 68 Compiler

https://fosdem.org/2026/schedule/event/PEXRTN-ga68-intro/
30•matt_d•4d ago•6 comments

Making geo joins faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
146•matheusalmeida•2d ago•39 comments

A Fresh Look at IBM 3270 Information Display System

https://www.rs-online.com/designspark/a-fresh-look-at-ibm-3270-information-display-system
6•rbanffy•3d ago•0 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
251•isitcontent•18h ago•27 comments

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

https://github.com/sandys/kappal
9•sandGorgon•2d ago•2 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
264•dmpetrov•18h ago•143 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
527•todsacerdoti•1d ago•255 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
406•ostacke•1d ago•105 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
351•vecti•20h ago•158 comments

Cross-Region MSK Replication: K2K vs. MirrorMaker2

https://medium.com/lensesio/cross-region-msk-replication-a-comprehensive-performance-comparison-o...
6•andmarios•4d ago•1 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
321•eljojo•20h ago•197 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
54•helloplanets•4d ago•52 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
365•aktau•1d ago•190 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
446•lstoll•1d ago•295 comments

Reputation Scores for GitHub Accounts

https://shkspr.mobi/blog/2026/02/reputation-scores-for-github-accounts/
4•edent•2h ago•0 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
290•i5heu•20h ago•246 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
103•quibono•4d ago•29 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
49•gmays•13h ago•22 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
27•bikenaga•3d ago•15 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
164•vmatsiiako•22h ago•75 comments
Open in hackernews

Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing

https://arxiv.org/abs/2508.12631
130•omarsar•5mo ago

Comments

hodgehog11•5mo ago
Wow, that was fast.

I've thought for a while that ensembling approaches would become the next stage of LLM development after CoT, since it provides yet another effective, independent axis for scaling laws. Great to see that perspective is taking off. The open weight community has an opportunity to take these ideas and run with them better than OpenAI has.

61j3t•5mo ago
Yet a context hell comes with that
bachittle•5mo ago
I’m fascinated by this new paradigm. We’ve more or less perfected Mixture-of-Experts inside a single model, where routing happens between subnetworks. What GPT-5 auto (and this paper) are doing is a step further: “LLM routing” across multiple distinct models. It’s still rough right now, but it feels inevitable that this will get much better over time.
NitpickLawyer•5mo ago
> It’s still rough right now, but it feels inevitable that this will get much better over time.

Yeah, the signals they get will improve things over time. You can do a lot of heavy lifting with embedding models nowadays, get "satisfaction" signals from chats, and adjust your router based on those. It will be weird at first, some people will complain, but at the end of the day, you don't need imo-gold levels of thinking to write a fitness plan that most likely the user won't even follow :)

Signal gathering is likely the driver of most of the subsidised model offerings we see today.

phi-go•5mo ago
Does this have a compute benefit or could one use different specialized LLM architectures / models for the subnetworks?
CuriouslyC•5mo ago
I mean, agentic workflows have been a thing for a while now, this is just agentic chat.
nico•5mo ago
I wish this could be exploited even further, where a big model could be built with a network of a lot of small, specialized, models

And then maybe you could just customize and optimize your own mode for local use. Almost like mixing and matching different modules. It would be nice to have a model that only knows and does what you need it to

mrbald•5mo ago
A Team-as-a-Service? Would be interesting to be able to create a Python script acting like a team of sales, project management, and engineering working together with telemetry and KPIs dashboard on top. If not to deliver anything useful then as a project management frameworks learning tool.
akavi•5mo ago
I'd actually bet against this. The "bitter lesson" suggests doing things end-to-end in-model will (eventually, with sufficient data) outcompete building things outside of models.

My understanding is that GPT5 already does this by varying the quantity of CoT done (in addition to the kind of super-model-level routing described in the post), and I strongly suspect it's only going to get more sophisticated

imtringued•5mo ago
The bitter lesson type of strategy would be to implement heterogeneous experts inside an MoE architecture so that the model automatically chooses the number of active parameters by routing to experts with more parameters.

This approach is much more efficient than the paper of this HN submission, because request based routing requires you to recalculate the KV cache from scratch as you switch from model to model.

krackers•5mo ago
Matformer seems a better approach to this type of scaling though
datadrivenangel•5mo ago
Paper and repo do not mention routing latency, which I think is a concern.

Also the paper has some pie chart crimes on page 6.

NitpickLawyer•5mo ago
Just from a brief look at the repo they seem to be doing semantic embeddings w/ Qwen3-Embedding-8B, which should be in the high thousands pp t/s on recent hardware. With a sufficiently large dataset after using it for a while you could probably fine-tune a smaller model as well (4B and 0.6B available from the same family)
biggestfan•5mo ago
Between these kinds of optimizations, improved data center efficiency, and smaller models being more capable, I wonder how long it will be before someone manages to make a profitable AI business. Maybe when they race to train better models slows down and they don't need to constantly upgrade capacity.
Justsignedup•5mo ago
Reminds me of the early days of cloud computing. It was very pricey, but once the tools caught up in 5 or so years, it went from "omg cloud is so expensive" to "omg cloud is only expensive when its worth building your own data center"
darth_avocado•5mo ago
AGI will not be a single model. It will be an ensemble of models that interact with each other. Just like different parts of your brain.
mgreg•5mo ago
Link to repo for those interested: https://github.com/ZhangYiqun018/AvengersPro
whistle650•5mo ago
It seems they use 70% of the benchmark query-answer pairs to cluster and determine which models work best for each cluster (by sending all queries to all models and looking at responses vs ground truth answers). Then they route the remaining 30% "test" set queries according to those prior determinations. It doesn't seem surprising that this approach would give you Pareto efficiency on those benchmarks.
visarga•5mo ago
It's ok if you can update the router over time, the more data you have the better.
cubefox•5mo ago
Based on my experience, the GPT-5 router either isn't very smart or is deliberately configured to be very stingy. It basically never uses the reasoning model by itself, even if that means it hallucinates nonsense.
patrickhogan1•5mo ago
Same experience as you.
srekhi•5mo ago
Isn't this what NotDiamond (founded 2 years ago!) has been working to solve for? Maybe someone from their team will chime in (cc @t5-notdiamond)
manveerc•5mo ago
Yeah that’s what my understanding is too about NotDiamomd. There are a bunch of similar products out there.
visarga•5mo ago
Essentially, instead of modifying the prompt itself, the system intelligently directs the prompt to the LLM that is best suited to handle it based on its learned performance and efficiency characteristics for similar types of queries. It's externally optimizing people's prompts.
ares623•5mo ago
How does it learn in the first place?
hobofan•5mo ago
That's almost the most simple kind of router imaginable, isn't it? Just embed the query and route to the model that in the past has performed the best on similar queries?

I'm sure that has been documented/tried before, and this almost certainly doesn't work in practice. The typical counter-example would be to take a simple-sounding query that actually requires complex reasoning, but because the query is close in the embedding space to other simple-sounding queries, it would be sent to a "dumber model" for efficency.

I guess in their benchmarks that works out, because from what it sounds like, they do per-dataset clustering, so the embedding clusters may actually be able to cluster "complexity levels". However, if you were to mix all datasets into one (similar to how you would encounter it for most real-world use-cases) and cluster against that, this approach would surely break down.

PeterStuer•5mo ago
I would prefer this to be optional.
retinaros•5mo ago
why do we always come up with new words for basic ideas. test time compute, test time router, test time sleep, test time slop. its a router lets call it router.

at the end most of those principles are not part of the LLM but part of the API design in front of the LLM. I understand the goal is trying to abstract this fact to sell more magic.