frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

1•laurentiurad•21s ago

How Much Money Is Trump Profiting from the Presidency? [video]

https://www.youtube.com/watch?v=cpOE3aAt8XU
1•dzonga•46s ago•0 comments

Tesla on X: "Master Plan Part IV" / X

https://twitter.com/Tesla/status/1962591324022153607
2•bilsbie•1m ago•0 comments

Chaotic Food: The Attraction of the Standard American Diet

https://www.exfatloss.com/p/chaotic-food-the-strange-attraction
1•Michelangelo11•7m ago•0 comments

Expert Warns of Worldwide Threat to Human Dignity

https://scitechdaily.com/ai-is-not-intelligent-at-all-expert-warns-of-worldwide-threat-to-human-d...
2•geox•8m ago•0 comments

Scripting Is More Fun with Nushell

https://julianhofer.eu/blog/2025/nushell/
1•JNRowe•9m ago•0 comments

When the Man Tried to Sell Minimalism to the Counterculture

https://www.newyorker.com/culture/listening-booth/when-the-man-tried-to-sell-minimalism-to-the-co...
1•mitchbob•11m ago•1 comments

Moonwalking together: Tracing Redditors' digital memory work on Michael Jackson

https://journals.sagepub.com/doi/full/10.1177/13548565211003878
1•handfuloflight•14m ago•0 comments

XSLT Debate Leads to Bigger Questions of Web Governance

https://thenewstack.io/xslt-debate-leads-to-bigger-questions-of-web-governance/
1•spankalee•17m ago•0 comments

Reports of Gmail security issue are inaccurate

https://blog.google/products/workspace/gmail-security-protections/
1•pentagrama•22m ago•0 comments

There Is No Trolley Problem

https://www.the-reframe.com/there-is-no-trolley-problem/
1•me-vs-cat•22m ago•0 comments

Show HN: A Modern Alternative to the Excel Fuzzy Lookup Add-In

https://www.getflookup.com/blog/advanced-data-cleaning/
1•ogora•23m ago•0 comments

The difference between time and attention

https://world.hey.com/jason/the-difference-between-time-and-attention-bdd955eb
1•amrrs•23m ago•0 comments

Attention is a smoothed cubic spline

https://arxiv.org/abs/2408.09624
1•amai•23m ago•0 comments

What's cooking on Sourcehut? Q3 2025

https://sourcehut.org/blog/2025-09-01-whats-cooking-q3-2025/
1•zorrn•25m ago•0 comments

Patrick Winston: How to Speak (2018) [video]

https://www.youtube.com/watch?v=Unzc731iCUY
12•tosh•30m ago•0 comments

The Talk Show: 'Ersatz PopSocket'

https://daringfireball.net/thetalkshow/2025/08/31/ep-430
1•Bogdanp•31m ago•0 comments

I Was Wrong About Data Center Water Consumption

https://www.construction-physics.com/p/i-was-wrong-about-data-center-water
3•dskrvk•32m ago•0 comments

New study reveals the country with the slowest ageing rates and the fastest

https://www.sciencefocus.com/news/ageing-countries
1•bookofjoe•33m ago•0 comments

The German local government showing Microsoft the red card

https://www.raconteur.net/technology/schleswig-holstein-open-source
2•worik•36m ago•0 comments

Captivate your audience with live polls, Q&A and feedback

https://engagetime.live/
1•estruyf•36m ago•0 comments

SparseLoCo: Communication-Efficient LLM Training

https://arxiv.org/abs/2508.15706
2•synapz_org•40m ago•1 comments

Anguilla: The Caribbean island making millions from the AI boom

https://www.bbc.co.uk/news/articles/cn5xdp427veo
1•FromTheArchives•42m ago•1 comments

Why do browsers throttle JavaScript timers?

https://nolanlawson.com/2025/08/31/why-do-browsers-throttle-javascript-timers/
2•josephscott•42m ago•0 comments

The future of excess mortality after Covid-19 (2024)

https://www.swissre.com/institute/research/topics-and-risk-dialogues/health-and-longevity/covid-1...
3•johntfella•42m ago•0 comments

The FTC Warns Big Tech Companies Not to Apply the Digital Services Act

https://www.wired.com/story/big-tech-companies-in-the-us-have-been-told-not-to-apply-the-digital-...
4•nradov•43m ago•0 comments

Show HN: Gemini CLI Proxy

https://github.com/ubaltaci/gemini-cli-proxy
1•ubaltaci•43m ago•0 comments

New attack reshapes rules of Bitcoin mining

https://techxplore.com/news/2025-09-reshapes-bitcoin.html
2•lif•45m ago•0 comments

First 6G Chip

https://techxplore.com/news/2025-09-scientists-world-6g-chip-capable.html
1•lif•46m ago•0 comments

"The Lottery," by Shirley Jackson (1948)

https://www.newyorker.com/magazine/1948/06/26/the-lottery
1•js2•49m ago•1 comments
Open in hackernews

Adaptive LLM routing under budget constraints

https://arxiv.org/abs/2508.21141
126•tdchaitanya•3h ago

Comments

fny•2h ago
Is there a reason human preference data is even needed? Don't LLMs already have a strong enough notion of question complexity to build a dataset for routing?
delichon•2h ago
> a strong enough notion of question complexity

Aka Wisdom. No, LLMs don't have that. Me neither, I usually have to step in the rabbit holes in order to detect them.

fny•1h ago
"Do you think you need to do high/medium/low amount of thinking to answer X?" seems well within an LLMs wheelhouse if the goal is to build an optimized routing engine.
nutjob2•28m ago
How do you think that an LLM could come by that information? Do you think that LLM vendors are logging performance and feeding that back into the model or some other mechanism?
jibal•1h ago
LLMs don't have notions ... they are pattern matchers against a vast database of human text.
mhh__•1h ago
Please do a SELECT * from this database
ashirviskas•6m ago
What was the name of the rocket that brought the first humans into space?
andrewflnr•2h ago
Is this really the frontier of LLM research? I guess we really aren't getting AGI any time soon, then. It makes me a little less worried about the future, honestly.

Edit: I never actually expected AGI from LLMs. That was snark. I just think it's notable that the fundamental gains in LLM performance seem to have dried up.

srekhi•2h ago
I'm not following this either. You'd think this would be frontier back in 2023
kenjackson•2h ago
First, I don't think we will ever get to AGI. Not because we won't see huge advances still, but AGI is a moving ambiguous target that we won't get consensus on.

But why does this paper impact your thinking on it? It is about budget and recognizing that different LLMs have different cost structures. It's not really an attempt to improve LLM performance measured absolutely.

_heimdall•1h ago
So you don't expect AGI to be possible ever? Or is your concern mainly with the wildly different definitions people use for it and that we'll continue moving goal posts rather than agree we got there?
nutjob2•42m ago
There's no concrete evidence AGI is possible mostly because it has no concrete definition.

It's mostly hand waving, hype and credulity, and unproven claims of scalability right now.

You can't move the goal posts because they don't exist.

ashirviskas•4m ago
Well, if a human is GI, we just need to make it Artificial. Easy.
guluarte•1h ago
I'm starting to think that there will not be an 'AGI' moment, we will simply slowly build smarter machines over time until we realize there is 'AGI'. It would be like video calls in the '90s everybody wanted them, now everybody hates them, lmao.
nutjob2•35m ago
Or we'll realize that human intelligence and machine intelligence is apple and oranges.
jibal•1h ago
LLMs are not on the road to AGI, but there are plenty of dangers associated with them nonetheless.
nicce•1h ago
Just 2 days ago Gemini 2.5 Pro tried to recommend me tax evasion based on non-existing laws and court decisions. The model was so charming and convincing, that even after I brought all the logic flaws and said that this is plain wrong, I started to doubt myself, because it is so good at pleasing, arguing and using words.

And most would have accept the recommendation because the model sold it as less common tactic, while sounding very logical.

roywiggins•1h ago
> even after I brought all the logic flaws and said that this is plain wrong

Once you've started to argue with an LLM you're already barking up the wrong tree. Maybe you're right, maybe not, but there's no point in arguing it out with an LLM.

nutjob2•37m ago
Or you could understand the tool you are using and be skeptical of any of its output.

So many people just want to believe, instead of the reality of LLMs being quite unreliable.

Personally it's usually fairly obvious to me when LLMs are bullshitting probably because I have lots of experience detecting it in humans.

andrewflnr•1h ago
Agreed, broadly. I never really thought they were, but seeing people work on stuff like this instead of even trying to improve the architecture really makes it obvious.
yahoozoo•1h ago
That and LLMs are seemingly plateauing. Earlier this year, it seemed like the big companies were releasing noticeable improvements every other week. People would joke a few weeks is “an eternity” in AI…so what time span are we looking at now?
andrewflnr•1h ago
That's just the thing. There don't seem to have been any breakthroughs in model performance or architecture, so it seems like we're back to picking up marginal reductions in cost to make any progress.
muldvarp•31m ago
There have been very large improvements in code generation in the last 6 months. A few weeks without improvement are not necessarily a plateau.
ACCount37•5m ago
Wait until it ramps up so much that people will say "it's a plateau, for real this time" when they go 3 days without a +10% capability jump.
CamperBob2•9m ago
GPT5 is no joke. They did a terrible job exposing it to the world, but when it's on its game it's damned impressive. The early problems with reasoning level choices seem to be mostly resolved, from what I've been seeing.
yieldcrv•1h ago
just because it’s on arxiv doesn’t mean anything

arxiv is essentially a blog under an academic format, popular amongst asian and south asian academic communities

currently you can launder reputation with it, just like “white papers” in the crypto world allowed for capital for some time

this ability will diminish as more people catch on

ctoth•1h ago
Is a random paper from Fujitsu Research claiming to be the frontier of anything?
andrewflnr•1h ago
Not just this paper, but model working shenanigans also seem to have been a big part of GPT-5, which certainly claims to be frontier work.
pbd•2h ago
GPT-4 at $24.7 per million tokens vs Mixtral at $0.24 - that's a 100x cost difference! Even if routing gets it wrong 20% of the time, the economics still work. But the real question is how you measure 'performance' - user satisfaction doesn't always correlate with technical metrics.
Keyframe•1h ago
number of complaints / million tokens?
pqtyw•1h ago
> GPT-4 at $24.7 per million tokens

While technically true why would you want to use it when OpenAI itself provides a bunch of many times cheaper and better models?

KTibow•58m ago
RouterBench is from March 2024.
FINDarkside•1h ago
It's trivial to get better score than GPT-4 with 1% of the cost by using my propertiary routing algorithm that routes all requests to Gemini 2.5 Flash. It's called GASP (Gemini Always, Save Pennies)
nutjob2•32m ago
Does anyone working in an individual capacity actually end up paying for Gemini (Flash or Pro)? Or does Google boil you like a frog and you end up subscribing?
aspect8445•17m ago
I've used Gemini in a lot of personal projects. At this point I've probably made tens of thousands of requests, sometimes exceeding 1k per week. So far, I haven't had to pay a dime!
simpaticoder•35m ago
PPT (price-per-token) is insufficient to compute cost. You will also need to know an average tokens-per-interaction (TPI). They multiply to give you a cost estimate. A .01x PPT is wiped out by 100x TPI.
mkoubaa•34m ago
> How you measure 'performance'

I heard the best way is through valuations

QuadmasterXLII•2h ago
The framing in the headline is interesting. As far as I recall, spending 4x more compute on a model to improve performance by 7% is the move that has worked over and over again up to this point. 101 % of GPT-4 performance (potentially at any cost) is what I would expect an improved routing algorithm to achieve.
dang•1h ago
(The submitted title was "93% of GPT-4 performance at 1/4 cost: LLM routing with weak bandit feedback")
spoaceman7777•1h ago
Incredible that they are using contextual bandits, and named it: Preference-prior Informed Linucb fOr adaptive rouTing (PILOT)

Rather than the much more obvious: Preference-prior Informed Linucb For Adaptive Routing (PILFAR)