frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

LLMs encode how difficult problems are

https://arxiv.org/abs/2510.18147
174•stansApprentice•3mo ago

Comments

jiito•3mo ago
I haven't read this particular paper in-depth, but it reminds me of another one I saw that used a similar approach to find if the model encodes its own certainty of answering correctly. https://arxiv.org/abs/2509.10625
kazinator•3mo ago
It's all very clear when you mentally replace "LLM" with "text completion driven by compressed training data".

E.g.

[Text copletion driven by compressed training data] exhibit[s] a puzzling inconsistency: [it] solves complex problems yet frequently fail[s] on seemingly simpler ones.

Some problems are better represented by a locus of texts in the training data, allowing more plausible talk to be generated. When the problem is not well represented, it does not help that the problem is simple.

If you train it on nothing but Scientology documents, and then ask about the Buddhist perspective on a situation, you will probably get some nonsense about body thetans, even if the situation is simple.

th0ma5•3mo ago
Thank you for posting this. I'm struck with how there is a lot of studying of the behavior and isolating it from other assumptions and then these individual capabilities are then described as a new solution or discovered capability that would work with all of those other assumptions. This makes most all of the LLM research feel like whack a mole if the goal was to make accurate and reliable models by understanding these techniques. Instead, it's more like seeing faces in cars and buildings and other artifacts of patterns and pattern groupings and recognition of patterns. Building houses on sand, etc.
lukev•3mo ago
Well, that's what a LLM is. The problem is if one's mental model is built on "AI" instead of "LLM."

The fact that LLMs can abstract concepts and do any amount of out-of-sample reasoning is impressive and interesting, but the null hypothesis for a LLM being "impressive" in any regard is that the data required to answer the question is present in it's training set.

XenophileJKO•3mo ago
This is true, but also misleading. We are learning that the models achieve compression by distilling higher level concepts and deriving generalized human like abilities, for example the recent introspection paper from Anthropic.
layoric•3mo ago
I have a hard time trying to conceptualize lossy text compression, but I've recently started to think about the "reasoning"/output as just a by product of lossy compression, and weights tending towards an average of the information "around" the main topic of prompt. What I've found easier is thinking about it like lossy image compression, generating more output tokens via "reasoning" is like subdividing nearby pixels and filling in the gaps with values that they've seen there before. Taking the analogy a bit too far, you can also think of the vocabulary as the pixel bit depth.

I definitely agree replacing AI or LLMs with "X driven by compressed training data" starts to make a lot more sense, and a useful shortcut.

suprjami•3mo ago
You're right about "reasoning". It's just trying to steer the conversation in a more relevant direction in vector space, hopefully to generate more relevant output tokens. I find it easier to conceptualize this in three dimensions. 3blue1brown has a good video series which covers the overall concept of LLM vectors in machine learning: https://youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_...

To give a concrete example, say we're generating the next token from the word "queen". Is this the monarch, the bee, the playing card, the drag entertainer? By adding more relevant tokens (honey, worker, hive, beeswax) we steer the token generation to the place in the "word cloud" where our next token is more likely to exist.

I don't see LLMs as "lossy compression" of text. To me that implies retrieval, and Transformers are a prediction device, not a retrieval device. If one needs retrieval then use a database.

Terr_•3mo ago
> You're right about "reasoning". It's just trying to steer the conversation in a more relevant direction in vector space, hopefully to generate more relevant output tokens.

I like to frame it as a theater-script cycling through the LLM. The "reasoning" difference is just changing the style so that each character has film noir monologues. The underlying process hasn't really changes, and the monologues text isn't fundamentally different from dialogue or stage-direction... but more data still means more guidance for each improv-cycle.

> say we're generating the next token from the word "queen". Is this the monarch, the bee, the playing card, the drag entertainer?

I'd like to point out that this scheme can result in things that look better to humans in the end... even when the "clarifying" choice is entirely arbitrary and irrational.

In other words, we should be alert to the difference between "explaining what you were thinking" versus "picking a firm direction so future improv makes nicer rationalizations."

esafak•3mo ago
It makes sense if you think of the LLM as building a data-aware model that compresses the noisy data by parsimony (the principle that the simplest explanation that fits is best). Typical text compression algorithms are not data-aware and not robust to noise.

In lossy compression the compression itself is the goal. In prediction, compression is the road that leads to parsimonious models.

astrange•3mo ago
It is not a useful shortcut because you don't know what the training data is, nothing requires it to be an "average" of anything, and post-training arbitrarily re-weights all of its existing distributions anyway.
cruffle_duffle•3mo ago
The way I visualize it is imagining clipping the high frequency details of concepts and facts. These things operate on a different plane of abstraction than simple strings of characters or tokens. They operate on ideas and concepts. To compress, you take out all the deep details and leave only the broad strokes.
kazinator•3mo ago
One day people will say "we used to think the devil is in the details, but now we know it is in their removal".
onraglanroad•3mo ago
> Text copletion driven by compressed training data...solves complex problems

Sure it does. Obviously. All we ever needed was some text completion.

Thanks for your valuable insight.

ToValueFunfetti•3mo ago
Why shouldn't you expect a problem's simplicity to correlate tremendously with how well it is represented in training data? Every angle I can think of tilts in that direction. Simpler problems are easier to remember and thus repeat, they come up more often, asd they require less space/time/effort to record (which also means they are less likely to contain errors).
N_Lens•3mo ago
This is a popular take on HN yet incomplete in its assessment of LLMs and their capabilities.
keeganpoppen•3mo ago
oh man i am pretty tired of the “it’s just autocomplete” armchair warriors… it is an accurate metaphor in only the most pedantic of ways, and has zero explanatory power whatsoever as far as intuition building goes. and i don’t even understand the impulse. “reality is easy, it’s just quantum autocomplete!”
msla•3mo ago
> It's all very clear when you mentally replace "LLM" with "text completion driven by compressed training data".

So you replace a more useful term with a less useful one?

Is that due to political reasons?

msla•3mo ago
> It's all very clear when you mentally replace "LLM" with "text completion driven by compressed training data".

This isn't what LLMs are, of course, but what some political groups insist they are so they can strengthen copyright law by pointing to LLMs as "theft". It's all very pro-Disney, of course.

WhyOhWhyQ•3mo ago
Probably irrelevant, but something funny about claude code is it will routinely say something like "10 week task, very complex", and then one-shot it in 2 minutes. I didn't have it create a feature for a while because it kept telling me it's way too complicated. All of the open source versions I tried weren't working, but I finally just decided to get it to make the feature anyways and it ended up doing better than the open source projects. So there's something off about how well claude estimates the difficulty of things for it, and I'm wondering if that makes it perform worse by not doing things it would do well at.
danielbln•3mo ago
In terms of the time estimates: I've added to my global rules to never give time estimates for tasks, as they're useless and inaccurate.
bavell•3mo ago
I did the same a few weeks back, also difficulty estimates, "impact" analysis and expected performance results - all of which is just hallucinated garbage not worth wasting tokens on.
cruffle_duffle•3mo ago
Same. I dunno how they got trained to spontaneously provide those estimates either. Like they must have read some weird training data related to the phrase “how difficult is this” or something.
jives•3mo ago
I wonder if it's trying to predict what kind of estimate a human engineer would provide.
EGreg•3mo ago
Considering it’s trained on predicting the next word in stuff humans estimated before AI, wouldn’t that make sense?
kridsdale1•3mo ago
A HUGE amount of the workday artifacts engineers have been forced to produce since we started the internet is project estimation documents for our managers. The training corpus on this stuff is immense and now all ingested in to these models. It’s doing no thinking at all when it gives you an estimate, it’s matching correlated strings which the humans of the past had to write down.

Fun fact, all those human-sourced estimates were hallucinations too.

abdullahkhalids•3mo ago
It would be very surprising if the AI training corpus includes a lot of project estimation documentation, since most of those are confidential and not publicly available.
andai•3mo ago
I think there's two aspects to this.

Firstly, Claude's self concept is based around humanity's collective self-concept. (Well, the statistical average of all the self-concepts on the internet.)

So it doesn't have a clear understanding of what LLMs' strengths and weaknesses are, and itself by extension. (Neither do we, from what I gathered. At least, not in a way that's well represented in web scrapes ;)

Secondly, as a programmer I have noticed a similar pattern... stuff that people say is easy turns out to be a pain in the ass, and stuff that they say is impossible turns out to be trivial. (They didn't even try, they just repeated what other people told them was hard, who also didn't try it...)

barren_suricata•3mo ago
Not sure how related this is, but I've noticed it has a tendency to start sentences with usually inflated optimism and I think the idea is that if it has a tendency to intro with "Aha I see it now! The problem is" whatever comes next has a higher tendency to be a correct solution than if you didn't use an overtly positive prefix, even if that leads to a lot of annoying behavior.
AlecSchueler•3mo ago
I've always been taught to slightly overestimate how long something will take so that it reflects better on the team when it's delivered ahead of schedule. There's bound to be a bunch of similar advice and patterns in the training data.
bartwe•3mo ago
Sound a lot like Kolmogorov complexity
baxtr•3mo ago
Kolmogorov complexity is the length of the shortest computer program that can produce a specific object as output. It formalizes the idea that simple objects have short descriptions, while complex (random) objects are incompressible.
dorgo•3mo ago
The complex objects are conceptually similar to prime numbers.
amelius•3mo ago
Compression is a great IQ test, but it's still limited to a small domain.
inavida•3mo ago
My interpretation of the abstract is that humans are pretty good at judging how difficult a problem is and LLMs aren't as reliable, that problem difficulty correlates with activations during inference, and finally that an accurate human judgement of problem difficulty (*as input) leads to better problem solving.

If so, this is a nice training signal for my own neural net, since my view of LLMs is that they are essentially analogy-making machines, and that reasoning is essentially a chain of analogies that ends in a result that aligns somewhat with reality. Or that I'm as crazy as most people seem to think I am.

penguinPhilosop•3mo ago
Umm.. arent the point of analogies is to find similarity between stuff, but reasoning is to find causality between stuff?
inavida•3mo ago
Not sure. I tend to think the "why" of things is always emergent, then applied to analogies.

Honestly I had no idea what to make of the abstract at first so I questioned duck.ai GPT5 mini to try to understand it in my own words, and according to mini, the first paragraph aligns pretty well with the abstract.

The second paragraph is my own opinion, but according to mini, aligns with at least a subset of cognitive theory in the context of problem solving.

I highly recommend asking an LLM to explore this interesting question you've asked. They're all extremely useful for testing assumptions, and the next time I can't sleep I'll probably do so myself.

Personally I haven't had any luck getting an LLM to solve even simple problems, but I suspect I don't know yet how to ask, and it's possible that the people who are building them are still working it out themselves.

amazingman•3mo ago
> Personally I haven't had any luck getting an LLM to solve even simple problems

How are you defining "problem"?

inavida•3mo ago
I had in mind the datasets of Easy2Hard-Bench that the study tested against: math competitions, math word problems, programming, chess puzzles, science QA, and commonsense reasoning.

The last problem like this that I myself asked an LLM to solve was to find tax and base price of items on an invoice given total price and tax rates. I couldn't make sense of the answer, but asking the LLM questions made me realize that I had framed the problem badly, and moreso that I didn't know how to ask. (Though the process also triggered a surprising ability of my own to dredge up and actually apply basic algebra.) I'm sure it's that I'm still learning what and how to ask.

We Mourn Our Craft

https://nolanlawson.com/2026/02/07/we-mourn-our-craft/
172•ColinWright•1h ago•151 comments

Speed up responses with fast mode

https://code.claude.com/docs/en/fast-mode
29•surprisetalk•1h ago•37 comments

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

https://www.forbes.com/sites/mikestunson/2026/02/05/us-jobs-disappear-at-fastest-january-pace-sin...
150•alephnerd•2h ago•100 comments

I Write Games in C (yes, C)

https://jonathanwhiting.com/writing/blog/games_in_c/
20•valyala•2h ago•4 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
123•AlexeyBrin•7h ago•24 comments

SectorC: A C Compiler in 512 bytes

https://xorvoid.com/sectorc.html
16•valyala•2h ago•0 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
65•vinhnx•5h ago•9 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
831•klaussilveira•22h ago•250 comments

Al Lowe on model trains, funny deaths and working with Disney

https://spillhistorie.no/2026/02/06/interview-with-sierra-veteran-al-lowe/
57•thelok•4h ago•8 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
116•1vuio0pswjnm7•8h ago•146 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
1060•xnx•1d ago•612 comments

Reinforcement Learning from Human Feedback

https://rlhfbook.com/
79•onurkanbkrc•7h ago•5 comments

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

https://www.hpcwire.com/off-the-wire/brookhaven-labs-rhic-concludes-25-year-run-with-final-collis...
4•gnufx•53m ago•1 comments

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
486•theblazehen•3d ago•177 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
212•jesperordrup•12h ago•72 comments

France's homegrown open source online office suite

https://github.com/suitenumerique
566•nar001•6h ago•258 comments

Coding agents have replaced every framework I used

https://blog.alaindichiappari.dev/p/software-engineering-is-back
224•alainrk•6h ago•352 comments

A Fresh Look at IBM 3270 Information Display System

https://www.rs-online.com/designspark/a-fresh-look-at-ibm-3270-information-display-system
39•rbanffy•4d ago•7 comments

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

https://github.com/Momciloo/fun-with-clip-path
8•momciloo•2h ago•0 comments

History and Timeline of the Proco Rat Pedal (2021)

https://web.archive.org/web/20211030011207/https://thejhsshow.com/articles/history-and-timeline-o...
19•brudgers•5d ago•4 comments

Selection Rather Than Prediction

https://voratiq.com/blog/selection-rather-than-prediction/
8•languid-photic•3d ago•1 comments

72M Points of Interest

https://tech.marksblogg.com/overture-places-pois.html
29•marklit•5d ago•3 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
114•videotopia•4d ago•31 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
77•speckx•4d ago•82 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
274•isitcontent•22h ago•38 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
201•limoce•4d ago•112 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
287•dmpetrov•22h ago•154 comments

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

https://github.com/sandys/kappal
22•sandGorgon•2d ago•11 comments

Making geo joins faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
155•matheusalmeida•2d ago•48 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
556•todsacerdoti•1d ago•268 comments