frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

LLMs Encode How Difficult Problems Are

https://arxiv.org/abs/2510.18147
59•stansApprentice•3h ago

Comments

jiito•1h ago
I haven't read this particular paper in-depth, but it reminds me of another one I saw that used a similar approach to find if the model encodes its own certainty of answering correctly. https://arxiv.org/abs/2509.10625
kazinator•1h ago
It's all very clear when you mentally replace "LLM" with "text completion driven by compressed training data".

E.g.

[Text copletion driven by compressed training data] exhibit[s] a puzzling inconsistency: [it] solves complex problems yet frequently fail[s] on seemingly simpler ones.

Some problems are better represented by a locus of texts in the training data, allowing more plausible talk to be generated. When the problem is not well represented, it does not help that the problem is simple.

If you train it on nothing but Scientology documents, and then ask about the Buddhist perspective on a situation, you will probably get some nonsense about body thetans, even if the situation is simple.

th0ma5•52m ago
Thank you for posting this. I'm struck with how there is a lot of studying of the behavior and isolating it from other assumptions and then these individual capabilities are then described as a new solution or discovered capability that would work with all of those other assumptions. This makes most all of the LLM research feel like whack a mole if the goal was to make accurate and reliable models by understanding these techniques. Instead, it's more like seeing faces in cars and buildings and other artifacts of patterns and pattern groupings and recognition of patterns. Building houses on sand, etc.
lukev•29m ago
Well, that's what a LLM is. The problem is if one's mental model is built on "AI" instead of "LLM."

The fact that LLMs can abstract concepts and do any amount of out-of-sample reasoning is impressive and interesting, but the null hypothesis for a LLM being "impressive" in any regard is that the data required to answer the question is present in it's training set.

XenophileJKO•4m ago
This is true, but also misleading. We are learning that the models achieve compression by distilling higher level concepts and deriving generalized human like abilities, for example the recent introspection paper from Anthropic.

Two billion email addresses were exposed

https://www.troyhunt.com/2-billion-email-addresses-were-exposed-and-we-indexed-them-all-in-have-i...
118•esnard•1h ago•77 comments

Kimi K2 Thinking, a SOTA open-source trillion-parameter reasoning model

https://moonshotai.github.io/Kimi-K2/thinking.html
447•nekofneko•6h ago•167 comments

Show HN: I scraped 3B Goodreads reviews to train a better recommendation model

https://book.sv
86•costco•1d ago•37 comments

Swift on FreeBSD Preview

https://forums.swift.org/t/swift-on-freebsd-preview/83064
136•glhaynes•3h ago•75 comments

Universe's expansion 'is now slowing, not speeding up'

https://ras.ac.uk/news-and-press/research-highlights/universes-expansion-now-slowing-not-speeding
19•chrka•46m ago•8 comments

Hightouch (YC S19) Is Hiring

https://job-boards.greenhouse.io/hightouch/jobs/5542602004
1•joshwget•7m ago

ICC ditches Microsoft 365 for openDesk

https://www.binnenlandsbestuur.nl/digitaal/internationaal-strafhof-neemt-afscheid-van-microsoft-365
423•vincvinc•4h ago•129 comments

LLMs Encode How Difficult Problems Are

https://arxiv.org/abs/2510.18147
59•stansApprentice•3h ago•5 comments

Open Source Implementation of Apple's Private Compute Cloud

https://github.com/openpcc/openpcc
317•adam_gyroscope•1d ago•60 comments

What if hard work felt easier?

https://jeanhsu.substack.com/p/what-if-hard-work-felt-easier
49•kiyanwang•1w ago•27 comments

You Should Write An Agent

https://fly.io/blog/everyone-write-an-agent/
8•tabletcorry•54m ago•0 comments

The Parallel Search API

https://parallel.ai/blog/introducing-parallel-search
56•lukaslevert•4h ago•29 comments

C++: A prvalue is not a temporary

https://blog.knatten.org/2025/10/31/a-prvalue-is-not-a-temporary/
15•ingve•6d ago•6 comments

I analyzed the lineups at the most popular nightclubs

https://dev.karltryggvason.com/how-i-analyzed-the-lineups-at-the-worlds-most-popular-nightclubs/
125•kalli•7h ago•62 comments

FBI tries to unmask owner of archive.is

https://www.heise.de/en/news/Archive-today-FBI-Demands-Data-from-Provider-Tucows-11066346.html
536•Projectiboga•5h ago•294 comments

Show HN: TabPFN-2.5 – SOTA foundation model for tabular data

https://priorlabs.ai/technical-reports/tabpfn-2-5-model-report
47•onasta•3h ago•11 comments

Eating stinging nettles

https://rachel.blog/2018/04/29/eating-stinging-nettles/
141•rzk•9h ago•146 comments

Black Hole Flare Is Biggest and Most Distant Seen

https://www.caltech.edu/about/news/black-hole-flare-is-biggest-and-most-distant-seen
12•gmays•2h ago•3 comments

Show HN: Dynamic code and feedback walkthroughs with your coding Agent in VSCode

https://www.intraview.ai/hn-demo
10•cyrusradfar•4h ago•0 comments

Springs and Bounces in Native CSS

https://www.joshwcomeau.com/animation/linear-timing-function/
50•Bogdanp•1w ago•5 comments

Mathematical exploration and discovery at scale

https://terrytao.wordpress.com/2025/11/05/mathematical-exploration-and-discovery-at-scale/
208•nabla9•12h ago•95 comments

UK outperforms US in creating unicorns from early stage VC investment

https://www.cityam.com/uk-outperforms-us-in-creating-unicorns-from-early-stage-vc-investment/
31•mmarian•1h ago•21 comments

Benchmarking the Most Reliable Document Parsing API

https://www.tensorlake.ai/blog/benchmarks
21•calavera•3h ago•14 comments

Auraphone: A simple app to collect people's info at events

https://andrewarrow.dev/2025/11/simple-app-collect-peoples-info-at-events/
13•fcpguru•6h ago•7 comments

Show HN: See chords as flags – Visual harmony of top composers on musescore

https://rawl.rocks/
99•vitaly-pavlenko•1d ago•27 comments

Supply chain attacks are exploiting our assumptions

https://blog.trailofbits.com/2025/09/24/supply-chain-attacks-are-exploiting-our-assumptions/
36•crescit_eundo•5h ago•27 comments

Show HN: qqqa – A fast, stateless LLM-powered assistant for your shell

https://github.com/matisojka/qqqa
106•iagooar•10h ago•75 comments

Please stop asking me to provide feedback #8036

https://github.com/anthropics/claude-code/issues/8036
10•jmward01•3h ago•0 comments

IKEA launches new smart home range with 21 Matter-compatible products

https://www.ikea.com/global/en/newsroom/retail/the-new-smart-home-from-ikea-matter-compatible-251...
254•lemoine0461•8h ago•186 comments

I may have found a way to spot U.S. at-sea strikes before they're announced

https://old.reddit.com/r/OSINT/comments/1opjjyv/i_may_have_found_a_way_to_spot_us_atsea_strikes/
243•hentrep•16h ago•343 comments