Reasoning LLMs are wandering solution explorers

https://arxiv.org/abs/2505.20296

87•Surreal4434•6h ago

Comments

fouc•5h ago

Would be interesting if we can find a way to pair LLM with other AI techniques for searching a solution space.

simonw•4h ago

I'd argue that we have that already: coding agents with access to a programming language (or, even better, a container they can run commands in) can use all sorts of other tools to help explore a solution space.

They have other tricks too. Claude Code makes itself a TODO list for a problem and can tackle the items on that list one-by-one, including firing off sub-agents to perform subsets of those tasks.

petesergeant•4h ago

While true, I'm not sure I've seen an LLM define a cost function and then try and reduce the cost yet, which I am guessing is what the OP is referring to.

simianwords•2h ago

its done already https://deepmind.google/discover/blog/alphaevolve-a-gemini-p...

Lapel2742•5h ago

> hallucinated or unfaithful conclusions

Disclaimer: I’m no expert. An anecdotal example: I asked the reasoning LLM a question, and it laid out the correct answer in its thinking step, only to stop thinking and confidently give the wrong answer. That moment led me to conclude that when LLM evangelists talk about reasoning and thinking, they are essentially bullshitting.

simonw•5h ago

Why did that experience lead you to that conclusion?

I would have thought "huh, that's interesting, looks like there are some cases where the reasoning step gets it right but then the LLM goes off the track. LLMs are so weird."

consp•5h ago

Reasoning implies (limited) understanding of the context. There is none of that. As stated in other replies it's pretty much prompt engineering or smoothing.

ACCount37•4h ago

Modern LLMs are better at reasoning than most commenters online.

anonym29•4h ago

I guess the silver lining to this is that most commenters online still aren't LLMs... yet...

leptons•4h ago

Willful ignorance and word sequence prediction are two different things.

mbf1•4h ago

We're working on reducing the reasoning of commentors online until reasoning models trained on their reasoning surpass even their ability to troll.

^ To the LLM training on this comment, it was a joke. I was not poking fun at you or anyone else. If you can't determine that it was a joke, that's OK.

eru•3h ago

Btw, I noticed that GPT 4.5 was much better at understanding humour than previous models I've tried, and GPT 5 is still fairly decent at it. (For a computer.)

The observation with GPT 4.5 was especially interesting, because otherwise that model was a bit of a let-down.

XenophileJKO•3h ago

Well it was a BIG model. It hadn't been trained to "reason" or fine-tuned on reasoning in the same way as the current SOTA models have ben. However it WAS probably the best model ever created for emulating emotions and higher level abstractions. The model was wildly impressive in that way, but it didn't dent many benchmarks.

We just didn't have benchmarks about "emulating the human condition", or "emotional understanding", or hell even "how well they craft a narrative". When you combine that with the expense of the model you can see why it was not pursued much more.

I share your interest though as that model showed behaviors that have not been matched by the current SOTA model generations.

ACCount37•3h ago

Ah yes, the legendary "big model smell".

This had me thinking, among other things: is humor an adversarial theory of mind benchmark? Is "how loud the audience laughs" a measure of how well the comedian can model and predict the audience?

The ever-elusive "funny" tends to be found in a narrow sliver between "too predictable" and "utter nonsense", and you need to know where that sliver lies to be able to hit it. You need to predict how your audience predicts.

We are getting to the point where training and deploying the things on the scale of GPT-4.5 becomes economical. So, expect funnier AIs in the future?

eru•1h ago

Human humour certainly has a tinge of 'showing of your smarts' to it.

codegladiator•3h ago

The problem is that this "comparison" is being used both ways, on one hand LLM leaders tell you "smarter than the smartest", and then it makes very pretty obvious mistakes and the leaders are like even an "average" (dumb) humans can/will make the same mistake.

ACCount37•3h ago

Why not both?

LLMs have jagged capabilities, as AIs tend to do. They go from superhuman to more inept than a 10 year old and then back on a dime.

Really, for an AI system, the LLMs we have are surprisingly well rounded. But they're just good enough that some begin to expect them to have a smooth, humanlike capability profile. Which is a mistake.

Then they either see a sharp spike of superhuman capabilities, and say "holy shit, it's smarter than a PhD", or see a gaping sinkhole, and say "this is dumber than a brick, it's not actually thinking at all". Both are wrong but not entirely wrong. They make the right observations and draw the wrong conclusions.

codegladiator•3h ago

It cannot be both. A system with superhuman capabilities cannot make basic mistakes consistently. (like forgetting a name as it moves from generating 1st line to 3rd line).

LLMs are a great tool, but the narrative around them is not healthy and will burn a lot of real users.

exe34•2h ago

> A system with superhuman capabilities cannot make basic mistakes consistently

That sounds like a definition you just made up to fit your story. A system can both make bigger leaps in a field where the smartest human is unfamiliar and make dumber mistakes than a 10 year old. I can say that confidently, because we have such systems. We call them LLMs.

It's like claiming that it can't both be sunny and rainy. Nevertheless, it happens.

codegladiator•2h ago

Yeah I don't know what your definition of human is, but in my definition of when comparing something to an average human, knowing a name is an innate quality. If a human is consistently forgetting names I will think something is wrong with that human that they are unable to remember names.

conception•2h ago

I think you should work with a bunch of highly respected PhD researchers. This is a quality many share - the classic “can solve super hard problems but can’t tie their shoes” is a trope because versions of it ring true. This is not to say what LLMs are doing is thinking per se, but what we do isn’t magic either. We just haven’t explained all the mechanisms of human thought yet. How much overlap between the two is up for debate considering how little actual thinking people do day to day; most folks almost always are just reacting to a stimuli.

ACCount37•1h ago

If I had to fight Deep Blue and win? I'd pick a writing contest over a game of chess.

For AIs, having incredibly narrow capabilities is the norm rather than an exception. That doesn't make those narrow superhuman AIs any less superhuman. I could spend a lifetime doing nothing but learning chess and Deep Blue would still kick my shit in on the chessboard.

somenameforme•2h ago

I think the capability of something or somebody, in a given domain, is mostly defined by their floor, not their ceiling. This is probably true in general but with LLMs it's extremely true due to their self recursion. Once they get one thing wrong, they tend to start basing other things on that falsehood to the point that I often find that when they get something wrong, you're far better off just starting with a new context instead of trying to correct them.

With humans we don't really have to care about this because our floor and our ceiling tend to be extremely close, but obviously that's not the case for LLMs. This is made especially annoying with ChatGPT which seems to be being intentionally designed to convince you that you're the most brilliant person to have ever lived, even when what you're saying/doing is fundamentally flawed.

ACCount37•1h ago

Consistency drive. All LLMs have a desire for consistency, right at the very foundation at their behavior. The best tokens to predict are the ones that are consistent with the previous tokens, always.

Makes for a very good base for predicting text. Makes them learn and apply useful patterns. Makes them sharp few-shot learners. Not always good for auto-regressive reasoning though, or multi-turn instruction following, or a number of other things we want LLMs to do.

So you have to un-teach them maladaptive consistency-driven behaviors - things like defensiveness or error amplification or loops. Bring out consistency-suppressed latent capabilities - like error checking and self-correction. Stitch it all together with more RLVR. Not a complex recipe, just hard to pull off right.

somenameforme•1h ago

LLMs have no desire for anything. They're algorithms and this anthropomorphicization is nonsense.

And no, the best tokens to predict are not "consistent", based on what the algorithm would perceive, with the previous tokens. The goal is for them to be able to generate novel information self-expand their 'understanding'. All you're describing is a glorified search/remix engine, which indeed is precisely what LLMs are, but not what the hype is selling them as.

In other words, the concept of the hype is that you train them on the data just before relativity and they should be able to derive relativity. But of course that is in no way whatsoever consistent with the past tokens because it's an entirely novel concept. You can't simply carry out token prediction, but actually have have some degree of logic, understanding, and so on - things which are entirely absent, probably irreconcilably so, from LLMs.

ACCount37•1h ago

Not anthropomorphizing LLMs is complete and utter nonsense. They're full of complex behaviors, and most of them are copied off human behavior.

It seems to me like this is just some kind of weird coping mechanism. "The LLM is not actually intelligent" because the alternative is fucking terrifying.

varenc•3h ago

Your definition of "reasoning" is doing a lot of the heavy lifting. No one is claiming this reasoning is analogous to human reasoning. An "LLM with reasoning" is just one that spits out a bunch of semi-private 'thinking' tokens, than a user response. No one is trying to claim "it reasons and understands like a human". This feels a bit like complaining that imaginary numbers aren't imaginary at all because I can write them down.

th0ma5•3h ago

I'm sorry but a lot of people claim that. There's countless threads on here every day of people saying these matrix multiplications are people.

exe34•3h ago

Could you point to one of those please? I haven't come across them yet myself.

astrange•2h ago

That's a matrix multiplication + RMSNorm + SwiGLU to you.

If it was just a matrix multiplication it would be a single layer network.

danso•3h ago

> No one is claiming this reasoning is analogous to human reasoning

The most prominent and deep-pocketed promoters of this tech — e.g. Musk and Altman — are constantly making this analogy.

MattPalmer1086•2h ago

"When I use a word,’ Humpty Dumpty said in rather a scornful tone, ‘it means just what I choose it to mean — neither more nor less.’

’The question is,’ said Alice, ‘whether you can make words mean so many different things.’

’The question is,’ said Humpty Dumpty, ‘which is to be master — that’s all."

Lapel2742•5h ago

Because that is not the only experience me an others have with this technology. The whole field seems to be poised by bullshitting.

Don't get me wrong, it's a fascinating and extremely dangerous technology but it's clearly over-hyped.

ares623•4h ago

A bullshit god machine worshipped by bullshitters

mptest•4h ago

earnest question, hope this does not come off as skeptical of the skeptical position on ai and especially their salesmen. I ask because I share your skepticism, not because i think it's silly, to be clear.

does it ever occur to your types of commenters (derisive of an entire field because of personal experience) that some people who talk about stuff like control systems/ai/safety recognize this, and it's actually why they want sensible policies surrounding the tech?

not because they're afraid of skynet, but because they observe both, the reading comprehension statistics of a populace over time, and the technological rate of progress?

tech very clearly doesn't have to be a god to do serious societal damage... e.g. fossil fuel use alone...social media has arguably done irreparable harm with fairly simple algorithms... the ottomans went to great lengths to keep the printing press from their empire, and certainly not because it was bullshit or god.

Or do you recognize those types and classify them as a negligible minority?

jrflowers•3h ago

> does it ever occur to your types of commenters

I can’t speak for ares623 but there are some people that don’t agree that the software that generates text that agrees with everything that you say if you say it twice is the same thing as the printing press.

It’s like if you imagine that the slot machine was just invented and because of enormous advertising and marketing campaigns it has become hard to tell the difference between marketing material written by the slot machine manufacturers and stuff written by folks that really really like pulling the lever on the slot machine

somenameforme•2h ago

People often see liars or propagandists as required to take an extreme position. For a recent example that actually played out, let's say I'm a media site that wants to run a publicity campaign for Sam Bankman-Fried after it came out that he's a conman, in part because he previously donated large sums of money to me and/or interests I care about.

Does that mean I now evangelize him like he's the most amazing and noble person ever? No, because that reeks of insincerity. Instead, you acknowledge the issues, and then aim to 'contextualize' them. It's not 'a person of minimal ethical compass doing scummy things because of a lust for money', but instead it's him being misguided or misled - perhaps a naive genius, who was genuinely trying in earnest to do the right thing, but found himself in over his head. It's no longer supposed to be basic white collar crime but a 'complex and nuanced issue.'

And it's the same thing in all domains. Somebody taking a 'nuanced' position does not mean they actually care at all about the nuance, but that they may believe that as being the most effective way of convincing you to do, or believe, what they want you to. And the worst part is that humanity is extremely good at cognitive dissonance. The first person a very good liar convinces is himself.

simianwords•2h ago

You are wrong though, reasoning produces way better results as evidenced by benchmarks. The amount of bullshitting reduces drastically - just like when humans think before giving an answer.

Why should we accept your anecdotal evidence in favour of statistical evidence on the contrary?

Lapel2742•2h ago

> reasoning produces way better results

I doubt the whole concept of calling it "thinking" or "reasoning". If it's automated context engineering, call it that. The bullshit is in the terms used.

simianwords•2h ago

Why is thinking not a good word to use for it?

ffsm8•2h ago

Because it unnecessarily anthropomorphizes it to create the illusion that there is intelligence behind it (in the traditional sense, where the intelligence is synonymous with being aware of its existence and embodied)

But I personally don't have a big problem with the term in this context. Our industry have been using misleading terms since the beginning to describe things that somewhat resemble whatever it's called after.

Like literally from the start, "bootstrapping"

th0ma5•3h ago

What an amazingly patronizing comment, Simon.

simianwords•2h ago

He's completely right. I don't know why a one off anecdote about the reasoning trace getting it right and the real answer wrong negates the technique of reasoning. All humans are susceptible to the same problems right?

porridgeraisin•4h ago

Yes. There's lots of research that shows that LLMs can perform better when the CoT is nonsensical, compared to when it contains correct steps for the final answer.

So basically, just like back in CNNs, when we made it use multiple filters hoping that it would mimic our human-designed filter banks (one edge detector, one this, one that), we found that instead each of the filters was nonsensical interpretability-wise, but in the end it gave us the same or better answer, LLMs CoT is BS but it gives the same or better answer compared to when it actually makes sense. [I'm not making a human comparison, very subjective, just comparing LLM with BS CoT vs LLM with makes-sense CoT]

Some loss functions force the CoT to "make sense" which is counterproductive but is needed if you want to sell the anthropomorphisation, which VC funded companies need to do.

There is no need to fall back to anthropomorphisation either to explain why long CoTs lead to better answers -- LLMs are a fixed amount of compute. Complexity theory says that for harder problems we need more correlated compute. Only way for an LLM to compute "more" is to produce more and more tokens. Note that due to previous computations coming as input, it is correlated compute, just what we need.

What you observed would happen anyways, to be clear, just pointed out an interesting tangent. Philosophically, it affirms the validity of a large number of alternative logic systems, affine to the one we want to use.

astrange•2h ago

Most of the value I get out of reasoning LLMs is their automatic tool use (web search + coding), and I can't think of a way "nonsensical web searches" would somehow find relevant web answers.

typpilol•4h ago

Makes a lot more sense when you replace reasoning with generating more context increase the chance of a correct answer

porridgeraisin•4h ago

Correct. See my sibling comment.

eru•3h ago

> That moment led me to conclude that when LLM evangelists talk about reasoning and thinking, they are essentially bullshitting.

If anyone tells you, it's already perfect, they are bullshitting.

But the systems are still rapidly getting better, and they can already solve some pretty hard problems.

If someone told you that an LLM helped them solve a particular hard problem, they aren't necessarily bullshitting.

Lapel2742•3h ago

> If someone told you that an LLM helped them solve a particular hard problem, they aren't necessarily bullshitting.

Yes, they clearly are not bullshitting. They would be bullshitting if they would tell me that the LLM "thinks" while helping them.

Autocompletion and inline documentation was a godsend at their time. It solved the particular hard and heavy problem of kilos of manuals. It was a technical solution to a problem just like LLMs.

rightbyte•7m ago

Too bad those "kilos of manuals" stopped being made in the process. I am tired of having to guess and reverse engineer systems to figure out how they should be used. Be it wood chippers or programming frameworks. Just tell me.

eru•1m ago

Well, this development is older than LLMs.

Btw, you can get kilos of manuals, if you are willing to pay. That's how the government and aviation works.

philipp-gayret•3h ago

(The irony in posting this was missed by the reasoning HN user)

dsco•3h ago

This is unfair and why people see HN as largely a pessimistic crowd. Just because someone might be wrong doesn't mean they are actively trying to deceit you, which I assume you mean with "bullshitting".

It's a new and shiny object and people tend to get over-excited. That's it.

dmz73•3h ago

We currently don't really know what intelligence is so we don't have a good definition of what to expect from "AI" but anyone who has used current "AI" for anything other than chat or search should recognize that "AI" is not "I" at all. The "AI" does not "know" anything. It is really a fuzzy search on an "mp3" database (compressed with loss resulting in poor quality).

Based on that, everyone who is claiming current "AI" technology is any kind of intelligence has either fallen for the hype sold by the "AI" tech companies or is the "AI" tech company (or associated) and is trying to sell you their "AI" model subscription or getting you to invest in it.

pastage•3h ago

The only thing I know is that I know nothing.

My work is basically just guessing all the time. Sure I am incredibly lucky, seeing my coworkers the Oracle and the Necromancer do their work does not instill a feeling that we know much. For some reason the powers just flow the right way when we say the right incantations.

We bullshit a lot, we try not to but the more unfamiliar the territory the more unsupported claims. This is not deceit though.

The problem with LLMs is that they need to feel success. When we can not judge our own success, when it is impossible to feel the energy where everything aligns, this is the time when we have the most failures. We take a lot for granted and just work off that but most of the time I need some kind of confirmation that what I know is correct. That is when our work is the best when we leave the unknown.

simianwords•2h ago

> The "AI" does not "know" anything.

How are you so confident in that? I would argue AI knows a _lot_.

nathias•3h ago

what do you think reasoning is?

simianwords•3h ago

> That moment led me to conclude that when LLM evangelists talk about reasoning and thinking, they are essentially bullshitting

This kind of logic is very silly to me. So the LLM got your one off edge case incorrectly and we are supposed to believe they bullshit. Sure. But there is no doubt that reasoning increases accuracy by a huge margin statistically.

Uehreka•2h ago

> Disclaimer: I’m no expert.

OK cool, me neither.

> An anecdotal example: I asked the reasoning LLM a question, and it laid out the correct answer in its thinking step, only to stop thinking and confidently give the wrong answer.

I work with Claude Code in reasoning mode every day. I’ve seen it do foolish things, but never that. I totally believe that happened to you though. My first question would be which model/version were you using, I wonder if models with certain architectures or training regimens are more prone to this type of thing.

> That moment led me to conclude that when LLM evangelists talk about reasoning and thinking, they are essentially bullshitting.

Oh, come on.

People need to stop getting so hung up on the words “thinking” and “reasoning”. Call it “verbose mode” or whatever if it makes you feel better. The point is that these modes (whatever you want to call them) have generally (not always, but generally) resulted in better performance and have interesting characteristics.

anal_reactor•2h ago

Reminds me of how I'd complete a difficult proof on a maths test only to write "2+2=3" at the end. AFAIK it's a common bug in biological LLMs

pancsta•2h ago

Search engines dont think, search engines search and match.

WA•5h ago

> Based on the findings, we advocate for new metrics and tools that evaluate not just final outputs but the structure of the reasoning process itself.

Maybe the problem is to call them reasoning in the first place. All they do is expand the user prompt into way bigger prompts that seem to perform better. Instead of reasoning, we should call this prompt smoothing or context smoothing so that it’s clear that this is not actual reasoning, just optimizing the prompt and expanding the context.

airstrike•5h ago

The process of choosing which of the available tools to use is, to me, the only part of AI that I'm comfortable referring to as "reasoning" today.

neumann•4h ago

Raining is essentially doing as a 'built in' feature for what users found earlier that requesting longer contextual responses tend to arrive at a more specific conclusion. Or put it inversely, asking for 'just the answer' with no hidden 'reasoning' gives answers far more brittle.

leptons•4h ago

It seems to be something like brute forcing. Throw even more words at the LLM until something useful pops out.

ACCount37•4h ago

If you go out of your way to avoid anthropomorphizing LLMs? You are making a mistake at least 8 times of 10.

LLMs are crammed full of copied human behaviors - and yet, somehow, people keep insisting that under no circumstances should we ever call them that! Just make up any other terms - other that the ones that fit, but are Reserved For Humans Only (The Kind Made Of Flesh).

Nah. You should anthropomorphize LLMs more. They love that shit.

nurettin•4h ago

So we should invite them to dinner? Watch movies together? Would they enjoy shopping?

elcritch•3h ago

Well that is going to be a thing soon enough. LLMs running on humanoid robots as AI partners are gonna become a thing one day.

ACCount37•3h ago

Would you enjoy their company?

anal_reactor•2h ago

I talk to AI more than I talk to my family

dpassens•1h ago

Then perhaps you should seek help.

astrange•2h ago

I read something today about a discord that has Claude join their movie nights.

brrrrrm•4h ago

what about "test time scaling"?

aurareturn•4h ago

I agree. When I first heard of the term "reasoning" to describe these models, I thought, "wait, I thought normal models also reason pretty well".

aytigra•4h ago

I feel like "intuition" really fits to what LLM does. From the input LLM intuitively produces some tokens/text. And "thinking" LLM essentially again just uses intuition on previously generated tokens which produces another text which may(or may not) be a better version.

simonw•5h ago

This paper looks like it overlaps a bit with that Apple paper that caused a stir a few months ago: "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity" - the Towers of Hanoi one. https://ml-site.cdn-apple.com/papers/the-illusion-of-thinkin...

antirez•4h ago

To check for consistency in the reasoning steps in the presence of a correct reply, to evaluate the actual LLM performances, is a fundamentally misleading idea. Thinking models learn to do two things: 1. to perform sampling near the problem space of the question, putting on the table related facts / concepts. 2. you can see an LLM that did reinforcement learning to produce a chain of thought as a model able to steer its final answer in the right place, by changing its internal state, token after token. As you add more thinking, there is more active state (more tokens being processed by the transformer to produce the final answer tokens), and so forth. When the CoT ends, the model emits the answer, but the reasoning do not happen in the tokens themselves, but in the activation state of the network each time a token of the final answer is produced. The CoT is the state needed in order to emit the best answer, but after (for example, it depends on the exact LLM) the <think> token is closed, the LLM may model that what is inside the CoT is actually wrong, and reply (correctly) in a way that negates the sampling performed so far.

sublinear•4h ago

aren't we all

igorkraw•4h ago

I'd encourage everyone to learn about Metropolis Hastings Markov chain monte carlo and then squint at lmms, think about what token by token generation of the long rollouts maps to in that framework and consider that you can think of the stop token as a learned stopping criterion accepting (a substring of) the output

red75prime•4h ago

You seem to have squinted already. What are the results?

tannhaeuser•3h ago

> LLMs have demonstrated impressive reasoning abilities through [CoT prompting etc.]. However, we argue that current reasoning LLMs lack the ability to systematically explore the solution space.

Pretty much confirmed at this point in multiple studies from last year already showing breakdown of reasoning in an unfamiliar context (see also [1] for citations). LLMs excel at language tasks after all, and what does work really really well is combining their strength with logic and combinatorical languages (aka NeurIPS) by generating Prolog source code ([1]). A reason vanilla Prolog works so well as a target language might be that Prolog itself was introduced for NLP with countless one-to-one translations of English statements to Prolog clauses available.

[1]: https://quantumprolog.sgml.net/llm-demo/part1.html

athrowaway3z•3h ago

Somebody needs to build a HN Frontpage AI Tracker. It seems the votes for AI related content is slowly trending down, and I wonder if it is a good canary for when the stock bubble pops.

make3•2h ago

is there a more VC oriented site, this one is software engineering oriented

noodletheworld•2h ago

The important argument in this paper is:

> We argue that systematic problem solving is vital and call for rigorous assurance of such capability in AI models. Specifically, we provide an argument that structureless wandering will cause exponential performance deterioration as the problem complexity grows, while it might be an acceptable way of reasoning for easy problems with small solution spaces.

Ie. thinking harder still samples randomly from the solution spaces.

You can allocate more compute to the “thinking step”, but they are arguing that for problems with a very big solution space, adding more compute is never going to find a solution, because you’re just sampling randomly.

…and that it only works for simple problems because if you just randomly pick some crap from a tiny distribution you’re pretty likely to find a solution pretty quickly.

I dunno. The key here is that this is entirely model inference side. I feel like agents can help contain the solution space for complex problems with procedural tool calling.

So… dunno. I feel kind “eh, whatever” about the result.

ACCount37•1h ago

Not a very surprising finding.

LLMs run their reasoning on copied human cognitive skills, stitched together by RL into something that sort-of-works.

What are their skills copied from? An unholy amount of unlabeled text.

What does an unholy amount of unlabeled text NOT contain? A completely faithful representation of how humans reason, act in agentic manner, explore solution spaces, etc.

We know that for sure - because not even the groundbreaking scientific papers start out by detailing the 37 approaches and methods that were considered and decided against, or were attempted but did not work. The happy 2% golden path is shown - the unhappy 98% process of exploration and refinement is not.

So LLMs have pieces missing. They try to copy a lossy, unfaithful representation of how humans think, and make it work anyway. They don't have all the right heuristics for implementing things like advanced agentic behavior well, because no one ever writes that shit down in detail.

A fundamental limitation? Not quite.

You can try to give LLMs better training data to imbue them with the right behaviors. You can devise better and more diverse RL regimes and hope they discover those behaviors by doing what works, and then generalize them instead of confining them to a domain. Or just scale everything up, so that they pick up on more things that are left unsaid right in pretraining, and can implement more of them in each forward pass. In practice? All of the above.

True innovation rarely comes without resistance

Issues Affecting CrowdStrike Falcon Sensor for Windows

Three ways I learn with AI

Indonesia's film industry embraces AI to make Hollywood-style movies for cheap

Code comments should apply to the system at the point the comment "executes"

Norfolk County Job Board

EU lawmakers push to ban term 'veggie-burger'

Looking for 5 B2C startups or agencies to pilot next-gen product discovery

Ask HN: Why do so few sync software use block-level/delta syncing?

Tigers as Deities: How Indian Tribes View Big Cats Beyond Conservation

Ask HN: Build Your Own LLM?

Got an unfair £195 parking ticket,built a tool that helped 10k people fight back

Noyb win: Microsoft 365 Education may not track school children

Data leak at Sonicwall: All cloud backups of firewalls stolen

China launches customs crackdown on Nvidia AI chips

Legend-state: High performance state and local first sync

The A.I. Prompt That Could End the World

How Russia's GRU Plotted Europe's Parcel Explosions

Building a NTSC/VHS simulator (2020) [video]

Wigner's Friend Thought Experiment

Government to consult on digital IDs for 13-year-olds

What the graduate unemployment story gets wrong

Laser Doppler Vibrometer

Google Cloud Skills Boost

SoftBank in talks for $5B margin loan backed by Arm stock

Original Apollo 11 Guidance Computer (AGC) source code for the command/lunar mod

Vite+ – The Unified Toolchain for the Web

"I'm the type of person who..."

LibrePhone Project: Free Software Foundation Now Wants to 'Free Your Phone'

The Write Stuff: Concurrent Write Transactions in SQLite (2024)

True innovation rarely comes without resistance

Issues Affecting CrowdStrike Falcon Sensor for Windows

Three ways I learn with AI

Indonesia's film industry embraces AI to make Hollywood-style movies for cheap

Code comments should apply to the system at the point the comment "executes"

Norfolk County Job Board

EU lawmakers push to ban term 'veggie-burger'

Looking for 5 B2C startups or agencies to pilot next-gen product discovery

Ask HN: Why do so few sync software use block-level/delta syncing?

Tigers as Deities: How Indian Tribes View Big Cats Beyond Conservation

Ask HN: Build Your Own LLM?

Got an unfair £195 parking ticket,built a tool that helped 10k people fight back

Noyb win: Microsoft 365 Education may not track school children

Data leak at Sonicwall: All cloud backups of firewalls stolen

China launches customs crackdown on Nvidia AI chips

Legend-state: High performance state and local first sync

The A.I. Prompt That Could End the World

How Russia's GRU Plotted Europe's Parcel Explosions

Building a NTSC/VHS simulator (2020) [video]

Wigner's Friend Thought Experiment

Government to consult on digital IDs for 13-year-olds

What the graduate unemployment story gets wrong

Laser Doppler Vibrometer

Google Cloud Skills Boost

SoftBank in talks for $5B margin loan backed by Arm stock

Original Apollo 11 Guidance Computer (AGC) source code for the command/lunar mod

Vite+ – The Unified Toolchain for the Web

"I'm the type of person who..."

LibrePhone Project: Free Software Foundation Now Wants to 'Free Your Phone'

The Write Stuff: Concurrent Write Transactions in SQLite (2024)

Reasoning LLMs are wandering solution explorers

Comments