Scaling Latent Reasoning via Looped Language Models

84•remexre•1mo ago

Comments

kelseyfrog•1mo ago

If you squint your eyes it's a fixed iteration ODE solver. I'd love to see a generalization on this and the Universal Transformer metioned re-envisioned as flow-matching/optimal transport models.

kevmo314•1mo ago

How would flow matching work? In language we have inputs and outputs but it's not clear what the intermediate points are since it's a discrete space.

Etheryte•1mo ago

One of the core ideas behind LLMs is that language is not a discrete space, but instead a multidimensional vector field where you can easily interpolate as needed. It's one of the reasons LLMs readily make up words that don't exist when translating text for example.

kevmo314•1mo ago

Not the input and output though, which is the important part for flow matching modeling. Unless you're proposing flow matching over the latent space?

cfcf14•1mo ago

This makes me think it would be nice to see some kinda child of modern transformer architecture and neural ODEs. There was such interesting work a few years ago on how neural ode/pdes could be seen as a sort of continuous limit of layer depth. Maybe models could learn cool stuff if the embeddings were somehow dynamical model solutions or something.

the8472•1mo ago

Does the training process ensure that all the intermediate steps remain interepretable, even on larger models? Not that we end up with some alien gibberish in all but the final step.

oofbey•1mo ago

Training doesn’t encourage the intermediate steps to be interpretable. But they are still in the same token vocabulary space, so you could decode them. But they’ll probably be wrong.

the8472•1mo ago

token vocabulary space is a hull around human communication (emoji, mathematical symbols, unicode scripts, ...), inside that there's lots of unused representation space that an AI could use to represent internal state. So this seems to be bad idea from an safety/oversight perspective.

https://openai.com/index/chain-of-thought-monitoring/

oofbey•1mo ago

What is a bad idea? Allowing reasoning to happen in continuous space instead of discrete token space? This paper can be seen as a variant of the Coconut models (continuous chain of thought). Continuous reasoning is certainly more efficient when it works. Lack of interpret ability makes certain safety systems harder to enforce. Is that your point?

the8472•1mo ago

Yes. Coconut has the same issue. See also: a joint statement by researchers from several labs about CoT monitorability: https://arxiv.org/abs/2507.11473

oofbey•1mo ago

Interesting. Thanks for the reference!

It's hard to know which way this will go. Discrete/text reasoning has many advantages. Safety as you note. Interpretability, which is closely related. Interoperability - e.g. the fact that you can switch models mid-discussion in Cursor and the new model understands the previous model's CoT just fine, or the ability to use reasoning traces from a larger model to train a smaller model to reason.

Continuous latent reasoning is a big hassle, but wins on efficiency, and in some situations I'm sure people will decide that benefit is worth the hassle. Because efficiency is fighting physics, which is hard to argue with on small devices with batteries. So my guess is that we'll see some of each approach in the future - with most cloud stuff being discrete, and a few highly-tuned edge applications being continuous.

Safety is a multi-faceted problem. I think it's easy to over-index on it because the impacts can be so huge. But there are so many different ways to approach the problem, and we must not rely on any one of them. It's like cyber-security - you need to use defense in depth. And sometimes it makes sense to sacrifice one kind of protection in order to get some convenience. e.g. if you decide to use continuous reasoning, that probably means you need to write a custom classifier to detect mal-intent rather than relying on an off-the-shelf LLM to analyze the reasoning trace. So I wouldn't ever take a position like "nobody should ever use continuous reasoning because it's too dangerous" - it just means that kind of safety protection needs to be applied differently.

lukebechtel•1mo ago

so it's:

output = layers(layers(layers(layers(input))))

instead of the classical:

output = layer4(layer3(layer2(layer1(input))))

oofbey•1mo ago

Yeah if layers() is a shortcut for layer4(layer3(layer2(layer1(input)))). But sometimes it’s only

output = layers(input)

output = layers(layers(input))

Depends on how difficult the token is.

remexre•1mo ago

Or more like,

    x = tokenize(input)
    i = 0
    do {
      finish, x = layers(x)
    } while(!finish && i++ < t_max);
    output = lm_head(x)

oofbey•1mo ago

That’s closer still. But even closer would be:

    x = tokenize(input)
    i = 0
    finish = 0
    do {
      p, x = layers(x)
      finish += p
    } while(finish < 0.95 && i++ < t_max);
    output = lm_head(x)

Except the accumulation of the stop probabilities isn’t linear like that - it’s more like a weighted coin model.

Do you have a mathematically attractive face?

Code only says what it does

The success of 'natural language programming'

The Scriptovision Super Micro Script video titler is almost a home computer

Discovering the "original" iPhone from 1995 [video]

Psychometric Comparability of LLM-Based Digital Twins

SidePop – track revenue, costs, and overall business health in one place

The Other Markov's Inequality

The Cascading Effects of Repackaged APIs [pdf]

Lightweight and extensible compatibility layer between dataframe libraries

Haskell for all: Beyond agentic coding

Dorsey's Block cutting up to 10% of staff

Show HN: Freenet Lives – Real-Time Decentralized Apps at Scale [video]

In the AI age, 'slow and steady' doesn't win

Administration won't let student deported to Honduras return

How were the NIST ECDSA curve parameters generated? (2023)

AI, networks and Mechanical Turks (2025)

Goto Considered Awesome [video]

Show HN: I Built a Free AI LinkedIn Carousel Generator

Implementing Auto Tiling with Just 5 Tiles

Open Challange (Get all Universities involved

Apple Tried to Tamper Proof AirTag 2 Speakers – I Broke It [video]

Show HN: Isolating AI-generated code from human code | Vibe as a Code

Show HN: More beautiful and usable Hacker News

Toledo Derailment Rescue [video]

War Department Cuts Ties with Harvard University

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

A Bid-Based NFT Advertising Grid

AI readability score for your documentation

NASA Study: Non-Biologic Processes Don't Explain Mars Organics

Do you have a mathematically attractive face?

Code only says what it does

The success of 'natural language programming'

The Scriptovision Super Micro Script video titler is almost a home computer

Discovering the "original" iPhone from 1995 [video]

Psychometric Comparability of LLM-Based Digital Twins

SidePop – track revenue, costs, and overall business health in one place

The Other Markov's Inequality

The Cascading Effects of Repackaged APIs [pdf]

Lightweight and extensible compatibility layer between dataframe libraries

Haskell for all: Beyond agentic coding

Dorsey's Block cutting up to 10% of staff

Show HN: Freenet Lives – Real-Time Decentralized Apps at Scale [video]

In the AI age, 'slow and steady' doesn't win

Administration won't let student deported to Honduras return

How were the NIST ECDSA curve parameters generated? (2023)

AI, networks and Mechanical Turks (2025)

Goto Considered Awesome [video]

Show HN: I Built a Free AI LinkedIn Carousel Generator

Implementing Auto Tiling with Just 5 Tiles

Open Challange (Get all Universities involved

Apple Tried to Tamper Proof AirTag 2 Speakers – I Broke It [video]

Show HN: Isolating AI-generated code from human code | Vibe as a Code

Show HN: More beautiful and usable Hacker News

Toledo Derailment Rescue [video]

War Department Cuts Ties with Harvard University

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

A Bid-Based NFT Advertising Grid

AI readability score for your documentation

NASA Study: Non-Biologic Processes Don't Explain Mars Organics

Scaling Latent Reasoning via Looped Language Models

Comments