Is One Layer Enough? A Single Transformer Layer Matches Full-Parameter RL Train

33•tcp_handshaker•2h ago

Comments

usernametaken29•1h ago

If you think about it for some time then you’ll come to realise transformers are autoencoders on steroids. A small input space is expanded onto a big manifold and contracted again. Now, suppose you want to impose a function to regulate the output of an autoencoder. It’s actually pretty obvious that you need exactly one layer to do so… f(manifold).

earthnail•1h ago

Took me a short time to understand what you mean with "autoencoders on steroids", but I believe you mean they are autoencoders with an inverse bottleneck - an intermediate representation that isn't smaller, but that's much larger than the input space. Is my understanding of your comment correct?

usernametaken29•57m ago

Kind of. Autoencoders don’t need to have an embedding that’s smaller than the input. Their only requirement is that they compress information and thus create reconstruction loss. Typically however they are not trained this way because they don’t converge.. transformers do the same thing, but they can squeeze much more bits of information through one pass because the way they are designed. This holds true even for decoder only networks because they’re still doing the same thing

soraki_soladead•1h ago

I might be misunderstanding your point but this conflates the distinguishing features of each. you mention expansion but autoencoders canonically compress their inputs. autoencoders have an explicit encoder and decoder. most transformers we interact with these days (LLMs) are decoder only. the manifold isn't typically something the model is applied to directly. we apply the function/model to the latent representations. those are what live on the manifold.

usernametaken29•36m ago

Now that’s interesting.. what exactly distinguishes latent representations and the manifold? IMHO, those are the same, and you’re constructing a piecewise function of the manifold itself. Decoders also produce manifolds much in the same way, with the distinction being that the encoder isn’t learned but static after initialisation. So fundamentally it is still DOING the same operation.

soraki_soladead•20m ago

The latent representations of the data are like points on a surface. That surface is the manifold. We don't typically have the full manifold and can only sample points from it by embedding data into it.

Worth noting a different manifold "exists" after each transformation (e.g. layer). You only sample from the same manifold when you apply the same transformation(s).

getnormality•25m ago

What you're suggesting seems to go implausibly far beyond what the paper says.

RL post-training alters the parameters of the transformer, while your f(manifold) idea seems to suggest that a new layer on top would suffice, no need to alter the transformer itself at all.

It would be extremely handy if that were so, but I'm guessing it isn't, or it would be the prevailing approach.

Show HN: ZeroFS – A log-structured filesystem for S3

Android Developer Verification: Threat masquerading as Protection

Many people misunderstand the purpose of code review

Is One Layer Enough? A Single Transformer Layer Matches Full-Parameter RL Train

Kimi K2.7 Code is generally available in GitHub Copilot

Vite+ Beta

The fall of the theorem economy

German Button Maker Searched Rivers of American Midwest for Valuable Shells

Hazel (YC W24) Is Hiring for Our Largest Government Contract

ZCode – Harness for GLM-5.2

Oomwoo, an open-source robot vacuum you build yourself

Show HN: Claudoro, Pomodoro timer embedded in the Claude Code statusline

WinPE as a stateless harness for Windows driver testing and fuzzing

Why I'm Forced to Say Farewell: Google Management Has Lost Its Moral Compass

AI fake news complaining about how AI fake news is the death of real news

This blog is written in en-GB

Asymmetric Quantization: Near-Lossless Retrieval with 97% Storage Reduction

Bring back crappy forums

Google loses fight over record $4.7B EU antitrust fine

How to ask for help from people who don't know you

Germany’s Infineon opens major chip plant as EU seeks tech autonomy

What to learn to be a graphics programmer

FFmpeg 9.1's new AAC encoder

Winamp Skin Museum

Why jet engines aren't made in China

Opening up 'Zero-Knowledge Proof' technology to promote privacy in age assurance

Ask HN: Who is hiring? (July 2026)

How do wombats poop cubes? (2021)

Orbital Data Centers: Why the Hype Outpaces Reality

Weave Robotics launches Isaac 1, a $7,999 home robot with Fall 2026 deliveries

Show HN: ZeroFS – A log-structured filesystem for S3

Android Developer Verification: Threat masquerading as Protection

Many people misunderstand the purpose of code review

Is One Layer Enough? A Single Transformer Layer Matches Full-Parameter RL Train

Kimi K2.7 Code is generally available in GitHub Copilot

Vite+ Beta

The fall of the theorem economy

German Button Maker Searched Rivers of American Midwest for Valuable Shells

Hazel (YC W24) Is Hiring for Our Largest Government Contract

ZCode – Harness for GLM-5.2

Oomwoo, an open-source robot vacuum you build yourself

Show HN: Claudoro, Pomodoro timer embedded in the Claude Code statusline

WinPE as a stateless harness for Windows driver testing and fuzzing

Why I'm Forced to Say Farewell: Google Management Has Lost Its Moral Compass

AI fake news complaining about how AI fake news is the death of real news

This blog is written in en-GB

Asymmetric Quantization: Near-Lossless Retrieval with 97% Storage Reduction

Bring back crappy forums

Google loses fight over record $4.7B EU antitrust fine

How to ask for help from people who don't know you

Germany’s Infineon opens major chip plant as EU seeks tech autonomy

What to learn to be a graphics programmer

FFmpeg 9.1's new AAC encoder

Winamp Skin Museum

Why jet engines aren't made in China

Opening up 'Zero-Knowledge Proof' technology to promote privacy in age assurance

Ask HN: Who is hiring? (July 2026)

How do wombats poop cubes? (2021)

Orbital Data Centers: Why the Hype Outpaces Reality

Weave Robotics launches Isaac 1, a $7,999 home robot with Fall 2026 deliveries

Is One Layer Enough? A Single Transformer Layer Matches Full-Parameter RL Train

Comments