Deficient executive control in transformer attention

https://academic.oup.com/pnasnexus/article/5/6/pgag149/8698838

16•derbOac•1h ago

Comments

ivanvoid•1h ago

this is a nice study but i don’t think it’s actually good argument

quotemstr•1h ago

The first thing I do when I see a paper that claims transformers fundamentally can't do X or Y is to look at the models under test:

> To evaluate generalizability, we conducted tests of GPT-5 (41), Claude Opus 4.1 (42), and Gemini 2.5 Pro (43) from 2025 September

The problem with empirical negative results on LLMs is that they can't rule out that the alleged deficiencies disappear with increased scale and the right fine-tuning. It's like saying my dog has trouble with subject-verb agreement, so meat brains are "fundamentally limited in their capacity for grammar".

I can accept that current LLMs (even latest generation) might exhibit cognitive gaps similar to those we see in humans with deficient executive function, I can't accept these gaps as evidence of fundamental limits of the transformer architecture. LLMs are universal function approximators. Executive function is a function. Yes, yes, it's well-known that transformers have a circuit complexity limit set by layer count and whatever. The limit disappears once you allow for autoregression. Nobody cares about the limits of AI inside a single forward pass.

I have high confidence that with the right sort of training, executive function gaps in LLM can be addressed. I'm not convinced that the problem is the architecture per se.

fc417fc802•29m ago

> they lack an explicit architecture for the executive control of attention found in humans

Deceptive terminology strikes again! The "attention" mechanism in transformers appears (to my understanding at least) to have about as much to do with human attention as the "neurons" in a multi-layer perceptron have to do with biological neurons.

That said, the core premise of building in something that mimics executive function is an intriguing one (which I assume has been explored before but it's not something I'm familiar with).

Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable

AI agent runs amok in Fedora and elsewhere

πFS

Raspberry Pi 5 – 16GB RAM

Anthropic requires 30 day data retention for Fable and Mythos

A Written Language for the Cherokee So Efficient It Was Thought to Be Magic

I'm Eric Ries, author of "The Lean Startup" and new book "Incorruptible" – AMA

How JPL keeps the 13-year-old Curiosity rover doing science

PgDog is funded and coming to a database near you

L'Affaire Siloxane

What is it like to be a bat? (1974) [pdf]

Deficient executive control in transformer attention

GeoLibre 1.0

Show HN: Extend UI – open-source UI kit for modern document apps

World Capitals Voronoi

Farmer donates land for a park, city sells it for $10M as data center land

Who's the smartest corvid?

Show HN: HelixDB – A graph database built on object storage

Building an HTML-first site doubled our users overnight

Unix GC Remastered

Claude Desktop spawns 1.8 GB Hyper-V VM on every launch, even for chat-only use

Apache Burr: Build reliable AI agents and applications

Authentication issues related to API requests

Why are there so many canines in fine art?

Computer Lessons

All 9,300 Japanese train station, animated by the year it opened (1872–2026)

Anthropic's model naming, extrapolated

Smudging the game disc to make speedrunning 'SpongeBob' faster

A €0.01 bank transfer could compromise a banking AI agent

Policy on the AI Exponential

Deficient executive control in transformer attention

Comments

Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable

AI agent runs amok in Fedora and elsewhere

πFS

Raspberry Pi 5 – 16GB RAM

Anthropic requires 30 day data retention for Fable and Mythos

A Written Language for the Cherokee So Efficient It Was Thought to Be Magic

I'm Eric Ries, author of "The Lean Startup" and new book "Incorruptible" – AMA

How JPL keeps the 13-year-old Curiosity rover doing science

PgDog is funded and coming to a database near you

L'Affaire Siloxane

What is it like to be a bat? (1974) [pdf]

Deficient executive control in transformer attention

GeoLibre 1.0

Show HN: Extend UI – open-source UI kit for modern document apps

World Capitals Voronoi

Farmer donates land for a park, city sells it for $10M as data center land

Who's the smartest corvid?

Show HN: HelixDB – A graph database built on object storage

Building an HTML-first site doubled our users overnight

Unix GC Remastered

Claude Desktop spawns 1.8 GB Hyper-V VM on every launch, even for chat-only use

Apache Burr: Build reliable AI agents and applications

Authentication issues related to API requests

Why are there so many canines in fine art?

Computer Lessons

All 9,300 Japanese train station, animated by the year it opened (1872–2026)

Anthropic's model naming, extrapolated

Smudging the game disc to make speedrunning 'SpongeBob' faster

A €0.01 bank transfer could compromise a banking AI agent

Policy on the AI Exponential