How does misalignment scale with model intelligence and task complexity?

https://alignment.anthropic.com/2026/hot-mess-of-ai/

75•salkahfi•2h ago

Comments

CuriouslyC•1h ago

This is a good line: "It found that smarter entities are subjectively judged to behave less coherently"

I think this is twofold:

1. Advanced intelligence requires the ability to traverse between domain valleys in the cognitive manifold. Be it via temperature or some fancy tunneling technique, it's going to be higher error (less coherent) in the valleys of the manifold than naive gradient following to the local minima.

2. It's hard to "punch up" when evaluating intelligence. When someone is a certain amount smarter than you, distinguishing their plausible bullshit from their deep insights is really, really hard.

xanderlewis•1h ago

What do 'domain valleys' and 'tunneling' mean in this context?

esyir•1h ago

Not the OP, but my interpretation here is that if you model the replies as some point in a vector space, assuming points from a given domain cluster close to each other, replies that span two domains need to "tunnel" between these two spaces.

energy123•1h ago

Incoherence is not error.

You can have a vanishingly small error and an incoherence at its max.

That would be evidence of perfect alignment (zero bias) and very low variance.

p-e-w•26m ago

> When someone is a certain amount smarter than you, distinguishing their plausible bullshit from their deep insights is really, really hard.

Insights are “deep” not on their own merit, but because they reveal something profound about reality. Such a revelation is either testable or not. If it’s testable, distinguishing it from bullshit is relatively easy, and if it’s not testable even in principle, a good heuristic is to put it in the bullshit category by default.

skydhash•18m ago

The issue is the revelation. It's always individual at some level. And don't forget our senses are crude. The best way is to store "insights" as information until we collect enough data that we can test it again (hopefully without a lot of bias). But that can be more than a lifetime work, so sometimes you have to take some insights at face value based on heuristics (parents, teachers, elder, authority,...)

cyanydeez•1h ago

Oh, the irony of thinking this refers to the investors and shell companies.

gopalv•1h ago

> Making models larger improves overall accuracy but doesn't reliably reduce incoherence on hard problems.

Coherence requires 2 opposing forces to hold coherence in one dimension and at least 3 of them in higher dimensions of quality.

My team wrote up a paper titled "If You Want Coherence, Orchestrate a Team of Rivals"[1] because we kept finding that upping the reasoning threshold resulted in less coherence - more experimentation before we hit a dead-end to turn around.

So we had a better result from using Haiku (we fail over to Sonnet) over Opus and using a higher reasoning model to decompose tasks rather than perform each one of them.

Once a plan is made, the cheaper models do better as they do not double-think their approaches - they fail or they succeed, they are not as tenacious as the higher cost models.

We can escalate to higher authority and get out of that mess faster if we fail hard and early.

The knowledge of how exactly failure happened seems to be less useful to the higher reasoning model over the action biased models.

Splitting up the tactical and strategic sides of the problem, seems to work similarly to how Generals don't hold guns in a war.

[1] - https://arxiv.org/abs/2601.14351

tsunamifury•1h ago

I don’t know why it seems so hard for these guys to understand you scorecard every step for new strategy to Close distance at goal and if you have multiple generated forward options with no good weight you spawn a new agent and multiple paths. Then you score all the terminal branches and prune.

LLMs aren’t constrained to linear logic like your average human.

throwpoaster•1h ago

Yudkowsky btfo.

IgorPartola•1h ago

For some reason the article reads to me like “AI is not evil, it just has accidents when it loses coherence.” Sounds a lot like liability shifting.

smy20011•1h ago

I think It's not because AI working on "misaligned" goals. The user never specify the goal clearly enough for AI system to work.

However, I think producing detailed enough specification requires same or even larger amount of work than writing code. We write rough specification and clarify these during the process of coding. I think there are minimal effort required to produce these specification, AI will not help you speed up these effort.

crabmusket•1h ago

That makes me wonder about the "higher and higher-level language" escalator. When you're writing in assembly, is it more work to write the code than the spec? And the reverse is true if you can code up your system in Ruby? If so, does that imply anything about the "spec driven" workflow people are using with AIs? Are we right on the cusp where writing natural language specs and writing high level code are comparably productive?

charcircuit•54m ago

If you are on the same wave length as someone you don't need to produce a full spec. You can trust that the other person has the same vision as you and will pick reasonable ways to implement things. This is one reason why personalized AI agents are important.

skydhash•26m ago

Programming languages can be a thinking tool for a lot of tasks. Very much like a lot of notation, like music sheet and map drawing. A condensed and somewhat formal manner of describing ideas can increase communication speed. It may lack nuance, but in some case, nuance is harmful.

The nice thing about code compared to other notation is that it's useful on its. You describe an algorithm and the machine can then solve the problem ad infinitum. It's one step instead of the two step of writing a spec and having an LLM translate it, then having to verify the output and alter it.

Assembly and high level languages are equivalent in terms of semantics. The latter helps in managing complexity, by reducing harmful possibilities (managing memory, off-by-one errors) and presenting common patterns (iterators/collections, struct and other data structures, ....) so that categories of problems are easily solved. There's no higher level of computing model unlocked. Just faster level of productivity unlocked by following proven patterns.

Spec driven workflow is a mirage, because even the best specs will leave a lot of unspecified details. Which are crucial as most of programming is making the computer not do the various things it can do.

crabmusket•14m ago

> most of programming is making the computer not do the various things it can do

This is a very stimulating way of putting it!

jmtulloss•1h ago

The comments so far seem focused on taking a cheap shot, but as somebody working on using AI to help people with hard, long-term tasks, it's a valuable piece of writing.

- It's short and to the point

- It's actionable in the short term (make sure the tasks per session aren't too difficult) and useful for researchers in the long term

- It's informative on how these models work, informed by some of the best in the business

- It gives us a specific vector to look at, clearly defined ("coherence", or, more fun, "hot mess")

kernc•9m ago

Other actionable insights are:

- Merge up amendments into the initial prompt.

- Evaluate a prompt multiple times (ensemble).

nayroclade•1h ago

The models they tested are already way behind the current state-of-the-art. Would be interesting to see if their results hold up when repeated with the latest frontier models.

How does misalignment scale with model intelligence and task complexity?

The Codex App

Anki ownership transferred to AnkiHub

GitHub experience various partial-outages/degradations

xAI joins SpaceX

The Connection Machine CM-1 "Feynman" T-shirt

Julia

Ask HN: Who is hiring? (February 2026)

Hacking Moltbook

Court orders restart of all US offshore wind power construction

Joedb, the Journal-Only Embedded Database

Firefox Getting New Controls to Turn Off AI Features

Carnegie Mellon Unversity Computer Club FTP Server

4x faster network file sync with rclone (vs rsync) (2025)

Advancing AI Benchmarking with Game Arena

Training a trillion parameter model to be funny

Nano-vLLM: How a vLLM-style inference engine works

The largest number representable in 64 bits

Todd C. Miller – Sudo maintainer for over 30 years

Zig Libc

Geologists may have solved mystery of Green River's 'uphill' route

Ask HN: Who wants to be hired? (February 2026)

Pretty soon, heat pumps will be able to store and distribute heat as needed

Why software stocks are getting pummelled

GitHub discusses giving maintainers control to disable PRs

Show HN: Adboost – A browser extension that adds ads to every webpage

IsoCoaster – Theme Park Builder

UK government launches fuel forecourt price API

Banning lead in gas worked. The proof is in our hair

Nvidia shares are down after report that its OpenAI investment stalled

How does misalignment scale with model intelligence and task complexity?

Comments

How does misalignment scale with model intelligence and task complexity?

The Codex App

Anki ownership transferred to AnkiHub

GitHub experience various partial-outages/degradations

xAI joins SpaceX

The Connection Machine CM-1 "Feynman" T-shirt

Julia

Ask HN: Who is hiring? (February 2026)

Hacking Moltbook

Court orders restart of all US offshore wind power construction

Joedb, the Journal-Only Embedded Database

Firefox Getting New Controls to Turn Off AI Features

Carnegie Mellon Unversity Computer Club FTP Server

4x faster network file sync with rclone (vs rsync) (2025)

Advancing AI Benchmarking with Game Arena

Training a trillion parameter model to be funny

Nano-vLLM: How a vLLM-style inference engine works

The largest number representable in 64 bits

Todd C. Miller – Sudo maintainer for over 30 years

Zig Libc

Geologists may have solved mystery of Green River's 'uphill' route

Ask HN: Who wants to be hired? (February 2026)

Pretty soon, heat pumps will be able to store and distribute heat as needed

Why software stocks are getting pummelled

GitHub discusses giving maintainers control to disable PRs

Show HN: Adboost – A browser extension that adds ads to every webpage

IsoCoaster – Theme Park Builder

UK government launches fuel forecourt price API

Banning lead in gas worked. The proof is in our hair

Nvidia shares are down after report that its OpenAI investment stalled