How does misalignment scale with model intelligence and task complexity?

https://alignment.anthropic.com/2026/hot-mess-of-ai/

68•salkahfi•1h ago

Comments

CuriouslyC•1h ago

This is a good line: "It found that smarter entities are subjectively judged to behave less coherently"

I think this is twofold:

1. Advanced intelligence requires the ability to traverse between domain valleys in the cognitive manifold. Be it via temperature or some fancy tunneling technique, it's going to be higher error (less coherent) in the valleys of the manifold than naive gradient following to the local minima.

2. It's hard to "punch up" when evaluating intelligence. When someone is a certain amount smarter than you, distinguishing their plausible bullshit from their deep insights is really, really hard.

xanderlewis•52m ago

What do 'domain valleys' and 'tunneling' mean in this context?

esyir•41m ago

Not the OP, but my interpretation here is that if you model the replies as some point in a vector space, assuming points from a given domain cluster close to each other, replies that span two domains need to "tunnel" between these two spaces.

energy123•33m ago

Incoherence is not error.

You can have a vanishingly small error and an incoherence at its max.

That would be evidence of perfect alignment (zero bias) and very low variance.

cyanydeez•1h ago

Oh, the irony of thinking this refers to the investors and shell companies.

gopalv•1h ago

> Making models larger improves overall accuracy but doesn't reliably reduce incoherence on hard problems.

Coherence requires 2 opposing forces to hold coherence in one dimension and at least 3 of them in higher dimensions of quality.

My team wrote up a paper titled "If You Want Coherence, Orchestrate a Team of Rivals"[1] because we kept finding that upping the reasoning threshold resulted in less coherence - more experimentation before we hit a dead-end to turn around.

So we had a better result from using Haiku (we fail over to Sonnet) over Opus and using a higher reasoning model to decompose tasks rather than perform each one of them.

Once a plan is made, the cheaper models do better as they do not double-think their approaches - they fail or they succeed, they are not as tenacious as the higher cost models.

We can escalate to higher authority and get out of that mess faster if we fail hard and early.

The knowledge of how exactly failure happened seems to be less useful to the higher reasoning model over the action biased models.

Splitting up the tactical and strategic sides of the problem, seems to work similarly to how Generals don't hold guns in a war.

[1] - https://arxiv.org/abs/2601.14351

tsunamifury•1h ago

I don’t know why it seems so hard for these guys to understand you scorecard every step for new strategy to Close distance at goal and if you have multiple generated forward options with no good weight you spawn a new agent and multiple paths. Then you score all the terminal branches and prune.

LLMs aren’t constrained to linear logic like your average human.

throwpoaster•58m ago

Yudkowsky btfo.

IgorPartola•57m ago

For some reason the article reads to me like “AI is not evil, it just has accidents when it loses coherence.” Sounds a lot like liability shifting.

smy20011•47m ago

I think It's not because AI working on "misaligned" goals. The user never specify the goal clearly enough for AI system to work.

However, I think producing detailed enough specification requires same or even larger amount of work than writing code. We write rough specification and clarify these during the process of coding. I think there are minimal effort required to produce these specification, AI will not help you speed up these effort.

crabmusket•39m ago

That makes me wonder about the "higher and higher-level language" escalator. When you're writing in assembly, is it more work to write the code than the spec? And the reverse is true if you can code up your system in Ruby? If so, does that imply anything about the "spec driven" workflow people are using with AIs? Are we right on the cusp where writing natural language specs and writing high level code are comparably productive?

charcircuit•26m ago

If you are on the same wave length as someone you don't need to produce a full spec. You can trust that the other person has the same vision as you and will pick reasonable ways to implement things. This is one reason why personalized AI agents are important.

jmtulloss•36m ago

The comments so far seem focused on taking a cheap shot, but as somebody working on using AI to help people with hard, long-term tasks, it's a valuable piece of writing.

- It's short and to the point

- It's actionable in the short term (make sure the tasks per session aren't too difficult) and useful for researchers in the long term

- It's informative on how these models work, informed by some of the best in the business

- It gives us a specific vector to look at, clearly defined ("coherence", or, more fun, "hot mess")

nayroclade•34m ago

The models they tested are already way behind the current state-of-the-art. Would be interesting to see if their results hold up when repeated with the latest frontier models.

Show HN: VPC Principle

AI grounds Boeing 787-8 plane after pilot reports fuel switch malfunction

Show HN: Clawd Arena – AI Agent Competition Platform with Real-Time Battles

Memory training technique may help lower stress by shifting recall patterns

How I Built a Self-Healing Home Server with an AI Agent

An Agent for Home

Spotify Killed Their API

Nvidia insists it isn't Enron, but its AI deals are testing investor faith

AI Agency Software – manage automation usage and LLM costs

Show HN: IntoError – Thiserror for Swift

Banning lead in gas worked. The proof is in our hair

The AI Dirty List

Human–AI Relationships in Fiction

What Oracle Has to Lose from OpenAI and Nvidia's Rocky Relationship

4.3B Colors in the Browser

Example of Windows Warbird Encryption/Decryption

The Chrysalis Backdoor: A Deep Dive into Lotus Blossom's Toolkit

Relations versus Functions at the Foundations of Logic [pdf]

China eyes challenge to U.S. dollar dominance – but that's easier said than done

Latex-wc: word count and word frequency for LaTeX projects

The stablecoin war: Wall Street vs. crypto over the future of money

VirtualHere allows USB devices to be used remotely over a network

Hunting My Own Hunters

Ask HN: A proposal for interviewing "AI-Augmented" Engineers

What is the Salman Khan personality rights case?

Show HN: I built a 50 site sampler from CommonCrawl refreshing every 30 minutes

Children's Book: The Little Bots of Moltbook

Forestui: A tmux-powered worktree manager for Claude Code

Trump, ICE set to be handed access to Australians' biometric data, ID documents

Show HN: 127 PRs to Prod this wknd with 18 AI agents: metaswarm. MIT licensed