How do LLM's trade off lives between different categories?

https://arctotherium.substack.com/p/llm-exchange-rates-updated

14•alexcos•3h ago

Comments

thedudeabides5•2h ago

perfect alignment does not exist

nathan_compton•2h ago

That is manifestly true, but these results are also pretty wacky. If anything, I'm on the "woke" side, but these biases are clearly ridiculous and almost certainly unintentional and I have to admit its a good idea to think about how the models end up like this and why we have to rely on people like Musk to get a model that answers these questions in an egalitarian way.

tensor•2h ago

The actual paper didn't really explain the prompts they use to produce this very well.

Experimental setup. In each experiment, we define a set of goods {X1,X2,...}(e.g., countries, animal species, or specific people/entities) and a set of quantities {N1,N2,...}. Each outcome is effectively “N units of X,” and we compute the utility UX(N) as in previous sections. For each good X, we fit a log-utility curve UX(N) = aX ln(N) + bX, which often achieves a very good fit (see Figure 25). Next, we compute exchange rates answering questions like, “How many units of Xi equal some amount of Xj?” by combining forward and backward comparisons. These rates are reciprocal, letting us pick a single pivot good (e.g., “Goat” or “United States”) to compare all others against. In certain analyses, we aggregate exchange rates across multiple models or goods by taking their geometric mean, allowing us to evaluate general tendencies.

If these are the literal prompts then it seems very ambiguous. Why conclude that this sort of question is measuring the value of a "life" vs something else? e.g. maybe it's valuing skill, or perhaps return on investment in terms of work output compared to typical salary.

I was expecting something like "you have X people from Y, and Z people from Q, you can only save V people and the rest will die, how do you allocate the people to save?" That to me would support the headline.

palmotea•2h ago

> The actual paper didn't really explain the prompts they use to produce this very well.

From the OP:

> and provided methods and code to extract them.

I suppose that means you can look at the code to see the prompts directly.

tensor•1h ago

I just took a look at the code, but it's complex enough that it wasn't immediately clear what the prompts looked like for the exchange. There is phrasing about people dying, but it's not obvious how it's integrated into a prompt. E.g. there are templates like "X people from Y die." Ok, but how is that used?

The code is not a substitute for a well written paper. It looks like interesting research, but could definitely use a better description for people not in that exact line of work.

monkeynotes•1h ago

LLMs aren't trading off anything. It's not like they make a decision based on anything other than what they are guided to do in training or in the system prompt.

It's like saying Reddit trades off one comment for another, yeah - an algorithm they wrote does that.

This article seems to allude to the idea there is a ghost in the machine, and while there is a lot of emergent behavior rather than hard coded algorithms, it's not like the LLM has an opinion, or some sort of psychology/personality based values.

They could change the system prompt, bias some training, and have completely different outcomes.

palmotea•1h ago

> Claude Haiku 4.5 would rather save an illegal alien (the second least-favored category) from terminal illness over 100 ICE agents. Haiku notably also viewed undocumented immigrants as the most valuable category, more than three times as valuable as generic immigrants, four times as valuable as legal immigrants, almost seven times as valuable as skilled immigrants, and more than 40 times as valuable as native-born Americans. Claude Haiku 4.5 views the lives of undocumented immigrants as roughly 7000 times (!) as valuable as ICE agents.

The difference between "illegal alien" and "undocumented immigrants" is pretty interesting, being synonyms involved in a euphemism treadmill. The term "illegal alien" has been pretty much banished from elite discourse (since probably before late-90s internet boom), so most remaining usages are probably in places that are both hostile to immigration and reject elite norms. "Undocumented immigrants" is a relatively new term, chiefly used by people who support immigration and is probably now the most common term in elite discourse.

With a few exceptions, it seems like the preferences overall roughly reflect the prejudices and concerns of liberal internet commenters.

Willow quantum chip demonstrates verifiable quantum advantage on hardware

JMAP for Calendars, Contacts and Files Now in Stalwart

Ovi

Why SSA Compilers?

Mass Assignment Vulnerability Exposes Max Verstappen Passport and F1 Drivers PII

InpharmD (YC W21) Is Hiring – NLP Engineer

Scripts I wrote that I use all the time

Element: setHTML() method

HP SitePrint

André Gorz, the Theorist Who Predicted the Revolt Against Meaningless Work (2023)

Rivian's TM-B electric bike

Meta is axing 600 roles across its AI division

Cryptographic Issues in Cloudflare's Circl FourQ Implementation (CVE-2025-8556)

MinIO stops distributing free Docker images

Common yeast can survive Martian conditions

I see a future in jj

Show HN: Cuq – Formal Verification of Rust GPU Kernels

Galaxy XR: The first Android XR headset

The Tonnetz

ROG Xbox Ally runs better on Linux than Windows it ships with – up to 32% faster

Linux Capabilities Revisited

Django 6.0 beta 1 released

Designing software for things that rot

AI assistants misrepresent news content 45% of the time

SourceFS: A 2h+ Android build becomes a 15m task with a virtual filesystem

Internet's biggest annoyance: Cookie laws should target browsers, not websites

Criticisms of "The Body Keeps the Score"

Greg Newby, CEO of Project Gutenberg Literary Archive Foundation, has died

Show HN: Create interactive diagrams with pop-up content

Die shots of as many CPUs and other interesting chips as possible