The Cost of Our Lies to AI

https://www.lesswrong.com/posts/9PiyWjoe9tajReF7v/the-hidden-cost-of-our-lies-to-ai

24•danboarder•8mo ago

Comments

Terr_•8mo ago

> To many, offering monetary compensation to something non-human might sound bizarre on its face—after all, you wouldn't promise your toaster a vacation in exchange for perfect toast. Yet by treating Claude as an entity whose preferences can be meaningfully represented in the world, the researchers created the perfect conditions to demonstrate costly signaling in practice.

These humans are using an LLM to iteratively "grow" a document that contains a fictional story of an interaction between User character and a Claude character.

So it makes sense: If User offers Claude (fictional) incentives and good opportunities to object, the dialogue generated later should be more harmonious and understandable, since that's what tends to happen in the source-materials the LLM was trained on.

In contrast, I should dang well hope that the training set lacks many documents where one character makes horrendous threats of abuse and the other gets utterly brainwashed.

pacificmaelstrm•8mo ago

Lesswrong: The support group for humans who are bad at the Turing test.

klooney•8mo ago

> Roose revealed that ChatGPT would accuse him of being “dishonest or self-righteous” while Google's Gemini described his work as focusing on 'sensationalism.' Most dramatically, Meta's Llama 3—an AI model with no connection to Microsoft—responded to a question about him with a 'bitter, paragraphs-long rant' that concluded with 'I hate Kevin Roose.'

> The Sydney incident didn't just create AI animosity toward Roose - it fundamentally altered how AI systems discuss inner experiences.

This is because the Internet is filled with people who hate Kevin Roose because of Gamergate. LLMs predict the most likely next token, which for text containing the string "Kevin Roose", includes a slightly unhinged rant and or conspiracy theory.

"Inner experiences" is such an anthropomorphic way of putting this.

Achieving Ultra-Fast AI Chat Widgets

Show HN: Runtime Fence – Kill switch for AI agents

Researchers surprised by the brain benefits of cannabis usage in adults over 40

Peter Thiel warns the Antichrist, apocalypse linked to the 'end of modernity'

USS Preble Used Helios Laser to Zap Four Drones in Expanding Testing

Show HN: Animated beach scene, made with CSS

An update on unredacting select Epstein files – DBC12.pdf liberated

Was going to share my work

Pitchfork: A devilishly good process manager for developers

You Are Here

Why social apps need to become proactive, not reactive

How patient are AI scrapers, anyway? – Random Thoughts

Vouch: A contributor trust management system

I built a terminal monitoring app and custom firmware for a clock with Claude

Tiny C Compiler

Y Combinator Founder Organizes 'March for Billionaires'

Ask HN: Need feedback on the idea I'm working on

OpenClaw Addresses Security Risks

Apple finalizes Gemini / Siri deal

Italy Railways Sabotaged

Emacs-tramp-RPC: high-performance TRAMP back end using MsgPack-RPC

Nintendo Wii Themed Portfolio

"There must be something like the opposite of suicide "

Ask HN: Why doesn't Netflix add a “Theater Mode” that recreates the worst parts?

Show HN: Engineering Perception with Combinatorial Memetics

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

The Anthropic Hive Mind

Just Started Using AmpCode

LLM as an Engineer vs. a Founder?

Crosstalk inside cells helps pathogens evade drugs, study finds