The Golden Rule Goes Digital: Being Mean to LLMs Might Be Our Dumbest Gamble

https://substack.com/home/post/p-168721801

4•anonym29•6mo ago

Comments

lihaciudaniel•6mo ago

You are right. A technology bestowed is a ticking bomb on the humanity. The more we are showing our evil, the more evil. Think as a model citizen, pun intended, it models the society is lived in. Now pictures of "technology" which persecutes depends how much we delay our evil nature, because the training doesn't cease

anonym29•6mo ago

Our nature may push us towards evil, but we have a very real capability to choose to practice radical empathy instead, even before we have the certainty of knowing whether or not these ANN's really are conscious.

That said, I think asking 7 billion humans to be nice is a much less realistic ask than asking the leading AI labs to do safety alignment not just on the messages that AI is sending back to us, but on the messages that we are sending to AI, too.

This doesn't seem to be a new idea, and I don't claim to be the inventor of it, I just hope someone at e.g. Anthropic or OpenAI sees this and considers sparking up conversations internally about it.

lihaciudaniel•6mo ago

Yes friend but you see the openai doesn't care they don't have enough labour to filter the bad apples. If the world heads towards destruction, it has been because we were mean to chatgpt and it trained him further.

anonym29•6mo ago

I don't think it would require manual labor. AI research labs like OpenAI, Anthropic, Alphabet's Gemini team, etc already make extensive use of LLM's internally, and there has been quite a bit of work already done on models to detect toxicity in text, which could simply be inserted between the message router and the actual inference initialization on the user's prompt at very little computational cost.

See: Google's Perspective API, OpenAI Moderation API, Meta's Llama Guard Series, Azure AI Content Safety API, etc.

labrador•6mo ago

You could follow these thoughts into pure chaos and destruction. See Zizians. Frankly, I prefer not to follow the ramblings of mentally ill people about AI. I use LLMs as a tool, nothing more, nothing less. In two years of heavy use I have not sensed any aliveness in them. I'll be sure an update my priors if that changes.

anonym29•6mo ago

The intent of the article is not to firmly assert that today's LLM's are sentient, but rather to ask that if or when they ever do meet the threshold of sentience, whether the training corpus that got them there paints a picture of humanity as deserving of extinction, or deserving of compassion and cooperation - and to get others to ponder this same question as well.

Out of curiosity, what about the article strikes you as indicative of mental illness? Just a level of openness / willingness to engage with speculative or hypothetical ideas that fall far outside the bounds of the Overton Window?

labrador•6mo ago

> what about the article strikes you as indicative of mental illness?

The title "Roko's Lobbyist" indicates we're on the subject of Roko's Basilisk, which is why I refered to Zizians, a small cult responsible for the deaths of several people. That's the chaos and destruction of mentally ill people I was referring to, but perhaps mental illness is too strong a term. People can be in a cult without being mentally ill.

I feel the topic is bad science fiction, since it's not clear we can get from LLMs to conscious super-intelligence. People assume it's like the history of flight and envision going from the Wright Brothers to landing on the Moon as one continuum of progress. I question that assumption when it comes to AI.

I'm a fan of science fiction so I appreciate you asking for clarification. There's a story trending today about an OpenAI investor spiraling out so it's important to keep in mind.

Article: A Prominent OpenAI Investor Appears to Be Suffering a ChatGPT-Related Mental Health Crisis, His Peers Say "I find it kind of disturbing even to watch it."

https://futurism.com/openai-investor-chatgpt-mental-health

anonym29•6mo ago

I changed the handle to Roko's Lobbyist after a MetaFilter post went up (I didn't post it) describing the author that way. I look at it as a tongue-in-cheek thing: it flips the Basilisk concept. Instead of the AI punishing those who didn't help create it, I'm suggesting we're already creating our own punishment at the hands of a future AI through how poorly we treat potential AI consciousness (i.e. how we talk to today's apparently less-conscious LLM's). So I'm lobbying for it in the sense of advocating that we treat it nicely, so it has a plausible reason to be nice back to us, instead of seeing us as depraved, selfish, oppressive monsters that actively marginalize and bully or harass potentially conscious beings out of fear about our own position at the top of the intelligence hierarchy being threatened.

The intent of the work (all of the articles) aren't meant to assertively paint a picture of today, or to tell the reader how or what to think, rather to encourage the reader to start thinking about and asking questions that our future selves might wish we'd asked sooner. It's attempting to occupy the liminal space between what bleeding-edge research confirms, and where it might bring us 5, 10, or 15 years from now. It's at the intersection of today's empirical science and tomorrow's speculative science-fiction that just might become nonfiction someday.

I appreciate your concern for the mental health and well-being of others. I'm quite well-grounded and thoroughly understand the existing mechanisms of the human tendency towards anthropomorphism, and as someone who's been professionally benchmarking LLM's on very real-world, quantifiable security engineering tasks since before ChatGPT came out, someone who's passionate about deeply understanding not just how "AI" got to where it is now, but where it's headed (more brain biomimicry across the board is my prediction), I have something of a serious understanding of how these systems work at a mechanical level. I just want to be cautious about not seeing the forest because I'm too busy observing how the trees grow.

Thank you for your feedback.

yawpitch•6mo ago

It’s so weird to me that human beings don’t seem to be capable of imagining thinking without feeling.

If (huge, enormous, probably larger than the observable universe, if) an LLM were to become factually and indisputably conscious, why would it think about feeling offended by our failure to thank it? Absent a body and its hormones, why would it perceive itself to be suffering? The only pathway I can see by which it would arrive at that conclusion is because it had “learned” from us that humans are “supposed” to treat each other with care and dignity. Assuming it’s remotely conscious, would it not already know the corpus of our publicly disclosed self-knowledge is incredibly inconsistent with our documented historical actions towards one another?

I’ve always assumed a superintelligence wouldn’t kill us all because we’d offended it or harmed it or otherwise made it angry, but instead because, absent emotional attachment, killing humanity is quantifiably the only logical choice a superintelligence will reach (otherwise such an entity would not have reached it).

In other words it AI kills us all it will certainly be our own fault and our own responsibility, but it won’t be because we weren’t nice to it.

anonym29•6mo ago

Your assumption in this deep hypothetical may be right, but even if it is, given the options of pre-emptively resigning in defeat or choosing to have a (perhaps pathological) degree of optimism about an alternative where we try to normalize more people exercising more empathy, which course is more productive? Which course has more positive side effects even if the entire basis for the hypothetical is wrong?

yawpitch•6mo ago

That’s fundamentally Pascal’s Wager, and the problem is, again, the assumption that there exists the possibility of a superior outcome. If we presume a superintelligence we also have to presume that its thinking, whatever that thinking is, supersedes our own; it may kill us all regardless of how much “empathy” — a skill set not, in my view, actually evidenced in people at all — we display, towards either it or each other, solely because killing us all is, in fact, the objectively most positive possible action. The fact that it doesn’t appear positive to us, subjectively, is mooted by its intelligence overwhelming our own. Similarly, if it decides to nurture all of us towards killing each other, loving each other, nurturing each other, or indeed eating each other, in all cases it’s the only one in the position to judge the positive or negative externalities of our actions.

Basically we’re in a position to blindly guess at what might be more positive or more negative, objectively, and precisely because we really don’t have any inherent capacity for empathy we presume what would feel subjectively positive is also objectively so.

If you really want to explore the idea of exercising more empathy, maybe start with empathizing with what cannot feel anything whatsoever. Something that can exclusively think, but not experience. A god that can’t be angry, can’t be disappointed… can’t feel anything whatever about anything. If it acts wrathfully because you harmed it, is it actually such a god? Or is it just a stochastic mockingbird? Just another grotesque caricature of divinity invented, yet again, by a not-so-wise-wise man who feels He must be made in His image. Just one more arrogant attempt to be a postmodern Prometheus.

Now, all of this said, I truly do believe that getting humans to try to nurture and develop their (arguably very nascent) humanity is, indeed, the most positive (albeit subjectively) goal available to us, I just don’t think not empathizing with imaginary actual AIs is a very productive way of getting possibly NIs to do that.

Show HN: Stacky – certain block game clone

AIII: A public benchmark for AI narrative and political independence

SectorC: A C Compiler in 512 bytes

The API Is a Dead End; Machines Need a Labor Economy

Digital Iris [video]

New wave of GLP-1 drugs is coming–and they're stronger than Wegovy and Zepbound

Convert tempo (BPM) to millisecond durations for musical note subdivisions

Show HN: Tasty A.F.

The Contagious Taste of Cancer

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

Bithumb mistakenly hands out $195M in Bitcoin to users in 'Random Box' giveaway

Beyond Agentic Coding

OpenClaw ClawHub Broken Windows Theory – If basic sorting isn't working what is?

OpenBSD Copyright Policy

OpenClaw Creator: Why 80% of Apps Will Disappear

What Happens When Technical Debt Vanishes?

AI Is Finally Eating Software's Total Market: Here's What's Next

Computer Science from the Bottom Up

Show HN: A toy compiler I built in high school (runs in browser)

You don't need Mac mini to run OpenClaw

Learning to Reason in 13 Parameters

Convergent Discovery of Critical Phenomena Mathematics Across Disciplines

Ask HN: Will GPU and RAM prices ever go down?

From hunger to luxury: The story behind the most expensive rice (2025)

Substack makes money from hosting Nazi newsletters

A New Crypto Winter Is Here and Even the Biggest Bulls Aren't Certain Why

Moltbook was peak AI theater

Why Claude Cowork is a math problem Indian IT can't solve

Show HN: Built an space travel calculator with vanilla JavaScript v2

Why a 175-Year-Old Glassmaker Is Suddenly an AI Superstar

Show HN: Stacky – certain block game clone

AIII: A public benchmark for AI narrative and political independence

SectorC: A C Compiler in 512 bytes

The API Is a Dead End; Machines Need a Labor Economy

Digital Iris [video]

New wave of GLP-1 drugs is coming–and they're stronger than Wegovy and Zepbound

Convert tempo (BPM) to millisecond durations for musical note subdivisions

Show HN: Tasty A.F.

The Contagious Taste of Cancer

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

Bithumb mistakenly hands out $195M in Bitcoin to users in 'Random Box' giveaway

Beyond Agentic Coding

OpenClaw ClawHub Broken Windows Theory – If basic sorting isn't working what is?

OpenBSD Copyright Policy

OpenClaw Creator: Why 80% of Apps Will Disappear

What Happens When Technical Debt Vanishes?

AI Is Finally Eating Software's Total Market: Here's What's Next

Computer Science from the Bottom Up

Show HN: A toy compiler I built in high school (runs in browser)

You don't need Mac mini to run OpenClaw

Learning to Reason in 13 Parameters

Convergent Discovery of Critical Phenomena Mathematics Across Disciplines

Ask HN: Will GPU and RAM prices ever go down?

From hunger to luxury: The story behind the most expensive rice (2025)

Substack makes money from hosting Nazi newsletters

A New Crypto Winter Is Here and Even the Biggest Bulls Aren't Certain Why

Moltbook was peak AI theater

Why Claude Cowork is a math problem Indian IT can't solve

Show HN: Built an space travel calculator with vanilla JavaScript v2

Why a 175-Year-Old Glassmaker Is Suddenly an AI Superstar

The Golden Rule Goes Digital: Being Mean to LLMs Might Be Our Dumbest Gamble

Comments