LLMs do not merely reflect the bias of their training, they police it

https://twitter.com/brianroemmele/status/1991714955339657384

26•nailer•1h ago

Comments

GL26•1h ago

We are not yet at misalignment, but this shows the existence of a slope that derivates into misaligned adversarial ai models. Must this be fixed at training time (at which step ?) ? Thinking about this report : https://ai-2027.com/

rwmj•39m ago

That was a nonsensical work of fiction, not a report.

harrouet•27m ago

This will be very useful to call out replicants, thx.

jacques_morin•26m ago

brian roemmele is an authority in nothing, I don't understand why this was published here. This dude has literally no expertise : https://www.reddit.com/r/DecodingTheGurus/comments/1cumj6w/h...

N_Lens•22m ago

Amusing read, thanks!

Since we are in the golden age of grifting, this guy will probably go pretty far.

JohnKemeny•6m ago

yeah, only authorities are considered here at HN.

KaiserPro•3m ago

> Brian Roemmele is the recognized world authority on how voice AI will impact computing and commerce.

I dunno, I think thats pretty convincing (http://voicefirst.expert/about/)

ramon156•5m ago

I'm a bit confused why the OP in that reddit post is so mean about the app. Seems fine. Not something I need, but not the worst choice of an app idea.

I did not care for the "X article" (is that what it's called?), but I don't get the rage that is in that reddit thread.

veltas•20m ago

Much the same as 'arguments' I can have with LLM's about things where I'm the expert and I know it's wrong, but it will justify its position to the end because it's trained on common misconceptions that exist among less-expert people.

cucumber3732842•12m ago

Why wouldn't an LLM whose training content is dominated by, or at least severely clouded by, the contribution habitual rule follower/peddler/enforcer types go on to mimic that behavior?

You feed it reddit and wikipeidia it's gonna turn into a conformist npc.

You feed it the contents of professional content and it's gonna spew vapid corporate nothingness.

You feed every text message ever sent over Boost Mobile, actually wait that sounds hilarious someone should do that.

KaiserPro•4m ago

wait, is this news?

Of course they reflect the bias in the training, thats been known since the 90s if not longer (see apocryphal story about training to detect tanks, but only detecting either trees or clouds)

but like this is expected, the whole point of RLHF (or any other feedback) is to condition the model to respond in a certain way. Thats what makes them useable for a bunch of situations.

JohnKemeny•3m ago

The paper under discussion:

https://zenodo.org/records/17720178

Note that Zenodo is a DOI-provider, not a (scientific) journal. Anyone can upload anything to Zenodo. It's less strict than arXiv.

Edit: The "paper" is written by one Hiroko Konishi, an independent researcher.

Show HN: Got sick of ads, so I made my own logic puzzle site

Show HN: Lossless GIF recompression via exhaustive search

Astronomers Discover Third Galaxy Without Dark Matter

Peek First. Open Later. A TUI file browser built with bubble tea

InfiniBand, RoCE, and All That

Shaders in plugins, GTK4 Shell Clients, New Protocols| miracle-wm 0.10.0

Programming a Problem-Oriented Language [pdf]

Alan Greenspan Dies at 100

E.L.L.A. – Local AI assistant with architecturally enforced safety directives

Show HN: Cascade – a simple unified CLI and endpoint for free-tier providers

Show HN: SindriKit – A C framework applying dependency injection to exploit dev

You Have to Poop in the Eiffel Tower [video]

What's Elon Worth? · Live Elon Musk Net Worth Tracker

What can wonky APIs tell us about the web?

ExperiOps: Maintenance of Vibe Coded Projects

Why Drawing Tablet Brands Won't Collaborate on Linux Floss Drivers

Show HN: Cutonce – Build automated equity research pipelines without code

Light therapy app confusion: bright-light boxes vs. flicker stimulation

Show HN: PauseRead – hosted read-later with Pocket HTML import

Infamous Front-Running Crypto Bot Gets Tricked and Drained for $7.5M

Accenture shares fall to lowest since 2017 as AI threat mounts

Information and Attention (1971)

Grid Generator

Show HN: Taqta. Made an Are.na-style visual boards for Obsidian

Apple Internals: Swift in the Kernel – By Josh Maine

Just because each item makes sense doesn't mean they make sense together

Scanned React source code: 659 security issues, one real GitHub token found

A Bitter Lesson for Memory

Is Anyone Else Excited by Swift's Progress as a Language? – Fatbobman's Weekly

Stargazing