frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

OpenAI has trained its LLM to confess to bad behavior

https://www.technologyreview.com/2025/12/03/1128740/openai-has-trained-its-llm-to-confess-to-bad-behavior/
1•cainxinth•1h ago

Comments

cainxinth•1h ago
https://archive.is/69DwW
grayhatter•33m ago
> For example, if you ask a model something it doesn’t know, the drive to be helpful can sometimes overtake the drive to be honest. And faced with a hard task, LLMs sometimes cheat. “Maybe the model really wants to please, and it puts down an answer that sounds good,” says Barak. “It’s hard to find the exact balance between a model that never says anything and a model that does not make mistakes.”

"Wants to"

I find I have a weird take on this kind of anthropomorphism... It doesn't bother me when people say a rock "wants to" roll down hill. But in does when someone say an LLM "wants to"... equally I'm not nearly as bothered by something like the LLM "tries to"... It's a strange rule set for correct communication...

Anyways, please forgive that preemptive tangent. My primary point is

[ citation needed ]

I remember reading a paper praising GPT for being able to explain it's decision making process. This paper provided no evidence, no arguments, and no citations for this exceptionally wild claim. How is this not just a worse version of that claim? I ask that as a real question, so many people willingly believe and state that an LLM is able to (correctly) explain it's decision making process. How? Why isn't it better to assume that's just another hallucination? Especially given it would be nonfalsifiable?

Anthropic's philosopher answers your questions [video]

https://www.youtube.com/watch?v=I9aGC6Ui3eE
1•Topfi•2m ago•0 comments

Lithuanian Watchdog Fines Torrent Tracker Users for Pirating Local Blockbuster

https://torrentfreak.com/lithuanian-watchdog-fines-torrent-tracker-users-for-pirating-local-block...
1•gslin•6m ago•0 comments

International Atomic Time

https://en.wikipedia.org/wiki/International_Atomic_Time
1•basilikum•7m ago•0 comments

Show HN: Paperclip Maximizer Bench

https://d.erenrich.net/paperclip-bench/index.html
1•brokensegue•7m ago•0 comments

Earn $100 Today

https://smartrobotics.com.ng/earn-100-today/
1•jacdon•8m ago•0 comments

The Global Building Atlas

https://googlemapsmania.blogspot.com/2025/12/introducing-global-building-atlas.html
1•geox•9m ago•0 comments

Interactive 3D Electron Orbital Visualizer – Chemistry Tool

https://claude.ai/public/artifacts/de0c28f3-7a8e-4409-8fb1-53eabf3379ad
1•kordlessagain•11m ago•1 comments

'The Chair Company' Is a Show About How Fun It Is to Use the Computer

https://defector.com/the-chair-company-is-a-show-about-how-fun-it-is-to-use-the-computer
2•Apocryphon•12m ago•0 comments

Stop Paying for 4 AI Models

https://riskparody.substack.com/p/stop-paying-for-4-ai-models
2•chrislguo•12m ago•0 comments

AI Energy Score v2: Refreshed Leaderboard, Now with Reasoning

https://huggingface.co/blog/sasha/ai-energy-score-v2
2•todsacerdoti•14m ago•0 comments

Say Goodbye to the Billable Hour, Thanks to AI

https://www.wsj.com/tech/ai/ai-goodbye-to-billable-hours-cba198fe
2•ryan_j_naughton•16m ago•1 comments

Goodbye to an 11-year-old Issue

https://cassidoo.co/post/eleven-year-issue/
1•mooreds•17m ago•0 comments

Un.org Is Down

https://www.un.org/
2•gfalcao•18m ago•0 comments

Git History

https://alchemists.io/articles/git_history
1•mooreds•19m ago•0 comments

This Might Be the Biggest Thing in the Universe That Spins

https://gizmodo.com/this-might-be-the-biggest-thing-in-the-universe-that-spins-2000695690
1•mooreds•20m ago•0 comments

Probing the existence of a fifth force via neutron star cooling

https://phys.org/news/2025-12-probing-neutron-star-cooling.html
1•Brajeshwar•21m ago•0 comments

Close-up images show how stars explode in real time

https://phys.org/news/2025-12-images-stars-real.html
1•Brajeshwar•21m ago•0 comments

How to Build an Embedded ChatGPT App (MCP App)

https://sunpeak.ai/blogs/chatgpt-apps-what-i-wish-i-knew
1•abewheeler•21m ago•1 comments

Need laundry folded? Don't ask a robot

https://knowablemagazine.org/content/article/technology/2025/why-robots-cant-fold-laundry
2•Brajeshwar•21m ago•0 comments

Ask HN: Best place to be a founder outside of the US?

1•garbawarb•23m ago•0 comments

Comparing a Xbox Prototype recreation to the original at Microsoft [video]

https://www.youtube.com/watch?v=kfaJ7gFdmyM
1•zdw•24m ago•0 comments

Ask HN: How do you handle private DNS for dev/staging environments?

2•kenonet•28m ago•0 comments

The Rise of ChatGPT and the Industrialization of the Post-Meaning World

https://lithub.com/on-the-rise-of-chatgpt-and-the-industrialization-of-the-post-meaning-world/
1•pseudolus•28m ago•0 comments

Unaggregating Cloud Watch Metrics

https://tomlarkworthy.github.io/lopebooks/notebooks/@tomlarkworthy_unaggregating-cloudwatch-metri...
3•tlarkworthy•31m ago•0 comments

China's scientific clout is growing as US influence wanes

https://www.nature.com/articles/d41586-025-03956-y
3•xqcgrek2•31m ago•0 comments

Heuristics and Anecdotes from Books

https://lindybook.com/tools/knowledge-base/
1•janni23•32m ago•0 comments

California agencies eye BurnBot for wildfire prevention

https://www.therobotreport.com/california-agencies-eye-burnbot-for-wildfire-prevention/
1•bookofjoe•32m ago•0 comments

I built a tiny RSS generator for my Advent of Code solutions

https://hamatti.org/posts/i-built-a-tiny-rss-generator-for-my-advent-of-code-solutions/
1•todsacerdoti•32m ago•0 comments

How Tally hit their first $1M ARR with a boring product

https://www.firstmillion.club/p/tally
1•elananandhan•33m ago•0 comments

Skin-Shedding Code (2024)

https://registerspill.thorstenball.com/p/skin-shedding-code
4•Kerrick•35m ago•1 comments