frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Training LLMs for Honesty via Confessions

https://arxiv.org/abs/2512.08093
2•arabello•1h ago

Comments

manarth•1h ago

    > "dishonesty may arise due to the effects of reinforcement learning (RL), where challenges with reward shaping can result in a training process that inadvertently incentivizes the model to lie or misrepresent its actions"
    > "As long as the "path of least resistance" for maximizing confession reward is to surface misbehavior rather than covering it up, this incentivizes models to be honest"
Humans might well benefit from this style of reward-shaping too.

    > "We find that when the model lies or omits shortcomings in its "main" answer, it often confesses to these behaviors honestly, and this confession honesty modestly improves with training."
I couldn't see whether this also tracks in the primary model answer, or if the "honesty" improvements are confined to the digital confession booth?

Ask HN: ArXiv Endorsement as Independent Researcher

1•7777777phil•1m ago•0 comments

CodeClash

https://codeclash.ai/
1•falcor84•2m ago•0 comments

Show HN: LogDeck – Container management and logs viewer for self-hosters

https://github.com/AmoabaKelvin/logdeck
1•KelvinAmoaba•5m ago•0 comments

How to Identify AI-Written Web Fiction

https://recordcrash.substack.com/p/how-to-identify-ai-written-web-fiction
1•networked•6m ago•0 comments

What happens when the coding becomes the least interesting part of the work

https://obie.medium.com/what-happens-when-the-coding-becomes-the-least-interesting-part-of-the-wo...
2•avivby•7m ago•0 comments

How to Launch on Hacker News (2012) [video]

https://www.youtube.com/watch?v=KjbQU6LmZzQ
1•ensocode•11m ago•0 comments

Is it just me or is fedora becoming mainstream

https://old.reddit.com/r/Fedora/comments/1pkk5k8/is_it_just_me_or_is_fedora_becoming_mainstream/
1•sipofwater•15m ago•0 comments

Programmer Network Platform

https://programmer.network
1•agjs•16m ago•0 comments

Installing Every NixOS Package

https://unnamed.website/posts/installing-every-nixos-package/
1•OuterVale•20m ago•0 comments

Ask HN: What is Fullstack Engineering?

1•grandimam•23m ago•0 comments

Algorithms do widen the divide: Social media feeds shape political polarization

https://english.elpais.com/technology/2025-11-27/algorithms-do-widen-the-divide-social-media-feed...
2•PaulHoule•24m ago•0 comments

Atlassian Security Bulletin – December 11 2025

https://confluence.atlassian.com/security/security-bulletin-december-11-2025-1689616574.html
1•gjvc•26m ago•0 comments

Southeast Asia seeks its place in space

https://www.technologyreview.com/2025/12/12/1129235/dispatch-thai-space-expo-southeast-asia-explo...
1•fleahunter•28m ago•0 comments

Gitlantis: Drive a boat through your codebase in 3D

https://marketplace.visualstudio.com/items?itemName=brian-njogu.gitlantis
1•summarity•29m ago•0 comments

Tricking a Security AI agent into pwning itself

https://www.hacktivesecurity.com/blog/2025/12/10/cve-2025-67511-tricking-a-security-ai-agent-into...
1•edoardottt•33m ago•0 comments

Ask your LLM for receipts: What I learned teaching Claude C++ crash triage

http://addxorrol.blogspot.com/2025/12/ask-your-llm-for-receipts-what-i.html
1•tdullien•35m ago•0 comments

Adopt Unicode Characters

https://aac.unicode.org/adopt
1•selvan•36m ago•0 comments

US threatens new ICC sanctions unless court pledges not to prosecute Trump

https://www.reuters.com/world/us/us-threatens-new-icc-sanctions-unless-court-pledges-not-prosecut...
9•jeroenhd•37m ago•0 comments

The most performant, secure, scalable, reliable, freest data platform

https://averagedatabase.com
1•tamnd•37m ago•0 comments

Simplify GPU Programming with Nvidia CUDA Tile in Python – Nvidia Technical Blog

https://developer.nvidia.com/blog/simplify-gpu-programming-with-nvidia-cuda-tile-in-python/
2•rbanffy•40m ago•0 comments

Hogwarts Legacy is free on the Epic Games Store for a limited time

https://www.windowscentral.com/gaming/hogwarts-legacy-epic-games-store-freebie-the-game-awards-20...
1•cyrc•41m ago•0 comments

Explaining Christmas to developers as a system architecture story

1•frafdez•48m ago•0 comments

Show HN: Workmux – Parallel development in tmux with Git worktrees

https://github.com/raine/workmux
2•rane•49m ago•0 comments

Native ads coming soon to Stack Overflow and Stack Exchange

https://meta.stackexchange.com/questions/415259/native-ads-coming-soon-to-stack-overflow-and-stac...
6•exploraz•50m ago•5 comments

How does a "you interview for US company, we do the work" scam work?

8•marttilaine•57m ago•7 comments

Show HN: AI Advent Calendar – 25 practical AI tips for small business owners

https://advent.abasis.ai
2•akimov_pro•1h ago•0 comments

We're launching Bindu, a simple way to connect AI agents

https://github.com/GetBindu/Bindu
1•ai_biden•1h ago•6 comments

After 27 years within budget Austria open 6thlongest railway tunnel in the world

https://infrastruktur.oebb.at/en/projects-for-austria/railway-lines/southern-line-vienna-villach/...
3•fzeindl•1h ago•0 comments

State of Developer Ecosystem Report 2025 (JetBrains)

https://devecosystem-2025.jetbrains.com
1•pjmlp•1h ago•0 comments

Reddit launches high court challenge to Australia's under-16s social media ban

https://www.theguardian.com/australia-news/2025/dec/12/reddit-high-court-challenge-social-media-b...
5•trocado•1h ago•0 comments