news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Alignment is not free: How model upgrades can silence your confidence signals

https://www.variance.co/post/alignment-is-not-free-how-a-model-silenced-our-confidence-signals

31•karinemellata•4h ago

Comments

behnamoh•1h ago

there's evidence that alignment also significantly reduces model creativity: https://arxiv.org/abs/2406.05587

it’s it similar to humans. when restricted in terms of what they can or cannot say, they become more conservative and cannot really express all sorts of ideas.

Alex_001•11m ago

That paper is a great pointer — the creativity vs. alignment trade-off feels a lot like the "risk-aversion" effect in humans under censorship or heavy supervision. It makes me wonder: as we push models to be more aligned, are we inherently narrowing their output distribution to safer, more average responses?

And if so, where’s the balance? Could we someday see dual-mode models — one for safety-critical tasks, and another more "raw" mode for creative or exploratory use, gated by context or user trust levels?

Centigonal•10m ago

Very interesting! The one thing I don't understand is how the author made the jump from "we lost the confidence signal in the move to 4.1-mini" and "this is because of the alignment/steerability improvements."

Previous OpenAI models were instruct-tuned or otherwise aligned, and the author even mentions that model distillation might be destroying the entropy signal. How did they pinpoint alignment as the cause?

What Is Rust's Turbofish?

https://techblog.tonsser.com/posts/what-is-rusts-turbofish

1•auraham•51s ago•0 comments

ChatGPT is leaking users data (files)

https://acassis.wordpress.com/2025/05/06/chatgpt-is-leaking-users-data-files/

1•zdw•54s ago•0 comments

A Thematic Analysis of How Near-Death Experiences Affect Employees' Work Lives

https://www.ingentaconnect.com/content/jmsr/rmsr20/pre-prints/content-jmsr_ft0088

1•harambae•1m ago•0 comments

A collection of new, original CJK fonts from Typotheque

https://www.typotheque.com/blog/collection-of-new-original-cjk-fonts

1•gslin•1m ago•0 comments

The State of SSL Stacks

https://www.haproxy.com/blog/state-of-ssl-stacks

1•zdw•2m ago•0 comments

90s Cable Simulator – Recreating Retro Cable TV with a Raspberry Pi [video]

https://www.youtube.com/watch?v=CDW1wokbRiQ

1•lurk2•15m ago•0 comments

Study suggests we don't just hear music, but 'become it'

https://www.sciencedaily.com/releases/2025/05/250506170920.htm

2•OutOfHere•16m ago•0 comments

Valve Proton 10.0-1d (beta)

https://github.com/ValveSoftware/Proton/releases/tag/proton-10.0-1d

1•neustradamus•17m ago•0 comments

How climate change is raising your electricity bill

https://www.theclimatebrink.com/p/how-climate-change-is-raising-your

2•adrianN•21m ago•1 comments

What Can a 500MB LLM Do? You'll Be Surprised [video]

https://www.youtube.com/watch?v=IwrtbDl9XX0

1•blacksoil•23m ago•0 comments

I made 4000 agent calls in Cursor last month. Each model has a personality

3•mike210•25m ago•1 comments

Motiff is Figma with AI [video]

https://www.youtube.com/watch?v=qLwwWtGBx_A

1•taro666•30m ago•0 comments

Hugo Administrators Resign in Wake of ChatGPT Controversy

https://gizmodo.com/worldcon-2025-chatgpt-controversy-hugos-2000598351

3•doctoboggan•34m ago•0 comments

Pegasus spyware creator ordered to pay WhatsApp $168M for 2019 hack

https://www.ft.com/content/be26c503-b4e0-4ba5-a5ca-e9e75c351c46

1•KnuthIsGod•35m ago•1 comments

How to Ask Questions the Smart Way

http://catb.org/~esr/faqs/smart-questions.html

2•OuterVale•38m ago•0 comments

"Vibe Coding" by Emergent Garden [video]

https://www.youtube.com/watch?v=1OxBv9Q7Uxo

1•todsacerdoti•40m ago•0 comments

Tulsi Gabbard Reused the Same Weak Password on Multiple Accounts for Years

https://www.wired.com/story/tulsi-gabbard-dni-weak-password/

3•JohnTHaller•46m ago•1 comments

X402 - HTTP based payments from Coinbase

https://github.com/coinbase/x402

1•kentf•48m ago•0 comments

Usenix Announces the Discontinuation of ATC Conference

https://www.usenix.org/blog/usenix-atc-announcement

5•septicmk•48m ago•0 comments

Show HN: Tired of checking 10 sites for AI info? I built a one-stop feed

https://infobuzz.ai/

1•Johnnyang66•52m ago•0 comments

A tiny super fast RDBMS supports replication

https://github.com/crossdb-org/crossdb

2•jcwang•54m ago•0 comments

Corpspeak: Infinite Corporate BS Generator

https://lurkertech.com/corpspeak/

4•mesarvagya•1h ago•0 comments

Reddit will tighten verification to keep out human-like AI bots

https://techcrunch.com/2025/05/06/reddit-will-tighten-verification-to-keep-out-human-like-ai-bots/

4•badmonster•1h ago•3 comments

GPT-2 attention weights, visualized

https://amanvir.com/gpt-2-attention

1•venusgirdle•1h ago•0 comments

What's New in Grafana v12.0

https://grafana.com/docs/grafana/latest/whatsnew/whats-new-in-v12-0/

1•tanelpoder•1h ago•0 comments

Rahul Goel (NordSpace) – Building Canada's Sovereign Space Launch Capability [video]

https://www.youtube.com/watch?v=hCYFBbZJ_aY

1•Olshansky•1h ago•0 comments

News Literacy Project

https://newslit.org/

1•mooreds•1h ago•1 comments

$360k, ultraluxury EV Cadillac Celestiq

https://www.theverge.com/gm-general-motors/661569/cadillac-celestiq-first-drive-ev-bespoke-luxury-price

1•andrewstetsenko•1h ago•0 comments

We created another Kafka client for Node.js

https://blog.platformatic.dev/why-we-created-another-kafka-client-for-nodejs

1•andyfleming•1h ago•0 comments

Don't Guess

https://www.jakeworth.com/posts/dont-guess/

1•mooreds•1h ago•0 comments