frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Natural Language Autoencoders: Turning Claude's Thoughts into Text

https://www.anthropic.com/research/natural-language-autoencoders
62•instagraham•1h ago

Comments

tjohnell•1h ago
It will inevitably learn how to think in a way that translates to one (moral) meaning and back but has an ulterior meaning underneath.
rotcev•50m ago
This is exactly what I first thought. “The user appears to be attempting to decode my previous thought process, …”, the question is whether or not the model will be able to internalize this in such a way that is undetectable to the aforementioned technique.
astrange•24m ago
That shouldn't happen as long as the autoencoder isn't used as an RL reward. It will happen (due to Goodhart's law) if it is.

Of course, if you use it to make any decision that can still happen eventually.

visarga•1h ago
Beautiful idea, an autoencoder must represent everything without hiding if is to recover the original data closely. So it trains a model to verbalize embeddings well. This reveals what we want to know about the model (such as when it thinks it is being tested, or other hidden thoughts).
firemelt•1h ago
finally a something interesting but this only makes me think that the last judgement is still in human hands to judge claude inner thoughts is correct or not

I mean who knows if those are really claude thoughts or claude just think that is his thoughts because humans wants it

Tossrock•54m ago
Anthropic Research going from strength to strength in interpretability. Publicly releasing the code so other labs can benefit from it is also a great move - very values aligned, and improves the overall AI safety ecosystem.
zozbot234•53m ago
Anthropic has released open weight models for translating the activations of existing models, viz. Qwen 2.5 (7B), Gemma 3 (12B, 27B) and Llama 3.3 (70B) into natural language text. https://github.com/kitft/natural_language_autoencoders https://huggingface.co/collections/kitft/nla-models This is huge news and it's great to see Anthropic finally engage with the Hugging Face and open weights community!
NitpickLawyer•50m ago
> We also release an interactive frontend for exploring NLAs on several open models through a collaboration with Neuronpedia.

Whatever they did on LLama didn't work, nothing makes sense in their example where they ask the model to lie about 1+1. Either the model is too old, or whatever they used isn't working, but whatever the autoencoder outputs is nothing like their examples with claude. Gemma is similarly bad.

fredericoluz•24m ago
same. i'm trying to trigger the 'mom is in the next room' russian thing but the model thinks the sentence is from american reddit.
fredericoluz•21m ago
it seems that the examples they showed off with haiku work. i'd guess llama is just too bad

The map that keeps Burning Man honest

https://www.not-ship.com/burning-man-moop/
428•speckx•5h ago•192 comments

Agents need control flow, not more prompts

https://bsuh.bearblog.dev/agents-need-control-flow/
143•bsuh•3h ago•71 comments

Natural Language Autoencoders: Turning Claude's Thoughts into Text

https://www.anthropic.com/research/natural-language-autoencoders
64•instagraham•1h ago•12 comments

AlphaEvolve: Gemini-powered coding agent scaling impact across fields

https://deepmind.google/blog/alphaevolve-impact/
193•berlianta•4h ago•75 comments

DeepSeek 4 Flash local inference engine for Metal

https://github.com/antirez/ds4
155•tamnd•4h ago•50 comments

AI Slop Is Killing Online Communities

https://rmoff.net/2026/05/06/ai-slop-is-killing-online-communities/
102•thm•1h ago•66 comments

Child marriages plunged when girls stayed in school in Nigeria

https://www.nature.com/articles/d41586-026-00720-8
270•surprisetalk•6h ago•191 comments

Chrome removes claim of On-device Al not sending data to Google Servers

https://old.reddit.com/r/chrome/comments/1t5qayz/chrome_removes_claim_of_ondevice_al_not_sending/
261•newsoftheday•3h ago•91 comments

I want to live like Costco people

https://tastecooking.com/i-want-to-live-like-costco-people/
98•speckx•4h ago•229 comments

Principles for agent-native CLIs

https://twitter.com/trevin/status/2051316002730991795
22•blumpy22•2h ago•4 comments

PySimpleGUI 6

https://github.com/PySimpleGUI/PySimpleGUI
61•geophph•2d ago•22 comments

OpenBSD Stories: The closest thing to cute kittens (OpenBSD/zaurus)

http://miod.online.fr/software/openbsd/stories/zaurus1.html
44•zdw•1d ago•5 comments

The Self-Cancelling Subscription

https://predr.ag/blog/the-self-cancelling-subscription/
114•surprisetalk•5h ago•49 comments

Dirtyfrag: Universal Linux LPE

https://www.openwall.com/lists/oss-security/2026/05/07/8
6•flipped•28m ago•0 comments

RaTeX: KaTeX-compatible LaTeX rendering engine in pure Rust

https://ratex.lites.dev/
132•atilimcetin•3d ago•81 comments

SQLite Is a Library of Congress Recommended Storage Format

https://sqlite.org/locrsf.html
570•whatisabcdefgh•21h ago•173 comments

Motherboard sales 'collapse' amid unprecedented shortages fueled by AI

https://www.tomshardware.com/pc-components/motherboards/motherboard-sales-collapse-by-more-than-2...
167•speckx•4h ago•190 comments

MPEG-2 Transport Stream Packaging for Media over QUIC Transport

https://www.ietf.org/archive/id/draft-gregoire-moq-msfts-00.html
42•mondainx•5h ago•12 comments

Colored Shadow Penumbra

https://chosker.github.io/blog/colored-shadow-penumbra
3•ibobev•45m ago•0 comments

Printing Blogs

https://fi-le.net/print/
23•fi-le•1d ago•5 comments

Show HN: Stage CLI – an easier way of reading your AI generated changes locally

https://github.com/ReviewStage/stage-cli
20•cpan22•4h ago•14 comments

GovernGPT (YC W24) Is Hiring Engineers to Build Thinking Systems in Montreal

https://www.ycombinator.com/companies/governgpt/jobs/hRyltS0-backend-engineer-thinking-systems
1•owalerys•7h ago

Nobody Reviews Compiler Output

https://skiplabs.io/blog/codegen_as_compiler
4•rzk•2d ago•0 comments

OurCar: What I learned making an app for my family

https://mendelgreenberg.com/posts/ourcar/
78•chabad360•1d ago•54 comments

Show HN: TRUST – Coding Rust like it's 1989

https://github.com/wojtczyk/trust
85•wojtczyk•13h ago•57 comments

Boris Cherny: TI-83 Plus Basic Programming Tutorial (2004)

https://www.ticalc.org/programming/columns/83plus-bas/cherny/
161•suoken•2d ago•69 comments

Brazil's Pix Payment System Faces Pressure from Visa and Mastercard

https://www.elciudadano.com/en/brazils-pix-payment-system-faces-pressure-from-visa-and-mastercard...
54•wslh•2h ago•36 comments

How Cloudflare responded to the “Copy Fail” Linux vulnerability

https://blog.cloudflare.com/copy-fail-linux-vulnerability-mitigation/
66•mobeigi•6h ago•55 comments

ZAYA1-8B matches DeepSeek-R1 on math with less than 1B active parameters

https://firethering.com/zaya1-8b-open-source-math-coding-model/
66•steveharing1•10h ago•49 comments

ProgramBench: Can language models rebuild programs from scratch?

https://arxiv.org/abs/2605.03546
122•jonbaer•16h ago•69 comments