frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Interpretability: Understanding how AI models think – Anthropic [video]

https://www.youtube.com/watch?v=fGKNUvivvnc
2•Topfi•5mo ago

Comments

Topfi•5mo ago
A very informative, frank and comprehensive discussion on the current state of LLM interpretability, especially the discussion concerning faithfulness and being able to "trust" in the way a model appears to "think" through specific problems is very well explained, especially in regard to how models arrive at an output when being prompted to verify a result.

I very much appreciated the honesty regarding what is currently not fully understood in regard to how LLMs arrive at a specific output and their attempts to make this more verifiable. Makes sense considering Anthropic expends what appears to be some of the most (public) effort concerning in-depth understanding over chasing performance goals of the frontier LLM labs.

Especially found this part very well put and liked how they emphasized that even when using terms such as "thinking" in the context of LLMs, that should not be misconstrued to mean that what they are talking about can be map onto the way we are familiar with the term in our human, lived experience:

> I think for me the “do models think in the sense that they do some integration and processing and sequential stuff that can lead to surprising places”? Clearly yes, it'd be kind of crazy from interacting with them a lot for there not to be something going on. We can sort of start to see how it's happening. Then the “like humans” bit is interesting because I think some of that is trying to ask “what can I expect from these” because if it's sort of like me being good at this would that make it good at that? But if it's different from me then I don't really know what to look for. And so really we're just looking to understanding, where do we need to be extremely suspicious or are starting from scratch in understanding this and where can we sort of just reason from our own, very rich experience of thinking? And there I feel a little bit trapped because as a human, I project my own image constantly onto everything like they warned us in the Bible where I'm just like this piece of silicon, it's just like me made in my image where to some extent it's been trained to simulate dialogue between people. So, it's going to be very person-like in its affect. And so some “humanness” will get into it simply from the training, but then it's like using very different equipment that has different limitations. And so, the way it does that might be pretty different.

> To Emmanuel's point, I think we're in this tricky spot answering questions like this because we don't really have the right language for talking about what language models do. It's like we're doing biology, but before people figured out cells or before people figured out DNA. I think we're starting to fill in that understanding. As Emmanuel said, there are these cases now where we can really just go read our paper. You'll know how the model added these two numbers. And then if you want to call it human-like, if you want to call it thinking, or if you want to not, then it's up to you. But the real answer is just find the right language and the right abstractions for talking about the models. But in the meantime, currently we've only 20% succeeded at that scientific project. To fill in the other 80%, we sort of have to borrow analogies from other fields. And there's this question of which analogies are the most apt? Should we be thinking of the models like computer programs? Should we be thinking of them like little people? And it seems to be like in some ways that thinking of them like little people is kind of useful. It's like if I say mean things to the model, it talks back at me.

Would hope this discussion from top level experts may finally put to rest a common delusion I’ve encountered, whether online or offline (spanning industry members, lecturers, students and of course regular people), wherein some are assuming to fully understand how LLMs work at every level, which unfortunately, currently no one does. Any answer beyond, we do not have enough information yet and more research is very much needed, is sadly far to optimistic. Not holding my breath though, even less for social media comments.

Even worse is of course the argument "LLMs must work like (human) brains and by proxy be conscious because some output is similar to what humans might produce" which is akin to "This artifact looks like a modern thing (if you ignore a significant amount of details not serving your interpretation), therefore we had hyper diffusion/ancient aliens/power plant pyramids/ancient plane space ships"...

On another note, there are few things more nerdy in the traditional meaning of the term than a VC backed multi billion dollar company still relying on a Brother HL-L2400DW for their modest printing needs.

Logic Puzzles: Why the Liar Is the Helpful One

https://blog.szczepan.org/blog/knights-and-knaves/
1•wasabi991011•1m ago•0 comments

Optical Combs Help Radio Telescopes Work Together

https://hackaday.com/2026/02/03/optical-combs-help-radio-telescopes-work-together/
1•toomuchtodo•6m ago•1 comments

Show HN: Myanon – fast, deterministic MySQL dump anonymizer

https://github.com/ppomes/myanon
1•pierrepomes•12m ago•0 comments

The Tao of Programming

http://www.canonical.org/~kragen/tao-of-programming.html
1•alexjplant•13m ago•0 comments

Forcing Rust: How Big Tech Lobbied the Government into a Language Mandate

https://medium.com/@ognian.milanov/forcing-rust-how-big-tech-lobbied-the-government-into-a-langua...
1•akagusu•13m ago•0 comments

PanelBench: We evaluated Cursor's Visual Editor on 89 test cases. 43 fail

https://www.tryinspector.com/blog/code-first-design-tools
2•quentinrl•15m ago•1 comments

Can You Draw Every Flag in PowerPoint? (Part 2) [video]

https://www.youtube.com/watch?v=BztF7MODsKI
1•fgclue•21m ago•0 comments

Show HN: MCP-baepsae – MCP server for iOS Simulator automation

https://github.com/oozoofrog/mcp-baepsae
1•oozoofrog•24m ago•0 comments

Make Trust Irrelevant: A Gamer's Take on Agentic AI Safety

https://github.com/Deso-PK/make-trust-irrelevant
2•DesoPK•28m ago•0 comments

Show HN: Sem – Semantic diffs and patches for Git

https://ataraxy-labs.github.io/sem/
1•rs545837•30m ago•1 comments

Hello world does not compile

https://github.com/anthropics/claudes-c-compiler/issues/1
14•mfiguiere•35m ago•1 comments

Show HN: ZigZag – A Bubble Tea-Inspired TUI Framework for Zig

https://github.com/meszmate/zigzag
2•meszmate•38m ago•0 comments

Metaphor+Metonymy: "To love that well which thou must leave ere long"(Sonnet73)

https://www.huckgutman.com/blog-1/shakespeare-sonnet-73
1•gsf_emergency_6•40m ago•0 comments

Show HN: Django N+1 Queries Checker

https://github.com/richardhapb/django-check
1•richardhapb•55m ago•1 comments

Emacs-tramp-RPC: High-performance TRAMP back end using JSON-RPC instead of shell

https://github.com/ArthurHeymans/emacs-tramp-rpc
1•todsacerdoti•59m ago•0 comments

Protocol Validation with Affine MPST in Rust

https://hibanaworks.dev
1•o8vm•1h ago•1 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
3•gmays•1h ago•0 comments

Show HN: Zest – A hands-on simulator for Staff+ system design scenarios

https://staff-engineering-simulator-880284904082.us-west1.run.app/
1•chanip0114•1h ago•1 comments

Show HN: DeSync – Decentralized Economic Realm with Blockchain-Based Governance

https://github.com/MelzLabs/DeSync
1•0xUnavailable•1h ago•0 comments

Automatic Programming Returns

https://cyber-omelette.com/posts/the-abstraction-rises.html
1•benrules2•1h ago•1 comments

Why Are There Still So Many Jobs? The History and Future of Workplace Automation [pdf]

https://economics.mit.edu/sites/default/files/inline-files/Why%20Are%20there%20Still%20So%20Many%...
2•oidar•1h ago•0 comments

The Search Engine Map

https://www.searchenginemap.com
1•cratermoon•1h ago•0 comments

Show HN: Souls.directory – SOUL.md templates for AI agent personalities

https://souls.directory
1•thedaviddias•1h ago•0 comments

Real-Time ETL for Enterprise-Grade Data Integration

https://tabsdata.com
1•teleforce•1h ago•0 comments

Economics Puzzle Leads to a New Understanding of a Fundamental Law of Physics

https://www.caltech.edu/about/news/economics-puzzle-leads-to-a-new-understanding-of-a-fundamental...
3•geox•1h ago•1 comments

Switzerland's Extraordinary Medieval Library

https://www.bbc.com/travel/article/20260202-inside-switzerlands-extraordinary-medieval-library
3•bookmtn•1h ago•0 comments

A new comet was just discovered. Will it be visible in broad daylight?

https://phys.org/news/2026-02-comet-visible-broad-daylight.html
4•bookmtn•1h ago•0 comments

ESR: Comes the news that Anthropic has vibecoded a C compiler

https://twitter.com/esrtweet/status/2019562859978539342
2•tjr•1h ago•0 comments

Frisco residents divided over H-1B visas, 'Indian takeover' at council meeting

https://www.dallasnews.com/news/politics/2026/02/04/frisco-residents-divided-over-h-1b-visas-indi...
5•alephnerd•1h ago•5 comments

If CNN Covered Star Wars

https://www.youtube.com/watch?v=vArJg_SU4Lc
1•keepamovin•1h ago•1 comments