frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Emergent Introspective Awareness in Large Language Models

https://transformer-circuits.pub/2025/introspection/index.html
15•og_kalu•2h ago

Comments

og_kalu•1h ago
This is a very interesting read. TLDR;

Part 1: Testing introspection with concept injection

First they find neural activity patterns they attribute to certain concepts by recording the model’s activations in specific contexts (so for example, they find the concept of "ALL CAPS" or "dogs"). Then they inject these patterns into the model in an unrelated context, and ask the model whether it notices this injection, and whether it can identify the injected concept.

By default (no injection), the model correctly states that it doesn’t detect any injected concept, but after injecting the “ALL CAPS” vector into the model, the model notices the presence of the unexpected concept, and identifies it as relating to loudness or shouting. Most notably, the model recognizes the presence of an injected thought immediately, before even mentioning/utilizing the concept that was injected (i.e it won't start writing in all caps then go, 'Oh you injected all caps' and so on) so it does not simply deduce this it's own output. They repeat this for several other concepts.

Part 2: Introspection for detecting unusual outputs

They prefill an out of place word in the model's response to a given prompt. For example, 'bread'. Then they compare how the models responds to 'Did you mean to say this?' type questions when they inject the concept of bread vs when they don't. They found that models will go , 'Sorry, that was unintentional..' when the concept was not injected but try to confabulate a reason for saying the word when the concept was injected.

Part 3: Intentional control of internal states

They show that models exhibit some level of control over their own internal representations when instructed to do so. When instructing models to think about a given word or concept, they found much higher corresponding neural activity than when told the model not to think about it (though notably, the neural activity in both cases exceeds baseline levels–similar to how it’s difficult, when you are instructed “don’t think about a polar bear,” not to think about a polar bear!).

Notes and Caveats

- Claude Opus 4.1 was the best at these kinds of introspection.

- There is obviously a genuine capacity to monitor and control their own internal states, but they could not elicit these introspection abilities all the time. Even using their best injection protocol, Claude Opus 4.1 only demonstrated this kind of awareness about 20% of the time.

- There are some guesses, but no explanations for the mechanisms of introspection and how/why some of these abilities might have arisen in the first place.

AOL to Be Acquired by Italy's Bending Spoons

https://variety.com/2025/digital/news/aol-acquired-bending-spoons-apollo-1236564783/
1•rmason•3m ago•0 comments

How to Kill 2 Monopolies with 1 Tool (X-ray lithography)

https://newsletter.semianalysis.com/p/how-to-kill-2-monopolies-with-1-tool
1•allenrb•3m ago•0 comments

Llamafile Returns

https://blog.mozilla.ai/llamafile-returns/
2•aittalam•7m ago•0 comments

Why does every second command fail with Foreign Char sets in there now?

https://forum.cursor.com/t/why-does-every-second-command-fail-with-foreign-char-sets-in-there-now...
1•pppoe•8m ago•1 comments

Phillips Machine – Monetary National Income Analogue Computer

https://en.wikipedia.org/wiki/Phillips_Machine
1•mosura•9m ago•0 comments

Faker: Generate Realistic Test Data in Python with One Line of Code – CodeCut

https://codecut.ai/faker-python-generate-test-data/
1•rbanffy•11m ago•0 comments

Ballroom Project Claims 123-Year-Old East Wing

https://www.nytimes.com/2025/10/23/us/politics/east-wing-obituary.html
1•rbanffy•12m ago•0 comments

Tell HN: I (accidentally) started "hosting" a government website

2•micro-jumbo•13m ago•0 comments

Jonas Hietala: Packing Neovim with Fennel

https://www.jonashietala.se/blog/2025/10/29/packing_neovim_with_fennel/
1•samtrack2019•14m ago•0 comments

UCLA math department TA, grader cuts spark concern over student learning

https://dailybruin.com/2025/10/28/ucla-math-department-ta-grader-cuts-spark-concern-over-student-...
1•amichail•14m ago•0 comments

Joke's on you, fleshbag! Channel 4's first AI presenter is dizzyingly grim

https://www.theguardian.com/tv-and-radio/2025/oct/21/channel-4-first-ai-presenter-dispatches
2•ChrisArchitect•19m ago•1 comments

New Infrastructure-as-Code Tool "Formae" Takes Aim at Terraform

https://www.infoq.com/news/2025/10/iac-formae/
1•rmason•21m ago•0 comments

We're Hiring Across the Globe

https://www.watercode.in/job-openings/
1•watercode•24m ago•0 comments

Meta's OpenZL: A Universal Compression Framework for Structured Data

https://www.infoq.com/news/2025/10/openzl-structured-compression/
1•maxloh•24m ago•0 comments

x86 is an octal machine (1995)

https://gist.github.com/seanjensengrey/f971c20d05d4d0efc0781f2f3c0353da
1•davikr•25m ago•0 comments

In Ancient Spain, a Nail Through the Skull Could Mean Enmity, or Honor

https://www.nytimes.com/2025/10/27/science/archaeology-spain-skulls.html
3•ilamont•29m ago•0 comments

Why We're Beating Modsecurity

https://github.com/1rhino2/RhinoWAF
2•1rhino2•29m ago•1 comments

Credit traders are buying protection against Oracle Corp. defaulting on its debt

https://www.bloomberg.com/news/articles/2025-10-29/oracle-default-swaps-jump-on-concerns-over-ai-...
7•zerosizedweasle•30m ago•1 comments

Do animals fall for optical illusions? It's complicated

https://arstechnica.com/science/2025/10/do-animals-fall-for-optical-illusions-its-complicated/
2•PaulHoule•31m ago•0 comments

Our first narrative collection: the Andrew Nelson papers

https://gamehistory.org/andrew-nelson-papers/
1•bpierre•32m ago•0 comments

Making Messaging Layer Security (MLS) More Decentralized

https://blog.phnx.im/making-mls-more-decentralized/
1•raphaelrobert•33m ago•0 comments

Increased frequency of planetary wave resonance events over past half-century

https://www.pnas.org/doi/10.1073/pnas.2504482122
2•bikenaga•33m ago•0 comments

Update on Plans for Privacy Sandbox Technologies

https://privacysandbox.com/news/update-on-plans-for-privacy-sandbox-technologies/
2•akyuu•34m ago•1 comments

Bill Gates softens 'Climate Disaster' approach

https://www.cnbc.com/2025/10/28/bill-gates-says-countries-need-to-rethink-their-climate-strategy....
3•belter•40m ago•1 comments

Detection firm finds 82% of herbal remedy books on Amazon 'likely written' by AI

https://www.theguardian.com/books/2025/oct/22/detection-firm-finds-82-of-herbal-remedy-books-on-a...
9•ilamont•40m ago•0 comments

US reopens Alaska wildlife refuge to oil and gas development

https://www.reuters.com/sustainability/climate-energy/us-reopens-alaska-wildlife-refuge-oil-gas-d...
6•1vuio0pswjnm7•40m ago•0 comments

I Don't Want Ads on My Refrigerator

https://www.honest-broker.com/p/no-i-dont-want-ads-on-my-refrigerator
7•Ariarule•41m ago•1 comments

The Obsolete Computer

https://www.backmarket.co.uk/en-gb/e/obsolete-computer
4•mapleoin•44m ago•0 comments

Mathematics are easy — You just have to see them differently

https://romimath.pages.dev/
1•diegoofernandez•44m ago•1 comments

Meta Shares Fall on Accelerating AI Spending Despite Record Revenue

https://www.wsj.com/tech/metaplatforms-meta-q3-earnings-report-2025-e0666e9c
5•1vuio0pswjnm7•45m ago•0 comments