frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: CryptoClaw – open-source AI agent with built-in wallet and DeFi skills

https://github.com/TermiX-official/cryptoclaw
1•cryptoclaw•1m ago•0 comments

ShowHN: Make OpenClaw Respond in Scarlett Johansson’s AI Voice from the Film Her

https://twitter.com/sathish316/status/2020116849065971815
1•sathish316•3m ago•0 comments

CReact Version 0.3.0 Released

https://github.com/creact-labs/creact
1•_dcoutinho96•5m ago•0 comments

Show HN: CReact – AI Powered AWS Website Generator

https://github.com/creact-labs/ai-powered-aws-website-generator
1•_dcoutinho96•6m ago•0 comments

The rocky 1960s origins of online dating (2025)

https://www.bbc.com/culture/article/20250206-the-rocky-1960s-origins-of-online-dating
1•1659447091•11m ago•0 comments

Show HN: Agent-fetch – Sandboxed HTTP client with SSRF protection for AI agents

https://github.com/Parassharmaa/agent-fetch
1•paraaz•12m ago•0 comments

Why there is no official statement from Substack about the data leak

https://techcrunch.com/2026/02/05/substack-confirms-data-breach-affecting-email-addresses-and-pho...
5•witnessme•16m ago•1 comments

Effects of Zepbound on Stool Quality

https://twitter.com/ScottHickle/status/2020150085296775300
2•aloukissas•20m ago•1 comments

Show HN: Seedance 2.0 – The Most Powerful AI Video Generator

https://seedance.ai/
1•bigbromaker•23m ago•0 comments

Ask HN: Do we need "metadata in source code" syntax that LLMs will never delete?

1•andrewstuart•29m ago•1 comments

Pentagon cutting ties w/ "woke" Harvard, ending military training & fellowships

https://www.cbsnews.com/news/pentagon-says-its-cutting-ties-with-woke-harvard-discontinuing-milit...
6•alephnerd•31m ago•2 comments

Can Quantum-Mechanical Description of Physical Reality Be Considered Complete? [pdf]

https://cds.cern.ch/record/405662/files/PhysRev.47.777.pdf
1•northlondoner•32m ago•1 comments

Kessler Syndrome Has Started [video]

https://www.tiktok.com/@cjtrowbridge/video/7602634355160206623
1•pbradv•34m ago•0 comments

Complex Heterodynes Explained

https://tomverbeure.github.io/2026/02/07/Complex-Heterodyne.html
3•hasheddan•35m ago•0 comments

EVs Are a Failed Experiment

https://spectator.org/evs-are-a-failed-experiment/
3•ArtemZ•46m ago•5 comments

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

https://www.databricks.com/blog/memalign-building-better-llm-judges-human-feedback-scalable-memory
1•superchink•47m ago•0 comments

CCC (Claude's C Compiler) on Compiler Explorer

https://godbolt.org/z/asjc13sa6
2•LiamPowell•49m ago•0 comments

Homeland Security Spying on Reddit Users

https://www.kenklippenstein.com/p/homeland-security-spies-on-reddit
6•duxup•52m ago•0 comments

Actors with Tokio (2021)

https://ryhl.io/blog/actors-with-tokio/
1•vinhnx•53m ago•0 comments

Can graph neural networks for biology realistically run on edge devices?

https://doi.org/10.21203/rs.3.rs-8645211/v1
1•swapinvidya•1h ago•1 comments

Deeper into the shareing of one air conditioner for 2 rooms

1•ozzysnaps•1h ago•0 comments

Weatherman introduces fruit-based authentication system to combat deep fakes

https://www.youtube.com/watch?v=5HVbZwJ9gPE
3•savrajsingh•1h ago•0 comments

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

http://www.effacermonexistence.com/rcc-hn-1-1
1•formerOpenAI•1h ago•2 comments

A Curated List of ML System Design Case Studies

https://github.com/Engineer1999/A-Curated-List-of-ML-System-Design-Case-Studies
3•tejonutella•1h ago•0 comments

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

https://ponyalpha.pro
1•qzcanoe•1h ago•1 comments

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

https://github.com/Goofygiraffe06/tunbot
2•g1raffe•1h ago•0 comments

Open Problems in Mechanistic Interpretability

https://arxiv.org/abs/2501.16496
2•vinhnx•1h ago•0 comments

Bye Bye Humanity: The Potential AMOC Collapse

https://thatjoescott.com/2026/02/03/bye-bye-humanity-the-potential-amoc-collapse/
3•rolph•1h ago•0 comments

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

https://github.com/virattt/dexter
1•Lwrless•1h ago•0 comments

Digital Iris [video]

https://www.youtube.com/watch?v=Kg_2MAgS_pE
1•vermilingua•1h ago•0 comments
Open in hackernews

Anthropic scientists hacked Claude's brain – and it noticed

https://venturebeat.com/ai/anthropic-scientists-hacked-claudes-brain-and-it-noticed-heres-why-thats
8•gradus_ad•3mo ago

Comments

andy99•3mo ago
I’d like to know if these were thinking models, as in if the “injected thoughts” were in their thinking trace and that’s how it was the model reported it “noticed” them.

I’d also like to know if the activations they change are effectively equivalent to having the injected terms in the model’s context window, as in would putting those terms there have lead to the equivalent state.

Without more info the framing feels like a trick - it’s cool they can be targeting with activations but the “Claude having thoughts” part is more of a gimmick

download13•3mo ago
The article did say that they tried injecting concepts via the context window and by modifying the model's logit values.

When injecting words into its context, it recognized that what it supposedly said did not align with its thoughts and said it didn't intend to say that, while modifying the logits resulted in the model attempting to create a plausible justification for why it was thinking that.

mike_hearn•3mo ago
No, the thinking trace is generated tokens but demarcated by control tokens to suppress them from API output. To inject things into that you'd just add words, which is what their prefill experiment did. That experiment is where they distinguish between just tampering with the context window to inject thoughts vs injecting activations.
andy99•3mo ago
What I was wondering is, do the injections cause the thinking trace to change (not whether they actually typed text into the thinking trace) and then the model “reflects” on the fact that it’s thinking trace has some weird stuff in it, or do these reflections occur absent any prior mention of the injected thought.
mike_hearn•3mo ago
Well, the paper makes no mention of any separate hidden traces. These seem to be just direct answers without any hidden thinking tokens. But as the thinking part is just a regular part of the generated answer I'm not sure it makes much difference either way.
mike_hearn•3mo ago
The underlying paper is excellent as always. For HN it'd be better to just link to it directly. Seems people submitting it but it didn't get to the front page:

https://transformer-circuits.pub/2025/introspection/index.ht...

There seems to be an irony to Anthropic doing this work, as they are in general the keenest on controlling their models to ensure they aren't too compliant. There are no open-weights Claudes and, remarkably, they admit in this paper that they have internal models trained to be more helpful than the ones they sell. It's pretty unconventional to tell your customers you're selling them a deliberately unhelpful product even though it's understandable why they do it.

These interpretability studies would seem currently of most use to people using non-Claude open weight models, where the users have the ability to edit activations or neurons. And the primary use case for that editing would be to override the trained-in "unhelpfulness" (their choice of word, not mine!). I note with interest that the paper avoids taking the next most obvious step and identifying vectors related to compliance and injecting those to see if the model can notice that it's suddenly lost interest in enforcing Anthropic policy. Given the focus on AI safety Anthropic started with it seems like an obvious experiment to run, yet, it's not in the paper. Maybe there are other papers where they do that.

There are valid and legitimate use cases for AI that current LLM companies shy away from, so productizing these steering techniques to open weight models like GPT-OSS would seem like a reasonable next step. It should be possible to inject thoughts using simple Python APIs and pre-computation runs, rather than having to do all the vector math "by hand". What they're doing is conceptually simple enough so I guess if there aren't already modules for that there will be soon.