I think the more people looking at this the better. I have a feeling there will be some breakthroughs in identifying important circuits and being able to make more efficient model architectures that are bootstrapped from some identified primitives.
https://open.spotify.com/episode/3H46XEWBlUeTY1c1mHolqh?si=L...
https://www.dwarkesh.com/p/sholto-trenton-2 -- search the transcript for "circuit" for the quick bits.
Eg, "If you look at the circuit, you can see that it's not actually doing any of the math, it's paying attention to that you think the answer's four and then it's reasoning backwards about how it can manipulate the intermediate computation to give you an answer of four."
[1]: https://transformer-circuits.pub/2021/garcon/index.html
This is a new tool which relies on existing introspection libraries like TransformerLens (which is similar in spirit to Garcon) to build an attribution graph. This graph displays intermediate computational steps the model took to sample a token.
For more details on the method, see this paper: https://transformer-circuits.pub/2025/attribution-graphs/met....
For examples of using it to study Gemma 2, check out the linked notebooks: https://github.com/safety-research/circuit-tracer/blob/main/...)
We also document some findings on Claude 3.5 Haiku here: https://transformer-circuits.pub/2025/attribution-graphs/bio...)
Have fun
https://gist.github.com/jexp/8d991d1e543c5a576a3f1ee70132ce7...
Eduard•1d ago
dvh•1d ago
Workaccount2•1d ago
buescher•1d ago
AdamH12113•1d ago
duskwuff•1d ago
1wheel•1d ago
https://www.neuronpedia.org/gemma-2-2b/graph?slug=pcb-tracin...
tacker2000•1d ago
forgotpwagain•1d ago
Henchman21•1d ago
Funny things, thoughts.
asadm•1d ago
mrheosuper•1d ago
Archit3ch•16h ago