Show HN: Symbolic Circuit Distillation: prove program to LLM circuit equivalence

https://github.com/neelsomani/symbolic-circuit-distillation

16•nsomani•1mo ago

Hi HN, I've been exploring various applications of formal methods to ML/interpretability and I've been hoping to get more eyes on the approach.

I have been working on a small interpretability project I call Symbolic Circuit Distillation. The goal is to take a tiny neuron-level circuit (like the ones in OpenAI's "Sparse Circuits" work) and automatically recover a concise Python program that implements the same algorithm, along with a bounded formal proof that the two are equivalent on a finite token domain.

Roughly, the pipeline is:

1. Start from a pruned circuit graph for a specific behavior (e.g. quote closing or bracket depth) extracted from a transformer. 2. Treat the circuit as an executable function and train a tiny ReLU network ("surrogate") that exactly matches the circuit on all inputs in a bounded domain (typically sequences of length 5–10 over a small token alphabet). 3. Search over a constrained DSL of common transformer motifs (counters, toggles, threshold detectors, small state machines) to synthesize candidate Python programs. 4. Use SMT-based bounded equivalence checking to either: - Prove that a candidate program and the surrogate agree on all inputs in the domain, or - Produce a counterexample input that rules the program out.

If the solver finds a proof, you get a small, human-readable Python function plus a machine-checkable guarantee that it matches the original circuit on that bounded domain.

Why I built this

Mechanistic interpretability has gotten pretty good at extracting "small crisp circuits" from large models, but turning those graphs into clean, human-readable algorithms is still very manual. My goal here is to automate that last step: go from "here is a sparse circuit" to "here is a verified algorithm that explains what it does", without hand-holding.

What works today

- Tasks: quote closing and bracket-depth detection from the OpenAI circuit_sparsity repo. - Exact surrogate fitting on a finite token domain. - DSL templates for simple counters, toggles, and small state machines. - SMT-based bounded equivalence between: sparse circuit -> ReLU surrogate -> Python program in the DSL.

Limitations and open questions

- The guarantees are bounded: equivalence is only proven on a finite token domain (short sequences and a small vocabulary). - Currently focused on very small circuits. Scaling to larger circuits and longer contexts is open engineering and research work. - The DSL is hand-designed around a few motifs. I am not yet learning the DSL itself or doing anything very clever in the search.

What I would love feedback on

- Are the problem framing and guarantees interesting to people working on mechanistic interpretability or formal methods? - Suggestions for next benchmarks: which circuits or behaviors would you want to see distilled next? - Feedback on the DSL design, search strategy, and SMT setup.

Happy to answer questions about implementation details, the SMT encoding, integration with OpenAI's Sparse Circuits repo, or anything else.

Comments

aappleby•1mo ago

No examples in the readme?

nsomani•1mo ago

There are two examples provided - quote matching and bracket closing.

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: Stacky – certain block game clone

Show HN: A toy compiler I built in high school (runs in browser)

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: Nginx-defender – realtime abuse blocking for Nginx

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: Slack CLI for Agents

Show HN: ARM64 Android Dev Kit

Show HN: MCP App to play backgammon with your LLM

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Show HN: Which chef knife steels are good? Data from 540 Reddit tread

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: I Hacked My Family's Meal Planning with an App

Show HN: Slop News – HN front page now, but it's all slop

Show HN: I built a free UCP checker – see if AI agents can find your store

Show HN: Daily-updated database of malicious browser extensions

Show HN: XAPIs.dev – Twitter API Alternative at 90% Lower Cost

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: Horizons – OSS agent execution engine

Show HN: Compile-Time Vibe Coding

Show HN: Local task classifier and dispatcher on RTX 3080