frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Symbolic Circuit Distillation: prove program to LLM circuit equivalence

https://github.com/neelsomani/symbolic-circuit-distillation
16•nsomani•1d ago
Hi HN, I've been exploring various applications of formal methods to ML/interpretability and I've been hoping to get more eyes on the approach.

I have been working on a small interpretability project I call Symbolic Circuit Distillation. The goal is to take a tiny neuron-level circuit (like the ones in OpenAI's "Sparse Circuits" work) and automatically recover a concise Python program that implements the same algorithm, along with a bounded formal proof that the two are equivalent on a finite token domain.

Roughly, the pipeline is:

1. Start from a pruned circuit graph for a specific behavior (e.g. quote closing or bracket depth) extracted from a transformer. 2. Treat the circuit as an executable function and train a tiny ReLU network ("surrogate") that exactly matches the circuit on all inputs in a bounded domain (typically sequences of length 5–10 over a small token alphabet). 3. Search over a constrained DSL of common transformer motifs (counters, toggles, threshold detectors, small state machines) to synthesize candidate Python programs. 4. Use SMT-based bounded equivalence checking to either: - Prove that a candidate program and the surrogate agree on all inputs in the domain, or - Produce a counterexample input that rules the program out.

If the solver finds a proof, you get a small, human-readable Python function plus a machine-checkable guarantee that it matches the original circuit on that bounded domain.

Why I built this

Mechanistic interpretability has gotten pretty good at extracting "small crisp circuits" from large models, but turning those graphs into clean, human-readable algorithms is still very manual. My goal here is to automate that last step: go from "here is a sparse circuit" to "here is a verified algorithm that explains what it does", without hand-holding.

What works today

- Tasks: quote closing and bracket-depth detection from the OpenAI circuit_sparsity repo. - Exact surrogate fitting on a finite token domain. - DSL templates for simple counters, toggles, and small state machines. - SMT-based bounded equivalence between: sparse circuit -> ReLU surrogate -> Python program in the DSL.

Limitations and open questions

- The guarantees are bounded: equivalence is only proven on a finite token domain (short sequences and a small vocabulary). - Currently focused on very small circuits. Scaling to larger circuits and longer contexts is open engineering and research work. - The DSL is hand-designed around a few motifs. I am not yet learning the DSL itself or doing anything very clever in the search.

What I would love feedback on

- Are the problem framing and guarantees interesting to people working on mechanistic interpretability or formal methods? - Suggestions for next benchmarks: which circuits or behaviors would you want to see distilled next? - Feedback on the DSL design, search strategy, and SMT setup.

Happy to answer questions about implementation details, the SMT encoding, integration with OpenAI's Sparse Circuits repo, or anything else.

Comments

aappleby•1d ago
No examples in the readme?
nsomani•23h ago
There are two examples provided - quote matching and bracket closing.

Show HN: I visualized the entire history of Citi Bike in the browser

https://bikemap.nyc/
49•freemanjiang•7h ago•20 comments

Show HN: I built a "Do not disturb" Device for my home office

https://apoorv.page/blogs/over-engineered-dnd
71•quacky_batak•4d ago•35 comments

Show HN: An LLM response cache that's aware of dynamic data

https://blog.butter.dev/on-automatic-template-induction-for-response-caching
12•raymondtana•5h ago•1 comments

Show HN: Free and local browser tool for designing gear models for 3D printing

https://gears.dmtrkovalenko.dev
38•neogoose•17h ago•10 comments

Show HN: A to Z – A word game I built from a childhood road trip memory

https://a26z.fun/
9•jackhulbert•6h ago•6 comments

Show HN: SMTP Tunnel – A SOCKS5 proxy disguised as email traffic to bypass DPI

https://github.com/x011/smtp-tunnel-proxy
126•lobito25•1d ago•43 comments

Show HN: KeelTest – AI-driven VS Code unit test generator with bug discovery

https://keelcode.dev/keeltest
26•bulba4aur•12h ago•10 comments

Show HN: Kerns – A Continuous Research Workspace

https://www.kerns.ai/
4•kanodiaayush•3h ago•0 comments

Show HN: Comet MCP – Give Claude Code a browser that can click

https://github.com/hanzili/comet-mcp
25•hanzili•4d ago•26 comments

Show HN: VaultSandbox – Test your real MailGun/SES/etc. integration

https://vaultsandbox.com/
52•vaultsandbox•1d ago•9 comments

Show HN: Flatagents: State machine orchestration with stateless LLM agents

https://github.com/memgrafter/flatagents
2•beacon294•4h ago•1 comments

Show HN: Tylax – A bidirectional LaTeX to Typst converter in Rust

https://github.com/scipenai/tylax
18•democat•10h ago•2 comments

Show HN: Mantic.sh – A structural code search engine for AI agents

https://github.com/marcoaapfortes/Mantic.sh
74•marcoaapfortes•1d ago•34 comments

Show HN: Tool for Testing MCP Servers

https://www.mcp-workbench.ai/
2•opiniateddev•5h ago•0 comments

Show HN: 48-digit prime numbers every git commit

https://textonly.github.io/git-prime/
66•keepamovin•6d ago•52 comments

Show HN: bikemap.nyc – visualization of the entire history of Citi Bike

https://github.com/freeman-jiang/bikemap.nyc
6•freemanjiang•5h ago•3 comments

Show HN: Game Boy Release Timelines

https://gameboyessentials.com/timelines
2•philistine•5h ago•2 comments

Show HN: Grammar of Graphics CLI tool made in Rust

https://github.com/williamcotton/gramgraph
2•williamcotton•5h ago•0 comments

Show HN: Tailsnitch – A security auditor for Tailscale

https://github.com/Adversis/tailsnitch
271•thesubtlety•2d ago•28 comments

Show HN: Prism.Tools – Free and privacy-focused developer utilities

https://blgardner.github.io/prism.tools/
365•BLGardner•1d ago•99 comments

Show HN: Make audio loops online

https://makeloops.online/
66•bilalba•2d ago•22 comments

Show HN: Stash – Sync Markdown Files with Apple Notes via CLI

https://github.com/shakedlokits/stash
70•shuka•1d ago•21 comments

Show HN: DoNotNotify – Log and intelligently block notifications on Android

https://donotnotify.com/
339•awaaz•2d ago•163 comments

Show HN: I built a simple "Gemini" watermark remover extension, "Peel Banana"

https://chromewebstore.google.com/detail/peel-banana/cngdhnfjakplnhplnmlgjalmfcochdgj
2•davepoon•2h ago•0 comments

Show HN: Seapie – a Python debugger where breakpoints drop into a REPL

https://github.com/hirsimaki-markus/seapie
7•markushirsimaki•8h ago•1 comments

Show HN: llmgame.ai – The Wikipedia Game but with LLMs

https://www.llmgame.ai
24•jmcallister•1d ago•22 comments

Show HN: Jax-JS, array library in JavaScript targeting WebGPU

https://ss.ekzhang.com/p/jax-js-an-ml-library-for-the-web
80•ekzhang•1d ago•22 comments

Show HN: Foundertrace – chain of YC startups founded by its employees

https://foundertrace.com/
38•loondri•4d ago•13 comments

Show HN: GPU Cuckoo Filter – faster queries than Blocked Bloom, with deletion

https://github.com/tdortman/cuckoo-filter
31•tdortman•1d ago•4 comments

Show HN: DDL to Data – Generate realistic test data from SQL schemas

52•tbrannan•1d ago•29 comments