Show HN: Hashing Go Functions Using SSA and Scalar Evolution

https://github.com/BlackVectorOps/semantic_firewall

3•BlackVectorOps•3w ago

Comments

BlackVectorOps•3w ago

Hello HN,

I built this because I've become paranoid about "safe" refactors in the wake of supply chain attacks like the xz backdoor.

We spend a lot of time reviewing code for syntax, but we lack good tools for verifying that a large refactor (e.g., renaming variables, changing loop styles) preserves the exact business logic. Standard SHA256 hashes break if you change a single whitespace or variable name, which makes them useless for verifying semantic equivalence.

I built Semantic Firewall (sfw) to solve this. It is an open-source tool that fingerprints Go code based on its behavior, not its bytes.

How it works:

1. SSA Conversion: It loads the Go source into Static Single Assignment form using golang.org/x/tools/go/ssa.

2. Canonicalization: It renames registers (v0, v1) deterministically and normalizes control flow graphs. This ensures that `if a { x } else { y }` fingerprints the same even if branches are swapped with inverted conditions.

3. Scalar Evolution (SCEV): This was the hardest part. I implemented an SCEV engine that mathematically solves loop trip counts. This means a `for range` loop and a `for i++` loop that iterate N times produce the exact same fingerprint.

Here is a quick example of what it catches:

  // Implementation A
  func wipe(k []byte) {
      for i := range k { k[i] = 0 }
  }

  // Implementation B (Refactor?)
  func wipe(buf []byte) {
      for i := 0; i < len(buf); i++ { buf[i] = 0 }
  }

These two produce identical hashes. If you change the logic (e.g. `i < len(buf)-1`), the hash diverges immediately.

It’s written in Go and available as a CLI or GitHub Action. I’d love to hear your thoughts on the approach or edge cases I might have missed in the normalization phase.

Repo: https://github.com/BlackVectorOps/semantic_firewall

bradleyjkemp•3w ago

I'd like to see some examples of before/after code samples which have the same hash.

I can see this will be tolerant of simple renames, but seems unlikely this hash will survive any real refactor of code

BlackVectorOps•3w ago

I solve that with Scalar Evolution (SCEV) analysis. The tool doesn't just hash the AST; it solves the loop math.

You are right that AST hashing is brittle. That is why I wrote an engine that virtually executes the induction variables to determine that a `range` loop, a C-style `for` loop, and a raw `goto` loop are all mathematically performing the same operation (Iterate 0 to N).

I just pushed a proof to the repo that runs those three exact scenarios. They produce the identical SHA-256 fingerprint.

It also handles Control Flow Normalization, so `if a > b { return 1 }` fingerprints identically to `if b <= a { return 1 }` (inverted condition + swapped branches).

It won't catch O(n) vs O(log n) algorithm changes, but it catches the "syntactic sugar" refactors that make up 90% of code churn.

You can view the proof code here:

https://github.com/BlackVectorOps/semantic_firewall/blob/mai...

Or run it yourself:

go run examples/proof.go

Show HN: MCP App to play backgammon with your LLM

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: ARM64 Android Dev Kit

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: I Hacked My Family's Meal Planning with an App

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: Slack CLI for Agents

Show HN: I built a free UCP checker – see if AI agents can find your store

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: Compile-Time Vibe Coding

Show HN: Slop News – HN front page now, but it's all slop

Show HN: Daily-updated database of malicious browser extensions

Show HN: Horizons – OSS agent execution engine

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

Show HN: I built a RAG engine to search Singaporean laws

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: Sem – Semantic diffs and patches for Git

Show HN: A password system with no database, no sync, and nothing to breach

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

Show HN: Gohpts tproxy with arp spoofing and sniffing got a new update

Show HN: I built a directory of $1M+ in free credits for startups