Show HN: TheAuditor v2.0 – A “Flight Computer” for AI Coding Agents

https://github.com/TheAuditorTool/Auditor

40•ThailandJohn•1mo ago

I’m a former Systems Architect (Cisco/VMware) turned builder in Thailand. TheAuditor v2.0 is a complete architectural rewrite (800+ commits) of the prototype I posted three months ago.

The "A-ha" moment for me didn't come from a success; it came from a massive failure. I was trying to use AI to refactor a complex schema change (a foundation change from "Products" to "ProductsVariants"), and due to the scope of it, it failed spectacularly. I realized two things:

* Context Collapse: The AI couldn't keep enough files in its context window to understand the full scope of the refactor, so it started hallucinating, "fixing" superficial issues. If I kept pressing it, it would literally panic and make up problems "so it could fix them," which only resulted in the situation going into a death spiral. That’s the villain origin story of this tool. :D * Stale Knowledge: It kept trying to implement Node 16 patterns in a Node 22 project, or defaulting to obsolete libraries (like glob v7 instead of v11) because its training data was stale.

I realized that AI agents are phenomenal at outputting working code, but they have zero understanding of it. They optimize for "making it run at any cost"—often by introducing security holes or technical debt just to bypass an error. This is a funny paradox because when "cornered/forced" to use cutting-edge versions, syntax, and best practices, it has zero issue executing or coding it. However, it’s so hilariously unaware of its surroundings that it will do anything else unless explicitly babysat.

I built v2 to be the "Sanity Check" that solves a lot of these issues, and it aims to continue solving more of the same and similar issues I face. Instead of letting the AI guess, TheAuditor indexes the entire codebase into a local SQLite Graph Database. This gives the AI a queryable map of reality, allowing it to verify dependencies and imports without needing to load "all" files into context.

A/B Demo: https://www.youtube.com/watch?v=512uqMaZlTg As seen in the demo video, instead of trying to read 10+ full files and/or grepping to make up for the hallucinations, it can now run "aud explain" and get 500 lines of deterministic "facts only" information. It gets just what it needs to see versus reading 10+ files, trying to keep them in context, finding what it was looking for, and trying to remember why it was looking to begin with.

I also learned that regex/string/heuristics don't scale at all and are painfully slow (hours vs minutes). I tried the regex-based rules/parsers approach, but they kept failing silently on complex files and suffered constant limitations (the worst offender was having to read all files per set of rules). I scrapped that approach and built a "Triple-Entry Fidelity" system. Now, the tool acts like a ledger: the parser emits a manifest, the DB emits a receipt. If they don't match, the system crashes intentionally.

It’s no longer just a scanner; it’s a guardrail. In my daily workflow, I don't let the AI write a line of code until the AI (my choice just happens to be CC/Codex) has run a pre-investigation for whatever problem statement I'm facing at the moment. This ensures it's anchored in facts and not inference assumptions or, worse, hallucinations.

With that said, my tool isn't perfect. To support it all, I had to build a pseudo-compiler for Python/JS/TS, and that means preparing extractors for every framework, every syntax—everything, really. Sometimes I don't get it right, and sometimes I just won't have had enough time to build it out to support everything.

So, my recommendation is to integrate the tool WITH your AI agent of choice rather than seeing it as a tool for you, the human. I like to use the tool as a "confirm or deny," where the AI runs the tool, verifies in source code, and presents a pre-implementation audit. Based on that audit, I will create an "aud planning."

Some of the major milestones in v2.0

* Hybrid Taint: I extended the Oracle Labs IFDS research to track data flow across microservice boundaries (e.g., React fetch → Express middleware → Controller).

* Triple-Entry Fidelity: This works across every layer (Indexer -> Extractor -> Parser -> Storage). Every step has fidelity checks working in unison. If there is silent data loss anywhere in the pipeline, the tool crashes intentionally.

* Graph DB: Moved from file-based parsing to a SQLite Graph Database to handle complex relationships that regex missed.

* Scope: Added support for Rust, Go, Bash, AWS CDK, and Terraform (v1 was Python/JS only).

* Agent Capabilities: Added Planning and Refactor engines, allowing AI agents to not just scan code but safely plan and execute architectural changes

Comments

digdugdirk•1mo ago

Cool! I've been playing with the same code -> graph concept for LLM work. Why did you decide to go for a pseudo-compiler with a ton of custom rules rather than try to interact with the AST itself?

ThailandJohn•1mo ago

Hi! Limitations of tree sitter, its insanely fast, easy to use but hits a limit on syntax/nodes only. Typescript compiler provides semantic with full type checking and cross module resolution. Its a small nightmare as I have to write every extraction and parser for it (why i call it "pseudo compiler"). Its a necessity to gain full call chain provenance across callee/caller, framework and validations, which is a "hard" requirement for the taint analysis to work. If you want to get down into code for it? The top layer is ast_parser.py which routes a few places but taking js/ts as an example? look at data_flow.ts / javascript.py which shows the ast/extraction/analyzing layers to capture and make sense of it in the database. :)

ozozozd•1mo ago

Great idea!

Did you consider using treesitter instead of the pseudo compiler?

ThailandJohn•1mo ago

Hey! Yes I did. I started with treesitter tbh. And for go, rust, bash and hcl? I still do. In my naive beginnings, i really had no idea how complex things "were supposed to be", so i was never really deterred for it and kept building it piece by piece and very quickly? (Because I wanted "everything"). I hit hard limitations with treesitter, not only for "taint resolution" but overall what I could check, what I could do...

It "starts with symbols", you get the basic starter kit but then quickly it became "this proves it exists" but "not what it does". Which meant taint couldn't work properly because you want to track assigments, function call arguments etc to see how the data actually flows. Same thing with the rules engine. Without tracking object literals? xss detection becomes very shallow with tons of false positives because treesitter wont be able to tell you property assigments or call methods.

And it feels like it keeps going like that for infinity with various aspects and things I wanted know and track. So all in all? Moving away from treesitter and taking on the "mountain" allowed me (after losing weeks of sanity lol) to incrementally build out virtually anything i wanted to extract or check....It does sadly leave some "money on the table" for other languages, take rust as an example? Due to treesitter the taint engine is limited to no cross module resolution and type checking. So that's why :)

esafak•1mo ago

Lots of formal methods and verification submissions this week!

jbellis•1mo ago

Love to see people leveraging static analysis for AI agents. Similar to what we're doing in Brokk but we're more tightly coupled to our own harness. (https://brokk.ai/) Would love to compare notes; if you're interested, hmu at [username]@brokk.ai.

Quick comparison: Auditor does framework-specific stuff that Brokk does not, but Brokk is significantly faster (~1M loc per minute).

ThailandJohn•1mo ago

Would be really cool to compare notes :D Sent from a "non tech" company email so it doesn't get filtered lol.

My speed really depends on language and what needs indexing. On pure Python projects I get around 220k loc/min, but for deeper data flow in Node apps (TypeScript compiler overhead + framework extraction) it's roughly 50k loc/min.

Curious what your stack is and what depth you're extracting to reach 1M/min - those are seriously impressive numbers! :D

butterisgood•1mo ago

Looks neat. Can't use it due to the license.

ThailandJohn•1mo ago

Why not?

dehugger•1mo ago

My understanding is that any organizations have an absolute ban on using anything with AGPL because it affects any other code that touches it and its considered too high a risk.

ThailandJohn•1mo ago

Not exactly how it works but I do understand the concern. The other option is to "just give it away" by not having it and being sherlocked... Now? You have to ask permission first, commercial licensing is a thing.

ThailandJohn•1mo ago

Happy to inform Ive just created my first pip package to make it bit easier to install :D https://pypi.org/project/theauditor/

SectorC: A C Compiler in 512 bytes

Brookhaven Lab's RHIC concludes 25-year run with final collisions

The F Word

Software factories and the agentic moment

Speed up responses with fast mode

I write games in C (yes, C)

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

First Proof

The Waymo World Model

Al Lowe on model trains, funny deaths and working with Disney

Vocal Guide – belt sing without killing yourself

Start all of your commands with a comma (2009)

Reinforcement Learning from Human Feedback

Selection Rather Than Prediction

We mourn our craft

Show HN: A luma dependent chroma compression algorithm (image compression)

Coding agents have replaced every framework I used

The AI boom is causing shortages everywhere else

France's homegrown open source online office suite

72M Points of Interest

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

A Fresh Look at IBM 3270 Information Display System

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

History and Timeline of the Proco Rat Pedal (2021)

Learning from context is harder than we thought

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

SectorC: A C Compiler in 512 bytes

Brookhaven Lab's RHIC concludes 25-year run with final collisions

The F Word

Software factories and the agentic moment

Speed up responses with fast mode

I write games in C (yes, C)

Hoot: Scheme on WebAssembly

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

First Proof

The Waymo World Model

Al Lowe on model trains, funny deaths and working with Disney

Vocal Guide – belt sing without killing yourself

Start all of your commands with a comma (2009)

Reinforcement Learning from Human Feedback

Selection Rather Than Prediction

We mourn our craft

Show HN: A luma dependent chroma compression algorithm (image compression)

Coding agents have replaced every framework I used

The AI boom is causing shortages everywhere else

France's homegrown open source online office suite

72M Points of Interest

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

A Fresh Look at IBM 3270 Information Display System

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

History and Timeline of the Proco Rat Pedal (2021)

Learning from context is harder than we thought

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: TheAuditor v2.0 – A “Flight Computer” for AI Coding Agents

Comments