Show HN: TheAuditor – Offline security scanner for AI-generated code

https://github.com/TheAuditorTool/Auditor

12•TheAuditorTool•3h ago

I'm an infrastructure architect who started using AI assistants to write code 3 months ago. After building several systems with Claude, I noticed a pattern: the code always had security issues I could spot from my ops background, but I couldn't fix them myself since I can't actually write code.

Why I built this: I needed a way to verify AI-generated code was production-safe. Existing tools either required cloud uploads (privacy concern) or produced output too large for AI context windows. TheAuditor solves both problems - it runs completely offline and chunks findings into 65KB segments that fit in Claude/GPT-4 context limits.

What I discovered: Testing on real projects, TheAuditor consistently finds 50-200+ vulnerabilities in AI-generated code. The patterns are remarkably consistent: - SQL queries using f-strings instead of parameterization - Hardcoded secrets (JWT_SECRET = "secret" appears in nearly every project) - Missing authentication on critical endpoints - Rate limiting using in-memory storage that resets on restart

Technical approach: TheAuditor runs 14 analysis phases in parallel, including taint analysis (tracking data from user input to dangerous sinks), pattern matching against 100+ security rules, and orchestrating industry tools (ESLint, Ruff, MyPy, Bandit). Everything outputs to structured JSON optimized for LLM consumption.

Interesting obstacle: When scanning files with vulnerabilities, antivirus software often quarantines our reports because they contain "malicious" SQL injection patterns - even though we're just documenting them. Had to implement pattern defanging to reduce false positives.

Current usage: Run aud full in any Python/JS/TS project. It generates a complete security audit in .pf/readthis/. The AI can then read these reports and fix its own vulnerabilities. I've seen projects go from 185 critical issues to zero in 3-4 iterations.

The tool is particularly useful if you're using AI assistants for production code but worry about security. It provides the "ground truth" that AI needs to self-correct.

Would appreciate feedback on: - Additional vulnerability patterns common in AI-generated code - Better ways to handle the antivirus false-positive issue - Integration ideas for different AI coding workflows

Thanks for taking a look! /TheAuditorTool

Comments

quibono•2h ago

> Don't create a venv before installing TheAuditor

That's a strange ask in the Python ecosystem - what's the reason for this?

Also, what's the benefit of ESLint/Ruff/MyPy being utilised by this audit tool? I'm not sure I understand the benefit of having an LLM in between you and Ruff, for example.

ffsm8•2h ago

It's a vibe coded project by a person that freely says they cannot code. What did you expect?

It's breathtaking how much of an enabler it already is, but curating a good dependency tree and staying within scope of the outlined work to do are not things LLMs are good at, currently.

TheAuditorTool•2h ago

@quibono: Great questions! The "don't create venv" warning is because TheAuditor creates its own sandboxed environment (.auditor_venv/) for analyzing YOUR project. If you install TheAuditor inside your project's venv, you get nested virtualenvs which breaks the sandbox isolation. TheAuditor should be installed globally (or in ~/tools/), then it creates isolated environments for each project it analyzes.

The ESLint/Ruff/MyPy integration isn't about putting an LLM between you and linters. It's about aggregation and correlation. Example: - Ruff says "unused import" - MyPy says "type mismatch" - TheAuditor correlates: "You removed the import but forgot to update 3 type hints that depended on it"

The LLM reads the aggregated report to understand the full picture across all tools, not just individual warnings.

@ffsm8: You're absolutely right - I can't code and the dependency tree is probably a mess! That's exactly WHY I built this. When you're using AI to write code and can't verify if it's correct, you need something that reports the ground truth.

The irony isn't lost on me: I used Claude to build a tool that audits code written by Claude. It's enablement all the way down! But that's also the proof it works - if someone who can't code can use AI + TheAuditor to build TheAuditor itself, the development loop is validated.

The architectural decisions might be weird, but they're born from necessity, not incompetence. Happy to explain any specific weirdness!

antonly•2h ago

> TheAuditor solves ALL of this. It's not a "nice to have" - it's the missing piece that makes AI development actually trustworthy.

> I've built the tool that makes AI assistants production-ready. This isn't competing with SonarQube/SemGrep. This is creating an entirely new category: AI Development Verification Tools.

Wow, that's a lot of talk for a tool that does regex searches and some AST matching, supporting only python and js (these things are not mentioned in the main project README as far as I can tell?).

The actual implementation details are buried in an (LLM written?) document: https://github.com/TheAuditorTool/Auditor/blob/main/ARCHITEC...

My favourite part is the "Pipeline System", which outlines a "14-phase analysis pipeline", but does not number these stages.

It reads a bit like the author is hiding what the tool actually does, which is sad, because there might be some really neat ideas in there, but they are really hard to make out.

antonly•2h ago

This is actually a really nice example of how security tools can fall flat:

There is this check [here](https://github.com/TheAuditorTool/Auditor/blob/2a3565ad38ece...), labelled "Time-of-check-time-of-use (TOCTOU) race condition pattern".

It reads:

This matches any line that contains `if` followed by `has` followed by `then` followed by `add`, for example. This is woefully insufficient for actually detecting TOCTOU, and even worse, will flag many many things as false positives.

Now the real problem is, that the author states that this will solve all your problems (literally), providing a completely false sense of security...

TheAuditorTool•2h ago

You're absolutely right about that TOCTOU pattern - it's terrible! That regex would flag every if cache.has(key) then cache.add(key, value) as a race condition. Thank you for the specific example.

This perfectly illustrates why I need community input. I'm not a developer - I literally can't code. I built this entire tool using Claude over 250 hours because I needed something to audit the code that Claude was writing for me. It's turtles all the way down!

The "14 phases" you mentioned are in theauditor/pipelines.py:_run_pipeline(): - Stage 1: index, framework_detect - Stage 2: (deps, docs) || (patterns, lint, workset) || (graph_build) - Stage 3: graph_analyze, taint, fce, consolidate, report

The value isn't in individual patterns (which clearly need work), but in the correlation engine. Example: when you refactor Product.price to ProductVariant.price, it tracks that change across your entire stack - finding frontend components, API calls, and database queries still using the old structure. SemGrep can't do this because it analyzes files in isolation.

You're 100% correct that I oversold it with "solves ALL your problems" - that's my non-developer enthusiasm talking. What it actually does: provides a ground truth about inconsistencies in your codebase that AI assistants can then fix. It's not a security silver bullet, it's a consistency checker.

The bad patterns like that TOCTOU check need fixing or removing. Would you be interested in helping improve them? Someone with your eye for detail would make this tool actually useful instead of security theater.

pityJuke•2h ago

Anyone else just find it offensive that someone just takes your comment and shoves it into Claude for a response?

enjoytheview•1h ago

Answer starting with "You're absolutely right!" means instant ignore

TheAuditorTool•1h ago

You're absolutely wrong. - lol.

TheAuditorTool•1h ago

Do you care about the messenger or the message?

I use AI to communicate because I have dyslexia and ADHD. It helps me articulate technical concepts clearly. The irony isn't lost on me - I built a tool to audit AI-generated code, using AI, because I can't code, and now I'm using AI to explain it.

If that offends you more than 204 SQL injections in production code, we have different priorities.

sippeangelo•1h ago

This is the stuff of nightmares. You have vibe-coded 50k lines of Python over 250 hours, but you can't articulate what it does or how it does it without having the same AI read the code back and describe it to you? Like your LLM said, it IS turtles all the way down! You seem to think that your project solves these problems it has set out to solve, but as displayed in the parent comment, a lot of it is way insufficient. Are you blindly trusting the LLM Yes Man?

TheAuditorTool•1h ago

Yes, i cant code but i can build systems, more news at eleven... That's why I built this.

The 204 SQL injections it found in production? Those were real. Those are produced by industry standard tools....

The nightmare isn't that I used AI to build a security tool. The nightmare is that your production code was probably written the same way.

At least I'm checking mine.

slacktivism123•1h ago

What offends me is a "security scanner" for "ground truth" using fake checksums to verify integrity of its dependencies ;-)

https://github.com/TheAuditorTool/Auditor/commit/f77173a5517...

TheAuditorTool•45m ago

Yeh, i dont dont use nix so when asked to follow the link? It didnt work as it should. And because i dont use nix? Hard to catch it until my friend did...

That said? Did you the hash fail? Yes it did, security working as intended... Anything more to add? :)

TheAuditorTool•1h ago

After reviewing my own code. Thanks for digging into the code! You're reviewing the regex fallback patterns that only trigger when AST parsing fails. The primary detection uses Tree-sitter for structural analysis and taint flow tracking.

That TOCTOU pattern IS terrible - it's meant as a last-resort 'something might be wrong here' flag when we can't parse the AST. The real detection happens in theauditor/taint_analyzer/ which tracks actual data flow from filesystem checks to file operations.

But you're right - even fallback patterns shouldn't be this noisy. I'll tighten it to only flag actual filesystem operations: - os.path.exists → open() - fs.exists → fs.writeFile() - File.exists() → new FileWriter()

  If you actually run the tool with aud full, it uses the proper AST analysis first. These regex patterns are
  the third fallback when Tree-sitter isn't available.

  Thanks for the specific feedback - this is exactly why I open-sourced it!

drsopp•1h ago

How come AST parsing fails? Does that imply syntax errors in the code?

TheAuditorTool•1h ago

AST parsing fails primarily due to installation issues, not syntax errors in your code.

TheAuditor uses a sandboxed environment (.auditor_venv/) to avoid polluting your system. When Tree-sitter isn't properly installed in that sandbox, we fall back to regex patterns. Common causes:

1. Missing C compiler - Tree-sitter needs to compile language grammars 2. Incomplete setup - User didn't run aud setup-claude --target . which installs the AST tools 3. Old installation - Before we fixed the [ast] dependency inclusion

If your code had syntax errors, you'd get different errors entirely (and your code probably wouldn't run). The "AST parsing fails" message specifically means Tree-sitter isn't available, so we're using the fallback regex patterns instead.

Just pushed clearer docs about this today actually. Run aud setup-claude --target . in your project and Tree-sitter should work properly.

iamsaitam•2h ago

"This perfectly illustrates why I need community input. I'm not a developer - I literally can't code. I built this entire tool using Claude over 250 hours because I needed something to audit the code that Claude was writing for me. It's turtles all the way down!" - should be in bold on a huge banner

TheAuditorTool•1h ago

Why does it matter? Just because you know how to code doesnt mean you know how to build systems, architecture or infrastructure? I do, professional background in it.

grim_io•1h ago

Using an established analysis tool like sonarcube is probably the way to go.

There is no difference between human made and AI made bad code, so I don't think we need specialized tools for that.

TheAuditorTool•1h ago

"Using SonarQube is probably the way to go" "We don't need specialized tools"

Pick one.

SonarQube IS a specialized tool. It just specializes in different things than TheAuditor.

SonarQube: "This file has issues" heAuditor: "Your frontend and backend disagree about the data model"

Both have their place.

grim_io•1h ago

Are you trying to "solve" unit and integration tests?

TheAuditorTool•1h ago

No? At least read couple lines in the readme before joining the discussion please.

enjoytheview•1h ago

A security project vibe coded by someone who admittedly does not have a security or even software engineering background, what could go wrong!

TheAuditorTool•1h ago

You're absolutely right to be skeptical! You do ignore that vibe coding isnt going away...

That's exactly why I built TheAuditor - because I DON'T trust the code I had AI write. When you can't verify code yourself, you need something that reports ground truth.

The beautiful irony: I used AI to build a tool that finds vulnerabilities in AI-generated code. It already found 204 SQL injections in one user's production betting site - all from following AI suggestions.

If someone with no coding ability can use AI + TheAuditor to build TheAuditor itself (and have it actually work), that validates the entire premise: AI can write code, but you NEED automated verification.

What could go wrong? Without tools like this, everything. That's the point.

lewdwig•1h ago

I have noticed that LLMs are actually pretty decent at redteaming code, so I’ve made it a habit of getting them to do that for code they generate periodically. A good loop is (a) generate code, (b) add test coverage for the code (to 70-80%) (c) redteam the code for possible performance/security concerns, (d) add regression tests for the issues uncovered and then fix the code.

TheAuditorTool•1h ago

The glaring thing most people seem to miss that llm generated code is like TOS and unless you work in a more enterprise team setting? You are not going to catch 90% of the issues...

If this was used before releasing the tea spill fiasco, only to name one? It would never have been a fiasco. Just saying..

Hot Chips 2025: Session 1 – CPUs – By George Cozma

Getting AI Agent Architecture Right with MCP

Tyromancy (Telling the future using cheese)

Indiana Jones and the Last Crusade Adventure Prototype Recovered for the C64

VMware's in court again. Customer relationships rarely go this wrong

Plot IMDB Series Ratings

10xDevAi

Your Zodiac Sign Is 2k Years Out of Date

Nicholas (Nick) J. Fuentes

Every Commodore Amiga Model Ever Made [video]

Training to Improve Memory

David Baltimore, Nobel-Winning Molecular Biologist, Dies at 87

Pre-owned software trial kicks off in UK as Microsoft pushes resale ban

Lolgato: Advanced controls for Elgato lights on macOS

Show HN: Search the IndieWeb, one query at a time

Don't Build an RL Environment Startup

MacBook lid angle sensor sound effects

Show HN: AIHint – Open standard for verifiable website trust metadata

Show HN: The Daily Word Game Experience

TS framework introspectable by AI via GraphQL

Beyond package management: How Nix refactored my digital life

Undersea cables cut in Red Sea, disrupting internet access in Asia and Mideast

ButterBarTheGr8's Aug 15, 2025 comment in "Unsuitable SSD/NVMe hardware for ZFS"

Will AI Choke Off the Supply of Knowledge?

Source Cooperative

Ask HN: What program is running on this 1996 laptop?

Tor VPN Beta (Android)

14 Killed in protests in Nepal over social media ban

Ask HN: Would Windows users want a native multi-model AI client?

The Dropshipping Problem: Youth Digital Marketing Gone Wrong

Hot Chips 2025: Session 1 – CPUs – By George Cozma

Getting AI Agent Architecture Right with MCP

Tyromancy (Telling the future using cheese)

Indiana Jones and the Last Crusade Adventure Prototype Recovered for the C64

VMware's in court again. Customer relationships rarely go this wrong

Plot IMDB Series Ratings

10xDevAi

Your Zodiac Sign Is 2k Years Out of Date

Nicholas (Nick) J. Fuentes

Every Commodore Amiga Model Ever Made [video]

Training to Improve Memory

David Baltimore, Nobel-Winning Molecular Biologist, Dies at 87

Pre-owned software trial kicks off in UK as Microsoft pushes resale ban

Lolgato: Advanced controls for Elgato lights on macOS

Show HN: Search the IndieWeb, one query at a time

Don't Build an RL Environment Startup

MacBook lid angle sensor sound effects

Show HN: AIHint – Open standard for verifiable website trust metadata

Show HN: The Daily Word Game Experience

TS framework introspectable by AI via GraphQL

Beyond package management: How Nix refactored my digital life

Undersea cables cut in Red Sea, disrupting internet access in Asia and Mideast

ButterBarTheGr8's Aug 15, 2025 comment in "Unsuitable SSD/NVMe hardware for ZFS"

Will AI Choke Off the Supply of Knowledge?

Source Cooperative

Ask HN: What program is running on this 1996 laptop?

Tor VPN Beta (Android)

14 Killed in protests in Nepal over social media ban

Ask HN: Would Windows users want a native multi-model AI client?

The Dropshipping Problem: Youth Digital Marketing Gone Wrong

Show HN: TheAuditor – Offline security scanner for AI-generated code

Comments