Show HN: Open-source scanner finds 97% of AI agent code non-compliant EU AI Act

1•airblackbox•2h ago

I built AIR Blackbox, an open-source static analysis tool that scans Python AI agent code against 6 technical requirements from the EU AI Act (Articles 9, 10, 11, 12, 14, 15). Think of it as a linter for AI governance. To stress-test the scanner — and to see where the industry actually stands — I ran it against 5,754 Python files across 11 major open-source projects. Combined GitHub stars: 341,000+. Projects scanned: AutoGPT (170K stars), Microsoft AutoGen (38K), LlamaIndex (37K), Mem0 (24K), Phidata (18K), LiteLLM (15K), GPT-Researcher (14K), Embedchain (9.2K), LangGraph (8.5K), OpenAI Agents SDK (5.2K), CrewAI Examples (2.8K). Results:

Average compliance score: 2.2 out of 6 articles 97% of files fail Article 9 (Risk Management) 89% fail Article 12 (Record-Keeping) 84% fail Article 14 (Human Oversight) Only 23 out of 5,754 files (0.4%) pass all 6 checks Best scoring repo: AutoGPT at 2.9/6. Worst: CrewAI examples at 1.4/6

What the scanner checks (per article):

Art. 9: risk classification, access control, risk audit Art. 10: input validation, PII handling, data schemas, provenance Art. 11: logging, documentation, type hints Art. 12: structured logging, audit trail, timestamps, log integrity Art. 14: human review, override mechanism, notifications Art. 15: input sanitization, error handling, testing, rate limiting

An article "passes" if at least 1 sub-check is detected. This is generous — real compliance requires substantially more. Caveats I'll save you the trouble of pointing out:

This is static analysis. It can't verify runtime behavior. File-level scanning misses cross-file compliance patterns. The pass threshold is intentionally lenient (1-of-N sub-checks). This checks technical requirements, not legal compliance. It's a linter, not a lawyer.

The EU AI Act enforcement deadline is August 2026. The full report, raw data (JSON), and the scanning scripts are all in the repo.

GitHub: https://github.com/air-blackbox/air-blackbox-mcp Full report: https://github.com/air-blackbox/air-blackbox-mcp/blob/main/b... Install: pip install air-blackbox-mcp Demo: https://huggingface.co/spaces/airblackbox/air-blackbox-scann...

Happy to answer questions about the methodology, the scanner internals, or what we're building next (fine-tuned local LLM for deeper analysis — your code never leaves your machine).

Comments

airblackbox•2h ago

Some context on why I built this: I kept seeing the same pattern — teams shipping AI agents into production with zero compliance infrastructure. Not because they don't care, but because there's no tooling that makes it easy. The EU AI Act maps to 6 specific technical areas. Most of them come down to things developers already know how to do — structured logging, input validation, error handling, access control. The problem is nobody's connecting those practices to the regulatory requirements. A few things I learned from this scan:

Article 11 (Technical Documentation) is the easy win. 98% of files pass because Python developers already write docstrings and type hints. The rest of the articles require intentional infrastructure that almost nobody adds. The gap isn't capability, it's awareness. LiteLLM's auth module scored 6/6 — it already has access control, structured logging, timestamps, error handling. It wasn't built for EU AI Act compliance. It just happens to have good engineering practices. Most agent code doesn't. "Example" and "quickstart" code sets the pattern. When OpenAI and CrewAI ship examples with zero compliance patterns, every project built from those examples inherits the gap. The ecosystem needs compliance baked into the templates, not bolted on after.

What I'm working on next: a fine-tuned Llama 3.2 1B model that runs locally and does deeper semantic analysis beyond regex pattern matching. The goal is "your code never leaves your machine" — because if you're worried about compliance, shipping your source code to a cloud API defeats the purpose. The scanner, the benchmark data, and the full 5,754-file report are all Apache 2.0. Rip it apart, tell me what's wrong, submit PRs.

Can the Most Abstract Math Make the World a Better Place?

First Look at Glaze: A New Product by Raycast [video]

I modeled traffic-weighted SLOs as probability chains in PromQL

Federal Reserve Bank of Kansas City Approves Limited Account for Kraken

There are now 10M live price points from AWS, Azure and GCP

RNA is key to the dark matter of the genome − scientists are sequencing it

Connecting Volunteers with Volunteer Opportunities

Find quantum-vulnerable crypto in your code before 2030 hits

Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis

A Rational Analysis of the Effects of Sycophantic AI

A new lawsuit claims Gemini assisted in suicide

Show HN: All the LM solutions on SWE-bench are bloated compared to humans

A "Supergiant" Gold Find in China Could Redraw the Biggest-Mine Map

Discovery of the most compact known 3+1 type quadruple star system

Electron microscopy reveals micro defects in next-gen semiconductors

Rijksmuseum researchers discover new painting by Rembrandt van Rijn

Show HN: Skill Eval – A framework for testing the quality of AI agent skills

Websites That Work Well on Basic Web Browsers

Lilaq: Advanced Data Visualization in Typst

Wezzly – An AI with Eyes That Sees Your Screen continuously in real time

Show HN: Engram update – 92% DMR, hosted API, lessons shipping agent memory

People are selling your home address online. This privacy tool will help

AI music and video is practically indistinguishable from physical content

Show HN: SLOK – SLO composition with traffic-weighted service chains in K8s

Traces vs. Logs for Debugging Distributed Systems

Replit vs. Amp

Subverting AI Agent Logging with a Git Post-Commit Hook

I Put a Full JVM Inside a Browser Tab

Architect Linter Pro v6.0 – CFG-Based Architecture Linting for Web Teams

Show HN: I packaged decade of video infra battle scars into tools for AI agents