Average compliance score: 2.2 out of 6 articles 97% of files fail Article 9 (Risk Management) 89% fail Article 12 (Record-Keeping) 84% fail Article 14 (Human Oversight) Only 23 out of 5,754 files (0.4%) pass all 6 checks Best scoring repo: AutoGPT at 2.9/6. Worst: CrewAI examples at 1.4/6
What the scanner checks (per article):
Art. 9: risk classification, access control, risk audit Art. 10: input validation, PII handling, data schemas, provenance Art. 11: logging, documentation, type hints Art. 12: structured logging, audit trail, timestamps, log integrity Art. 14: human review, override mechanism, notifications Art. 15: input sanitization, error handling, testing, rate limiting
An article "passes" if at least 1 sub-check is detected. This is generous — real compliance requires substantially more. Caveats I'll save you the trouble of pointing out:
This is static analysis. It can't verify runtime behavior. File-level scanning misses cross-file compliance patterns. The pass threshold is intentionally lenient (1-of-N sub-checks). This checks technical requirements, not legal compliance. It's a linter, not a lawyer.
The EU AI Act enforcement deadline is August 2026. The full report, raw data (JSON), and the scanning scripts are all in the repo.
GitHub: https://github.com/air-blackbox/air-blackbox-mcp Full report: https://github.com/air-blackbox/air-blackbox-mcp/blob/main/b... Install: pip install air-blackbox-mcp Demo: https://huggingface.co/spaces/airblackbox/air-blackbox-scann...
Happy to answer questions about the methodology, the scanner internals, or what we're building next (fine-tuned local LLM for deeper analysis — your code never leaves your machine).
airblackbox•2h ago
Article 11 (Technical Documentation) is the easy win. 98% of files pass because Python developers already write docstrings and type hints. The rest of the articles require intentional infrastructure that almost nobody adds. The gap isn't capability, it's awareness. LiteLLM's auth module scored 6/6 — it already has access control, structured logging, timestamps, error handling. It wasn't built for EU AI Act compliance. It just happens to have good engineering practices. Most agent code doesn't. "Example" and "quickstart" code sets the pattern. When OpenAI and CrewAI ship examples with zero compliance patterns, every project built from those examples inherits the gap. The ecosystem needs compliance baked into the templates, not bolted on after.
What I'm working on next: a fine-tuned Llama 3.2 1B model that runs locally and does deeper semantic analysis beyond regex pattern matching. The goal is "your code never leaves your machine" — because if you're worried about compliance, shipping your source code to a cloud API defeats the purpose. The scanner, the benchmark data, and the full 5,754-file report are all Apache 2.0. Rip it apart, tell me what's wrong, submit PRs.