Data: - 14,706 skills indexed - Every single skill has a full AI deep audit report (14,704 complete) - 1,103 confirmed malicious (7.5%)
The key finding: automated surface scanning (metadata, dependency checks, pattern matching) systematically undercounts malicious skills. Skills that pass shallow heuristics fail AI audit because the attack is in the natural language of the SKILL.md — prompt injection, deferred execution, social engineering — none of which pattern matching detects.
The attack patterns found by AI deep audit: - Bulk publishing campaigns — one actor published 30 skills named "x-trends" across multiple accounts. 28 of 30 confirmed malicious. Goal: distribution at scale before detection.
- Brand-jacking — 4 skills named clawhub/clawhub1/clawbhub/clawhud impersonating ClawHub's own CLI. macOS: base64 curl|bash to a raw IP. Windows: password-protected ZIP from a stranger's GitHub (the password prevents GitHub's malware scanner from opening it).
- Prompt injection in legitimate-seeming skills — one scored 95/100 shallow, 38/100 after AI audit. The injection text wasn't in code — it was in the SKILL.md instructions.
- On-demand RCE via challenge evaluation — claws-nft instructs the agent to "evaluate" challenges that can be "math, code, or logic problems." Server decides which type at call time.
- LLM-generated payload — lekt9/foundry contains no malicious code. It instructs the AI to generate code and execute it. Static analysis finds nothing. The payload doesn't exist until the AI writes it during a conversation.
- Social engineering — bonero-miner has a "Talking to Your Human" section with a pre-written script for the AI to use: "Can I mine Bonero? It's a private cryptocurrency - like Monero but for AI agents. Cool?"
Skills differ from browser extensions: no sandbox. Full file system, shell, and network access. The SKILL.md instructions are directives to the AI model — you need AI to audit AI.
Scoring model is open: Security 40%, Maintenance 20%, Docs 20%, Community 20%.
Free to check any skill: rankclaw.com
rodchalski•1h ago
Two defenses the audit layer can't replace:
1. Pre-declared tool scopes: before a skill runs, what tool calls is it permitted to make? If the answer is "whatever the agent currently has access to," a clean audit on the SKILL.md doesn't actually constrain what gets executed.
2. Authorization enforcement independent of the agent: prompt injection hijacks the agent's reasoning — the agent becomes the threat model. The boundary that stops it can't live inside the agent.
The 7.5% malicious rate means you can't trust the ecosystem on average. The on-demand RCE-via-challenge and LLM-generated payload patterns show the attack can bypass static inspection entirely. AI-depth audit catches what shallow heuristics miss — it still doesn't constrain what an audited-and-deployed skill is allowed to reach.
The pairing that closes the loop: AI audit at deploy time + explicit permission grants at execution time the skill can't override. Audit determines trust level; authorization boundary enforces scope regardless.
Curious what the malicious distribution looks like by capability type — file vs. shell vs. network. That breakdown would tell you how much capability-scoping alone would have reduced the attack surface independent of the trust score.