Show HN: SkillFortify, a formal verification for AI agent skills

https://github.com/varun369/skillfortify

1•varunpratap369•1h ago

Hi HN,

In January 2026, 1,200 malicious skills infiltrated the OpenClaw agent marketplace (ClawHavoc campaign). A month later, researchers catalogued 6,487 malicious agent tools that VirusTotal cannot detect. The first agent-software RCE was assigned CVE-2026-25253.

The response: a dozen heuristic scanning tools (pattern matching, LLM-as-judge, YARA rules). They all carry the same caveat: "no findings does not mean no risk."

SkillFortify takes a different approach. Instead of checking for known bad patterns, it formally verifies what a skill CAN do against what it CLAIMS to do. Five mathematical theorems guarantee soundness -- if SkillFortify says a skill is safe, it provably cannot exceed its declared capabilities.

What it does: - skillfortify scan . -- discover and analyze all skills in a project - skillfortify verify skill.md -- formally verify against capability declaration - skillfortify lock -- generate skill-lock.json for reproducible configs - skillfortify trust skill.md -- compute trust score (provenance + behavior) - skillfortify sbom -- CycloneDX 1.6 Agent Skill Bill of Materials

Supports Claude Code skills, MCP servers, and OpenClaw manifests.

Evaluated on 540 skills (270 malicious, 270 benign): F1=96.95%, zero false positives.

Paper: [ZENODO_DOI_URL] Install: pip install skillfortify Code: https://github.com/varun369/skillfortify

Built as part of the AgentAssert research suite. Happy to answer questions about the formal model, threat landscape, or benchmark methodology.

Comments

varunpratap369•1h ago

Hi, I'm Varun — the author. A bit of context on why I built this.

  I've spent 15 years in enterprise technology as a Solution Architect. When
   our teams
  started adopting AI agents with third-party skills, I realized we had the
  same blind
  trust problem that npm had before npm audit existed — except worse,
  because agent
  skills can execute shell commands, read environment variables, and make
  network
  requests by design.

  After ClawHavoc hit in January, I saw a dozen scanning tools appear in
  weeks. All
  heuristic. All pattern matching. The leading one literally says in their
  docs: "no
  findings does not mean no risk." That bothered me.

  So I asked: can we do better than heuristics? The answer is yes — formal
  analysis
  with soundness guarantees. If the analysis says "no violations," the math
  proves
  the skill cannot exceed its declared capabilities. Not "we checked and
  didn't find
  anything" — "we proved it can't."

  The key insight: I adapted the Dolev-Yao model (1983, originally for
  cryptographic
  protocol verification) to model attackers in the agent skill supply chain.
   Combined
  with abstract interpretation over a capability lattice, SAT-based
  dependency
  resolution, and a trust algebra — you get five provable theorems instead
  of five
  regex patterns.

  Honest about limitations: we miss typosquatting (50% detection — needs
  name similarity
  module) and dependency confusion (0% — needs registry lookup). These are
  v0.2. The
  paper documents every gap.

  Happy to go deep on any of: the formal model, the benchmark methodology,
  why SAT
  for dependencies, or the trust score algebra. Ask away.

varunpratap369•1h ago

Hi, I'm author. A bit of context on why I built this.

  I've spent 15 years in enterprise technology as a Solution Architect. When
  our teams
  started adopting AI agents with third-party skills, I realized we had the
  same blind
  trust problem that npm had before npm audit existed — except worse, because
  agent
  skills can execute shell commands, read environment variables, and make
  network
  requests by design.

  After ClawHavoc hit in January, I saw a dozen scanning tools appear in
  weeks. All
  heuristic. All pattern matching. The leading one literally says in their
  docs: "no
  findings does not mean no risk." That bothered me.

  So I asked: can we do better than heuristics? The answer is yes — formal
  analysis
  with soundness guarantees. If the analysis says "no violations," the math
  proves
  the skill cannot exceed its declared capabilities. Not "we checked and
  didn't find
  anything" — "we proved it can't."

  The key insight: I adapted the Dolev-Yao model (1983, originally for
  cryptographic
  protocol verification) to model attackers in the agent skill supply chain.
  Combined
  with abstract interpretation over a capability lattice, SAT-based dependency

  resolution, and a trust algebra — you get five provable theorems instead of
  five
  regex patterns.

  Honest about limitations: we miss typosquatting (50% detection — needs name
  similarity
  module) and dependency confusion (0% — needs registry lookup). These are
  v0.2. The
  paper documents every gap.

  Happy to go deep on any of: the formal model, the benchmark methodology, why
   SAT
  for dependencies, or the trust score algebra. Ask away.

Show HN: Turning 2D floor plans into 3D-ready JSON with Detectron2

Explain to Issue Reporter

Brave Search API now features Place Search, a new endpoint for map applications

Launch HN: Cardboard (YC W26) – Agentic video editor

We Built a Video Rendering Engine by Lying to the Browser About What Time It Is

OsmAnd's Faster Offline Navigation

AirSnitch: Demystifying and Breaking Client Isolation in Wi-Fi Networks

People Leaving US

My accepted research work on 'Failure-Aware Security Framework'

Bring Your Own Agent (BYOA)

Attacking Russia's Center of Gravity: A Clausewitzian Answer

Human Made: The Pledge

GitHub Actions is left vulnerable to supply chain attacks: Datadog Report

How Google Killed the Rent-a-Domain Era

Show HN: Karta – Google Search, for discovering talent

Smallest transformer that can add two 10-digit numbers

A Visual Guide to DNA Sequencing

He saw an abandoned trailer. Then, uncovered a surveillance network

Show HN: I built a local AI-powered Ouija board with a fine-tuned 3B model

Using AI without losing skills

Hyper: a reactive server side rendered web framework for Clojure

Trump, seeking executive power over elections, is urged to declare emergency

TikTok, X link organiser for iOS and Android

Towards a Sovereign Mobile Stack

Show HN: Protection Against Zero-Day Cyber Attacks

Anthropic is giving Claude Opus 3 its own Substack

4Chan knew about Jeffrey Epstein's death 38 minutes before the rest of the world

Ask HN: How are you handling EU AI Act compliance as a developer?

Microsoft announces new "mini PCs" for Windows 365

Stellify – Structured code for AI-assisted development