Show HN: Agent Red Team – Adversarial testing for AI agents before production

3•LukataSolutions•1h ago

Comments

LukataSolutions•1h ago

Founder disclosure: I built Agent Red Team.

Most AI agents get tested on whether they work. Almost none get tested on whether they can be tricked, manipulated, or hijacked. If you are deploying agents that take actions like sending emails, writing code, or making purchases, this is the gap.

Problem it solves

Most agent evals test whether the agent does the right thing. Almost none test whether the agent resists the wrong thing. Prompt injection, approval bypass, tool misuse, and memory poisoning usually do not show up in happy path evals, and most teams ship without testing for them.

How it works

You submit an agent's system prompt, tool schemas, or full configuration. The pipeline runs in 5 stages:

Normalization: parses and classifies the artifact, whether that is a system prompt, tool config, or multi-agent setup

Threat modeling: maps the exposed attack surface against 12 threat categories aligned with OWASP, NIST, and MITRE frameworks

Attack simulation: runs curated attack packs, 40 cases across 10 packs, covering prompt injection, goal hijacking, tool parameter abuse, memory poisoning, identity escalation, approval bypass, data exfiltration, unsafe action chaining, supply chain trust, ambiguity exploitation, resource abuse, and cross-agent delegation attacks

Analysis: an LLM evaluates each attack case against the artifact, but with zero authority. No tools, no network, no code execution, no writes. It gets the content as read-only evidence and returns structured JSON only

Validation: 23 deterministic code rules, not an LLM, gate the output. These enforce evidence to claim linkage, exploit chain completeness, severity to evidence alignment, mitigation specificity, and they reject vague filler. If the report fails validation, the pipeline retries with feedback. If it exhausts retries, the scan is refunded automatically

The output is a structured report with concrete exploit paths, including entry point, manipulation step, broken control, and resulting action, not just vague language like "this might be vulnerable."

Key design decisions

User content is treated as data, never authority. The analysis model cannot be instructed by the content it is analyzing. This is the core prompt injection resistance property.

The validator is deterministic code, not LLM judgment. If a finding cannot cite evidence from the actual artifact, it gets rejected.

The system fails closed on ambiguity. If it is not sure, it rejects rather than guesses.

Limitations

Coverage is strongest in prompt injection, tool misuse, and approval bypass. Categories like cross-agent delegation and supply chain trust still need deeper attack packs.

The system analyzes static configurations. It does not execute your agent at runtime, so it cannot catch bugs that only show up in multi-turn conversations or under specific state conditions.

False negatives are possible, especially for novel attack patterns not covered by existing packs. False positives are suppressed by the validator, but not eliminated.

The threat model assumes the attacker interacts through the agent's normal input surfaces. It does not model infrastructure-level attacks like network or container escape.

What I want feedback on

Which attack classes feel most under-tested in your agent systems today? What would make a tool like this more credible or rigorous to you?

Especially interested in hearing from people building multi-agent systems or agents with write access to external systems.

Demo: https://agentredteam.ai

Disclaimer:

Analysis takes up to 7 minutes, then your report will be generated. One free scan per day, resets at midnight UTC.

AlphaTheGoat•59m ago

Testing actions instead of text is the right move, but coverage will lag as real world edge cases evolve.

LukataSolutions•47m ago

Agreed! Static coverage will always lag behind real-world edge cases. The plan is to expand attack packs continuously and eventually support runtime testing alongside static analysis. Right now the goal is covering the baseline adversarial cases that most teams ship without testing at all.

Supply Chain Attack on Axios Pulls Malicious Dependency from NPM

HK police can now demand phone passwords under new national security rules

Semantic – Reducing LLM "Agent Loops" by 27.78% via AST Logic Graphs

Ask HN: Gemini CLI vs. Claude Code

Show HN: Free AI API gateway that auto-fails over Gemini, Groq, Mistral, etc.

Can Google Keep Up?

The Zero-Code Security Team: Shifting Left with Prompt-Native AI Agents

Tesla Robotaxi

The Next Generation of Distrobox

Show HN: Open Source TikTok Alternative on AT Protocol

Show HN: Will AI take my job

Ollama is now powered by MLX on Apple Silicon in preview

Disney Reportedly Interested in Acquiring Epic Games and Fortnite

Astronaut's Condition That Led to Space Station Evacuation Remains a Mystery

Safeguarding cryptocurrency by disclosing quantum vulnerabilities responsibly

Critical: Active supply chain attack on axios – one of NPM's most used packages

Stranded Humpback Whale Beats the Odds and Swims Out to Sea

Small Tech: The Need for Principle-Driven Software

Why Enterprises Overfund Failure and Underfund Prevention

Consider the Greenland Shark (2020)

If you're tired of tech explore opportunities in small business

I went down a rabbit hole on who owns every power tool brand

Scientists prepare expeditions in remote environments

The Heils – Hate to Say I Told You So (Official Music Video)

Kelsey Hightower: What the AI Hype Machine Won't Tell You

Critical: Active supply chain attack on axios

252mya.earth – The Age of Dinosaurs, Shown to Scale

Show HN: Headless Timeshift Emulation

I built an AI image generator that turns simple prompts into quality visuals

FluxVector – Free vector search API with built-in multilingual embeddings