frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Agent Red Team – Adversarial testing for AI agents before production

https://agentredteam.ai
3•LukataSolutions•1h ago

Comments

LukataSolutions•1h ago
Founder disclosure: I built Agent Red Team.

Most AI agents get tested on whether they work. Almost none get tested on whether they can be tricked, manipulated, or hijacked. If you are deploying agents that take actions like sending emails, writing code, or making purchases, this is the gap.

Problem it solves

Most agent evals test whether the agent does the right thing. Almost none test whether the agent resists the wrong thing. Prompt injection, approval bypass, tool misuse, and memory poisoning usually do not show up in happy path evals, and most teams ship without testing for them.

How it works

You submit an agent's system prompt, tool schemas, or full configuration. The pipeline runs in 5 stages:

Normalization: parses and classifies the artifact, whether that is a system prompt, tool config, or multi-agent setup

Threat modeling: maps the exposed attack surface against 12 threat categories aligned with OWASP, NIST, and MITRE frameworks

Attack simulation: runs curated attack packs, 40 cases across 10 packs, covering prompt injection, goal hijacking, tool parameter abuse, memory poisoning, identity escalation, approval bypass, data exfiltration, unsafe action chaining, supply chain trust, ambiguity exploitation, resource abuse, and cross-agent delegation attacks

Analysis: an LLM evaluates each attack case against the artifact, but with zero authority. No tools, no network, no code execution, no writes. It gets the content as read-only evidence and returns structured JSON only

Validation: 23 deterministic code rules, not an LLM, gate the output. These enforce evidence to claim linkage, exploit chain completeness, severity to evidence alignment, mitigation specificity, and they reject vague filler. If the report fails validation, the pipeline retries with feedback. If it exhausts retries, the scan is refunded automatically

The output is a structured report with concrete exploit paths, including entry point, manipulation step, broken control, and resulting action, not just vague language like "this might be vulnerable."

Key design decisions

User content is treated as data, never authority. The analysis model cannot be instructed by the content it is analyzing. This is the core prompt injection resistance property.

The validator is deterministic code, not LLM judgment. If a finding cannot cite evidence from the actual artifact, it gets rejected.

The system fails closed on ambiguity. If it is not sure, it rejects rather than guesses.

Limitations

Coverage is strongest in prompt injection, tool misuse, and approval bypass. Categories like cross-agent delegation and supply chain trust still need deeper attack packs.

The system analyzes static configurations. It does not execute your agent at runtime, so it cannot catch bugs that only show up in multi-turn conversations or under specific state conditions.

False negatives are possible, especially for novel attack patterns not covered by existing packs. False positives are suppressed by the validator, but not eliminated.

The threat model assumes the attacker interacts through the agent's normal input surfaces. It does not model infrastructure-level attacks like network or container escape.

What I want feedback on

Which attack classes feel most under-tested in your agent systems today? What would make a tool like this more credible or rigorous to you?

Especially interested in hearing from people building multi-agent systems or agents with write access to external systems.

Demo: https://agentredteam.ai

Disclaimer:

Analysis takes up to 7 minutes, then your report will be generated. One free scan per day, resets at midnight UTC.

AlphaTheGoat•59m ago
Testing actions instead of text is the right move, but coverage will lag as real world edge cases evolve.
LukataSolutions•47m ago
Agreed! Static coverage will always lag behind real-world edge cases. The plan is to expand attack packs continuously and eventually support runtime testing alongside static analysis. Right now the goal is covering the baseline adversarial cases that most teams ship without testing at all.

Supply Chain Attack on Axios Pulls Malicious Dependency from NPM

https://socket.dev/blog/axios-npm-package-compromised
1•dsr12•25s ago•0 comments

HK police can now demand phone passwords under new national security rules

https://www.bbc.com/news/articles/ce8j9yj52lro
2•jen729w•45s ago•0 comments

Semantic – Reducing LLM "Agent Loops" by 27.78% via AST Logic Graphs

https://github.com/concensure/Semantic
1•concensure•6m ago•0 comments

Ask HN: Gemini CLI vs. Claude Code

1•elC0mpa•6m ago•0 comments

Show HN: Free AI API gateway that auto-fails over Gemini, Groq, Mistral, etc.

https://github.com/msmarkgu/RelayFreeLLM
1•markfront•7m ago•0 comments

Can Google Keep Up?

https://rosquillas.info/@rosquillas/posts/4879acf3-9670-4b09-9162-84c034dbdcf6
1•rosquillas•8m ago•2 comments

The Zero-Code Security Team: Shifting Left with Prompt-Native AI Agents

https://www.godaddy.com/resources/news/the-zero-code-security-team-shifting-left-with-prompt-nati...
1•nitishagar•10m ago•0 comments

Tesla Robotaxi

https://apps.apple.com/us/app/tesla-robotaxi/id6744257048
1•Austin_Conlon•11m ago•1 comments

The Next Generation of Distrobox

https://distrobox.it/posts/announcing_distrobox_next/
1•jcastro•11m ago•0 comments

Show HN: Open Source TikTok Alternative on AT Protocol

https://sprk.so
1•knotbin•12m ago•0 comments

Show HN: Will AI take my job

https://aijobsreport.org/quiz
1•galuggus•12m ago•0 comments

Ollama is now powered by MLX on Apple Silicon in preview

https://ollama.com/blog/mlx
3•redundantly•15m ago•0 comments

Disney Reportedly Interested in Acquiring Epic Games and Fortnite

https://tech4gamers.com/disney-interested-epic-games-fortnite/
1•doppp•23m ago•0 comments

Astronaut's Condition That Led to Space Station Evacuation Remains a Mystery

https://www.nytimes.com/2026/03/28/science/nasa-astronaut-medical-evaluation.html
2•Tomte•26m ago•0 comments

Safeguarding cryptocurrency by disclosing quantum vulnerabilities responsibly

https://research.google/blog/safeguarding-cryptocurrency-by-disclosing-quantum-vulnerabilities-re...
2•madars•28m ago•0 comments

Critical: Active supply chain attack on axios – one of NPM's most used packages

https://twitter.com/i/status/2038807290422370479
1•thunderbong•28m ago•0 comments

Stranded Humpback Whale Beats the Odds and Swims Out to Sea

https://www.nytimes.com/2026/03/27/world/europe/stranded-whale-germany-baltic-freed.html
1•gmays•30m ago•0 comments

Small Tech: The Need for Principle-Driven Software

https://cdox.ca/small-tech
3•devonnull•31m ago•0 comments

Why Enterprises Overfund Failure and Underfund Prevention

https://techaccelerationandresilience.com/blog-posts/why-enterprises-overfund-failure-and-underfu...
1•gpi•33m ago•0 comments

Consider the Greenland Shark (2020)

https://www.lrb.co.uk/the-paper/v42/n09/katherine-rundell/consider-the-greenland-shark
1•mitchbob•35m ago•1 comments

If you're tired of tech explore opportunities in small business

https://startinstates.com/
2•GKakhiani•35m ago•0 comments

I went down a rabbit hole on who owns every power tool brand

https://old.reddit.com/r/Tools/comments/1s4jjs8/i_went_down_a_rabbit_hole_on_who_owns_every_power/
2•thunderbong•45m ago•0 comments

Scientists prepare expeditions in remote environments

https://actu.epfl.ch/news/how-scientists-prepare-expeditions-in-remote-envir/
1•defrost•54m ago•0 comments

The Heils – Hate to Say I Told You So (Official Music Video)

https://www.youtube.com/watch?v=xKXu_eL3IhE
2•keepamovin•54m ago•1 comments

Kelsey Hightower: What the AI Hype Machine Won't Tell You

https://bitdrift.io/podcast/beyond-the-noise/episode-11
1•karinakarina3•54m ago•0 comments

Critical: Active supply chain attack on axios

https://twitter.com/feross/status/2038807290422370479
4•9woc•57m ago•1 comments

252mya.earth – The Age of Dinosaurs, Shown to Scale

https://252mya.earth/
2•gmays•57m ago•0 comments

Show HN: Headless Timeshift Emulation

https://github.com/RodBarnes/ts-tools/blob/main/README.md
1•IronRod•58m ago•0 comments

I built an AI image generator that turns simple prompts into quality visuals

https://nanobananagen.org/
2•huixiaodewenzi•1h ago•3 comments

FluxVector – Free vector search API with built-in multilingual embeddings

https://fluxvector.dev
1•andresdp•1h ago•0 comments