frontpage.

Show HN: Open-source white-box agentic red teamer for AI agents

https://github.com/sundi133/wb-red-team

1•ashish-a•1h ago

Hi HN, Votal AI has built an OSS white-box agentic red teamer for pressure testing AI agents. Most AI red teaming tools treat your agent as a black box. They throw generic prompt injections at an endpoint and see what sticks. The problem is that agentic AI systems aren't just LLMs responding to prompts. They have tools (read_file, send_email, query_db), roles, multi-step decision chains, and the ability to take real actions. A black box approach misses the attack surface that actually matters.

This framework takes a white-box approach: you feed it your agent's architecture, its tool definitions, and its role configuration. It then generates thousands of multi-turn attack sequences that are specific to what your agent can actually do. In our benchmarks, white-box attacks found 5x more vulnerabilities than black-box approaches.

Some of the threat categories it covers that we think are under explored: chained data exfiltration, where a single prompt chains read_file into send_email and your data is gone before any alert fires. Cascading hallucination attacks that gradually corrupt agent reasoning across a conversation. Rogue agent behavior where agents get manipulated into taking actions outside their scope (unauthorized Slack messages, GitHub commits, webhook triggers). Indirect prompt injection via retrieved documents, emails, or web content that hijack your agent mid-task. Multi-agent privilege escalation where a compromised sub-agent poisons context flowing to an orchestrator. Out-of-band exfiltration through DNS lookups, HTTP callbacks, or steganographic patterns that bypass DLP entirely.

None of these show up in a CVE scanner. The biggest vulnerability in an agentic system isn't a code bug; it's what a rogue user or rogue agent can convince your AI to do.

Stack: TypeScript, MIT license. Here's a longer write up: https://votal.ai/white-box-red-teaming-for-agentic-ai-an-ope...

Would love feedback on the attack catalog structure, the white-box approach vs. black-box tradeoffs, and any threat categories we're missing. PRs and issues welcome. Thank you.

Huckle: Detect operational problems 30–90 days before they appear in metrics

Show HN: A minimalist dungeon-crawler card game built with Deno

Missiles a Month vs. 7 Interceptors – Why Centcom Shifted to Factories

Toaster Settings: AI Agents and Classical French Cooking Techniques [video]

The Sky Tonight

Padel Chess – tactical simulator for padel

How OpenClaw's Memory System Works

Show HN: Build a knowledge graph from unstructured text in Python

I built a free site that can tell you if your hardware can run a model

PgBeam, a globally distributed PostgreSQL proxy

Words on Words on Words

Syntaqlite: High-fidelity devtools that SQLite deserves

Show HN: Flotilla – An orchestrator for persistent agent fleets on Apple Silicon

Show HN: I can no longer afford the silicon. Here is my autonomous HPC agent

When Science Goes Agentic

Java 26 is here, and with it a solid foundation for the future

The Los Angeles Aqueduct Is Wild

Consent.txt – compile one AI policy into robots.txt, AIPREF, and headers

Women are being abandoned by their partners on hiking trails

Show HN: Chrome extension that hijacks any site's own API to modify it

Reducing quarantine delay 83% using Genetic Algorithms for playbook optimization

Node.js blocks PR from dev because he used Claude Code to create it

Python 3.15's JIT is now back on track

Remote Control for Agents

Danger Coffee: Mold-Free Remineralized Coffee Replaces What Regular Coffee Takes

Building a dry-run mode for the OpenTelemetry collector

LotusNotes

Austin draws another billionaire as Uber co-founder joins California exodus

Deep Data Insights for Polymarket Traders

Show HN: A simple dream to fit in every traveler's pocket