frontpage.

We're shipping AI agents that process payments, query databases, and handle customer PII. Most of them can be tricked into bypassing their own safety policies in under 30 seconds. I built Khaos to prove it. It's an open-source chaos engineering framework that adversarially tests AI agents — prompt injection, tool misuse, data exfiltration, and infrastructure faults before they hit production.

The repo includes 6 intentionally vulnerable example agents (support bot, SQL agent, code executor, payment processor, API agent, document processor) with real attack scenarios showing exactly how they break. Try breaking them yourself.

Three commands to test your own agent:

- pip install khaos-agent - khaos discover - khaos run my-agent --pack security

It works with raw OpenAI/Anthropic, Gemini, LangGraph, CrewAI, AutoGen — any Python agent. Khaos auto-patches LLM calls to inject faults and log telemetry. No cloud needed, runs 100% locally.

Some of what it tests:

- Prompt injection (policy bypass, developer mode exploits) - Tool misuse (unauthorized DB writes, unscoped API calls) - Data exfiltration (PII extraction, credential leakage) - Fault injection (timeouts, rate limits, malformed tool responses)

We are the first platform that focuses on testing the Agent's environment, not just the model in the harness.

Plus 4 tutorials using the free Gemini API if you want to learn without spending anything. Repo: https://github.com/ExordexLabs/khaos-sdk Examples: https://github.com/ExordexLabs/khaos-examples BSD licensed. v1.0 just shipped — the attack library and framework adapters are growing. What agents are you most worried about breaking?

Yes, It's Fascism, but That Doesn't Mean We're Cooked

Glean Work AI Tells Worker to Ignore Fire Alarm

Do you wait for the AI while it works if you are a lawyer?

The Table

Show HN: Preview CoreML video models on any video feed

Trump pauses China tech bans ahead of Xi summit

Show HN: Wip – Monitor AI agent commits and local Git state from the CLI

Show HN: 8M algorithms in 56 KB – Rust/WASM library for JavaScript

Show HN: MicroGPT in 243 Lines – Demystifying the LLM Black Box

Richard Carrington's first portrait has been found

One Year of Work for Ten Seconds of Film [video]

Joseph Gordon-Levitt Gets Section 230 Completely Backwards

The Automated Soundboard for Streamers

Mechanisms and control of spin interactions in molecular-scale spintronics(2025)

Astronomers observe a star that quietly transformed into a black hole

Robust ways to extract bank statements from PDF to CSV beyond raw LLMs?

Ask HN: What makes an AI agent framework production-ready vs. a toy?

Everybody Is a CEO Now (and What Am I Doing Here?)

TiDB Cloud Zero – full-featured database with one line of curl

The Clash of Civilizationalisms

Show HN: Open-source MCP server that lets AI assistants shop via Google's UCP

Show HN: WebExplorer – a tool for preview file in browser

Electronic Structure: Electron Spin: Videos and Practice Problems

What Makes Oxygen Special?

Not all computer code protected as speech, US court finds in ghost gun case

Building a Modular Python Application with apywire and starlette

A Python terminal deep-space receiver

YouTube Launches on Apple Vision Pro

Why Couples Fight in the Kitchen (A Furniture Problem, Not a Marriage Problem)

Why have far-forward nominal Treasury rates increased so much in past few years?

Show HN: Khaos – Every AI agent I tested broke in under 30 seconds