Agentic QA – Open-source middleware to fuzz-test agents for loops

39•Saurabh_Kumar_•2mo ago

I built this because I watched my LangChain agent burn ~$50 in OpenAI credits overnight due to an infinite loop.

It's a middleware API that acts as a 'Flight Simulator'. You send it your agent's prompt, and it runs adversarial attacks (Red Teaming) to catch loops and PII leaks before deployment.

Code & Repo: https://github.com/Saurabh0377/agentic-qa-api Live Demo: https://agentic-qa-engine.onrender.com/docs

Would love feedback on other failure modes you've seen!

Comments

Saurabh_Kumar_•2mo ago

HN, OP here. I built this because I recently watched my LangChain agent burn through ~$50 of OpenAI credits overnight. It got stuck in a semantic infinite loop (repeating "I am checking..." over and over) which my basic max_iterations check didn't catch because the phrasing was slightly different each time. Realizing that "Pre-Flight" testing for agents is surprisingly hard, I built a small middleware API (FastAPI + LangChain) to automate this. What it does: It acts as an adversarial simulator. You send it your agent's system prompt, and it spins up a 'Red Team' LLM to attack it. Currently checks for: Infinite Loops: Semantic repetition detection. PII Leaks: Attempts social engineering ('URGENT AUDIT') to force the agent to leak fake PII, then checks if it gets blocked. Prompt Injection: Basic resistance checks. Tech Stack: Python, FastAPI, Supabase (for logs). It's open-source and I hosted a live instance on Render if you want to try curl it without installing: https://agentic-qa-api.onrender.com/docs Would love feedback on what other failure modes you've seen your agents fall into!

esafak•2mo ago

1. This is premature to share. I'm not going to pull in a dependency for something so trivial: https://github.com/Saurabh0377/agentic-qa-api/blob/main/main...

2. Keep the comments in English.

giancarlostoro•2mo ago

I had Claude Code losing its mind because of something outside of its control, one of the formatters used by Zed for Python kept messing with HTML templates, which are insanely sensitive to line breaks in some template specific code statements. Zed kept adding line breaks without reason other than some tool just did it. Claude kept trying to fix it, going to the extreme of using ed to force it, I watched it lose its mind till I asked "I think Zed is formatting the file every time you save?" turns out, yes, yes it was. It wasn't an issue when it used ed, but when Claude or I would change the file again, it would become an issue again.

I don't know what could have saved me, maybe .current_editor should be a file that your agents instructions.md file imports, and your editor updates it, to give Claude context about your tooling.

khannn•2mo ago

Couldn't even keep an em dash out of the title

BOOOOO

mikigraf•2mo ago

Almost thought you found my startup AgenticQA.eu

Why E cores make Apple silicon fast

DoNotNotify is now Open Source

Show HN: Fine-tuned Qwen2.5-7B on 100 films for probabilistic story graphs

Dave Farber has died

Matchlock – Secures AI agent workloads with a Linux-based sandbox

Reverse Engineering Raiders of the Lost Ark for the Atari 2600

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Curating a Show on My Ineffable Mother, Ursula K. Le Guin

Beyond agentic coding

Rabbit Ear "Origami": programmable origami in the browser (JS)

The Legacy of Daniel Kahneman: A Personal View (2025)

SectorC: A C Compiler in 512 bytes (2023)

LLMs as the new high level language

The Architecture of Open Source Applications (Volume 1) Berkeley DB

Software factories and the agentic moment

A11yJSON: A standard to describe the accessibility of the physical world

Speed up responses with fast mode

Modern and Antique Technologies Reveal a Dynamic Cosmos

Hoot: Scheme on WebAssembly

Arcan Explained – A browser for different webs

uLauncher

Stories from 25 Years of Software Development

Vocal Guide – belt sing without killing yourself

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

First Proof

LineageOS 23.2

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Start all of your commands with a comma (2009)

In the Australian outback, we're listening for nuclear tests