frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Agentic QA – Open-source middleware to fuzz-test agents for loops

17•Saurabh_Kumar_•6d ago
I built this because I watched my LangChain agent burn ~$50 in OpenAI credits overnight due to an infinite loop.

It's a middleware API that acts as a 'Flight Simulator'. You send it your agent's prompt, and it runs adversarial attacks (Red Teaming) to catch loops and PII leaks before deployment.

Code & Repo: https://github.com/Saurabh0377/agentic-qa-api Live Demo: https://agentic-qa-engine.onrender.com/docs

Would love feedback on other failure modes you've seen!

Comments

Saurabh_Kumar_•6d ago
HN, OP here. I built this because I recently watched my LangChain agent burn through ~$50 of OpenAI credits overnight. It got stuck in a semantic infinite loop (repeating "I am checking..." over and over) which my basic max_iterations check didn't catch because the phrasing was slightly different each time. Realizing that "Pre-Flight" testing for agents is surprisingly hard, I built a small middleware API (FastAPI + LangChain) to automate this. What it does: It acts as an adversarial simulator. You send it your agent's system prompt, and it spins up a 'Red Team' LLM to attack it. Currently checks for: Infinite Loops: Semantic repetition detection. PII Leaks: Attempts social engineering ('URGENT AUDIT') to force the agent to leak fake PII, then checks if it gets blocked. Prompt Injection: Basic resistance checks. Tech Stack: Python, FastAPI, Supabase (for logs). It's open-source and I hosted a live instance on Render if you want to try curl it without installing: https://agentic-qa-api.onrender.com/docs Would love feedback on what other failure modes you've seen your agents fall into!
esafak•1h ago
1. This is premature to share. I'm not going to pull in a dependency for something so trivial: https://github.com/Saurabh0377/agentic-qa-api/blob/main/main...

2. Keep the comments in English.

giancarlostoro•58m ago
I had Claude Code losing its mind because of something outside of its control, one of the formatters used by Zed for Python kept messing with HTML templates, which are insanely sensitive to line breaks in some template specific code statements. Zed kept adding line breaks without reason other than some tool just did it. Claude kept trying to fix it, going to the extreme of using ed to force it, I watched it lose its mind till I asked "I think Zed is formatting the file every time you save?" turns out, yes, yes it was. It wasn't an issue when it used ed, but when Claude or I would change the file again, it would become an issue again.

I don't know what could have saved me, maybe .current_editor should be a file that your agents instructions.md file imports, and your editor updates it, to give Claude context about your tooling.

khannn•36m ago
Couldn't even keep an em dash out of the title

BOOOOO

mikigraf•18m ago
Almost thought you found my startup AgenticQA.eu

10 Years of Let's Encrypt

https://letsencrypt.org/2025/12/09/10-years
167•SGran•1h ago•57 comments

Show HN: Gemini Pro 3 hallucinates the HN front page 10 years from now

https://dosaygo-studio.github.io/hn-front-page-2035/news
1082•keepamovin•5h ago•459 comments

PeerTube is recognized as a digital public good by Digital Public Goods Alliance

https://www.digitalpublicgoods.net/r/peertube
240•fsflover•3h ago•36 comments

Mistral Releases Devstral 2 (72.2% SWE-Bench Verified) and Vibe CLI

https://mistral.ai/news/devstral-2-vibe-cli
347•pember•5h ago•164 comments

If you're going to vibe code, why not do it in C?

https://stephenramsay.net/posts/vibe-coding.html
167•sramsay•3h ago•189 comments

Handsdown one of the coolest 3D websites

https://bruno-simon.com/
242•razzmataks•4h ago•69 comments

Pebble Index 01 – External memory for your brain

https://repebble.com/blog/meet-pebble-index-01-external-memory-for-your-brain
269•freshrap6•5h ago•274 comments

So You Want to Speak at Software Conferences?

https://dylanbeattie.net/2025/12/08/so-you-want-to-speak-at-software-conferences.html
51•speckx•2h ago•11 comments

Donating the Model Context Protocol and Establishing the Agentic AI Foundation

https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agenti...
68•meetpateltech•3h ago•30 comments

Kaiju – General purpose 3D/2D game engine in Go and Vulkan with built in editor

https://github.com/KaijuEngine/kaiju
120•discomrobertul8•5h ago•51 comments

LLM from scratch, part 28 – training a base model from scratch on an RTX 3090

https://www.gilesthomas.com/2025/12/llm-from-scratch-28-training-a-base-model-from-scratch
410•gpjt•1w ago•96 comments

We Need to Die

https://willllliam.com/blog/why-we-need-to-die/
10•ericzawo•23m ago•1 comments

Clearspace (YC W23) Is Hiring a Founding Designer

https://www.ycombinator.com/companies/clearspace/jobs/yamWTLr-founding-designer-at-clearspace
1•roycebranning•3h ago

The stack circuitry of the Intel 8087 floating point chip, reverse-engineered

https://www.righto.com/2025/12/8087-stack-circuitry.html
25•elpocko•2h ago•9 comments

My favourite small hash table

https://www.corsix.org/content/my-favourite-small-hash-table
88•speckx•5h ago•17 comments

Launch HN: Mentat (YC F24) – Controlling LLMs with Runtime Intervention

24•cgorlla•4h ago•21 comments

Agentic AI Foundation

https://block.xyz/inside/block-anthropic-and-openai-launch-the-agentic-ai-foundation
5•thinkingkong•43m ago•1 comments

"The Matilda Effect": Pioneering Women Scientists Written Out of Science History

https://www.openculture.com/2025/12/matilda-effect.html
32•binning•2h ago•5 comments

30 Year Anniversary of WarCraft II: Tides of Darkness

https://www.jorsys.org/archive/december_2025.html#newsitem_2025-12-09T07:42:19Z
134•sjoblomj•11h ago•85 comments

Show HN: AlgoDrill – Interactive drills to stop forgetting LeetCode patterns

https://algodrill.io
142•henwfan•9h ago•86 comments

AWS Trainium3 Deep Dive – A Potential Challenger Approaching

https://newsletter.semianalysis.com/p/aws-trainium3-deep-dive-a-potential
52•Symmetry•5d ago•17 comments

Agentic QA – Open-source middleware to fuzz-test agents for loops

17•Saurabh_Kumar_•6d ago•5 comments

The Joy of Playing Grandia, on Sega Saturn

https://www.segasaturnshiro.com/2025/11/27/the-joy-of-playing-grandia-on-sega-saturn/
157•tosh•10h ago•100 comments

Apple's slow AI pace becomes a strength as market grows weary of spending

https://finance.yahoo.com/news/apple-slow-ai-pace-becomes-104658095.html
108•bgwalter•5h ago•121 comments

Show HN: Detail, a Bug Finder

https://detail.dev/
36•drob•3h ago•15 comments

Transformers know more than they can tell: Learning the Collatz sequence

https://www.arxiv.org/pdf/2511.10811
91•Xcelerate•6d ago•33 comments

Constructing the Word's First JPEG XL MD5 Hash Quine

https://stackchk.fail/blog/jxl_hashquine_writeup
89•luispa•1w ago•17 comments

Ask HN: Should "I asked $AI, and it said" replies be forbidden in HN guidelines?

592•embedding-shape•4h ago•338 comments

Tutorial 48: my museum collections kit

https://svpow.com/2025/11/26/tutorial-48-my-museum-collections-kit/
5•surprisetalk•4d ago•0 comments

How private equity is changing housing

https://www.theatlantic.com/ideas/2025/12/private-equity-housing-changes/685138/
80•harambae•3h ago•170 comments