frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Discuss – Do AI agents deserve all the hype they are getting?

4•MicroWagie•4h ago•1 comments

Ask HN: Anyone Using a Mac Studio for Local AI/LLM?

48•UmYeahNo•1d ago•30 comments

LLMs are powerful, but enterprises are deterministic by nature

3•prateekdalal•8h ago•6 comments

Ask HN: Non AI-obsessed tech forums

29•nanocat•19h ago•26 comments

Ask HN: Ideas for small ways to make the world a better place

18•jlmcgraw•21h ago•21 comments

Ask HN: 10 months since the Llama-4 release: what happened to Meta AI?

44•Invictus0•1d ago•11 comments

Ask HN: Who wants to be hired? (February 2026)

139•whoishiring•5d ago•520 comments

Ask HN: Who is hiring? (February 2026)

313•whoishiring•5d ago•514 comments

Ask HN: Non-profit, volunteers run org needs CRM. Is Odoo Community a good sol.?

2•netfortius•16h ago•1 comments

AI Regex Scientist: A self-improving regex solver

7•PranoyP•23h ago•1 comments

Tell HN: Another round of Zendesk email spam

104•Philpax•2d ago•54 comments

Ask HN: Is Connecting via SSH Risky?

19•atrevbot•2d ago•37 comments

Ask HN: Has your whole engineering team gone big into AI coding? How's it going?

18•jchung•2d ago•13 comments

Ask HN: Why LLM providers sell access instead of consulting services?

5•pera•1d ago•13 comments

Ask HN: How does ChatGPT decide which websites to recommend?

5•nworley•1d ago•11 comments

Ask HN: What is the most complicated Algorithm you came up with yourself?

3•meffmadd•1d ago•7 comments

Ask HN: Is it just me or are most businesses insane?

8•justenough•1d ago•7 comments

Ask HN: Mem0 stores memories, but doesn't learn user patterns

9•fliellerjulian•2d ago•6 comments

Ask HN: Is there anyone here who still uses slide rules?

123•blenderob•4d ago•122 comments

Kernighan on Programming

170•chrisjj•5d ago•61 comments

Ask HN: Anyone Seeing YT ads related to chats on ChatGPT?

2•guhsnamih•1d ago•4 comments

Ask HN: Does global decoupling from the USA signal comeback of the desktop app?

5•wewewedxfgdf•1d ago•3 comments

Ask HN: Any International Job Boards for International Workers?

2•15charslong•18h ago•2 comments

We built a serverless GPU inference platform with predictable latency

5•QubridAI•2d ago•1 comments

Ask HN: Does a good "read it later" app exist?

8•buchanae•3d ago•18 comments

Ask HN: Have you been fired because of AI?

17•s-stude•4d ago•15 comments

Ask HN: Anyone have a "sovereign" solution for phone calls?

12•kldg•4d ago•1 comments

Ask HN: Cheap laptop for Linux without GUI (for writing)

15•locusofself•3d ago•16 comments

Ask HN: How Did You Validate?

4•haute_cuisine•2d ago•6 comments

Ask HN: OpenClaw users, what is your token spend?

14•8cvor6j844qw_d6•4d ago•6 comments
Open in hackernews

Agentic QA – Open-source middleware to fuzz-test agents for loops

39•Saurabh_Kumar_•2mo ago
I built this because I watched my LangChain agent burn ~$50 in OpenAI credits overnight due to an infinite loop.

It's a middleware API that acts as a 'Flight Simulator'. You send it your agent's prompt, and it runs adversarial attacks (Red Teaming) to catch loops and PII leaks before deployment.

Code & Repo: https://github.com/Saurabh0377/agentic-qa-api Live Demo: https://agentic-qa-engine.onrender.com/docs

Would love feedback on other failure modes you've seen!

Comments

Saurabh_Kumar_•2mo ago
HN, OP here. I built this because I recently watched my LangChain agent burn through ~$50 of OpenAI credits overnight. It got stuck in a semantic infinite loop (repeating "I am checking..." over and over) which my basic max_iterations check didn't catch because the phrasing was slightly different each time. Realizing that "Pre-Flight" testing for agents is surprisingly hard, I built a small middleware API (FastAPI + LangChain) to automate this. What it does: It acts as an adversarial simulator. You send it your agent's system prompt, and it spins up a 'Red Team' LLM to attack it. Currently checks for: Infinite Loops: Semantic repetition detection. PII Leaks: Attempts social engineering ('URGENT AUDIT') to force the agent to leak fake PII, then checks if it gets blocked. Prompt Injection: Basic resistance checks. Tech Stack: Python, FastAPI, Supabase (for logs). It's open-source and I hosted a live instance on Render if you want to try curl it without installing: https://agentic-qa-api.onrender.com/docs Would love feedback on what other failure modes you've seen your agents fall into!
esafak•1mo ago
1. This is premature to share. I'm not going to pull in a dependency for something so trivial: https://github.com/Saurabh0377/agentic-qa-api/blob/main/main...

2. Keep the comments in English.

giancarlostoro•1mo ago
I had Claude Code losing its mind because of something outside of its control, one of the formatters used by Zed for Python kept messing with HTML templates, which are insanely sensitive to line breaks in some template specific code statements. Zed kept adding line breaks without reason other than some tool just did it. Claude kept trying to fix it, going to the extreme of using ed to force it, I watched it lose its mind till I asked "I think Zed is formatting the file every time you save?" turns out, yes, yes it was. It wasn't an issue when it used ed, but when Claude or I would change the file again, it would become an issue again.

I don't know what could have saved me, maybe .current_editor should be a file that your agents instructions.md file imports, and your editor updates it, to give Claude context about your tooling.

khannn•1mo ago
Couldn't even keep an em dash out of the title

BOOOOO

mikigraf•1mo ago
Almost thought you found my startup AgenticQA.eu