frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Nyx – multi-turn, adaptive, offensive testing harness for AI agents

https://fabraix.com
16•zachdotai•1h ago
We built Nyx to solve a problem we kept hitting while building agents: AI agents break in ways traditional software doesn't. Logic bugs, reasoning failures, edge cases that manual testing and static benchmarks never explore.

Nyx is an autonomous testing harness that probes your AI agents to find failure modes before users do. It’s used to find logic bugs, instruction following failures, edge cases in agent behavior, and for red-team security testing (jailbreaks, prompt injection, tool hijacking)

Technical approach: * Pure blackbox (no special access needed - test like your users interact) * Multi-turn adaptive conversations * Multi-modal testing (voice, text, images, documents, browser interactions) * Massively parallel by default

Instead of spending time writing static evals for the key failure modes of your AI agents, point Nyx at any system and it autonomously discovers failure modes that matter. We typically find issues in under 10 minutes that manual audits take hours to surface.

This is early work and we know the methodology is still going to evolve. We would love nothing more than feedback from the community as we iterate on this.

Comments

adam_rida•1h ago
Very cool!
aacudad•1h ago
I am not sure this will work, seems like added complexity to something simple
ibrahim-fab•1h ago
Nice. Definitely true that evaluating agents behavior is by far the toughest part of building them. Also most eval cases are added without thought and not maintained when agent behaviour updates. Interesting approach.
zachdotai•26m ago
We wrote some thoughts on static vs. dynamic evals and how it relates to understanding the security posture of an AI system. Static security evals no longer carry the signal they used to. A one-shot pass/fail tells you almost nothing about real-world risk.

Would love your thoughts on this: https://fabraix.com/blog/adversarial-cost-to-exploit

ljhasdr•1h ago
i need to try this before mythos comes to attack our service. thanks!

Slava's Monoid Zoo

https://factorcode.org/slava/monoids.html
1•luu•2m ago•0 comments

Nicholas Pardini on real inflation vs. CPI and financial repression [video]

https://www.youtube.com/watch?v=qcZEj88sQp4
1•telotortium•3m ago•0 comments

Show HN: Brygga – A modern, fast, feature-rich IRC client for macOS

1•EldrRoot•4m ago•0 comments

Arm Helium Technology Reference Book (2023) [pdf]

https://github.com/arm-education/Arm-Helium-Technology/blob/main/HeliumTechnology_referencebook.pdf
1•my123•8m ago•0 comments

To Dissect a Mockingbird: A Graphical Notation for the Lambda Calculus (1996)

https://dkeenan.com/Lambda/index.htm
2•pizza•13m ago•0 comments

Vercel April 2026 security incident

https://vercel.com/kb/bulletin/vercel-april-2026-security-incident
1•lox•15m ago•1 comments

The game was rigged by the one man hired to prevent rigging

https://twitter.com/Jeremybtc/status/2045561556650639664
1•delichon•18m ago•0 comments

Swiss AI Initiative

https://www.swiss-ai.org
2•doener•18m ago•1 comments

Got an Old Kindle? It Might Not Work Anymore

https://www.nytimes.com/wirecutter/reviews/older-kindle-support-ending/
2•eigenhombre•20m ago•0 comments

2,100 Swiss municipalities showing which provider handles their official email

https://mxmap.ch/
11•doener•20m ago•2 comments

Vercel hack – an old fashioned honey pot?

https://twitter.com/rauchg/status/2045995362499076169
1•Juusohei•24m ago•0 comments

NVFP4 on Nvidia DGX Spark is slower than FP8 on the same model

https://forums.developer.nvidia.com/t/nvfp4-on-dgx-spark-gb10-is-broken-i-bought-9-of-these-for-t...
2•vinnybad•24m ago•0 comments

Ultimate Guide to Vibe Coding

https://github.com/EnzeD/vibe-coding
1•indigodaddy•32m ago•0 comments

Ukraine Moves to Replace Frontline Soldiers with 25,000 Ground Robots

https://united24media.com/latest-news/ukraine-moves-to-replace-frontline-soldiers-with-25000-grou...
9•Teever•36m ago•0 comments

Headless Everything for Personal AI

https://interconnected.org/home/2026/04/18/headless
3•rdslw•38m ago•0 comments

Fear and Loathing Among the Haves and Have Mores in San Francisco

https://www.wsj.com/tech/ai/fear-and-loathing-among-the-haves-and-have-mores-in-san-francisco-ee8...
4•1vuio0pswjnm7•39m ago•0 comments

Interview with 0.1x Engineer [video]

https://www.youtube.com/watch?v=hwG89HH0VcM
2•strogonoff•40m ago•0 comments

Banned by Anthropic?

https://bannedbyanthropic.com/
81•gck1•43m ago•50 comments

Show HN: Turn your PCB into a Minecraft world

https://www.youtube.com/watch?v=3z7CzlbqfR0
1•napowderly•43m ago•0 comments

Ex-CEO, ex-CFO of bankrupt AI company charged with fraud

https://www.reuters.com/legal/government/ex-ceo-ex-cfo-bankrupt-ai-company-charged-with-fraud-202...
4•1vuio0pswjnm7•46m ago•1 comments

US Draft Update: Major Tech Company Urges Universal National Service

https://www.newsweek.com/us-draft-update-major-tech-company-urges-universal-national-service-1185...
5•Teever•47m ago•0 comments

Cathode Ray Tube Memory (RAM)

https://www.radiomuseum.org/forum/williams_kilburn_williams_kilburn_ram.html
1•mycall•47m ago•0 comments

Audit Tool for Vercel Exposure of Environment Variables

https://github.com/garyhtou/Vercel-Env-Var-Exposure-Triager
2•garyhtou•48m ago•1 comments

Improving Office+Photoshop+Fusion on Linux with Adversarial Drinking

https://hajo.me/blog/2026/04/18/improving-office-photoshop-fusion-on-linux-with-adversarial-drink...
1•fxtentacle•53m ago•0 comments

Is Prolog Worth Learning?

https://medium.com/@kenichisasagawa/is-prolog-worth-learning-892e8a61bf57
4•myth_drannon•55m ago•0 comments

Claude UI Feature Request

2•simon_acca•56m ago•0 comments

Reminder: Enable ZRAM on your Linux system to optimize RAM usage

https://www.cnx-software.com/2026/04/15/reminder-enable-zram-on-your-linux-system-to-optimize-ram...
7•type0•57m ago•0 comments

Aliens.gov will be running as a WordPress multisite

https://aliens.gov/
3•johnnyApplePRNG•57m ago•0 comments

Show HN: Developerpod, K-Cups for Code

https://developerpod.com/
2•DavidCanHelp•58m ago•0 comments

Fuse Shared Libraries into ELFs

https://github.com/fossable/solder
2•fossable•1h ago•0 comments