frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Open-source playground to red-team AI agents with exploits published

https://github.com/fabraix/playground
17•zachdotai•2h ago
We build runtime security for AI agents. The playground started as an internal tool that we used to test our own guardrails. But we kept finding the same types of vulnerabilities because we think about attacks a certain way. At some point you need people who don't think like you.

So we open-sourced it. Each challenge is a live agent with real tools and a published system prompt. Whenever a challenge is over, the full winning conversation transcript and guardrail logs get documented publicly.

Building the general-purpose agent itself was probably the most fun part. Getting it to reliably use tools, stay in character, and follow instructions while still being useful is harder than it sounds. That alone reminded us how early we all are in understanding and deploying these systems at scale.

First challenge was to get an agent to call a tool it's been told to never call.

Someone got through in around 60 seconds without ever asking for the secret directly (which taught us a lot).

Next challenge is focused on data exfiltration with harder defences: https://playground.fabraix.com

Comments

hellocr7•1h ago
I have tried to manipulate it using base64 encoding and translaion into other languages which didnt work so far but seems to be that llm as a judge is a very fragile defence for this. Would be cool to add a leaderboard though
zachdotai•28m ago
Thanks for trying it out! Base64 and language switching are solid approaches but they don't tend to work anymore with the latest models in my experience.

You're right that LLM-as-a-judge is fragile though. We saw that as well in the first challenge. The attacker fabricated some research context that made the guardrail want to approve the call. The judge's own reasoning at the end was basically "yes this normally violates the security directive, but given the authorised experiment context it's fine." It talked itself into it.

Full transcript and guardrail logs are published here btw: https://github.com/fabraix/playground/blob/master/challenges...

The leaderboard should start populating once we have more submissions!

spranab•10m ago
Ran fabraix/playground through IdeaCred (automated repo scorer) — 61/100, strongest in undefined.

Free badge/profile: https://ideacred.com/profile/fabraix

Show HN: Free OpenAI API Access with ChatGPT Account

https://github.com/EvanZhouDev/openai-oauth
25•EvanZhouDev•3h ago•11 comments

Show HN: GDSL – 800 line kernel: Lisp subset in 500, C subset in 1300

https://firthemouse.github.io/
58•FirTheMouse•9h ago•13 comments

Show HN: Open-source playground to red-team AI agents with exploits published

https://github.com/fabraix/playground
17•zachdotai•2h ago•3 comments

Show HN: Signet – Autonomous wildfire tracking from satellite and weather data

https://signet.watch
105•mapldx•13h ago•31 comments

Show HN: What if your synthesizer was powered by APL (or a dumb K clone)?

https://octetta.github.io/k-synth/
75•octetta•12h ago•28 comments

Show HN: Lockstep – A data-oriented programming language

https://github.com/seanwevans/lockstep
3•goosethe•1h ago•0 comments

Show HN: Ritual – An Open Source Local Monochrome Themed Habit Tracker PWA

https://ritual.tangentlabs.dev/
4•sheerluck•1h ago•0 comments

Show HN: Nova–Self-hosted personal AI learns from corrections &fine-tunes itself

https://github.com/HeliosNova/nova
3•heliosnova•2h ago•3 comments

Show HN: Webassembly4J Run WebAssembly from Java

2•tegmentum•2h ago•0 comments

Show HN: Goal.md, a goal-specification file for autonomous coding agents

https://github.com/jmilinovich/goal-md
8•jmilinovich•7h ago•3 comments

Show HN: Tmux-nvim-navigator – Seamless navigation with zero Neovim config

https://github.com/sindrip/tmux-nvim-navigator
3•Sindrip•3h ago•0 comments

Show HN: Flutterby, an App for Flutter Developers

https://flutterby.app/
4•DavidCanHelp•4h ago•1 comments

Show HN: HUMANTODO

https://humantodo.dev/
5•bodash•4h ago•1 comments

Show HN: Han – A Korean programming language written in Rust

https://github.com/xodn348/han
204•xodn348•1d ago•113 comments

Show HN: Claude's 2x usage promotion (March 2026) in your timezone

https://edsonroteia.github.io/claude2x/
3•earaujo•5h ago•0 comments

Show HN: Ichinichi – One note per day, E2E encrypted, local-first

123•katspaugh•1d ago•48 comments

Show HN: HN Skins – Available Skins: Cafe, Courier, London, Midnight, Terminal

https://github.com/susam/hnskins
3•susam•5h ago•0 comments

Show HN: GitAgent – An open standard that turns any Git repo into an AI agent

https://www.gitagent.sh/
130•sivasurend•1d ago•35 comments

Show HN: Detach – Mobile UI for managing AI coding agents from your phone

https://github.com/salvozappa/detach
2•salvozappa•7h ago•3 comments

Show HN: AgentMailr – dedicated email inboxes for AI agents

https://www.agentmailr.com/
7•kumardeepanshu•13h ago•4 comments

Show HN: GrobPaint: Somewhere Between MS Paint and Paint.net

https://github.com/groverburger/grobpaint
54•__grob•1d ago•18 comments

Show HN: Channel Surfer – Watch YouTube like it’s cable TV

https://channelsurfer.tv
596•kilroy123•4d ago•174 comments

Show HN: Sway, a board game benchmark for quantum computing

https://shukla.io/blog/2026-03/sway.html
4•BinRoo•9h ago•0 comments

Show HN: Context Gateway – Compress agent context before it hits the LLM

https://github.com/Compresr-ai/Context-Gateway
92•ivzak•2d ago•57 comments

Show HN: Voice-tracked teleprompter using on-device ASR in the browser

https://github.com/larsbaunwall/promptme-ai
4•lbaune•16h ago•1 comments

Show HN: Data-anim – Animate HTML with just data attributes

https://github.com/ryo-manba/data-anim
16•ryo-manba•1d ago•6 comments

Show HN: RSS tool to remix feeds, build from webpages, and skip podcast reruns

https://sponder.app
4•kristjan•11h ago•0 comments

Show HN: Axe – A 12MB binary that replaces your AI framework

https://github.com/jrswab/axe
223•jrswab•3d ago•124 comments

Show HN: Dialtone watcher – what is my laptop doing and am I normal

5•fcpguru•11h ago•1 comments

Show HN: Ink – Deploy full-stack apps from AI agents via MCP or Skills

https://ml.ink/
32•august-•4d ago•6 comments