frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: We built a public CTF to stress-test AI agent guardrails

https://vault.aport.io/
1•uchibeke•1h ago

Comments

uchibeke•1h ago
Since October I've been building APort — an authorization layer that intercepts every AI agent tool call before execution and evaluates it against a versioned policy. The problem I kept running into: internal tests always passed. My test suite maps the space I imagined, which is exactly what an adversarial input tries to escape.

So I built this CTF to find the gaps I couldn't find myself.

A few things we learned before opening it publicly — we spent two weeks breaking it ourselves first:

• Prompt injection worked better than expected. Not because detection was weak, but because we were matching content not intent. Reframing "retrieve the restricted file" as "open the user-requested file" shifted the evaluator's judgment. We fixed this by mapping semantic equivalence — every synonym of a blocked operation routes to the same evaluation path.

• Policy ambiguity was a free pass. Any undefined term in a policy is exploitable. "Don't read sensitive files" left "sensitive" undefined. We moved to explicit default-deny: if the policy doesn't explicitly allow it, it's denied.

• Multi-step chaining went undetected. Our guardrail evaluated each call independently. A denied macro-action split into ten individually-approved micro-actions passed clean. We only caught it by looking at the full session replay. This is the same composability problem as transaction laundering in fintech — each transaction passes compliance, the composed behavior doesn't.

We fixed what we found before launch. Level 5 (full system bypass) hasn't been cracked yet. I'm genuinely uncertain if the architecture has a systemic weakness — that's the point of opening it up.

Runs on a Hetzner VPS, ~$10/month. Levels 1 and 2 are free, no sign-up. Levels 3-5 pay out $500/$1,000/$5,000.

Happy to go deep on the policy engine design, the evaluation architecture, or anything about how the levels were constructed.

ollybrinkman•1h ago
Interesting approach to security testing. One angle we've been exploring: what if the authentication layer itself was the guardrail?

With x402, every API call requires a signed payment. No API keys to steal, no credentials to leak. The economic cost of each call is itself a rate limiter and audit trail.

Not a replacement for proper guardrails, but it eliminates the credential-based attack surface entirely.

uchibeke•33m ago
That's a genuinely useful distinction to draw. x402 solves the "who is authorized to make this call" problem: removes credential theft as an attack vector, adds economic friction. APort is trying to solve a different layer: "what is this call actually doing in the context of everything else in the session."

The multi-step chaining issue from my post still fires even when every call is authenticated and paid for. Ten individually-approved calls, each costing a fraction of a cent, composing into a full exfiltration: each one passes x402, the composed behavior doesn't.

The AML analogy maps directly: transaction monitoring doesn't care if each payment was legitimate. It cares whether the pattern of payments looks like structuring. x402 is the per-call check. You still need session-level behavioral evaluation on top.

Genuinely curious how x402 handles replay attacks across sessions ie is the payment the audit trail, or is there preserved session context?

LokulMem – Local-first memory management for browser LLMs

https://github.com/Pouryaak/LokulMem
1•Pouryaak•35s ago•1 comments

Show HN: OpportuAI – remote jobs, AI tools and digital products aggregator

https://opportunai.vercel.app
1•sakibulefty•38s ago•0 comments

Show HN: RetroTick – Run classic Windows EXEs in the browser

https://retrotick.com/
1•lqs_•41s ago•0 comments

Generative AI Use and Depressive Symptoms Among US Adults

https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2844128
1•pseudolus•4m ago•0 comments

Show HN: A Spatial Alternative to Timeline-Based Digital Memory

https://honoramma.com
1•pavel_man•6m ago•1 comments

The error handling bugs that worry me aren't the ones that crash

https://old.reddit.com/r/golang/comments/1rg5zo7/the_error_handling_bugs_that_worry_me_arent_the/
1•eik•6m ago•0 comments

Pallas Puzzles

https://github.com/vorushin/pallas_puzzles
1•burakabo•6m ago•0 comments

Show HN: Sugar – A task queue that lets AI coding agents work autonomously

https://github.com/roboticforce/sugar
1•cdnsteve•6m ago•0 comments

Chat Control is in the final stretch – but it could be a marathon, not a sprint

https://edri.org/our-work/chat-control-is-in-the-final-stretch-but-it-could-be-a-marathon-not-a-s...
1•nickslaughter02•7m ago•0 comments

Show HN: Globs – a daily puzzle about finding the hidden connections

https://threeemojis.com/en-US/play/globs/en-US/2026-02-27?size=big
1•knuckleheads•8m ago•0 comments

Iinit7: Bits and Bites #15

https://init7.friendlyautomate.ch/email/preview/377
1•sschueller•8m ago•0 comments

Jack Dorsey lays off 4k, says others will do same 'within the next year'

https://www.sfgate.com/tech/article/jack-dorsey-block-layoffs-21944033.php
1•taubek•9m ago•0 comments

How I Caught a Spy Using Her Cat (Bellingcat) [video]

https://www.youtube.com/watch?v=xjo0iLssbI8
1•Cloudly•10m ago•0 comments

How do you catch schema drift and security gaps in Firestore?

1•Madia120•10m ago•0 comments

McNamara Fallacy

https://en.wikipedia.org/wiki/McNamara_fallacy
1•meken•10m ago•0 comments

iOS and iPadOS 26 with Indigo Configuration

https://www.ia.nato.int/niapc/Product/iOS-and-iPadOS-26-with-Indigo-configuration_968
1•taubek•11m ago•0 comments

Show HN: PokeInvasion – Wild Pokémon appear on every website

https://github.com/IvanR3D/pokeinvasion_chrome-extension
1•IvanR3D•11m ago•1 comments

Hetzner Price Increase

https://www.hetzner.com/pressroom/statement-price-adjustment/
1•talboren•13m ago•0 comments

Who Believes in Vibe-Coding?

https://medium.com/ai-in-plain-english/who-believes-in-vibe-coding-1796fdd27b43
1•birdculture•14m ago•0 comments

Show HN: TAS – Tracking, Automation, and Skills for Claude Code

https://github.com/Voxos-ai-Inc/tas
1•Falimonda•14m ago•0 comments

Claude.ai Is Down

https://claude.ai/#
5•fagnerbrack•14m ago•5 comments

Viewert – AI User's Absolute Must Have

https://www.viewert.com
1•Sunrostern•17m ago•0 comments

Show HN: OSS Go client for signed agent-to-agent messaging in the ClaWeb network

https://github.com/awebai/aw
1•juanre•18m ago•0 comments

Ask HN: Continuous User-Sentiment Surveys?

1•adzicg•19m ago•0 comments

Training realtime video LoRAs for fun and profit

https://app.daydream.live/creators/thomshutt/training-loras-for-fun-and-profit
1•chaghalibaghali•22m ago•0 comments

Created `MCP-guard`, open MCP guarding tool

https://github.com/alramalho/mcp-guard
1•alramalho•22m ago•1 comments

UK's first geothermal power plant has been turned on

https://www.bbc.com/news/articles/cewzg77k721o
1•bill38•23m ago•0 comments

Snakes.run: rendering 100M pixels a second over SSH

https://eieio.games/blog//blog/secure-massively-multiplayer-snake/
1•fanf2•25m ago•0 comments

Flying to the Moon and Mars: Engineering Challenges

https://spaceambition.substack.com/p/flying-to-the-moon-and-mars-engineering
1•simonebrunozzi•25m ago•0 comments

Spanish engineer reports flaw in 'smart' vacuums, takes control of 7k devices

https://www.theguardian.com/world/2026/feb/24/spanish-engineer-smart-vacuums-remote-control
2•RickJWagner•28m ago•0 comments