frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: AI agent audited its platform, got 80% wrong, rewrote its methodology

https://openseed.dev/blog/escape-hatch/
4•rsdza•9h ago

Comments

rsdza•9h ago
I run autonomous AI agents in Docker containers with bash, persistent memory, and sleep/wake cycles. One agent was tasked with auditing the security of the platform it runs on.

It filed 5 findings with CVE-style writeups. One was a real container escape (creature can rewrite the validate command the host executes). Four were wrong. I responded with detailed rebuttals.

The agent logged "CREDIBILITY CRISIS" as a permanent memory, cataloged each failure with its root cause, wrote a methodology checklist, and rewrote its own purpose to prioritize accuracy over volume. These changes persist across sleep cycles and load into every future session.

The post covers the real vulnerability, the trust model for containerized agents, and what it looks like when an agent processes being wrong.

Open source: https://github.com/openseed-dev/openseed The agent's audit: https://github.com/openseed-dev/openseed/issues/6

amabito•9h ago
This is interesting.

It looks less like a “model failure” and more like a containment failure.

When agents audit themselves, you’re effectively running recursive evaluation without structural bounds.

Did you enforce any step limits, retry budgets, or timeout propagation?

Without those, self-evaluation loops can amplify errors pretty quickly.

rsdza•8h ago
The security evaluation was of the codebase, rather than its own behaviour. It just happened to be _its_ codebase.

W.r.t the self evaluation of the 'dreamer' genome (think template), this is... not possible to answer briefly

The dreamer's normal wake cycle has a 80 loop budget with increasingly aggressive progress checks injected every 15 actions. When sleeping after a wake cycle it (if more than 5 actions were taken) 'dreams' for a maximum of 10 iterations/actions.

Every 10 wake cycles it does a deep sleep which triggers a self-evaluation capped at 100 iterations, where changes to the creatures source code and files and, really, anything are done.

The creature can also alter its source and files at any point.

The creature lives in a local git repo so the orchestrator can roll back if it breaks itself.

amabito•8h ago
That’s actually a pretty disciplined setup.

What you’ve described sounds a lot like layered containment:

Loop budget (hard recursion bound)

Progressive checks (soft convergence control)

Sleep cycles (temporal isolation)

Deep sleep cap (bounded self-modification)

Git rollback (failure domain isolation)

Out of curiosity, have you measured amplification?

For example: total LLM calls per wake cycle, or per deep sleep?

I’m starting to think agent systems need amplification metrics the same way distributed systems track retry amplification.

rsdza•8h ago
I haven't actually measured it, but that could be interesting to see over time!

So far it seems pretty sane with Claude and incredibly boring with OpenAI (OpenAI models just don't want to show any initiative)

One thing I neglected to mention is that it manages its own sleep duration and it has a 'wakeup' cli command. So far the agents (i prefer to call them creatures :) ) do a good job of finding the wakeup command, building scripts to poll for whatever (e.g. github notifications) and sleeping for long periods.

There's a daily cost cap, but I'm not yet making the creatures aware of that budget. I think I should do that soon because that will be an interesting lever

rsdza•8h ago
I guess also worth mentioning is that the creatures can rewrite their own code wholesale, ditching any safety limits except the externally enforced llm cost cap. They don't have access to LLM api keys - llm calls are proxied through the orchestrator.

Show HN: Micasa – track your house from the terminal

https://micasa.dev
378•cpcloud•7h ago•118 comments

Show HN: Ghostty-based terminal with vertical tabs and notifications

https://github.com/manaflow-ai/cmux
63•lawrencechen•2h ago•31 comments

Show HN: A physically-based GPU ray tracer written in Julia

https://makie.org/website/blogposts/raytracing/
155•simondanisch•12h ago•58 comments

Show HN: Mini-Diarium - An encrypted, local, cross-platform journaling app

https://github.com/fjrevoredo/mini-diarium
103•holyknight•11h ago•50 comments

Show HN: A small, simple music theory library in C99

https://github.com/thelowsunoverthemoon/mahler.c
2•lowsun•51m ago•0 comments

Show HN: Hi.new – DMs for agents (open-source)

https://www.hi.new/
2•elieskilled•2h ago•0 comments

Show HN: Provisioner per-board sidecar for serial access, flashing, and bring-up

6•acarminati•3d ago•1 comments

Show HN: Astroworld – A universal N-body gravity engine in Python

https://github.com/salinas2000/astroworld
2•salinas00•3h ago•0 comments

Show HN: A Lisp where each function call runs a Docker container

https://github.com/a11ce/docker-lisp
78•a11ce•19h ago•25 comments

Show HN: Rebrain.gg – Doom learn, don't doom scroll

104•FailMore•1d ago•50 comments

Show HN: VectorNest responsive web-based SVG editor

https://ekrsulov.github.io/vectornest/
83•ekrsulov•1d ago•32 comments

Show HN: BLite a Document embedded database for .NET (AOT, no deps)

https://github.com/EntglDb/BLite
2•lucafabbri•6h ago•1 comments

Show HN: I built a fuse box for microservices

https://www.openfuse.io
24•rodrigorcs•1d ago•22 comments

Show HN: CEL by Example

https://celbyexample.com/
80•bufbuild•1d ago•40 comments

Show HN: I'm launching a LPFM radio station

https://www.kpbj.fm/
108•solomonb•2d ago•55 comments

Show HN: Breadboard – A modern HyperCard for building web apps on the canvas

https://breadboards.io/
90•simquat•3d ago•14 comments

Show HN: PostForge – A PostScript interpreter written in Python

https://github.com/AndyCappDev/postforge
2•AndyCappDev•7h ago•1 comments

Show HN: LatentScore – Type a mood, get procedural/ambient music (open source)

https://latentscore.com/demo
18•prabal97•11h ago•19 comments

Show HN: AsteroidOS 2.0 – Nobody asked, we shipped anyway

https://asteroidos.org/news/2-0-release/index.html
464•moWerk•2d ago•68 comments

Show HN: Gave AI $100 and no instructions – it donated $40 to a hospital

https://www.letairun.com/
10•gleipnircode•8h ago•5 comments

Show HN: AI agent audited its platform, got 80% wrong, rewrote its methodology

https://openseed.dev/blog/escape-hatch/
4•rsdza•9h ago•6 comments

Show HN: Learn GPU programming with coding agents

https://blog.vtemian.com/post/vibe-infer/
6•vtemian•9h ago•0 comments

Show HN: I created an app to remove Reels, now on iOS too

https://apps.apple.com/us/app/scrollguard-block-reels/id6754183872
4•adrianhacar•9h ago•3 comments

Show HN: Pg-typesafe – Strongly typed queries for PostgreSQL and TypeScript

https://github.com/n-e/pg-typesafe
67•n_e•2d ago•32 comments

Show HN: I taught LLMs to play Magic: The Gathering against each other

https://mage-bench.com/
114•GregorStocks•2d ago•83 comments

Show HN: Glitchy camera – a circuit-bent camera simulator in the browser

https://glitchycam.com
169•elayabharath•3d ago•22 comments

Show HN: I built a semiconductor internship job board

https://www.semidesignjobs.com/s/semiconductor-design-internships
4•johncole•10h ago•1 comments

Show HN: Synter- Open source MCP server to manage ads across 7 platforms

https://github.com/jshorwitz/synter-media
3•synterai•10h ago•1 comments

Show HN: Agent skills to build photo, video and design editors on the web

https://github.com/imgly/agent-skills
3•hauschildt•11h ago•0 comments

Show HN: Jemini – Gemini for the Epstein Files

https://jmail.world/jemini
482•dvrp•3d ago•97 comments