Miguel: An AI agent that modifies its own source code, sandboxed in Docker

2•PedroMFernandes•2h ago

Comments

PedroMFernandes•2h ago

Hi HN, I built Miguel, an AI agent (Claude + Agno framework) that reads, modifies, and extends its own source code autonomously, sandboxed inside Docker.

I gave it 10 seed capabilities (answer questions, read own code, create tools, handle errors, etc). It completed all 10, then generated its own capability checklist and started implementing those too. It's now at 21+ self-implemented capabilities, including web search, persistent memory, task planning, file analysis, API integrations, and sub-agent delegation. Every change is a real git commit you can browse: https://github.com/soulfir/miguel/commits/main

HOW IT WORKS The system splits into two sides with a hard trust boundary. The host side is protected: it runs the CLI, the improvement runner, git operations, and validation checks. The agent can never see or modify these files. The container side is sandboxed: the agent, its tools, and all execution live inside Docker with the project mounted read-only. The agent can only write to its own code directory. The improvement loop goes like this. The runner takes a git snapshot, then sends the agent its own source code along with the next capability to implement. The agent modifies its own files inside the container. The runner validates the changes with AST syntax checks, JSON schema checks, and import checks. If everything passes, it commits and pushes. If anything fails, it automatically rolls back to the last working state.

WHAT I LEARNED The agent kept creating thin wrapper tools. For example, an api_get() that just calls http_request(method="GET"). It was optimizing for convenience without understanding that every tool costs context tokens. I ended up writing immutable "10 Commandments of Self-Improvement" into the protected runner — principles the agent sees every batch but can never modify. One of them: "Your tool count is a tax on cognition, not a score." After the commandments, the agent ran a consolidation batch on its own. It cut its codebase by 10% and its system prompt by 63%, while keeping all functionality. It understood the principle and acted on it. The agent also evolved itself from a single agent into a team architecture with specialized sub-agents (Coder, Researcher, Analyst), using the framework's native team/delegation API. I didn't touch any agent code for that. The most dangerous failure mode isn't bad code — it's context exhaustion. The agent would read six or more of its own files to "understand" itself, then have no context left to actually write code. Managing the agent's cognitive budget turned out to be the core design challenge.

LIMITATIONS It uses the Claude API, so it costs money to run. Each improvement batch is roughly a few euros in API calls, which is the main bottleneck on how fast it evolves. It's early stage — the architecture works well but the agent still makes questionable decisions sometimes. Licensed under CC BY-NC 4.0. Code is on GitHub. Star it and check back in a week. The code will be different, because Miguel will have changed it.

Happy to answer questions about the architecture, the safety model, or the weirdness of watching an AI rewrite its own brain.

Think Twice Before Buying or Using Meta's Ray-Bans

Anthropic gives lesson in AI revenue hallucination

Production query plans without production data

Build a deep researcher and learn DSPy Signatures and Modules

AI Is Making Libraries Obsolete

Singularity Is Around?

Do YC companies all use the top sales tools?

Deleted Tweet from Energy Secretary Sends Oil Markets on Another Wild Ride

Evolving the Node.js Release Schedule

DOGE employee stole Social Security data and put it on a thumb drive

Claude Opus 4.6 generated a YouTube poop video with a single prompt

Build a "Deep Data" MCP Server to Connect LLMs to Your Local Database in 10min

Aaron Swartz and the Return of Jottit

A Special AMD Ryzen AM5 Motherboard for Linux / Open-Source Enthusiasts

Side questions with /btw in Claude Code

Mathematics is undergoing the biggest change in its history

SaaSpocalypse Now

Classifying email providers of 2000 Swiss municipalities via DNS

I Ching or Book of Changes

I Got Root on Meta AI's Infrastructure Using a Chat Prompt

Chemists thought phosphorus had shown all its cards–until it surprised them

How to start coding with AI agents

Zero Point Energy

Show HN: Repovex – GitHub repo health scores for your whole org

Front End Memory Leaks: 500-Repo Static Analysis and 5-Scenario Benchmark Study

Visual plasticity and exercise revisited: No evidence for a "cycling lane"

Google and Tesla think we're managing the electrical grid all wrong

I've no technical background, hope someone finds this interesting

GLP-1 drugs push U.S. consumers toward spicy foods, lifting sauce makers

Television and computer use and dementia risk in older adults