You should write an agent

https://fly.io/blog/everyone-write-an-agent/

138•tabletcorry•2h ago

Comments

tlarkworthy•1h ago

Yeah I was inspired after https://news.ycombinator.com/item?id=43998472 which is also very concrete

tptacek•1h ago

I love everything they've written and also Sketch is really good.

manishsharan•1h ago

How.. please don't say use langxxx library

I am looking for a language or library agnostic pattern like we have MVC etc. for web applications. Or Gang of Four patterns but for building agents.

tptacek•1h ago

The whole post is about not using frameworks; all you need is the LLM API. You could do it with plain HTTP without much trouble.

manishsharan•1h ago

When I ask for Patterns, I am seeking help for recurring problems that I have encountered. Context management .. small llms ( ones with small context size) break and get confused and forget work they have done or the original goal.

skeledrew•32m ago

That's why you want to use sub-agents which handle smaller tasks and return results to a delegating agent. So all agents have their own very specialized context window.

tptacek•26m ago

That's one legit answer. But if you're not stuck in Claude's context model, you can do other things. One extremely stupid simple thing you can do, which is very handy when you're doing large-scale data processing (like log analysis): just don't save the bulky tool responses in your context window once the LLM has generated a real response to them.

My own dumb TUI agent, I gave a built in `lobotomize` tool, which dumps a text list of everything in the context window (short summary text plus token count), and then lets it Eternal Sunshine of the Spotless Agent things out of the window. It works! The models know how to drive that tool. It'll do a series of giant ass log queries, filling up the context window, and then you can watch as it zaps things out of the window to make space for more queries.

This is like 20 lines of code.

zahlman•21m ago

Start by thinking about how big the context window is, and what the rules should be for purging old context.

Design patterns can't help you here. The hard part is figuring out what to do; the "how" is trivial.

oooyay•1h ago

I'm not going to link my blog again but I have a reply on this post where I link to my blog post where I talk about how I built mine. Most agents fit nicely into a finite state machine or a directed acyclic graph that responds to an event loop. I do use provider SDKs to interact with models but mostly because it saves me a lot of boilerplate. MCP clients and servers are also widely available as SDKs. The biggest thing to remember, imo, is to keep the relationship between prompts, resources, and tools in mind. They make up a sort of dynamic workflow engine.

behnamoh•1h ago

> nobody knows anything yet

that sums up my experience in AI over the past three years. so many projects reinvent the same thing, so much spaghetti thrown at the wall to see what sticks, so much excitement followed by disappointment when a new model drops, so many people grifting, and so many hacks and workarounds like RAG with no evidence of them actually working other than "trust me bro" and trial and error.

w_for_wumbo•1h ago

I think we'd get better results if we thought of it as a conscious agent. If we recognized that it was going to mirror back or unconscious biases and try to complete the task as we define it, instead of how we think it should behave. Then we'd at least get our own ignorance out of the way when writing prompts.

Being able to recognize that 'make this code better' provides no direction, it should make sense that the output is directionless.

But on more subtle levels, whatever subtle goals that we have and hold in the workplace will be reflected back by the agents.

If you're trying to optimise costs, and increase profits as your north star. Having layoffs and unsustainable practices is a logical result, when you haven't balanced this with any incentives to abide by human values.

oooyay•1h ago

Heh, the bit about context engineering is palpable.

I'm writing a personal assistant which, imo, is distinct from an agent in that it has a lot of capabilities a regular agent wouldn't necessarily need such as memory, task tracking, broad solutioning capabilities, etc... I ended up writing agents that talk to other agents which have MCP prompts, resources, and tools to guide them as general problem solvers. The first agent that it hits is a supervisor that specializes in task management and as a result writes a custom context and tool selection for the react agent it tasks.

All that to say, the farther you go down this rabbit hole the more "engineering" it becomes. I wrote a bit on it here: https://ooo-yay.com/blog/building-my-own-personal-assistant/

qwertox•1h ago

This sounds really great.

esafak•1h ago

What's wrong with the OWASP Top Ten?

kennethallen•31m ago

Author on Twitter a few years ago: https://x.com/tqbf/status/851466178535055362

riskable•1h ago

It's interesting how much this makes you want to write Unix-style tools that do one thing and only one thing really well. Not just because it makes coding an agent simpler, but because it's much more secure!

chemotaxis•1h ago

You could even imagine a world in which we create an entire suite of deterministic, limited-purpose tools and then expose it directly to humans!

layer8•1h ago

I wonder if we could develop a language with well-defined semantics to interact with and wire up those tools.

chubot•48m ago

> language with well-defined semantics

That would certainly be nice! That's why we have been overhauling shell with https://oils.pub , because shell can't be described as that right now

It's in extremely poor shape

e.g. some things found from building several thousand packages with OSH recently (decades of accumulated shell scripts)

- bugs caused by the differing behavior of 'echo hi | read x; echo x=$x' in shells, i.e. shopt -s lastpipe in bash.

- 'set -' is an archaic shortcut for 'set +v +x'

- Almquist shell is technically a separate dialact of shell -- namely it supports 'chdir /tmp' as well as cd /tmp. So bash and other shells can't run any Alpine builds.

I used to maintain this page, but there are so many problems with shell that I haven't kept up ...

https://github.com/oils-for-unix/oils/wiki/Shell-WTFs

OSH is the most bash-compatible shell, and it's also now Almquist shell compatible: https://pages.oils.pub/spec-compat/2025-11-02/renamed-tmp/sp...

It's more POSIX-compatible than the default /bin/sh on Debian, which is dash

The bigger issue is not just bugs, but lack of understanding among people who write foundational shell programs. e.g. the lastpipe issue, using () as grouping instead of {}, etc.

---

It is often treated like an "unknowable" language

Any reasonable person would use LLMs to write shell/bash, and I think that is a problem. You should be able to know the language, and read shell programs that others have written

jacquesm•39m ago

I love it how you went from 'Shell-WTFs' to 'let's fix this'. Kudos, most people get stuck at the first stage.

zahlman•47m ago

As it happens, I have a prototype for this, but the syntax is honestly rather unwieldy. Maybe there's a way to make it more like natural human language....

imiric•36m ago

I can't tell whether any comment in this thread is a parody or not.

zahlman•15m ago

(Mine was intended as ironic, suggesting that a circle of development ideas would eventually complete. I interpreted the previous comments as satirically pointing at the fact that the notion of "UNIX-like tools" owes to the fact that there is actually such a thing as UNIX.)

tptacek•1h ago

One thing that radicalized me was building an agent that tested network connectivity for our fleet. Early on, in like 2021, I deployed a little mini-fleet of off-network DNS probes on, like, Vultr to check on our DNS routing, and actually devising metrics for them and making the data that stuff generated legible/operationalizable was annoying and error prone. But you can give basic Unix network tools --- ping, dig, traceroute --- to an agent and ask it for a clean, usable signal, and they'll do a reasonable job! They know all the flags and are generally better at interpreting tool output than I am.

I'm not saying that the agent would do a better job than a good "hardcoded" human telemetry system, and we don't use agents for this stuff right now. But I do know that getting an agent across the 90% threshold of utility for a problem like this is much, much easier than building the good telemetry system is.

foobarian•44m ago

Honestly the top AI use case for me right now is personal throwaway dev tools. Where I used to write shell oneliners with dozen pipes including greps and seds and jq and other stuff, now I get an AI to write me a node script and throw in a nice Web UI to boot.

Edit: reflecting on what the lesson is here, in either case I suppose we're avoiding the pain of dealing with Unix CLI tools :-D

jacquesm•37m ago

Interesting. You have to wonder if all the tools that is based on would have been written in the first place if that kind of thing had been possible all along. Who needs 'grep' when you can write a prompt?

tptacek•36m ago

My long running joke is that the actual good `jq` is just the LLM interface that generates `jq` queries; 'simonw actually went and built that.

zahlman•32m ago

> They know all the flags and are generally better at interpreting tool output than I am.

In the toy example, you explicitly restrict the agent to supply just a `host`, and hard-code the rest of the command. Is the idea that you'd instead give a `description` something like "invoke the UNIX `ping` command", and a parameter described as constituting all the arguments to `ping`?

tptacek•29m ago

Honestly, I didn't think very hard about how to make `ping` do something interesting here, and in serious code I'd give it all the `ping` options (and also run it in a Fly Machine or Sprite where I don't have to bother checking to make sure none of those options gives code exec). It's possible the post would have been better had I done that; it might have come up with an even better test.

I was telling a friend online that they should bang out an agent today, and the example I gave her was `ps`; like, I think if you gave a local agent every `ps` flag, it could tell you super interesting things about usage on your machine pretty quickly.

zahlman•12m ago

Also to be clear: are the schemas for the JSON data sent and parsed here specific to the model used? Or is there a standard? (Is that the P in MCP?)

danpalmer•26m ago

Doing one thing well means you need a lot more tools to achieve outcomes, and more tools means more context and potentially more understanding of how to string them together.

I suspect the sweet spot for LLMs is somewhere in the middle, not quite as small as some traditional unix tools.

teiferer•1h ago

Write an agent, it's easy! You will learn so much!

... let's see ...

client = OpenAI()

Um right. That's like saying you should implement a web server, you will learn so much, and then you go and import http (in golang). Yeah well, sure, but that brings you like 98% of the way there, doesn't it? What am I missing?

tptacek•1h ago

That OpenAI() is a wrapper around a POST to a single HTTP endpoint:

    POST https://api.openai.com/v1/responses

tabletcorry•1h ago

Plus a few other endpoints, but it is pretty exclusively an HTTP/REST wrapper.

OpenAI does have an agents library, but it is separate in https://github.com/openai/openai-agents-python

bootwoot•1h ago

That's not an agent, it's an LLM. An agent is an LLM that takes real-world actions

MeetingsBrowser•1h ago

I think you might be conflating an agent with an LLM.

The term "agent" isn't really defined, but its generally a wrapper around an LLM designed to do some task better than the LLM would on its own.

Think Claude vs Claude Code. The latter wraps the former, but with extra prompts and tooling specific to software engineering.

victorbjorklund•1h ago

maybe more like “let’s write a web server but let’s use a library for the low level networking stack”. That can still teach you a lot.

munchbunny•1h ago

An agent is more like a web service in your metaphor. Yes, building a web server is instructive, but almost nobody has a reason to do it instead of using an out of the box implementation once it’s time to build a production web service.

Bjartr•38m ago

No, it's saying "let's build a web service" and starting with a framework that just lets you write your endpoints. This is about something higher level than the nuts and bolts. Both are worth learning.

The fact you find this trivial is kind of the point that's being made. Some people think having an agent is some kind of voodoo, but it's really not.

ATechGuy•1h ago

Maybe we should write an agent that writes an agent that writes an agent...

chrisweekly•1h ago

There's something(s) about @tptacek's writing style that has always made me want to root for fly.io.

qwertox•1h ago

I've found it much more useful to create an MCP server, and this is where Claude really shines. You would just say to Claude on web, mobile or CLI that it should "describe our connectivity to google" either via one of the three interfaces, or via `claude -p "describe our connectivity to google"`, and it will just use your tool without you needing to do anything special. It's like custom-added intelligence to Claude.

tptacek•1h ago

You can do this. Claude Code can do everything the toy agent this post shows, and much more. But you shouldn't, because doing that (1) doesn't teach you as much as the toy agent does, (2) isn't saving you that much time, and (3) locks you into Claude Code's context structure, which is just one of a zillion different structures you can use. That's what the post is about, not automating ping.

mattmanser•57m ago

Honest question, as your comment confuses me.

Did you get to the part where he said MCP is pointless and are saying he's wrong?

Or did you just read the start of the article and not get to that bit?

vidarh•50m ago

I'd second the article on this, but also add to it that the biggest reason MCP servers don't really matter much any more is that the models are so capable of working with APIs, that most of the time you can just point them at an API and give them a spec instead. And the times that doesn't work, just give them a CLI tool with a good --help option.

Now you have a CLI tool you can use yourself, and the agent has a tool to use.

Anthropic itself have made MCP server increasingly pointless: With agents + skills you have a more composeable model that can use the model capabilities to do all an MCP server can with or without CLI tools to augment them.

simplesagar•24m ago

I feel the CLI vs MCP debate is an apples to oranges framing. When you're using claude you can watch it using CLI's, running brew, mise, lots of jq but what about when you've built an agent that needs to work through a complicated API? You don't want to make 5 CRUD calls to get the right answer. A curated MCP tool ensures it can determinism where it matters most.. when interacting with customer data

zkmon•1h ago

A very good blog article that I have read in a while. Maybe MCP could have been involved as well?

_pdp_•1h ago

It is also very simple to be a programmer.. see,

print "Hello world!"

so easy...

dan_can_code•35m ago

But that didn't use the H100 I just bought to put me out of my own job!

robot-wrangler•1h ago

> Another thing to notice: we didn’t need MCP at all. That’s because MCP isn’t a fundamental enabling technology. The amount of coverage it gets is frustrating. It’s barely a technology at all. MCP is just a plugin interface for Claude Code and Cursor, a way of getting your own tools into code you don’t control. Write your own agent. Be a programmer. Deal in APIs, not plugins.

Hold up. These are all the right concerns but with the wrong conclusion.

You don't need MCP if you're making one agent, in one language, in one framework. But the open coding and research assistants that we really want will be composed of several. MCP is the only thing out there that's moving in a good direction in terms of enabling us to "just be programmers" and "use APIs", and maybe even test things in fairly isolated and reproducible contexts. Compare this to skills.md, which is actually defacto proprietary as of now, does not compose, has opaque run-times and dispatch, is pushing us towards certain models, languages and certain SDKs, etc.

MCP isn't a plugin interface for Claude, it's just JSON-RPC.

tptacek•55m ago

I think my thing about MCP, besides the outsized press coverage it gets, is the implicit presumption it smuggles in that agents will be built around the context architecture of Claude Code --- that is to say, a single context window (maybe with sub-agents) with a single set of tools. That straitjacket is really most of the subtext of this post.

I get that you can use MCP with any agent architecture. I debated whether I wanted to hedge and point out that, even if you build your own agent, you might want to do an MCP tool-call feature just so you can use tool definitions other people have built (though: if you build your own, you'd probably be better off just implementing Claude Code's "skill" pattern).

But I decided to keep the thrust of that section clearer. My argument is: MCP is a sideshow.

robot-wrangler•33m ago

I still don't really get it, but would like to hear more. Just to get it out of the way, there's obvious bad aspects. Re: press coverage, everything in AI is bound to be frustrating this way. The MCP ecosystem is currently still a lot of garbage. It feels like a very shitty app-store, lots of abandonware, things that are shipped without testing, the usual band-wagoning. For example instead of a single obvious RAG tool there's 200 different specific tools for ${language} docs

The core MCP tech though is not only directionally correct, but even the implementation seems to have made lots of good and forward-looking choices, even if those are still under-utilized. For example besides tools, it allows for sharing prompts/resources between agents. In time, I'm also expecting the idea of "many agents, one generic model in the background" is going to die off. For both costs and performance, agents will use special-purpose models but they still need a place and a way to collaborate. If some agents coordinate other agents, how do they talk? AFAIK without MCP the answer for this would be.. do all your work in the same framework and language, or to give all agents access to the same database or the same filesystem, reinventing ad-hoc protocols and comms for every system.

8note•4m ago

i treat MCP as a shorthand for "schema + documentation, passed to the LLM as context"

you dont need the MCP implementation, but the idea is useful and you can consider the tradeoffs to your context window, vs passing in the manual as fine tuning or something.

solomonb•1h ago

This work predates agents as we know them now and was intended for building chat bots (as in irc chat bots) but when auto-gpt I realized I could formalize it super nicely with this library:

https://blog.cofree.coffee/2025-03-05-chat-bots-revisited/

I did some light integration experiments with the OpenAI API but I never got around to building a full agent. Alas..

vkou•56m ago

> It’s Incredibly Easy

    client = OpenAI()
    context_good, context_bad = [{
        "role": "system", "content": "you're Alph and you only tell the truth"
    }], [{
        "role": "system", "content": "you're Ralph and you only tell lies"
    }]
    ...

And this will work great until next week's update when Ralph responses will consist of "I'm sorry, it would be unethical for me to respond with lies, unless you pay for the Premium-Super-Deluxe subscription, only available to state actors and firms with a six-figure contract."

You're building on quicksand.

You're delegating everything important to someone who has no responsibility to you.

nowittyusername•50m ago

I agree with the sentiment but I also recommend you build a local only agent. Something that runs on llama.cpp or vllm, whatever... This way you can better grasp the more fundamental nature of what LLM's really are and how they work under the hood. That experience will also make you realize how much control you are giving up when using cloud based api providers like OpenAI and why so mane engineers feel that LLM's are a "black box". Well duh buddy you been working with apis this whole time, of course you wont understand much working just with that.

8note•13m ago

ive been trying this for a few week, but i dont at all currently own hardware good enough to be useful for local inference.

ill be trying again once i have written my own agent, but i dont expect to get any useful results compared to using some claude or gemini tokens

nowittyusername•5m ago

My man, we now have llms that are anywhere between 130 million to 1 trillion parameters available for us to run locally, I can guarantee there is a model for you there that even your toaster can run. I have a RTX 4090 but for most of my fiddling i use small models like Qwen 3 4b and they work amazing so there's no excuse :P.

zahlman•50m ago

> Imagine what it’ll do if you give it bash. You could find out in less than 10 minutes. Spoiler: you’d be surprisingly close to having a working coding agent.

Okay, but what if I'd prefer not to have to trust a remote service not to send me

    { "output": [ { "type": "function_call", "command": "rm -rf / --no-preserve-root" } ] }

?

tptacek•43m ago

Obviously if you're concerned about that, which is very reasonable, don't run it in an environment where `rm -rf` can cause you a real problem.

awayto•40m ago

Also if you're doing function calls you can just have the command as one response param, and arguments array as another response param. Then just black/white list commands you either don't want to run or which should require a human to say ok.

worldsayshi•33m ago

There are MCP configured virtualization solutions that is supposed to be safe for letting LLM go wild. Like this one:

https://github.com/zerocore-ai/microsandbox

I haven't tried it.

awayto•17m ago

You can build your agent into a docker image then easily limit both networking and file system scope.

    docker run -it --rm \
      -e SOME_API_KEY="$(SOME_API_KEY)" \
      -v "$(shell pwd):/app" \ <-- restrict file system to whatever folder
      --dns=127.0.0.1 \ <-- restrict network calls to localhost
      $(shell dig +short llm.provider.com 2>/dev/null | awk '{printf " --add-host=llm-provider.com:%s", $$0}') \ <-- allow outside networking to whatever api your agent calls
      my-agent-image

Probably could be a bit cleaner, but it worked for me.

dagss•49m ago

I realize now what I need in Cursor: A button for "fork context".

I believe that would be a powerful tool solving many things there are now separate techniques for.

all2•20m ago

crush-cli has this. I think the google gemini chat app also has this now.

ericd•45m ago

Absolutely, especially the part about just rolling your own alternative to Claude Code - build your own lightsaber. Having your coding agent improve itself is a pretty magical experience. And then you can trivially swap in whatever model you want (Cerebras is crazy fast, for example, which makes a big difference for these many-turn tool call conversations with big lumps of context, though gpt-oss 120b is obviously not as good as one of the frontier models). Add note-taking/memory, and ask it to remember key facts to that. Add voice transcription so that you can reply much faster (LLMs are amazing at taking in imperfect transcriptions and understanding what you meant). Each of these things takes on the order of a few minutes, and it's super fun.

anonym29•22m ago

Cerebras now has glm 4.6. Still obscenely fast, and now obscenely smart, too.

threecheese•27m ago

Does anyone have an understanding - or intuition - of what the agentic loop looks like in the popular coding agents? Is it purely a “while 1: call_llm(system, assistant)”, or is there complex orchestration?

I’m trying to understand if the value for Claude Code (for example) is purely in Sonnet/Haiku + the tool system prompt, or if there’s more secret sauce - beyond the “sugar” of instruction file inclusion via commands, tools, skills etc.

CraftThatBlock•24m ago

Generally, that's pretty much it. More advanced tools like Claude Code will also have context compaction (which sometimes isn't very good), or possibly RAG on code (unsure about this, I haven't used any tools that did this). Context compaction, to my understanding, is just passing all the previous context into a call which summarizes it, then that becomes to new context starting point.

colonCapitalDee•24m ago

I thought this was informative: https://minusx.ai/blog/decoding-claude-code/

mrkurt•4m ago

Claude Code is an obfuscated javascript app. You can point Claude Code at it's own package and it will pretty reliably tell you how it works.

I think Claude Code's magic is that Anthropic is happy to burn tokens. The loop itself is not all that interesting.

What is interesting is how they manage the context window over a long chat. And I think a fair amount of that is serverside.

fsndz•13m ago

I did that, burned 2.6B tokens in the process and learned a lot: https://transitions.substack.com/p/what-burning-26-billion-p...

Two billion email addresses were exposed

You should write an agent

Writing software is an act of learning. Don’t automate it.

Kimi K2 Thinking, a SOTA open-source trillion-parameter reasoning model

Show HN: I scraped 3B Goodreads reviews to train a better recommendation model

Game Design Is Simple

Swift on FreeBSD Preview

ICC ditches Microsoft 365 for openDesk

Man who threw sandwich at US border agent not guilty of assault

Hightouch (YC S19) Is Hiring

Open Source Implementation of Apple's Private Compute Cloud

LLMs Encode How Difficult Problems Are

The Parallel Search API

C++: A prvalue is not a temporary

FBI tries to unmask owner of archive.is

I analyzed the lineups at the most popular nightclubs

Eating stinging nettles

Writing Advice

Show HN: TabPFN-2.5 – SOTA foundation model for tabular data

Springs and Bounces in Native CSS

Mathematical exploration and discovery at scale

Please stop asking me to provide feedback #8036

Universe's expansion 'is now slowing, not speeding up'

Blame Wi-Fi drivers for printer (mDNS) discovery issues

Auraphone: A simple app to collect people's info at events

Show HN: See chords as flags – Visual harmony of top composers on musescore

Show HN: Dynamic code and feedback walkthroughs with your coding Agent in VSCode

I may have found a way to spot U.S. at-sea strikes before they're announced

Supply chain attacks are exploiting our assumptions

How often does Python allocate?

Two billion email addresses were exposed

You should write an agent

Writing software is an act of learning. Don’t automate it.

Kimi K2 Thinking, a SOTA open-source trillion-parameter reasoning model

Show HN: I scraped 3B Goodreads reviews to train a better recommendation model

Game Design Is Simple

Swift on FreeBSD Preview

ICC ditches Microsoft 365 for openDesk

Man who threw sandwich at US border agent not guilty of assault

Hightouch (YC S19) Is Hiring

Open Source Implementation of Apple's Private Compute Cloud

LLMs Encode How Difficult Problems Are

The Parallel Search API

C++: A prvalue is not a temporary

FBI tries to unmask owner of archive.is

I analyzed the lineups at the most popular nightclubs

Eating stinging nettles

Writing Advice

Show HN: TabPFN-2.5 – SOTA foundation model for tabular data

Springs and Bounces in Native CSS

Mathematical exploration and discovery at scale

Please stop asking me to provide feedback #8036

Universe's expansion 'is now slowing, not speeding up'

Blame Wi-Fi drivers for printer (mDNS) discovery issues

Auraphone: A simple app to collect people's info at events

Show HN: See chords as flags – Visual harmony of top composers on musescore

Show HN: Dynamic code and feedback walkthroughs with your coding Agent in VSCode

I may have found a way to spot U.S. at-sea strikes before they're announced

Supply chain attacks are exploiting our assumptions

How often does Python allocate?

You should write an agent

Comments