Building Effective AI Agents

https://www.anthropic.com/engineering/building-effective-agents

140•Anon84•3h ago

Comments

spenczar5•2h ago

(December 2024, which somehow feels an eternity ago)

nahsra•2h ago

Yes, but it's held up really well in my opinion! I use this piece constantly as a reference and I don't feel it's aged. It reframed Anthropic as "the practical partner" in the development of AI tools.

simonw•2h ago

This article remains one of the better pieces on this topic, especially since it clearly defines which definition of "AI agents" they are using at the start! They use: "systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks".

I also like the way they distinguish between "agents" and "workflows", and describe a bunch of useful workflow patterns.

I published some notes on that article when it first came out: https://simonwillison.net/2024/Dec/20/building-effective-age...

A more recent article from Anthropic is https://www.anthropic.com/engineering/built-multi-agent-rese... - "How we built our multi-agent research system". I found this one fascinating, I wrote up a bunch of notes on it here: https://simonwillison.net/2025/Jun/14/multi-agent-research-s...

juddlyon•2h ago

Thank you for the extra notes, this is top of mind for me.

smoyer•1h ago

The article on the multi-agent research is awesome. I do disagree with one statement in the building effective AI agents article - building your initial system without a framework sounds nice as an educational endeavor but the first benefit you get from a good framework is the easy ability to try out different (and cross-vendor) LLMs

gregorymichael•2h ago

One of my favorite AI How-tos in the last year. Barry and Erik spend 80% of the post saying ~”eh, you probably don’t need agents. Just build straightforward deterministic workflows with if-statements instead.”

And then, when you actually do need agents, don’t over complicate it!

This post also introduced the concept of an Augmented LLM — a LLM hooked up to tools, memory, data — which is a useful abstraction for evolving LLM use beyond fancy autocomplete.

“An augmented LLM running in a loop” is the best definition of an agent I’ve heard so far.

suyash•2h ago

I think the Agent hype has come down now

kevinventullo•1h ago

Now it’s all about AI Agencies

revskill•2h ago

So an agent is just a monoid in the category of monads ?

AvAn12•1h ago

How do agents deal with task queueing, race conditions, and other issues arising from concurrency? I see lots of cool articles about building workflows of multiple agents - plus what feels like hand-waving around declaring an orchestrator agent to oversee the whole thing. And my mind goes to whether there needs to be some serious design considerations and clever glue code. Or does it all work automagically?

cmsparks•1h ago

Frankly, it's pretty difficult. Though, I've found that the actor model maps really well onto building agents. An instance of an actor = an instance of an agent. Agent to agent communication is just tool calling (via MCP or some other RPC)

I use Cloudflare's Durable Objects (disclaimer: I'm biased, I work on MCP + Agent things @ Cloudflare). However, I figure building agents probably maps similarly well onto any actor style framework.

pyman•1h ago

Should the people developing AI agent protocols be exploring decentralised architectures, using technologies like blockchain and peer-to-peer networks to distribute models and data? What are the trade-offs of relying on centralised orchestration platforms owned by large companies like Amazon, Cloudfare or NVIDIA? Thanks

daxfohl•13m ago

That's more of a hobbyist thing I'd say. Corporations developing these things will of course want to use some centralized system that they trust. It's more efficient, they have more control over it, it's easier for average people to use, etc.

A decentralized thing would be more for individuals who want more control and transparency. A decentralized public ledger would make it possible to verify that your agent, the agents it interacts with, and the contents of their interactions have not been altered or compromised in any way, whereas a corporate-owned framework could not provide the same level of assurance.

But technically, there's no advantage I can think of for using a public distributed ledger to manage interactions. Agent tasks are pretty ephemeral, so unlike digital currency, there's not really a need to maintain a complete historical log of every action forever. And as far as providing tools for dealing with race conditions, blockchain would be about the least efficient way of creating a mutex imaginable. So technically, just like with non-AI apps, cetralized architecture is always going to be a lot more efficient.

simonw•1h ago

The standard for "agents" is that tools run in sequence, so no need to worry about concurrency. Several models support parallel tool calls now where the model can say "Run these three tools" and your harness can chose to run them in parallel or sequentially before passing the results back to the model as the next step in the conversation.

Anthropic are leaning more into multi-agent setups where the parent agent might delegate to one or more sub-agents which might run in parallel. They use that trick for Claude Code - I have some notes on reverse-engineering that here https://simonwillison.net/2025/Jun/2/claude-trace/ - and expand on that in their write-up of how Claude Research works: https://simonwillison.net/2025/Jun/14/multi-agent-research-s...

It's still _very_ early in figuring out good patterns for LLM tool-use - the models only got really great at using tools in about the past 6 months, so there's plenty to be discovered about how best to orchestrate them.

svachalek•19m ago

I'm not sure we're at "great" yet. Gemini 2.5 pro fails maybe 50% of the time for me at even generating a syntactically successful tool call.

gk1•1h ago

In at least the case for coding agents the emerging pattern is to have the agents use containers for isolating work and git for reviewing and merging that work neatly.

See for example the container use MCP which combines both: https://github.com/dagger/container-use

That’s for parallelizing coding work… I’m not sure about other kinds of work. I still see people using workflow builder tools like n8n, Zapier, and maybe CrewAI.

daxfohl•1h ago

Nothing works automagically. You still have to build in all the operational characteristics that you would for any traditional system. It's deceptively easy to look at some AI agent demos and think "oh, I can replace my team's huge mess of spaghetti code with a few clever AI prompts!" And it may even work for the first couple use cases. But all that code is there for a reason, and eventually it'll have to be reckoned with. Once you get to the point where you're translating all that code directly into the AI prompt and hoping for no hallucinations, you know you've lost the plot.

nurettin•1h ago

If I had to deal with "AI agent concurrency", I would get them to submit their requests to a queue and process those sequentially.

0x457•29m ago

I can only talk about Codex web interface, I had a very detailed refactoring plan for a project it was too long to complete in one go, so used "ask" feature to split it up into multiple task and group them by "which tasks can be executed concurrently".

It split them up in a way they would be split up in real life, but in real life there is an assumption that people working on tasks going to communicate with each other. The way it generates tasks resulted in HUGE loss of context (my plan was hella detailed).

I was willing to spend a few more hours trying to make it work rather than doing the work myself. I've opened another chat and split it up into multiple sequential tasks, with a detailed prompt for each task (why, what, how, validation, update documentation reminder etc).

Anyway, orchestrator might work on some super simple tasks, much smaller tasks than those articles make you believe.

rdedev•22m ago

This is why I am leaning towards making the llm generate code that calls operates on took calls instead of having everything in JSON.

Huggingfaces's smolagents library makes the llm generate python code where tools are just normal python functions. If you want parallel tools calls just prompt the llm to do so. It should take care of synchronizing everything. Ofcourse there is the whole issue around executing llm generated code but we have a few solutions for that

deadbabe•1h ago

When an AI agents completes a task, why not have the AI agent save the workflow used to accomplish that task so the next time it sees a similar input it feeds it to a predefined series of tools to avoid any LLM decision making in between tool calls?

And then eventually, with enough sample inputs, create simple functions that can recognize what tools should be used to process a type of input? And only fallback to an LLM agent if the input is novel?

0x457•17m ago

You somewhat can do this. I use neo4j as a knowledge database for agents, and it has processes and tasks described.

iLoveOncall•46m ago

> These frameworks make it easy to get started by simplifying standard low-level tasks like calling LLMs, defining and parsing tools, and chaining calls together. However, they often create extra layers of abstraction that can obscure the underlying prompts and responses, making them harder to debug. They can also make it tempting to add complexity when a simpler setup would suffice.

> We suggest that developers start by using LLM APIs directly

Best advice of the whole article by far.

It's insane that people use whole frameworks to send what is essentially an array of strings to a webservice.

We've removed LangChain and LangGraph from our project at work because they are literally worthless, just adding complexity and making you write MORE code than if you didn't use them because you have to deal with their whole boilerplate.

btbuildem•44m ago

> use simple, composable patterns

It's somehow incredibly reassuring that the "do one thing and do it well" maxim has held up over decades. Composability ftw.

chaosprint•27m ago

Half a year has passed, and it feels like a long time in the field of AI. I read this article repeatedly a few months ago, but now I think the development of Agent has obviously reached a bottleneck. Even the latest gemini seems to have regressed.

m3kw9•24m ago

They have hard time solving prompt issues injection and that’s a one of the bottle necks

EGreg•13m ago

What exactly makes them regress?

Why can’t they just fork swarms of themselves, work 24/7 in parallel, check work and keep advancing?

amelius•11m ago

Because they are not intelligent. (And this is a good definition of it).

mellosouls•15m ago

Discussed at the time:

https://news.ycombinator.com/item?id=42470541

Building Effective "Agents", 763 points, 124 comments

Mustiolo v0.5.0 is now available

Show HN: MCP Kit – a toolkit for building, mocking and optimizing AI agents

Picking Apart AMD's AI Accelerator Forecasts for Fun and Budgets

The teen who filmed the Air India crash video the world saw

Linguistic Evidence Suggests Xiōng-Nú and Huns Spoke the Same Language

Learning (the basics of) nftables – Evan Pratten

Bots are overwhelming websites with their hunger for AI data

Corvela AI, building the next generation of smart AI-powered systems

The Right Chemistry: How Jean Harlow became a 'platinum blond' (2020)

Show HN: Superscan – Visualize filetree for filesystem, gdrive, S3 buckets etc.

Claude Code Plan Mode

30 Years of APOD (Astronomy Photo of the Day) in One Pic

How Italian Futurism Influenced Fashion

Bringing Boulder Dash back for a new generation

The World's Most Challenging Puzzle

Front-facing brake lights could significantly prevent road crashes

The obesity trap, and common misconceptions about obesity

Governments are ditching Windows and Microsoft Office

Linux 6.17 Looks Like It Could Go Ahead and Make SMP Support Unconditional

When Red Buttons Aren't Enough

Streets.gl 3D OpenStreetMap

Honda Unexpectedly Enters the Space Race with First Successful Rocket Launch

Dear LLM companies – Obscure source plagiarism is NOT creativity

Zed – Agentic Engineering

Venice activists plan to disrupt Jeff Bezos's wedding

Apple: Prepare your network for quantum-secure encryption in TLS

Psylo: A New Kind of Private Web Browser

Salesforce Updates Slack Pricing to Expand Access to AI, Agentforce, and CRM

Billionaires are not like us

Let's Teach an LLM to Write a New Programming Language