Components of a Coding Agent

https://magazine.sebastianraschka.com/p/components-of-a-coding-agent

59•MindGods•4h ago

Comments

armcat•2h ago

I still find it incredible at the power that was unleashed by surrounding an LLM with a simple state machine, and giving it access to bash

esafak•1h ago

Tools gave humans the edge over other animals.

Yokohiii•11m ago

And those tools regularly burnt cities to ashes. Took a long time to get it under control.

stanleykm•1h ago

unfortunately all the agent cli makers have decided that simply giving it access to bash is not enough. instead we need to jam every possible functionality we can imagine into a javascript “TUI”.

HarHarVeryFunny•42m ago

If all you want is a program that calls the model in a loop and offers a bash tool, then ask Claude Code to build that. You won't like it though!

For a preview of what it'd be like, just tell your AI chat app that you'll run bash commands for it, and please change the app in your "current directory" to "sort the output before printing it", or some such request.

Yokohiii•13m ago

I think you get him wrong? He is already concerned about "bash on steroids" and current tools add concerning amounts of steroids to everything.

stanleykm•10m ago

i did.. and thats what i use. obviously its a little more than just a tool that calls bash but it is considerably less than whatever they are doing in coding agents now.

senko•9m ago

Claude Code with Opus 4.6 regularly uses sed for multi-line edits, in my experience. On top of it, Pi is famously only exposing 4 tools, which is not just Bash, but far more constrained than CCs 57 or so tools.

So, yes, it can work.

HarHarVeryFunny•1h ago

At it's heart it's prompt/context engineering. The model has a lot of knowledge baked into it, but how do you get it out (and make it actionable for a semi-autonomous agent)? ... you craft the context to guide generation and maintain state (still interacting with a stateless LLM), and provide (as part of context) skills/tools to "narrow" model output into tool calls to inspect and modify the code base.

I suspect that more could be done in terms of translating semi-naive user requests into the steps that a senior developer would take to enact them, maybe including the tools needed to do so.

It's interesting that the author believes that the best open source models may already be good enough to complete with the best closed source ones with an optimized agent and maybe a bit of fine tuning. I guess the bar isn't really being able to match the SOTA model, but being close to competent human level - it's a fixed bar, not a moving one. Adding more developer expertise by having the agent translate/augment the users request/intent into execution steps would certainly seem to have potential to lower the bar of what the model needs to be capable of one-shotting from the raw prompt.

Yokohiii•1h ago

That is why I am currently looking into building my own simple, heavily isolated coding agent. The bloat is already scary, but the bad decisions should make everyone shiver. Ten years ago people would rant endlessly about things with more then one edge, that requires a glimpse of responsibility to use. Now everyone seems to be either in panic or hype mode, ignoring all good advice just to stay somehow relevant in a chaotic timeline.

MrScruff•1h ago

> This is speculative, but I suspect that if we dropped one of the latest, most capable open-weight LLMs, such as GLM-5, into a similar harness, it could likely perform on par with GPT-5.4 in Codex or Claude Opus 4.6 in Claude Code.

Unless I'm misunderstanding what's being described here, running Claude Code with different backend models is pretty common.

https://docs.z.ai/scenario-example/develop-tools/claude

It doesn't perform on par with Anthropic's models in my experience.

kamikazeturtles•1h ago

> It doesn't perform on par with Anthropic's models in my experience.

Why do you think that is the case? Is Anthropic's models just better or do they train the models to somehow work better with the harness?

MrScruff•1h ago

It's a good question, I've wondered that myself. I haven't used GLM-5 with CC but I've used GLM-4.7 a fair amount, often swapping back and forth with Sonnet/Opus. The difference is fairly obvious - on occasions I've mistakenly left GLM enabled running when I thought I was using Sonnet, and could tell pretty quickly just based on the gap in problem solving ability.

mmargenot•1h ago

It is more common now to improve models in agentic systems "in the loop" with reinforcement learning. Anthropic is [very likely] doing this in the backend to systematically improve the performance of their models specifically with their tools. I've done this with Goose at Block with more classic post-training approaches because it was before RL really hit the mainstream as an approach for this.

If you want to look at some of the tooling and process for this, check out verifiers (https://github.com/PrimeIntellect-ai/verifiers), hermes (https://github.com/nousresearch/hermes-agent) and accompanying trace datasets (https://huggingface.co/datasets/kai-os/carnice-glm5-hermes-t...), and other open source tools and harnesses.

esafak•1h ago

They're just dumber. I've used plenty of models. The harness is not nearly as important.

vidarh•12m ago

The harness if anything matters more with those other models because of how much dumber they are... You can compensate for some of the stupidity (but by no means all) with harnesses that tries to compensate in ways that e.g. Claude Code does not because it isn't necessary to do so for Anthropics own models.

crustycoder•1h ago

A timely link - I've just spent the last week failing to get a ChatGPT Skill to produce a reproducible management reporting workflow. I've figured out why and this article pretty much confirms my conclusions about the strengths & weaknesses of "pure" LLMS, and how to work around them. This article is for a slightly different problem domain, but the general problems and architecture needed to address them seem very similar.

beshrkayali•1h ago

> long contexts are still expensive and can also introduce additional noise (if there is a lot of irrelevant info)

I think spec-driven generation is the antithesis of chat-style coding for this reason. With tools like Claude Code, you are the one tracking what was already built, what interfaces exist, and why something was generated a certain way.

I built Ossature[1] around the opposite model. You write specs describing behavior, it audits them for gaps and contradictions before any code is written, then produces a build plan toml where each task declares exactly which spec sections and upstream files it needs. The LLM never sees more than that, and there is no accumulated conversation history to drift from. Every prompt and response is saved to disk, so traceability is built in rather than something you reconstruct by scrolling back through a chat. I used it over the last couple of days to build a CHIP-8 emulator entirely from specs[2]. I have some more example projects on GitHub[3]

1: https://github.com/ossature/ossature

2: https://github.com/beshrkayali/chomp8

3: https://github.com/ossature/ossature-examples

Yokohiii•32m ago

I like it a lot, I find the chat driven workflow very tiring and a lot of information gets lost in translation until LLMs just refuse to be useful.

How does the human intervention work out? Do you use a mix of spec and audit editing to get into the ready to generate state? How high is the success/error rate if you generate from tasks to code, do LLMs forget/mess up things or does it feel better?

The spec driven approach is potentially better for writing things from scratch, do you have any plans for existing code?

peterm4•14m ago

This looks great, and I’ve bookmarked to give it a go.

Any reason you’ve opted for custom markdown formats with the @ syntax rather than using something like frontmatter?

Very conscious that this would prevent any markdown rendering in github etc.

Yokohiii•1h ago

The example is really lean and straightforward. I don't use coding agents, but this is some good overview and should help everyone to understand that coding agents may have sophisticated outcomes, but the raw interaction isn't magical at all.

It's also a good example that you can turn any useful code component that requires 1k LOC into a mess of 500k LOC.

Show HN: A game where you build a GPU

12,000 AI-generated blog posts added in a single commit

Simple self-distillation improves code generation

Show HN: TurboQuant-WASM – Google's vector quantization in the browser

Tell HN: Anthropic no longer allowing Claude Code subscriptions to use OpenClaw

Some Unusual Trees

Apple approves driver that lets Nvidia eGPUs work with Arm Macs

Show HN: sllm – Split a GPU node with other developers, unlimited tokens

Author of "Careless People" banned from saying anything negative about Meta

Components of a Coding Agent

Artemis II crew take “spectacular” image of Earth

Iran's Network of Cameras Bolsters Air Defenses, Expert Says

The Indie Internet Index – submit your favorite sites

The Cathedral, the Bazaar, and the Winchester Mystery House

Training mRNA Language Models Across 25 Species for $165

Electrical Transformer Manufacturing Is Throttling the Electrified Future

Mbodi AI (YC P25) Is Hiring

Claude Code Found a Linux Vulnerability Hidden for 23 Years

What life looks like on the most remote inhabited island

The most-disliked people in the publishing industry

OpenClaw privilege escalation vulnerability

iNaturalist

When legal sports betting surges, so do Americans' financial problems

Herbie: Automatically improve imprecise floating point formulas

Run Linux containers on Android, no root required

Astronomers Find a Third Galaxy Missing Its Dark Matter

Why the Most Valuable Things You Know Are Things You Cannot Say

The smallest ELF executable (2021)

Improving my focus by giving up my big monitor

We replaced RAG with a virtual filesystem for our AI documentation assistant

Components of a Coding Agent

Comments

Show HN: A game where you build a GPU

12,000 AI-generated blog posts added in a single commit

Simple self-distillation improves code generation

Show HN: TurboQuant-WASM – Google's vector quantization in the browser

Tell HN: Anthropic no longer allowing Claude Code subscriptions to use OpenClaw

Some Unusual Trees

Apple approves driver that lets Nvidia eGPUs work with Arm Macs

Show HN: sllm – Split a GPU node with other developers, unlimited tokens

Author of "Careless People" banned from saying anything negative about Meta

Components of a Coding Agent

Artemis II crew take “spectacular” image of Earth

Iran's Network of Cameras Bolsters Air Defenses, Expert Says

The Indie Internet Index – submit your favorite sites

The Cathedral, the Bazaar, and the Winchester Mystery House

Training mRNA Language Models Across 25 Species for $165

Electrical Transformer Manufacturing Is Throttling the Electrified Future

Mbodi AI (YC P25) Is Hiring

Claude Code Found a Linux Vulnerability Hidden for 23 Years

What life looks like on the most remote inhabited island

The most-disliked people in the publishing industry

OpenClaw privilege escalation vulnerability

iNaturalist

When legal sports betting surges, so do Americans' financial problems

Herbie: Automatically improve imprecise floating point formulas

Run Linux containers on Android, no root required

Astronomers Find a Third Galaxy Missing Its Dark Matter

Why the Most Valuable Things You Know Are Things You Cannot Say

The smallest ELF executable (2021)

Improving my focus by giving up my big monitor

We replaced RAG with a virtual filesystem for our AI documentation assistant