frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Components of a Coding Agent

https://magazine.sebastianraschka.com/p/components-of-a-coding-agent
59•MindGods•4h ago

Comments

armcat•2h ago
I still find it incredible at the power that was unleashed by surrounding an LLM with a simple state machine, and giving it access to bash
esafak•1h ago
Tools gave humans the edge over other animals.
Yokohiii•11m ago
And those tools regularly burnt cities to ashes. Took a long time to get it under control.
stanleykm•1h ago
unfortunately all the agent cli makers have decided that simply giving it access to bash is not enough. instead we need to jam every possible functionality we can imagine into a javascript “TUI”.
HarHarVeryFunny•42m ago
If all you want is a program that calls the model in a loop and offers a bash tool, then ask Claude Code to build that. You won't like it though!

For a preview of what it'd be like, just tell your AI chat app that you'll run bash commands for it, and please change the app in your "current directory" to "sort the output before printing it", or some such request.

Yokohiii•13m ago
I think you get him wrong? He is already concerned about "bash on steroids" and current tools add concerning amounts of steroids to everything.
stanleykm•10m ago
i did.. and thats what i use. obviously its a little more than just a tool that calls bash but it is considerably less than whatever they are doing in coding agents now.
senko•9m ago
Claude Code with Opus 4.6 regularly uses sed for multi-line edits, in my experience. On top of it, Pi is famously only exposing 4 tools, which is not just Bash, but far more constrained than CCs 57 or so tools.

So, yes, it can work.

HarHarVeryFunny•1h ago
At it's heart it's prompt/context engineering. The model has a lot of knowledge baked into it, but how do you get it out (and make it actionable for a semi-autonomous agent)? ... you craft the context to guide generation and maintain state (still interacting with a stateless LLM), and provide (as part of context) skills/tools to "narrow" model output into tool calls to inspect and modify the code base.

I suspect that more could be done in terms of translating semi-naive user requests into the steps that a senior developer would take to enact them, maybe including the tools needed to do so.

It's interesting that the author believes that the best open source models may already be good enough to complete with the best closed source ones with an optimized agent and maybe a bit of fine tuning. I guess the bar isn't really being able to match the SOTA model, but being close to competent human level - it's a fixed bar, not a moving one. Adding more developer expertise by having the agent translate/augment the users request/intent into execution steps would certainly seem to have potential to lower the bar of what the model needs to be capable of one-shotting from the raw prompt.

Yokohiii•1h ago
That is why I am currently looking into building my own simple, heavily isolated coding agent. The bloat is already scary, but the bad decisions should make everyone shiver. Ten years ago people would rant endlessly about things with more then one edge, that requires a glimpse of responsibility to use. Now everyone seems to be either in panic or hype mode, ignoring all good advice just to stay somehow relevant in a chaotic timeline.
MrScruff•1h ago
> This is speculative, but I suspect that if we dropped one of the latest, most capable open-weight LLMs, such as GLM-5, into a similar harness, it could likely perform on par with GPT-5.4 in Codex or Claude Opus 4.6 in Claude Code.

Unless I'm misunderstanding what's being described here, running Claude Code with different backend models is pretty common.

https://docs.z.ai/scenario-example/develop-tools/claude

It doesn't perform on par with Anthropic's models in my experience.

kamikazeturtles•1h ago
> It doesn't perform on par with Anthropic's models in my experience.

Why do you think that is the case? Is Anthropic's models just better or do they train the models to somehow work better with the harness?

MrScruff•1h ago
It's a good question, I've wondered that myself. I haven't used GLM-5 with CC but I've used GLM-4.7 a fair amount, often swapping back and forth with Sonnet/Opus. The difference is fairly obvious - on occasions I've mistakenly left GLM enabled running when I thought I was using Sonnet, and could tell pretty quickly just based on the gap in problem solving ability.
mmargenot•1h ago
It is more common now to improve models in agentic systems "in the loop" with reinforcement learning. Anthropic is [very likely] doing this in the backend to systematically improve the performance of their models specifically with their tools. I've done this with Goose at Block with more classic post-training approaches because it was before RL really hit the mainstream as an approach for this.

If you want to look at some of the tooling and process for this, check out verifiers (https://github.com/PrimeIntellect-ai/verifiers), hermes (https://github.com/nousresearch/hermes-agent) and accompanying trace datasets (https://huggingface.co/datasets/kai-os/carnice-glm5-hermes-t...), and other open source tools and harnesses.

esafak•1h ago
They're just dumber. I've used plenty of models. The harness is not nearly as important.
vidarh•12m ago
The harness if anything matters more with those other models because of how much dumber they are... You can compensate for some of the stupidity (but by no means all) with harnesses that tries to compensate in ways that e.g. Claude Code does not because it isn't necessary to do so for Anthropics own models.
crustycoder•1h ago
A timely link - I've just spent the last week failing to get a ChatGPT Skill to produce a reproducible management reporting workflow. I've figured out why and this article pretty much confirms my conclusions about the strengths & weaknesses of "pure" LLMS, and how to work around them. This article is for a slightly different problem domain, but the general problems and architecture needed to address them seem very similar.
beshrkayali•1h ago
> long contexts are still expensive and can also introduce additional noise (if there is a lot of irrelevant info)

I think spec-driven generation is the antithesis of chat-style coding for this reason. With tools like Claude Code, you are the one tracking what was already built, what interfaces exist, and why something was generated a certain way.

I built Ossature[1] around the opposite model. You write specs describing behavior, it audits them for gaps and contradictions before any code is written, then produces a build plan toml where each task declares exactly which spec sections and upstream files it needs. The LLM never sees more than that, and there is no accumulated conversation history to drift from. Every prompt and response is saved to disk, so traceability is built in rather than something you reconstruct by scrolling back through a chat. I used it over the last couple of days to build a CHIP-8 emulator entirely from specs[2]. I have some more example projects on GitHub[3]

1: https://github.com/ossature/ossature

2: https://github.com/beshrkayali/chomp8

3: https://github.com/ossature/ossature-examples

Yokohiii•32m ago
I like it a lot, I find the chat driven workflow very tiring and a lot of information gets lost in translation until LLMs just refuse to be useful.

How does the human intervention work out? Do you use a mix of spec and audit editing to get into the ready to generate state? How high is the success/error rate if you generate from tasks to code, do LLMs forget/mess up things or does it feel better?

The spec driven approach is potentially better for writing things from scratch, do you have any plans for existing code?

peterm4•14m ago
This looks great, and I’ve bookmarked to give it a go.

Any reason you’ve opted for custom markdown formats with the @ syntax rather than using something like frontmatter?

Very conscious that this would prevent any markdown rendering in github etc.

Yokohiii•1h ago
The example is really lean and straightforward. I don't use coding agents, but this is some good overview and should help everyone to understand that coding agents may have sophisticated outcomes, but the raw interaction isn't magical at all.

It's also a good example that you can turn any useful code component that requires 1k LOC into a mess of 500k LOC.

Show HN: A game where you build a GPU

https://jaso1024.com/mvidia/
98•Jaso1024•1h ago•16 comments

12,000 AI-generated blog posts added in a single commit

https://github.com/OneUptime/blog/commit/30cd2384794c897d95aca77d173db44af51ca849
64•noslop•1h ago•36 comments

Simple self-distillation improves code generation

https://arxiv.org/abs/2604.01193
368•Anon84•7h ago•112 comments

Show HN: TurboQuant-WASM – Google's vector quantization in the browser

https://github.com/teamchong/turboquant-wasm
52•teamchong•3h ago•2 comments

Tell HN: Anthropic no longer allowing Claude Code subscriptions to use OpenClaw

935•firloop•19h ago•722 comments

Some Unusual Trees

https://thoughts.wyounas.com/p/some-unusual-trees
172•simplegeek•8h ago•52 comments

Apple approves driver that lets Nvidia eGPUs work with Arm Macs

https://www.theverge.com/tech/907003/apple-approves-driver-that-lets-nvidia-egpus-work-with-arm-macs
79•naves•1h ago•16 comments

Show HN: sllm – Split a GPU node with other developers, unlimited tokens

https://sllm.cloud
32•jrandolf•2h ago•21 comments

Author of "Careless People" banned from saying anything negative about Meta

https://www.thetimes.com/uk/technology-uk/article/sarah-wynn-williams-careless-people-meta-nrffdfpmf
350•macleginn•3h ago•246 comments

Components of a Coding Agent

https://magazine.sebastianraschka.com/p/components-of-a-coding-agent
59•MindGods•4h ago•21 comments

Artemis II crew take “spectacular” image of Earth

https://www.bbc.com/news/articles/ce8jzr423p9o
947•andsoitis•22h ago•319 comments

Iran's Network of Cameras Bolsters Air Defenses, Expert Says

https://www.wsj.com/livecoverage/iran-war-news-2026/card/iran-s-network-of-cameras-bolsters-air-d...
19•uxhacker•49m ago•0 comments

The Indie Internet Index – submit your favorite sites

https://iii.social
17•freshman_dev•3h ago•1 comments

The Cathedral, the Bazaar, and the Winchester Mystery House

https://www.dbreunig.com/2026/03/26/winchester-mystery-house.html
107•dbreunig•3d ago•43 comments

Training mRNA Language Models Across 25 Species for $165

54•maziyar•2d ago•17 comments

Electrical Transformer Manufacturing Is Throttling the Electrified Future

https://www.bloomberg.com/features/2025-bottlenecks-transformers/
17•toomuchtodo•2d ago•4 comments

Mbodi AI (YC P25) Is Hiring

https://www.ycombinator.com/companies/mbodi-ai/jobs/mf9L3sy-senior-robotics-engineer-systems-cont...
1•chitianhao•5h ago

Claude Code Found a Linux Vulnerability Hidden for 23 Years

https://mtlynch.io/claude-code-found-linux-vulnerability/
239•eichin•18h ago•149 comments

What life looks like on the most remote inhabited island

https://apps.npr.org/life-on-tristan-da-cunha/
31•brightbeige•1h ago•5 comments

The most-disliked people in the publishing industry

https://www.woman-of-letters.com/p/the-most-disliked-people-in-the-publishing
69•Caiero•3d ago•23 comments

OpenClaw privilege escalation vulnerability

https://nvd.nist.gov/vuln/detail/CVE-2026-33579
483•kykeonaut•1d ago•222 comments

iNaturalist

https://www.inaturalist.org/
498•bookofjoe•1d ago•118 comments

When legal sports betting surges, so do Americans' financial problems

https://www.npr.org/2026/04/04/nx-s1-5773354/legal-sports-betting-research-credit-bankruptcy
30•pseudolus•2h ago•14 comments

Herbie: Automatically improve imprecise floating point formulas

https://herbie.uwplse.org/doc/latest/tutorial.html
183•summarity•4d ago•33 comments

Run Linux containers on Android, no root required

https://github.com/ExTV/Podroid
196•politelemon•19h ago•70 comments

Astronomers Find a Third Galaxy Missing Its Dark Matter

https://www.universetoday.com/articles/astronomers-find-a-third-galaxy-missing-its-dark-matter-va...
8•gostsamo•41m ago•1 comments

Why the Most Valuable Things You Know Are Things You Cannot Say

https://deadneurons.substack.com/p/why-the-most-valuable-things-you
9•nr378•1h ago•1 comments

The smallest ELF executable (2021)

https://nathanotterness.com/2021/10/tiny_elf_modernized.html
31•michelangelo•3d ago•1 comments

Improving my focus by giving up my big monitor

https://ounapuu.ee/posts/2026/04/01/focus/
164•Fudgel•3d ago•160 comments

We replaced RAG with a virtual filesystem for our AI documentation assistant

https://www.mintlify.com/blog/how-we-built-a-virtual-filesystem-for-our-assistant
375•denssumesh•1d ago•144 comments