Agentic Coding Recommendations

https://lucumr.pocoo.org/2025/6/12/agentic-coding/

259•rednafi•23h ago

Comments

benob•23h ago

> This is not an advertisment for Claude Code. It's just the agent I use at the moment. What else is there? Alternatives that are similar in their user experiences are OpenCode, goose, Codex and many others. There is also Devin and Cursor's background agents but they work a bit different in that they run in the cloud.

What do you recommand to get a Claude-code-like experience in the open-source + local llm ecosystem?

the_mitsuhiko•23h ago

> What do you recommand to get a Claude-code-like experience in the open-source + local llm ecosystem?

There is nothing at the moment that I would recommend. However I'm quite convinced that we will see this soon. First of all I quite like where SST's OpenCode is going. The upcoming UX looks really good. Secondly because having that in place, will make it quite easy to put local models in when they get better. The issue really is that there are just not enough good models for tool usage yet. Sonnet is so shockingly good because it was trained for excellent tool usage. Even Gemini does not come close yet.

This is all just a question of time though.

hucker•22h ago

Have you tried aider, and if so, how is it lacking compared to Claude Code in your opinion?

the_mitsuhiko•22h ago

I only tried aider with hosted models and it's too expensive compared to Claude Code so I did not give it a real proper try.

aitchnyu•22h ago

How is it more expensive?

the_mitsuhiko•22h ago

You pay for aider with per-token pricing. Claude Code comes with a flatrate that gives you deep discounts.

BeetleB•16h ago

Really depends on the type of coding you plan to do and how much.

The amusing thing is people normally recommend using aider to save money. With Aider, you can control the size of the context window, and selectively add/drop files from the window. I typically aim for under 25K tokens at a time. With Gemini, that's about 3 cents per prompt (and often much less when I have only, say, 10 tokens). So for me, I'd need to do well over 3000 coding prompts a month to get to $100. I simply don't use it that much.

Also, at work, I have Copilot, and one can use Aider with that. So I only pay for my personal coding at home.

Getting to the original question - Aider probably lags Claude Code significantly at this point. It's a fantastic tool and I still use it - primarily because it is editor agnostic. But some of the other tools out there do a lot more with agents.

To give you an idea - my combined AI use - including for non-code purposes - is well under $20/mo. Under $10 for most months. I simply don't have that much time to do coding in my free time - even with an AI doing it!

CuriouslyC•22h ago

Aider is worth some tinkering for slightly different reasons than Claude Code.

I find agents do a lot of derpy shit for hard problems but when you've got fairly straightforward things to build it's nice to just spin them up, let them rip and walk away.

Aider feels more like pair programming with an agent, it can kind of be spun up and let rip, but mostly it tries to keep a tighter feedback loop with the user and stay more user directed, which is really powerful when working on challenging things. For stuff like codebase refactors, documentation passes, etc that tight loop feels like overkill though.

yroc92•23h ago

I’m also interested to hear ideas for this.

CuriouslyC•22h ago

Aider is almost there, in fact it's intentionally "not" there. You can set it up to do things like run test/static analysis automatically and fix errors, and work with it to get a to-do list set up so the entire project is spec'd out, then just keep prompting it with "continue...". It has a hard coded reflection limit of 3 iterations right now, but that can also be hacked to whatever you want. The only thing missing for full agentic behavior is built in self prompting behavior.

diggan•17h ago

> The only thing missing for full agentic behavior is built in self prompting behavior.

Correct me if I'm wrong, but Aider still doesn't do proper tool calling? Last time I tried it, they did it the "old school" way of parsing out unix shell commands from the output text and ran it once the response finished streaming, instead of the sort of tool call/response stuff we have today.

sandinmyjoints•15h ago

I think this is still the case. There are some open issues around this. I am surprised they have not moved forward more. I find Aider hugely useful, but would like the opportunity to try out MCP with it.

CuriouslyC•15h ago

Yeah, tool/mcp integration isn't great with Aider out of the box.

Karrot_Kream•15h ago

There's an open PR for MCP integration (actually 2 PRs but one has more community consensus around it) with a Priority label on it but it hasn't been merged yet. Hopefully soon.

saint_yossarian•22h ago

The Neovim plugin CodeCompanion is currently moving into a more agentic direction, it already supports an auto-submit loop with builtin tools and MCP integration.

Yes it's not a standalone CLI tool, but IMHO I'd rather have a full editor available at all times, especially one that's so hackable and lightweight.

mickeyp•22h ago

Shameful plug: my upcoming app perhaps?

Single-file download, fuss-free and install-less that runs on mac, windows and linux (+ docker of course.) It can run any model that talks to openai (which is nearly all of them), so it'll work with the big guys' models and of course other ones like ones you run privately or on localhost.

Unlike Claude Code, which is very good, this one runs in your browser with a local app server to do the heavy lifting. A console app could be written to use this self-same server, too, of course (but that's not priority #1) but you do get a lot of nice benefits that you get for free from a browser.

One other advantage, vis-a-vis Armin's blog post, is that this one can "peek" into terminals that you _explicitly_ start through the service.

It's presently in closed alpha, but I want to open it up to more people to use. If you're interested, you and anyone else who is interested can ping me by email -- see my profile.

elpocko•22h ago

>run any model that talks to openai (which is nearly all of them)

What does that mean? I've never seen any locally run model talk to OpenAI, how and why would they? Do you mean running an inference server that provides an OpenAI-compatible API?

mickeyp•22h ago

Sorry, to clarify: OpenAI has an specification for their API endpoints that most vendors are compatible with or have adopted wholesale.

So, if your model inference server understands the REST API spec that OpenAI created way back, you can use a huge range of libraries that in theory only "work" with OpenAI.

diggan•17h ago

> OpenAI has an specification for their API endpoints that most vendors are compatible with or have adopted wholesale

Worth clarifying that what the ecosystem/vendors have adopted is the "ChatCompletion" endpoint, which most models are under. But newer models (like codex) are only available under the Responses API, which the ecosystem/vendors haven't adopted as widely, AFAIK.

gk1•20h ago

I see a new alternative (or attempt at one) come out every few days, so it shouldn't be long before we have "the one" alternative.

https://www.app.build/ was just launched by the Neon -- err, Databricks -- team and looks promising.

fvdessen•23h ago

I find it excellent news that all the techniques that make agentic coding more efficient also make human coding more efficient. There was a worry that code would become big mud balls that only AI understand, but it looks like the opposite. Clear code is important for AI productivity, so it now matters even more, because the difference of productivity is immediately and objectively measurable. Before AIs what code was well factored or not was largely a matter of opinion. Now you can say; look how better Claude works on codebase A vs codebase B, and present your case with numbers.

v5v3•23h ago

"There was a worry that code would become big mud balls that only AI understand, but it looks like the opposite."

For now...

soulofmischief•22h ago

As long as interfaces are well defined, comprehensive tests are written, memory is safely handled and time complexity is analyzable, who cares what the rest of the code looks like.

I understand programming for the sake of programming, chasing purity and really digging into the creative aspects of coding. But I get that same kick out of writing perfect interfaces, knowing that the messier the code underneath is, the more my beautiful interface gets to shine. But transformers are offering us a way to build faster, to create more, and to take on bigger complexity while learning deeply about new domains as we go. I think the more we lean into that, we might enter a software golden age where the potential for creativity and impact can enter a whole new level.

iDont17•18h ago

Yeh, exactly. Code doesn’t matter. Correct and stable electrical states matter.

Energy based models and machines that boot strap from models, organize their state to a prompt are on their way. The analog hole for coders is closing.

Most software out there is the layers of made up tools and such to manage and deploy software. We’ll save a lot of cycles pruning it all for generic patterns.

5-10 more years it’s all hardware again. Then no longer need to program a computer like it’s 1970.

bluefirebrand•16h ago

> As long as interfaces are well defined, comprehensive tests are written, memory is safely handled and time complexity is analyzable, who cares what the rest of the code looks like

The thing is, code that does all of the things you listed here is good looking code almost by definition

If AI was anywhere near capable of producing this quality then it would be so thrilling, wouldn't it?

But it's not. The consensus seems to be pretty universal that AI code is Junior to Intermediate quality at best, the majority of the time

That generally isn't code that satisfies the list of quality criteria you mentioned

Leynos•16h ago

Do you give the agent a style guide?

Do you perform (or have another agent perform) code reviews on the agent's code?

Do you discuss architecture and approach with the agent beforehand and compile that discussion into a design and development plan?

If you don't do these things, then you're just setting yourself up for failure.

bluefirebrand•16h ago

Ah yes the tried and true "you're holding it wrong"

By the time I did all the stuff you're suggesting, I could just build the damn thing myself

Leynos•14h ago

Would you skip these guardrails when working with a "Junior to Intermediate" developer?

bluefirebrand•13h ago

No, but the goal of mentoring a Junior or Intermediate developer is that they eventually learn this stuff on their own. The value of helping another human grow is worth the tradeoff

AI is a tool, not a human. I'm not about to invest in it the way I would a Junior developer. If the tool doesn't do the job, it's not a good tool. If a tool requires the same level of investment that a human does it's also not a good tool

jacobr1•10h ago

These factors are now getting baked into the tools. Primarily via prompt engineering, and what might be called "agentic design," which is how much complexity you put into a single pass vs some hierarchical layering of agents and tools with distinct jobs.

I've been deobfuscating claude code to watch their prompts evolve as I use it and you an see the difference around how and when it chooses to re-analyze a codebase or how it will explicitly breakup work into steps. A lot of the implicit knowledge of software engineering is being added, _outside_ of the LLM training.

soulofmischief•9h ago

But you are holding it wrong, and experience is the difference in being able to recognize when oft-repeated advice should actually be seriously considered, vs dismissing entire swaths of engineers who are holding it right.

Do you know how tiring it gets to constantly engage with people who complain about agentic workflows without actually having the experience or knowledge to properly evaluate them? These people already have intentionally closed their minds on the subject, but still love to debate it loudly and frequently even though they have little intention of actually considering others' arguments or advice. What could be an educational moment turns into ideological warfare reminiscent of the text editor or operating system wars.

It's beginning to get infuriating, because of the unbridled arrogance typically encountered by naysayers.

skydhash•15h ago

That’s just the hallmark of good software engineering or do you think everyone else was cowboy coding things with Vim before?

Leynos•14h ago

Seemingly so going by the sibling comment to yours.

ath3nd•10h ago

> As long as interfaces are well defined, comprehensive tests are written, memory is safely handled and time complexity is analyzable, who cares what the rest of the code looks like.

Software engineers and anyone who'd get hired to fix the mess that the LLM created. Also, ironically, other LLMs would probably work better on...not messy code.

> But transformers are offering us a way to build faster, to create more, and to take on bigger complexity

Wow. You sound as if building faster or creating more is synonymous with quality or utility. Or as if LLMs allow us to take on a bigger level of complexity (this is where they notoriously crumble).

> we might enter a software golden age where the potential for creativity

I haven't heard of a single (good) software engineer whose creativity was stifled by their inability to code something. Is an LLM generating a whole book in Hemingway style considered creative, or a poem? Or a program/app?

diggan•21h ago

> There was a worry that code would become big mud balls

That's always been a worry with programming (see basically all Rich Hickey talks), and is still a problem since people prefer "moving fast today" instead of "not having 10 tons of technical debt tomorrow"

LLMs makes it even easier for people to spend the entire day producing boilerplate without stopping for a second to rethink why they are producing so much boilerplate. If the pain goes away, why fix it?

jnwatson•17h ago

Literally less than an hour ago, I reviewed a bunch of LLM-generated boilerplate. I then told the agent to show me a plan to refactor it. I suggested some tweaks, and then it implemented the plan and then tested that it didn't break anything.

It isn't much different than dealing with an extremely precocious junior engineer.

Given how easy it is to refactor now, it certainly makes economic sense to delay it.

diggan•17h ago

But I'm guessing you're doing those refactors because you know they're ultimately worth it, because you have experience programming since before LLMs?

Like I know boilerplate and messy code sucks because I've had to work with it, without LLMs, and I know how much it sucks. I think you do too, but I think we know that, because we had to fight with it in the past.

danielbln•16h ago

Right now devs around the world are pushing ungodly amounts of reinforcement learning data into the big AI labs. There is no reason to believe these models won't handle this stuff themselves and our priors will become a useless relic.

jacobr1•10h ago

LLMs start to bog down with it at a certain point too. For a couple of my side projects, I decided to let things rip and now worry about code structure at all. After a certain point some of the changes I wanted to make just started either failing work racking up large bills. It would try to make a change, run tests, realize it broke something somewhere else, try to fix that, cause another issue. Undo the original thing, fix the new issue, maybe try to refactor, partially, fail, revert that, decide to make the tests pass by removing the tests! and then keep some broken version of fix. With a few similar cycles of that on repeat as well.

Deliberately telling it how to rethink the structure, refactor first, then seperate out components fixed things.

IF LLMs stayed at the current level, I would expect llm-aided-coders to learn how to analyze and address situations like this. However, I do expect models to be better able to A) avoid these kinds of situations with better design up front or reflection when making changes and B) identify more systematic patterns and reason about the right way to structure things. Basically ambiently detecting "code smells."

You can already see improvements both from newer models and from prompt engineering coming from the agentic tools.

dimal•12h ago

The type of person that would do that would have done the same thing without LLMs. LLMs don’t change anything except now they can just create their big ball of mud faster.

The pain of shitty code doesn’t go away. They can ship your crappy MVP faster, but technical debt doesn’t magically go away.

This is an awesome opportunity for those people to start learning how to do software design instead of just “programming”. People that don’t are going to be left behind.

physicles•16h ago

I was struck by this too. Good error messages, fast tools, stable ecosystems, simple code without magic, straight SQL… it’s what I always want. Maybe agents will be what raises the bar for dev experience, simply because they work so quickly that every slowdown matters.

petesergeant•22h ago

Randomly, my advice: don't sleep on this.

Three or four weeks ago I was posting how LLMs were useful for one-off questions but I wouldn't trust them on my codebase. Then I spent my week's holiday messing around on them for some personal projects. I am now a fairly committed Roo user. There are lots of problems, but there is incredible value here.

Try it and see if you're still a hold-out.

phito•22h ago

I will definitely sleep on agents. Normal LLM use, fine, but I am not giving up reasoning.

bananapub•21h ago

this is kind of a weird position to take. you're the captain, you're the person reviewing the code the LLM (agent or not) generates, you're the one asking for the code you want, you're in charge of deciding how much effort to put in to things, and especially which things are most worth your effort.

all this agent stuff sounded stupid to me until I tried it out in the last few weeks, and personally, it's been great - I give a not-that-detailed explanation for what I want, point it at the existing code and get back a patch to review once I'm done making my coffee. sometimes it's fine to just apply, sometimes I don't like a variable name or whatever, sometimes it doesn't fit in with the other stuff so I get it to try again, sometimes (<< 10% of the time) it's crap. the experience is pretty much like being a senior dev with a bunch of very eager juniors who read very fast.

anyway, obviously do whatever you want, but deriding something you've not looked in to isn't a hugely thoughtful process for adapting to a changing world.

phito•21h ago

If I have to review all code code it's writing, I'd rather write it myself (maybe with the help of an LLM).

> anyway, obviously do whatever you want, but deriding something you've not looked in to isn't a hugely thoughtful process for adapting to a changing world.

I have tried it. Not sure I want to be part of such world, unfortunately.

> the experience is pretty much like being a senior dev with a bunch of very eager juniors who read very fast.

I... don't want that. Juniors just slow me down because I have to check what they did and fix their mistakes.

(this is in the context of professional software development, not making scripts, tinkering etc)

BeetleB•15h ago

> I... don't want that. Juniors just slow me down because I have to check what they did and fix their mistakes.

> (this is in the context of professional software development, not making scripts, tinkering etc)

I understand the sentiment. A few months ago they wanted us to move fast and dumped us (originally 2 developers) with 4 new people who have very little real world coding experience. Not fun, and very stressful.

However, keep in mind that in many workplaces, handling junior devs poorly means one of two things:

1. If you have some abstruse domain expertise, and it's OK that only 1-2 people work on it, you'll be relegated to doing that. Sadly, most workplaces don't have such tasks.

2. You'll be fantastic in your output. Your managers will like you. But they will not promote you. After some point, they expect you to be a leverage multiplier - if you can get others to code really well, the overall team productivity will exceed that of any superstar (and no, I don't believe 10x programmers exist in most workplaces).

simonw•19h ago

What's your definition of "agents" there?

BeetleB•15h ago

> Normal LLM use, fine, but I am not giving up reasoning.

Ouch! Reminds me of:

- I'm never going to use cell phones. I care about voice quality (me decades ago)

- I'm never going to use VoIP. I care about voice quality (everyone but me 2 decades ago).

- I'm never going to use a calculator. I am not going to give up on reasoning.

- I'm never going to let my kids play with <random other ethnicity>. I care about good manners.

https://en.wikipedia.org/wiki/False_dilemma

phito•2h ago

Sure, keep slowly offloading more and more of your brain to technology. Until you won't be needed anymore.

vultour•21h ago

I spent a good part of yesterday attempting to use ChatGPT to help me choose an appropriate API gateway. Over and over it suggested things that literally do not exist, and the only reason I could tell was that I spent a good amount of time in the actual documentation. This has been my experience roughly 80% of the time when trying to use an LLM. I would like to know what is the magical prompt engineering technique that makes it stop confidently hallucinating about literally everything.

yunwal•20h ago

Did you try giving it the docs to read?

petesergeant•20h ago

Sure, this was exactly how I felt three weeks ago, and I could have written that comment myself. The agentic approach where it works out it made something up by looking at the errors the type-check generates is what makes the difference.

spacechild1•20h ago

I'm having a very good experience with ChatGPT at the moment. I'm mostly using it for little tasks where I don't remember the exact library functions. Examples:

"C++ question: how do I get the unqualified local system time and turn into an ISO time string?"

"Python question: how do I serialize a C struct over a TCP socket with asyncio?"

"JS question: how do I dynamically show/hide an HTML element?" (I obviously don't write a lot of JS :-D)

ChatGPT gave me the correct answers on the first try. I have been a sceptic, but I'm now totally sold on AI assisted coding, at least as a replacement for Google and StackOverflow. For me there is no point anymore in wading through all the blog spam and SEO crap just to find a piece of information. Stack Overflow is still occasionally useful, but the writing is on the wall...

EDIT: Important caveat: stay critical! I have been playing around asking ChatGPT more complex questions where I actually know the correct answer resp. where I can immediately spot mistakes. It sometimes gives me answers that would look correct to a non-expert, but are hilariously wrong.

vultour•18h ago

The problem with this approach is that you might lose important context which is present in the documentation but doesn’t surface through the LLM. As an example, I just asked GPT-4o how to access Nth character in a string in Go. Predictably, it answered str[n]. This is a wildly dangerous suggestion because it works correctly for ASCII but not for other UTF8 characters. Sure, if you know about this and prompt it further it tells you about this limitation, but that’s not what 99% of people will do.

spacechild1•12h ago

> The problem with this approach is that you might lose important context which is present in the documentation but doesn’t surface through the LLM.

Oh, I'm definitely aware of that! I mostly do this with things I have already done, but can't remember all the details. If the LLM shows me something new, I check the official documentation. I'm not into vibe coding :) I still want to understand every line of code I write.

simonw•19h ago

Which model did you use?

I find using o3 or o4-mini and prompting "use your search tool" works great for having it perform research tasks like this.

I don't trust GPT-4o to run searches.

BeetleB•15h ago

> I spent a good part of yesterday attempting to use ChatGPT to help me choose an appropriate API gateway.

If you mean the ChatGPT interface, I suspect you're headed in the wrong direction.

Try Aider, with API interface. You can use whatever model you like (as you're paying per token). See my other comment:

https://news.ycombinator.com/item?id=44259900

I mirror the GP's sentiment. My initial attempts using a chat like interface were poor. Then some months ago, due to many HN comments, I decided to give Aider a try. I had put my kid to bed and it was 10:45pm. My goal was "Let me just figure out how to install Aider and play with it for a few minutes - I'll do the real coding tomorrow." 15 minutes later, not only had I installed it, my script was done. There was one bug I had to fix myself. It was production quality code, too.

I was hooked. Even though I was done, I decided to add logging, command line arguments, etc. An hour later, it was a production grade script, with a very nice interface and excellent logging.

Oh, and this was a one-off script. I'll run it once and never again. Now all my one-off scripts have excellent logging, because it's almost free.

There was no going back. For small scripts that I've always wanted to write, AI is the way to go. That script had literally been in my head for years. It was not a challenging task - but it had always been low in my priority list. How many ideas do you have in your head that you'll never get around to because of lack of time. Well, now you can do 5x more of those than you would have without AI.

geoka9•7h ago

Just wanted to add to your post with my anecdote.

I was at the "script epiphany" stage a few months ago and I got cool Bash scripts (with far more bells and whistles I would normally implement) just by iterating with Claude via its web interface.

Right now I'm at the "Gemini (with Aider) is pretty good for knock-offs of the already existing functionality" stage (in a Go/HTMX codebase).

I'm yet to get to the "wow, that thing can add brand new functionality using code I'm happy with just by clever context management and prompting" stage; but I'm definitely looking forward to it.

Leynos•15h ago

Did you use search grounding? O3 or o4-mini-high with search grounding (which will usually come on by default with questions like this) are usually the best option.

EdwardDiego•22h ago

"Write the simplest code you can, so the dumb AI can understand it" isn't the massive sell I was expecting.

I wonder how that interacts with his previous post?

https://lucumr.pocoo.org/2025/2/20/ugly-code/

horsawlarway•20h ago

Honestly, I find this approach to be useful pretty much anytime you're working with other people as well.

There are absolutely times to be extremely focused and clever with your code, but they should be rare and tightly tied to your business value.

Most code should be "blindingly obvious" whenever possible.

The limit on developers isn't "characters I can type per minute" it's "concepts I can hold in my head."

The more of those there are... The slower you will move.

Don't create more interfaces over the existing ones, don't abstract early, feel free to duplicate and copy liberally, glue stuff together obviously (even if it's more code, or feels ugly), declare the relevant stuff locally, stick with simple patterns in the docs, don't be clever.

You will write better code. Code shouldn't be pretty, it should be obvious. It should feel boring, because the hard part should be making the product not the codebase.

yuri91•22h ago

So using agents forces (or at least nudges) you to use go and tailwind, because they are simple enough (and abundant in the training data) for the AI to use correctly.

Does this mean that eventually in a world where we all use this stuff, no new language/framework/library will ever be able to emerge?

Competing with the existing alternatives will be too hard. You won't even be able to ask real humans for help on platforms like StackOverflow because they will be dead soon.

energy123•22h ago

With maturing synthetic data pipelines, can't they just take one base llm and fine tune it for 20 different niches, and allow user to access the niche with a string parameter in the API call? Even if a new version of a language released only yesterday, they could quickly generate enough synthetic training data to bake in the new syntax for that niche, and roll it out.

pelagicAustral•22h ago

My best results have been with Ruby/Rails and either vanilla Bootstrap, or something like Tabler UI, Tailwind seems to be fine as well, but I'm still not a fan of the verbosity.

With a stable enough boilerplate you can come up with outstanding results in a few hours. Truly production ready stuff for small size apps.

cpursley•21h ago

How are you getting results when Ruby has no type system? That seems like where half the value of LLM coding agents are (dumping in type errors and it solving them).

diggan•21h ago

Bunch of unit, functional and E2E tests, just like before LLMs :) Haven't tried with Ruby specifically but works well with JavaScript and other dynamic languages so should work fine with Ruby too.

owebmaster•21h ago

I wonder if people that loves typescript never wrote tests and that is why they are so fascinated with types for dynamic languages. I guess they have never been really productive.

BoiledCabbage•16h ago

Or even better, what if you could automate writing half or more of your unit test, and ensure they run not just out of band, but on ever build?

And even better rather than have them off in some far away location annotate the code itself so the tests will be updated with the code.

That's pretty impressive and someone would have to be short sighted to feel the false productivity of constantly manually implementing what a computer can automatically do for them.

Not to mention how much better if you work on any actual large scale systems with true cross team dependencies and not trivial code bases that get thrown away every few years where it almost doesn't matter how you write it.

e40•22h ago

Speaking of which, anyone had success using these tools for coding Common Lisp?

mark_h•22h ago

Joe Marshall had a couple of posts about... No: https://funcall.blogspot.com/2025/05/vibe-coding-common-lisp...

telotortium•16h ago

Vibe coding Common Lisp could probably work well with additional tool support. Even a good documentation lookup and search tool, exposed in an AGENTS.md file, could significantly improve the problem Joe ran into of having the code generate bogus symbols. If you provide a small MCP server or other tool to introspect a running image containing your application, it could be even better.

LLMs can handle the syntax of basically any language, but the library knowledge is significantly improved by having a larger corpus of code than Common Lisp tends to have publicly available.

fhd2•21h ago

Agents no, LLMs yes. Not for generating code per se, but for answering questions. Common Lisp doesn't seem to have a strong influx of n00bs like me, and even though there's pretty excellent documentation, I find it sometimes hard to know what I'm looking for. LLMs definitely helped me a few times by answering my n00b questions I would have otherwise had to ask online.

diggan•21h ago

Not CL specifically but works well with Clojure and fits better than non-lisp languages (imo) once you give the LLM direct access to the repl

Lapel2742•22h ago

> Does this mean that eventually in a world where we all use this stuff, no new language/framework/library will ever be able to emerge?

If you truly believe in the potential of agentic AI, then the logical conclusion is that programming languages will become the assembly languages of the 21st century. This may or may not become the unfortunate reality.

fhd2•21h ago

I'd bet money that in less than six months, there'll be some buzz around a "programming language for agents".

Whether that's going to make sense, I have some doubts, but as you say: For an LLM optimist, it's the logical conclusion. Code wouldn't need to be optimised for humans to read or modify, but for models, and natural language is a bit of an unnecessary layer in that vision.

Personally I'm not an LLM optimist, so I think the popular stack will remain focused on humans. Perhaps tilting a bit more towards readability and less towards typing efficiency, but many existing programming languages, tools and frameworks already optimise for that.

Tomte•22h ago

I’m wondering whether we may see programming languages that are either unreadable to humans or at least designed towards use by LLMs.

energy123•21h ago

Yes, and an efficient tokenizer designed only for that language. As the ratio of synthetic data to human data grows this will become more plausible.

temp0826•18h ago

LLM as a frontend to LLVM IR maybe.

PeterStuer•22h ago

A traditional digital stack's lifecycle is:

1. The previous gen has become bloated and complex because it widened it's scope to cover every possible miche scenario and got infiltrated by 'expert' language and framework specialists that went on an atrotecture binge.

2. As a result a new stack is born, much simpler, back to basics, than the poorly aged encumbant. It doesn't cover every niche, but it does a few newly popular things realy easy and well, and rises on the coattails of this new thing as the default envoronment for it.

3. Over time the new stack ages just as poorly as the old stack for all the same reasons. So the cycle repeats.

I do not see this changing with ai-assisted coding, as context enrichment is getting better allowing a full stack specification in post training.

bluefirebrand•16h ago

> It doesn't cover every niche, but it does a few newly popular things realy easy and well, and rises on the coattails of this new thing as the default envoronment for it

How will it ever rise on the coattails of anything if it isn't in the AI training data so no one is ever incentivized to use it to begin with?

jacobr1•10h ago

AI legible documentation. If you optimize for a "1-pager" doc you can add to the context of an LLM and that is all it needs to know to use your package or framework ... people will use it if has some kind non-technical advantage. deepwiki.com is sorta an attempt to automate doing something like this.

uncircle•22h ago

> Does this mean that eventually in a world where we all use this stuff, no new language/framework/library will ever be able to emerge?

That's a very good question.

Rephrased: as good training data will diminish exponentially with the Internet being inundated by LLM regurgitations, will "AI savvy" coders prefer old, boring languages and tech because there's more low-radiation training data from the pre-LLM era?

The most popular language/framework combination in early 2020s is JavaScript/React. It'll be the new COBOL, but you won't need an expensive consultant to maintain in the 2100s because LLMs can do it for you.

Corollary: to escape the AI craze, let's keep inventing new languages. Lisps with pervasive macro usage and custom DSLs will be safe until actual AGIs that can macroexpand better than you.

NitpickLawyer•21h ago

> Rephrased: as good training data will diminish exponentially with the Internet being inundated by LLM regurgitations

I don't think the premise is accurate in this specific case.

First, if anything, training data for newer libs can only increase. Presumably code reaches github in a "at least it compiles" state. So you have lots of people fight the AIs and push code that at least compiles. You can then filter for the newer libs and train on that.

Second, pre-training is already mostly solved. The pudding seems to be now in post-training. And for coding a lot of post-training is done with RL / other unsupervised techniques. You get enough signals from using generate -> check loops that you can do that reliably.

The idea that "we're running out of data" is way too overblown IMO, especially considering the last ~6mo-1y advances we've seen so far. Keep in mind that the better your "generation" pipeline becomes, the better will later models be. And the current "agentic" loop based systems are getting pretty darn good.

bluefirebrand•16h ago

> First, if anything, training data for newer libs can only increase.

How?

Presumably in the "every coder is using AI assistants" future, it will be an incredible amount of friction to get people to adopt languages that AI assistants don't know anything about

So how does the training data for a new language get made, if no programmers are using the language, because the AI tools that all programmers rely on aren't trained on the language?

The snake eating its own tail

NitpickLawyer•15h ago

You can code today with new libs, you just need to tell the model what to use. Things like context7 work, or downloading docs, llms.txt or any other thing that will pop up in the future. The idea that LLMs can only generate what they were trained on is like 3 years old. They can do pretty neat things with stuff in context today.

bluefirebrand•15h ago

The context would have to be massive in order to ingest an entire new programming language and associated design patterns, best practices and such wouldn't it?

I'm not an expert here by any means but I'm not seeing how this makes much sense versus just using languages that the LLM is already trained on

Leynos•15h ago

Synthetic training data presumably.

sampo•22h ago

> no new language/framework/library will ever be able to emerge?

Here is a Youtube video that makes the same argument. React is / will be the last Javascript framework, because it is the dominant one right now. Even of people publish new frameworks, LLM coding assistants will not be able to assist coding using the new frameworks, so the new frameworks will not find users or popularity.

And even for React, it will be difficult to add any more new features, because LLMs only assist to write code that uses the features the LLMs know about, which are the old, established ways to write React.

https://www.youtube.com/watch?v=P1FLEnKZTAE

diggan•21h ago

> LLM coding assistants will not be able to assist coding using the new frameworks

Why not? When my coding agent discovers that they used the wrong API or used the right API wrong, it digs up the dependency source on disk (works at least with Rust and with JavaScript) and looks up the new details.

I also have it use my own private libraries the same way, and those are not in any training data guaranteed.

I guess if whatever platform/software you use doesn't have tool calling youre kind of right, but also missing something kind of commonplace today.

6bb32646d83d•15h ago

My theory is that it will not be the case.

New frameworks can be created, but they will be different from before:

- AI-friendly syntax, AI-friendly error handling

- Before being released, we will have to spend hundred of millions of token of agents reading the framework and writing documentation and working example code with it, basically creating the dataset that other AI can reference when using the new framework.

- Create a way to have that documentation/example code easily available for AI agents (via MCP or new paradigm)

koonsolo•22h ago

If AI really takes over coding, programming languages will be handled the same way we currently handle assembly code.

Right now languages are the interface between human and computer. When LLM's would take over, their ideal programming language is probably less verbose than what we are currently using. Maybe keywords could become 1 token long, etc. Just some quick thoughts here :D.

chuckadams•22h ago

> So using agents forces (or at least nudges) you to use go and tailwind

Not even close, and the article betrays the author's biases more than anything else. The fact that their Claude Code (with Sonnet) setup has issues with the `cargo test` cli for instance is hardly a categorical issue with AIs or cargo, let alone rust in general. Junie can't seem to use its built-in test runner tool on PHP tests either, that doesn't mean AI has a problem with PHP. I just wrote a `bin/test-php` script for it to use instead, and it figures out it has to use that (telling it so in the guidelines helps, but it still keeps trying to use its built-in tool first)

As for SO, my AI assistant doesn't close my questions as duplicates. I appreciate what SO is trying to do in terms of curation, but the approach to it has driven people away in droves.

rolisz•21h ago

I tried Junie in PyCharm and it had big problems with running tests or even using the virtual environment set up in PyCharm for that project.

You'd expect more from the company that is developing both the IDE and the AI agent...

chuckadams•18h ago

JB's product strategy is baffling. The AI assistant is way more featureful, but it's a lousy agent. Junie is pretty much only good as an agent, but it's hardwired to one model, doesn't support MCP, but does have a whole lot of internal tools ... which it can't seem to use reliably. They really need to work on having just one good AI product that does it all.

I really liked Augment, except for its piggish UI. Then they revealed the price tag, and back to Junie I went.

dist-epoch•21h ago

As an example, XML is suddenly cool again, because LLMs love it.

furyofantares•17h ago

> Does this mean that eventually in a world where we all use this stuff, no new language/framework/library will ever be able to emerge?

I highly doubt it. These things excel at translation.

Even without training data, if you have an idiosyncratic-but-straightforward API or framework, they pick it up no problem just looking at the codebase. I know this from experience with my own idiosyncratic C# framework that no training data has ever seen, that the LLM is excellent at writing code against.

I think something like Rust lifetimes would have a harder time getting off the ground in a world where everyone expects LLM coding to work off the bat. But something like Go would have an easy time.

Even with the Rust example though, maybe the developers of something that new would have to take LLMs into consideration, in design choices, tooling choices, or documentation choices, and it would be fine.

bluehatbrit•17h ago

Just yesterday I gave Claude (via Zed) a project brief and a fresh elixir phoenix project. It had 0 problems. It did opt for tailwind for the css, but phoenix already sets it up when using `mix phx.new` so that's probably why.

I don't buy that it pushes you into using Go at all. If anything I'd say they push you towards Python a lot of the time when asking it random questions with no additional context.

The elixir community is probably only a fraction of the size of Go or Python, but I've never had any issues with getting it to use it.

rramon•22h ago

Elixir looks like agood choice as well, folks have recorded a session building a Phoenix web app with Claude Code and it went quite well for them: https://youtu.be/V2b6QCPgFTk

agos•20h ago

in the same vein, the closing keynote at this years' ElixirConf EU featured agents building web apps: https://www.youtube.com/watch?v=ojL_VHc4gLk

reedf1•22h ago

I have seen multiple articles pushing for GO as agentic language of choice; does anyone else feel like this is quite forced? I have tried agentic coding in several languages and I didn't have a particularly good or productive experience with GO.

decide1000•22h ago

I don't agree with the author on the GO thing. I've created agents that work 24/7 on GH issues for me, in Rust, Python and PHP. I use Claude (api). The result overall is very good. When I wake up there is always a fresh PR waiting for me.

I don't like the word "agent" because it is not a blind LLM, small or fast script. It is a complex workflow with many checks and prompting before writing a single line of code. That's also the key to AI powered development; context.

the_mitsuhiko•22h ago

> I've created agents that work 24/7 on GH issues for me, in Rust, Python and PHP. I use Claude (api). The result overall is very good.

It's quite possible it's a case of holding things wrong but I think at least the basic evaluation I did that made me come to the conclusion that Go works particularly well isn't too bad. I just get results that I feel good with quicker than with Rust and Python. FWIW I also had really good results with PHP on the level of Go too, it's just overall a stack that does not cater too well to my problem.

agos•20h ago

it's especially forced if you are making frontend or mobile apps

haiku2077•6h ago

Or ML apps, since Go doesn't have native GPU or CUDA support. You can do it via CGO but it's far messier than pure Go.

sensanaty•9h ago

All of this is forced, yes. There's probably a trillion dollars riding on this all not imploding, so we're getting it shoved everywhere, all the time, incessantly.

My own experience with "Agents" (and no, I am not a luddite) has been nothing short of comical in how terrible it's been. We try it every single day at our company. We've tried all the advice. All the common wisdom. They have never, not once, produced anything of any value. All the public showcases of these brilliant "agents" have also been nothing short of spectacular failures [1]. Yet despite all this, I keep seeing these type of posts, and pretty much always it's from someone with a vested interest of some kind when you dig deep down enough. All the managerial types pushing it, you look deep enough it's always because the board or investors or whatever other parasite has a vested interest.

I know one thing is for certain, what AI will give us is more and more fucking advertisements shoved into every facet of our lives, except now it sorta talks like a human!

[1] https://news.ycombinator.com/item?id=44050152

ceving•22h ago

When I read "Avoid inheritance" in a text about Go, I can't help but get the impression that the text also comes from Claude.

the_mitsuhiko•22h ago

This text is not about go, it's about agentic coding. I have and am using this across different languages. On this project (which is a go backend) I still have TypeScript in the frontend and I have some Python based tasks too. The rules apply universally.

Keats•22h ago

I've been trying Claude Code with Sonnet 4.0 for a week or so now for Rust code but it feels really underwhelming (and expensive since it's via Bedrock right now). Everytime it's doing something it's missing half despite spending a lot of time planning at the beginning of the session. What am I missing?

exfalso•22h ago

Exact same experience. I have no clue what other people are doing. I was hunting for use cases where it could be used and it kept not working. I don't get it.

energy123•22h ago

Is it only Rust that you've had this experience with or is it a general thing?

exfalso•21h ago

Also tried it with Python. The autocomplete there was ok-ish(although to me the "wait for it -> review suggested code" cycle is a bit too slow), but getting it to code even standalone well-defined functions was a clear failure. I spent more time trying to fix prompts than it took to write the functions in the first place.

andyferris•21h ago

I had been trying with Rust, but after this article I think I might change tack and attempt a project in Go...

Keats•21h ago

I'm not sure if it's Rust related. It manages to write the Rust code just fine, it's just that it doesn't seem to

- think of everything that is needed for a feature (fixable via planning at the beginning)

- actually follow that plan correctly

I just tried with a slightly big refactor to see if some changes would improve performance. I had it write the full plan and baseline benchmarks to disk, then let it go in yolo mode. When it was done it only implemented something like half of the planning phases and was saying the results look good despite all the benchmarks having regressed.

bitwize•20h ago

I've encountered human Rust programmers that exhibit this behavior. I think that Rust may be falling victim to its own propaganda, as developers come to believe that the borrow checker will fix their logic errors, not just flag up memory management that's at risk of buffer overflow or UAF.

furyofantares•17h ago

I get a ton of value out of Claude Code but you just listed a lot of things I've found LLMs/agents not very good at.

- Rust

- Big refactors

- Performance improvements

- Yolo-mode, especially if you aren't skilled yet at prompting and knowing which things the LLM will do well and which will need supervision

bananapub•21h ago

it shouldn't be expensive - you can pay for Pro ($20/month) or Max ($100 or $200/month) to get what would cost >> $1000/month in API costs.

andyferris•21h ago

Can you use Claude Code with Pro? I was trying to figure this out and I thought you couldn't (unless you enter an API key and pay for tokens).

bananapub•21h ago

https://news.ycombinator.com/item?id=44179604

andyferris•20h ago

Thanks

rolisz•21h ago

Yes, since last week or so.

Keats•21h ago

Yep i know but I have free AWS credits sooo

dimitri-vs•21h ago

Same. I have a very efficient workflow with Cursor Edit/Agent mode where it pretty much one-shots every change or feature I ask it to make. Working inside a CLI is painful, are people just letting Claude Code churn for 10-15 minutes and then reviewing the diff? Are people even reviewing the code?

danielbln•19h ago

This sort of asynchronous flow will become more and more mainstream. chatgpt.com/codex, Google's Jules and to a degree Claude Code (even though that's local) are all following that pattern: phrase a goal, send it off to the agent, review the diff and request changes, rinse and repeat until ready for PR review.

For me this only works for fairly tightly scoped tasks that aren't super complex, but it does work. And I think the days of staring down the IDE will be coming to a close for all but the most complex coding tasks in the future.

apwell23•18h ago

> Are people even reviewing the code?

No because its boring. Thats why we don't have airplane pilots just watch the machine thats fully on autopilot.

hsuduebc2•22h ago

Can someone recommended some source for vibe coding eg. how to prompt it properly, what tools to use? Does someone have any experience on anything other than small projects from scratch?

exfalso•21h ago

My reading of the status quo is that people who use it for toy or greenfield projects written from scratch are having a blast. Until the project reaches a certain complexity in size and function when it starts to break down.

People working on existing projects in turn are scratching their heads because it's just not quite working or providing much of a productivity boost. I belong to this camp.

coffeefirst•20h ago

I had the same question because all my experience with this contradicts the hype.

I watched Ronacher's demo from yesterday, https://www.youtube.com/watch?v=sQYXZCUvpIc, and this is it, a well-regarded engineer working on a serious open source project. There's no wizard behind the curtain, it's the thing I've been asking the promoters for.

And you should make your own judgment, but I'm just not impressed.

It seems to me the machine takes longer, creates a plan that "is shit," and then has to be fixed by a person who has a perfect understanding of the problem.

I'm loving LLMs as research tools, pulling details out of bad documentation, fixing my types and dumb SQL syntax errors, and searching my own codebase in natural language.

But if I have to do all the reasoning myself no matter what, setting a robot free to make linguistically probable changes really feels like a net negative.

Verdex•17h ago

Thanks for the link.

Given the hype and repercussions of success or failure of what LLMs can hypothetically do, I feel like the only way forward for reasonable understanding of the situation is for people to post live streams of what they're raving about.

Or at the very least source links with version control history.

simonw•19h ago

I wrote this a few months ago. The advice still holds but it only has a short section about coding agents so it's less relevant today than it was when I wrote it: https://simonwillison.net/2025/Mar/11/using-llms-for-code/

darkxanthos•21h ago

I stumbled into Agentic Coding in VS Code Nightlys with co-pilot using Claude Sonnet 4 and I've been silly productive. Even when half my day is meetings, you wouldn't be able to tell from my git history.

My thinking now is removed from the gory details and is a step or two up. How can I validate the changes are working? Can I understand this code? How should it be structured so I can better understand it? Is there more we can add to the AI conventions markdown in the repo to guide the Agent to make fewer mistaken assumptions?

Last night I had a file with 38 mypy errors. I turned it over to the agent and went and had a conversation with my wife for 15 minutes. I came back, it summarized the changes it made and why, I debated one of the changes with it but ultimately decided it was right.

Mypy passed. Good to go.

I'm currently trying to get my team to really understand the power here. There's a lot of skeptics and the AI still isn't perfect and people who are against the AI era will latch onto that as validation but it's exactly opposite the correct reaction. It's really validation because as a friend of mine says

"Today is the worst day you will have with this technology for the rest of your life."

GardenLetter27•20h ago

I trust it more with Rust than Python tbh, because with Python you need to make sure it runs every code path as the static analysis isn't as good as clippy + rust-analyzer.

diggan•20h ago

I agree, had more luck with various models writing Rust than Python, but only in the case where they have tools available so one way or another it can run `cargo check` and see the nice errors, otherwise it's pretty equal between the two.

I think the excellent error messages in Rust also help as much humans as it does LLMs, but some of the weaker models get misdirected by some of the "helpful" tips, like some error message suggest "Why don't you try .clone here?" when the actual way to address the issue was something else.

redman25•18h ago

That's true typed languages seem to handle the slop better. One thing I've noticed specifically with rust is that agents tend to overcomplicate things though. They tend to start digging into the gnarlier bits of the language much quicker than they probably need to.

km144•20h ago

> Is there more we can add to the AI conventions markdown in the repo to guide the Agent to make fewer mistaken assumptions?

Forgive my ignorance, but is this just a file you're adding to the context of every agent turn or this a formal convention in the VS code copilot agent? And I'm curious if there's any resources you used to determine the structure of that document or if it was just a refinement over time based on mistakes the AI was repeating?

jnwatson•17h ago

I just finished writing one. It is essentially the onboarding doc for your project.

It is the same stuff you'd tell a new developer on your team: here are the design docs, here are the tools, the code, and this is how you build and test, and here are the parts you might get hung up on.

In hindsight, it is the doc I should have already written.

apwell23•18h ago

> you wouldn't be able to tell from my git history.

I can easily tell from git history which commits were heavily AI generated

ajdidbdbsgs•18h ago

> Last night I had a file with 38 mypy errors

Fixing type checker errors should be one the least time consuming things you do. This was previously consuming a lot of your time?

A lot of the AI discourse would be more effective if we could all see the actual work one another is doing with it (similar to the cloudflare post).

diggan•17h ago

> AI discourse would be more effective if we could all see the actual work one another is doing with it

Yes, this is a frequent problem both here and everywhere else. The discussions need to include things like exact model version, inference parameters, what system prompt you used, what user prompt, what code you gave it, what exactly it replied and so much more details, as currently almost every comment is "Well, I used Sonnet last week and it worked great" without any details. Not to mention discussions around local models missing basic stuff like what quantization (if any) and what hardware you're running it on. People just write out "Wow fast model" or stuff like that, and call it a day.

Although I understand why, every comment be huge if everyone always add sufficient context. I don't know the solution to this, but it does frustrate me.

square_usual•17h ago

There's many examples of exactly what you're asking for, such as Kenton Varda's Cloudlfare oauth provider [1] and Simon Willison's tools [2]. I see a new blog post like this with detailed explanations of what they did pretty frequently, like Steve Klabnik's recent post [3], which while it isn't as detailed has a lot of very concrete facts. There's even more blog posts from prominent devs like antirez who talk about other things they're doing with AI like rubber ducking [4], if you're curious about how some people who say "I used Sonnet last week and it was great" are working, because not everyone uses it to write code - I personally don't because I care a lot about code style.

[1]: https://github.com/cloudflare/workers-oauth-provider/

[2]: https://tools.simonwillison.net/

[3]: https://steveklabnik.com/writing/a-tale-of-two-claudes/

[4]: https://antirez.com/news/153

diggan•16h ago

Maybe I should have been more specific, I was talking specifically about discussions and comments on forums like HN and r/localllama, not that people who are writing blogposts aren't specific enough in their blogposts.

BeetleB•16h ago

> The discussions need to include things like exact model version, inference parameters, what system prompt you used, what user prompt, what code you gave it, what exactly it replied and so much more details, as currently almost every comment is "Well, I used Sonnet last week and it worked great" without any details...Not to mention discussions around local models missing basic stuff like what quantization (if any) and what hardware you're running it on.

While I agree with "more details", the amount of details you're asking for is ... ridiculous. This is a HN comment, not a detailed study.

diggan•14h ago

> the amount of details you're asking for is

I'm not asking for anything, nor providing anything as "a solution", just stating a problem. The second paragraph in my comment is quite literally about that.

SparkyMcUnicorn•16h ago

I feel like that would get tiresome to write, read, and sort through. I don't like everyone's workflow, but if I notice someone making a claim that indicates they might be doing something better than me, then I'm interested.

Maybe keeping your HN profile/gist/repo/webpage up to date would be better.

dimal•12h ago

I don’t know about fixing python types, but fixing typescript types can be very time consuming. A LOT of programming work is like this —- not solving anything interesting or difficult, but just time-consuming drudgery.

These tools have turned out to be great at this stuff. I don’t think I’ve turned over any interesting problems to an LLM and had it go well, but by using them to take care of drudgery, I have a lot more time to think about the interesting problems.

I would suggest that instead of asking people to post their work, try it out on whatever bullshit tasks you’ve been avoiding. And I specifically mean “tasks”. Stuff where the problem has already been solved a thousand times before.

andnand•17h ago

Whats your workflow? Ive been playing with Claude Code for personal use. Usually new projects for experimentation. We have Copilot licenses through work so I've been playing around with VS Code agent mode for the last week. Usually using 3.5, 3.7 Sonnet or 04-mini. This is in a large Go project. Its been abysmal at everything other than tests. I've been trying to figure out if I'm just using the tooling wrong but I feel like I've tried all the "best practices" currently. Contexts, switching models for planning and coding, rules, better prompting. Nothings worked so far.

polskibus•16h ago

My experiments with copilot and Claude desktop via mcp on the same codebase suggest that copilot is trimming the context much more than desktop. Using the same model the outputs are just less informed.

SparkyMcUnicorn•15h ago

Switch to using Sonnet 4 (it's available in VS Code Insiders for me at least). I'm not 100% sure but a Github org admin and/or you might need to enable this model in the Github web interface.

Write good base instructions for your agent[0][1] and keep them up to date. Have your agent help you write and critique it.

Start tasks by planning with your agent (e.g. "do not write any code."), and have your agent propose 2-3 ways to implement what you want. Jumping straight into something with a big prompt is hit or miss, especially with increased task complexity. Planning also gives your agent a chance to read and understand the context/files/code involved.

Apologies if I'm giving you info you're already aware of.

[0] https://code.visualstudio.com/docs/copilot/copilot-customiza...

[1] Claude Code `/init`

andnand•15h ago

This is exactly what I was looking for. Thanks! Im trying to give these tools a fair shot before I judge them. Ive had success with detailed prompts and letting the agent jump straight in when working on small/new projects. Ill give more planning prompts a shot.

Do you change models between planning and implementation? I've seen that recommended but it's been hard to judge if that's made a difference.

SparkyMcUnicorn•12h ago

Glad I could help!

Sometimes I do planning in stronger models like Gemini 2.5 Pro (started giving o3 a shot at this the past couple days) with all the relevant files in context, but often times I default to Sonnet 4 for everything.

A common pattern is to have the agent write down plans into markdown files (which you can also iterate on) when you get beyond a certain task size. This helps with more complex tasks. For large plans, individual implementation-phase-specific markdown files.

Maybe these projects can provide some assistance and/or inspiration:

- https://www.task-master.dev/

- https://github.com/Helmi/claude-simone

8note•14h ago

make sure it writes a requirements and design doc for the change its gonna make, and review those. and, ask it to ask you questions about where there's ambiguity, and to record those responses.

when it has a work plan, track the workplan as a checklist that it fills out as it works.

you can also atart your conversations by asking it to summarize the code base

namaria•30m ago

I really don't get it. I've tested some agents and they can generate boilerplate. It looks quite impressive if you look at the logs, actually seems like an autonomous intelligent agent.

But I can run commands on my local linux box that generate boilerplate in seconds. Why do I need to subscribe to access gpu farms for that? Then the agent gets stuck at some simple bug and goes back and forth saying "yes, I figured out and solved it now" and it keeps changing between two broken states.

The rabid prose, the Fly.io post deriding detractors... To me it seems same hype as usual. Lots of words about it, the first few steps look super impressive, then it gets stuck banging against a wall. If almost all that is said is prognostication and preaching, and we haven't seen teams and organizations racing ahead on top of this new engine of growth... maybe it can't actually carry loads outside of the demo track?

It can be useful. Does it merit 100 billion dollar outlays and datacenter-cum-nuclear-powerplant projects? I hardly think so.

namaria•36m ago

> "Today is the worst day you will have with this technology for the rest of your life."

Why do we trust corporations to keep making things better all of a sudden?

The most jarring effect of this hype cycle is that all appear to refers to some imaginary set of corporate entities.

jedisct1•21h ago

Pretty much my experience as well, although I would highly recommend Roo Code + Claude (via the API) to build entire projects, and Claude for "batch" tasks or finalization.

AI models are trained on data that can be 1 or 2 years old. And they're trained of what the saw the most. So, language changes, breaking API changes, dependencies that don't work any more, name changes, etc. are going to get them super confused.

Go indeed works well because of its standard library that avoids the need for many dependencies, and its stability.

I found PHP to actually be the best target language for coding agents. For the same reasons, and also for the ton of documentation and example code available. That doesn't prevent agents from automatically using some modern PHP features, applying static analysis tools, etc.

For frontend stuff, agents will almost always pick React + Tailwind because this is what they saw the most. But Tailwind 4 is very different from Tailwind 3, and that got them super confused.

tonnydourado•21h ago

Gotta say, 100/200 bucks monthly feels prohibitively expensive for even trying out something, particularly something as unproven as code-writing AI, even more particularly when other personal experiences with AI have been at the very least underwhelming, and extra particularly when the whole endeavor is so wrapped up in ethical concerns.

dukeyukey•21h ago

You can use Claude Code either pay-as-you-go with an API key, or subscribe to the $20 Pro subscription.

jononor•17h ago

One month at 20 USD seems like it should be plenty to try it out on a small project or two to decide wether it is worth trying 100 bucks/month? Or one can just wait a couple of months as people report their learnings.

BeetleB•16h ago

Try Aider with API usage. Learn how to control context size (/clear, /add, /drop). Limit context to 25K. Use whatever model you want (Sonnet 4 or Gemini 2.5 Pro).

For simple scripts, it often costs me under $1 to build. I'm working on a bigger tool these days, and I've done lots of prompts, a decent amount of code, over 100 tests, and my running total is right now under $6.

I'd suggest learn the basics of using AI to code using Aider, and then consider whether you want to try Claude Code (which is likely more powerful, but also more expensive unless you use it all the time).

Karrot_Kream•15h ago

Yeah I've been using Aider mostly and just started using Codex, very similar to Claude Code, yesterday. Aider is more manual and requires more guiding but it's also an order of magnitude cheaper.

The monkey brain part of me that really doesn't trust an LLM and trusts my decades of hard-won programming experience also prefers using Aider because the usage flow generally goes:

1. Iterate with Aider on a plan

2. Tell Aider to write code

3. Review the code

4. Continue hacking myself until I want to delegate something to an LLM again.

5. Head back to Step 1.

Codex automates this flow significantly but it's also a lot more expensive. Just the little bits of guiding I offer an LLM through Aider can make the whole process a lot cheaper.

It's unclear to me whether the full agentic Claude Code/Codex style approach will win or whether Aider's more carefully guided approach will win in the marketplace of ideas, but as a pretty experienced engineer Aider seems to be the sweet spot between cost, impact, and authorial input.

BeetleB•14h ago

Yes, my concerns as well about the more powerful tools (which I admit I haven't tried).

Even with Aider, I feel it goes too fast and I sometimes actively slow it down (by giving it only very incremental changes rather than get it to do a larger chunk). I think I'd be totally lost with a more powerful agentic tool.

gk1•21h ago

Nice to see container use mentioned (https://github.com/dagger/container-use). I work with the team that made it (a lot of ex-Docker folks including the creator of Docker.)

Running agents in parallel will be a big deal as soon as we learn (or the agents learn) how to reliably work with just one.

Even before then, if you're trying to get work done while the agent is doing its own thing or you find yourself watching over the agent's "shoulder" out of fear it'll change something you didn't ask it to change, then it's useful to run it in a containerized dev environment.

Container use is definitely early but moving quickly, and probably improved even since this post was published. We're currently focused on stability, reducing git confusion, better human<>agent interaction, and environment control.

swah•20h ago

Meta: this hits differently because the author of this post created an awesome, popular Python web framework some 15 years ago. I miss those times dearly (using Python for web stuff).

jpadamspdx•20h ago

https://github.com/dagger/container-use (cu) is improving daily. Happy to help get it working if you're hitting anything (we're all in dagger.io discord). Last night I tried it with Amazon Q Developer CLI chat (with claude-3.7-sonnet), which I hadn't touched before (will PR how-to to the README today). MCP integration just worked for me. Figured out where to put the agent rules for Q and how to restrict to just the tools from cu. I kicked off three instances of Q to modify my flask app project with the same prompt in parallel (don't step on the local source) and got three variants to review in short order. I merged the one I liked into the repo and tossed the rest.

bgwalter•18h ago

Well, the author's previous blog posts were shorter (e.g., https://lucumr.pocoo.org/2022/1/30/unsafe-rust/) and more succinct.

I've no idea what he is saying here. It is all about vaguely defined processes and tools and people increasingly adopt an LLM writing style.

the_mitsuhiko•17h ago

> and people increasingly adopt an LLM writing style.

If you are insinuating that this is written by an LLM: it is not.

bgwalter•17h ago

No, I didn't try to claim that. I seem to see the influence in many people's writing and verbosity though. It could be as simple as a counter reaction: If an LLM is allowed to be verbose, so are humans. It could also be that people who use LLMs a lot subconsciously adopt the style.

I infer that you are the author of the post. Take it as a compliment, I think you have written many good pre-LLM articles.

haiku2077•18h ago

> Context system: Go provides a capable copy-on-write data bag that explicitly flows through the code execution path, similar to contextvars in Python or .NET's execution context. Its explicit nature greatly simplifies things for AI agents. If the agent needs to pass stuff to any call site, it knows how to do it.

I believe this is considered a bad practice: the general attitude is that the only sane use case for values in context.Context is tracing data, and all other data should he explicitly passed via arguments.

the_mitsuhiko•18h ago

I'm really not an expect in Go, but the data that I'm passing at the moment via context is the type of data which is commonly placed there by libraries I use: database connections, config, rate limiters, cache backends etc. Does not seem particularly bad to me at least.

haiku2077•17h ago

If you use context.Context for this you give up a lot of type safety and generally make your data passing opaque.

It's totally fine to put multiple values into a different data bag type that has explicit, typed fields. For example, the Echo framework has its own strongly typed and extensible Context interface for request scoped data: https://pkg.go.dev/github.com/labstack/echo#Context

the_mitsuhiko•17h ago

> If you use context.Context for this you give up a lot of type safety and generally make your data passing opaque.

The data passing maybe, not sure how you lose type safety. The value comes from the context with the right type just fine. The stuff that I'm attaching to the context are effectively globals just that this way you can enable proper isolation in tests and else.

From my limited experience with echo, the context there is not at all the same thing.

haiku2077•16h ago

Context.Value's signature is Value(any) any - you have to use type conversion or reflection to determine the value's type at runtime, instead of a compile-time check.

the_mitsuhiko•15h ago

But I have methods such as MustRateLimiterFromContext(ctx) which returns the right type :)

haiku2077•11h ago

By crashing your program at runtime if any wrong type is added!

physicles•17h ago

Agreed on all points.

The only place I’ve encountered this pattern is in chromedp, the go wrapper for the chrome headless browser driver. Its API… isn’t good.

Most methods you use are package globals that take a context.Context as a first parameter. But you have to understand that this context is a _special_ one: you can’t pass any old context like context.Background(), you must pass a context you got from one of the factory methods.

If you want to specify a timeout, you use context.WithTimeout. Clever I guess, but that’s the only setting that works like that.

It’s essentially a void*.

unshavedyak•18h ago

On the note of language choice, i've been experimenting with Claude Code recently and thought the other day how happy i am to be using Rust with it and how afraid i'd be in Python, JS, etc.

I've noticed Claude Code introduces quite a few errors and then walks through the compile errors to fix things up. Refactors/etc also become quite easy with this workflow from CC.

I'm sure it does well in dynamic languages, but given how much the LLM leans into these compile errors i get the feeling it would simply miss more things if there was none/less.

So far though my #1 concern is finding ways to constraining the LLM. It produces slop really, really quick and when it works more slowly i can avoid some of the review process. Eg i find stubbing out methods and defining the code path i want, in code, rather than trying to explain it to the LLM to be productive.

Still in my infancy of learning this tool though. It feels powerful, but also terrifying in hands of lazy folks just pushing through slop.

linguistbreaker•17h ago

My take on choice of language:

1) Java has the largest, oldest and most explicit data set for the LLM to reference, so it's likely to be the most thorough, if not the most correct.

2) Go with the language YOU know best because you'll be able to spot when the LLM is incorrect, flawed in its 'reasoning', hallucinating etc.

Macha•17h ago

I always assumed the LLMs had the most python code to reference, as they seem to default to Python most often if you don't specify

diggan•17h ago

> Java has the largest, oldest and most explicit data set for the LLM to reference

That seems to be a recommendation for coding with LLMs that don't have access to tools to look up APIs, docs and 3rd party source-code, rather than something you'd chose for "Agentic Coding".

Once the tooling can automatically figure out what is right, what language you use matters less, as long as source code ends available somewhere the agent can read it when needed.

Agree much with your 2nd point though, all outputs still require careful review and what better language to use than one you know inside-out?

tough•17h ago

I have been learning Go, Swift, and Rust with the help of LLM/ Agents.

basically the terser/safer syntax and runtime compilation errors are a great tight feedback loop for the agent to fix stuff by itself.

fibers•17h ago

Why is this? is there just a insanely large codebase of open source projects in Java (the only thing i can think of is the entire Apache suite)? Or is it because the docs are that expressive and detailed for a given OSS library?

linguistbreaker•7h ago

Java's API docs are very complete and explicit.

Certain points about the language, as well as certain long-existing open source projects have been discussed ad-nauseum online. This all adds to the body of knowledge.

linsomniac•16h ago

I had a recent discussion with another member of the Python community (OA is written by a big name in Python).

He started off saying "learning to code with AI is like learning to cook by ordering off the menu". I know he meant "an AI being the way you learn how to code", but there's another meaning that I've been thinking a lot about because my 16yo son is really into coding and I'm trying to come up with how I can help him be successful in the world at the horizon where he starts doing it professionally.

In that way, "learning how to work together with an AI to code" is a really, really interesting question. Because the world is going to look VERY different in 2-6 years.

The thread in question: https://bsky.app/profile/alsweigart.bsky.social/post/3lr6guv...

bgwalter•16h ago

I think this discussion boxes new students into the mediocre category right from the start.

Do we really want to tell Fabrice Bellard that he isn't productive enough?

If you want to train people to become fungible factory workers on the other hand, train them to work on the conveyor belt.

linsomniac•15h ago

I get your point, but I'm envisioning a different endpoint.

Let's take your factory example: Factories are just a fact of life right now, almost nobody is producing bespoke cars or phones or clothing. So given that my son is basically 100% likely to be working with an automation line, how do I get him on the track to being a machine operator or a millwright rather than doing conveyor belt work?

robertlagrant•16h ago

> Likewise with AI I strongly prefer more code generation over using more dependencies. I wrote about why you should write your own code before, but the more I work with agentic coding, the more I am convinced of this.

This is an interesting statement!

mountainriver•16h ago

Packages have communities though.

Leynos•16h ago

Something I like to do is get Gemini Deepresearch to write a single file manual for any uncommon dependency and include that in my docs/ directory. Helps a bunch.

I also ask it to write specialized guides on narrow topics (e.g., testing async SQLAlchemy using pytest-async and pytest-postgresql).

maqnius•16h ago

'Many hallucinations' may become the new 'poorly documented' when it comes to tech stack decisions. I'm asking myself if it could slow down adoption of new tech in future, since it's harder to provide the equivalent learning material of 10 years of Stack Overflow than writing equally good documentation.

Imustaskforhelp•15h ago

A suggestion that maybe the dark mode shouldn't be on the end of the page and maybe on the top of the page, I personally would've loved it and do it with some of my html blogs, maybe personal preference but yeah I think I agree golang is pretty cool but the knowledge base of python feels more and I sometimes just use uv with python and ai gemini pro right within the browser to create one off cool scripts. Pretty cool!

apwell23•11h ago

anthropic just released ai fluency course

https://www.youtube.com/watch?v=JpGtOfSgR-c

this is best set of vidoes on the topic i've seen

anonymid•9h ago

neovim has AI integration! (and a pretty damn good one if I say so myself (I wrote it))

https://github.com/dlants/magenta.nvim

Denmark Wants to Dump Microsoft Software for Linux, LibreOffice

Formo – data platform for onchain apps

Your iPhone is about to get uglier

Gif320: A GIF viewer for DEC VT320 terminals

If the moon were only 1 pixel: A tediously accurate solar system model

Agent or Workflow? (Interactive Quiz)

Iranian Crown Prince in Exile – Interview with Reza Pahlavi [video]

Media Cheat Sheet – Up-to-date social media image and video sizes

Embedded Configurable Operating System (ECos)

eKilo – A super lightweight Vim alternative.

Show HN: Open-source AI generated short form content

The Hoodcat Chronicles

VS Code 1.101

Ask HN: Stripe alternative for marketplace type business

ArisInfra Solutions IPO 2025 – Price Band, GMP, Allotment and How to Apply

Bubblewrap: Low-level unprivileged sandboxing tool

Show HN: Radicle Desktop, a graphical user interface for Radicle

Apple's Spin on the Personalized Siri Apple Intelligence Reset

Scientists rush to stop mirror microbes that could threaten life on Earth

Quality Standards

How to pronounce schedule in British English

If I Ran Mastodon

Opportunities for Digital Health in 2025

Barbie-maker Mattel partners with OpenAI to make AI child's play

Claude Code: Settings

Google to reduce Pixel 6A charging performance after fire reports

Chinese AI Companies Dodge USA Chip Curbs with Flying Suitcases of Hard Drives

Tesla sues ex-Optimus engineer alleging theft of robotic trade secrets

Thousands of Koreans were banned from Instagram this week. I was one of them

Xian: A Native Python Blockchain

Denmark Wants to Dump Microsoft Software for Linux, LibreOffice

Formo – data platform for onchain apps

Your iPhone is about to get uglier

Gif320: A GIF viewer for DEC VT320 terminals

If the moon were only 1 pixel: A tediously accurate solar system model

Agent or Workflow? (Interactive Quiz)

Iranian Crown Prince in Exile – Interview with Reza Pahlavi [video]

Media Cheat Sheet – Up-to-date social media image and video sizes

Embedded Configurable Operating System (ECos)

eKilo – A super lightweight Vim alternative.

Show HN: Open-source AI generated short form content

The Hoodcat Chronicles

VS Code 1.101

Ask HN: Stripe alternative for marketplace type business

ArisInfra Solutions IPO 2025 – Price Band, GMP, Allotment and How to Apply

Bubblewrap: Low-level unprivileged sandboxing tool

Show HN: Radicle Desktop, a graphical user interface for Radicle

Apple's Spin on the Personalized Siri Apple Intelligence Reset

Scientists rush to stop mirror microbes that could threaten life on Earth

Quality Standards

How to pronounce schedule in British English

If I Ran Mastodon

Opportunities for Digital Health in 2025

Barbie-maker Mattel partners with OpenAI to make AI child's play

Claude Code: Settings

Google to reduce Pixel 6A charging performance after fire reports

Chinese AI Companies Dodge USA Chip Curbs with Flying Suitcases of Hard Drives

Tesla sues ex-Optimus engineer alleging theft of robotic trade secrets

Thousands of Koreans were banned from Instagram this week. I was one of them

Xian: A Native Python Blockchain

Agentic Coding Recommendations

Comments