For now...
I understand programming for the sake of programming, chasing purity and really digging into the creative aspects of coding. But I get that same kick out of writing perfect interfaces, knowing that the messier the code underneath is, the more my beautiful interface gets to shine. But transformers are offering us a way to build faster, to create more, and to take on bigger complexity while learning deeply about new domains as we go. I think the more we lean into that, we might enter a software golden age where the potential for creativity and impact can enter a whole new level.
Energy based models and machines that boot strap from models, organize their state to a prompt are on their way. The analog hole for coders is closing.
Most software out there is the layers of made up tools and such to manage and deploy software. We’ll save a lot of cycles pruning it all for generic patterns.
5-10 more years it’s all hardware again. Then no longer need to program a computer like it’s 1970.
The thing is, code that does all of the things you listed here is good looking code almost by definition
If AI was anywhere near capable of producing this quality then it would be so thrilling, wouldn't it?
But it's not. The consensus seems to be pretty universal that AI code is Junior to Intermediate quality at best, the majority of the time
That generally isn't code that satisfies the list of quality criteria you mentioned
Do you perform (or have another agent perform) code reviews on the agent's code?
Do you discuss architecture and approach with the agent beforehand and compile that discussion into a design and development plan?
If you don't do these things, then you're just setting yourself up for failure.
By the time I did all the stuff you're suggesting, I could just build the damn thing myself
AI is a tool, not a human. I'm not about to invest in it the way I would a Junior developer. If the tool doesn't do the job, it's not a good tool. If a tool requires the same level of investment that a human does it's also not a good tool
I've been deobfuscating claude code to watch their prompts evolve as I use it and you an see the difference around how and when it chooses to re-analyze a codebase or how it will explicitly breakup work into steps. A lot of the implicit knowledge of software engineering is being added, _outside_ of the LLM training.
Do you know how tiring it gets to constantly engage with people who complain about agentic workflows without actually having the experience or knowledge to properly evaluate them? These people already have intentionally closed their minds on the subject, but still love to debate it loudly and frequently even though they have little intention of actually considering others' arguments or advice. What could be an educational moment turns into ideological warfare reminiscent of the text editor or operating system wars.
It's beginning to get infuriating, because of the unbridled arrogance typically encountered by naysayers.
Software engineers and anyone who'd get hired to fix the mess that the LLM created. Also, ironically, other LLMs would probably work better on...not messy code.
> But transformers are offering us a way to build faster, to create more, and to take on bigger complexity
Wow. You sound as if building faster or creating more is synonymous with quality or utility. Or as if LLMs allow us to take on a bigger level of complexity (this is where they notoriously crumble).
> we might enter a software golden age where the potential for creativity
I haven't heard of a single (good) software engineer whose creativity was stifled by their inability to code something. Is an LLM generating a whole book in Hemingway style considered creative, or a poem? Or a program/app?
That's always been a worry with programming (see basically all Rich Hickey talks), and is still a problem since people prefer "moving fast today" instead of "not having 10 tons of technical debt tomorrow"
LLMs makes it even easier for people to spend the entire day producing boilerplate without stopping for a second to rethink why they are producing so much boilerplate. If the pain goes away, why fix it?
It isn't much different than dealing with an extremely precocious junior engineer.
Given how easy it is to refactor now, it certainly makes economic sense to delay it.
Like I know boilerplate and messy code sucks because I've had to work with it, without LLMs, and I know how much it sucks. I think you do too, but I think we know that, because we had to fight with it in the past.
Deliberately telling it how to rethink the structure, refactor first, then seperate out components fixed things.
IF LLMs stayed at the current level, I would expect llm-aided-coders to learn how to analyze and address situations like this. However, I do expect models to be better able to A) avoid these kinds of situations with better design up front or reflection when making changes and B) identify more systematic patterns and reason about the right way to structure things. Basically ambiently detecting "code smells."
You can already see improvements both from newer models and from prompt engineering coming from the agentic tools.
The pain of shitty code doesn’t go away. They can ship your crappy MVP faster, but technical debt doesn’t magically go away.
This is an awesome opportunity for those people to start learning how to do software design instead of just “programming”. People that don’t are going to be left behind.
Three or four weeks ago I was posting how LLMs were useful for one-off questions but I wouldn't trust them on my codebase. Then I spent my week's holiday messing around on them for some personal projects. I am now a fairly committed Roo user. There are lots of problems, but there is incredible value here.
Try it and see if you're still a hold-out.
all this agent stuff sounded stupid to me until I tried it out in the last few weeks, and personally, it's been great - I give a not-that-detailed explanation for what I want, point it at the existing code and get back a patch to review once I'm done making my coffee. sometimes it's fine to just apply, sometimes I don't like a variable name or whatever, sometimes it doesn't fit in with the other stuff so I get it to try again, sometimes (<< 10% of the time) it's crap. the experience is pretty much like being a senior dev with a bunch of very eager juniors who read very fast.
anyway, obviously do whatever you want, but deriding something you've not looked in to isn't a hugely thoughtful process for adapting to a changing world.
> anyway, obviously do whatever you want, but deriding something you've not looked in to isn't a hugely thoughtful process for adapting to a changing world.
I have tried it. Not sure I want to be part of such world, unfortunately.
> the experience is pretty much like being a senior dev with a bunch of very eager juniors who read very fast.
I... don't want that. Juniors just slow me down because I have to check what they did and fix their mistakes.
(this is in the context of professional software development, not making scripts, tinkering etc)
> (this is in the context of professional software development, not making scripts, tinkering etc)
I understand the sentiment. A few months ago they wanted us to move fast and dumped us (originally 2 developers) with 4 new people who have very little real world coding experience. Not fun, and very stressful.
However, keep in mind that in many workplaces, handling junior devs poorly means one of two things:
1. If you have some abstruse domain expertise, and it's OK that only 1-2 people work on it, you'll be relegated to doing that. Sadly, most workplaces don't have such tasks.
2. You'll be fantastic in your output. Your managers will like you. But they will not promote you. After some point, they expect you to be a leverage multiplier - if you can get others to code really well, the overall team productivity will exceed that of any superstar (and no, I don't believe 10x programmers exist in most workplaces).
Ouch! Reminds me of:
- I'm never going to use cell phones. I care about voice quality (me decades ago)
- I'm never going to use VoIP. I care about voice quality (everyone but me 2 decades ago).
- I'm never going to use a calculator. I am not going to give up on reasoning.
- I'm never going to let my kids play with <random other ethnicity>. I care about good manners.
"C++ question: how do I get the unqualified local system time and turn into an ISO time string?"
"Python question: how do I serialize a C struct over a TCP socket with asyncio?"
"JS question: how do I dynamically show/hide an HTML element?" (I obviously don't write a lot of JS :-D)
ChatGPT gave me the correct answers on the first try. I have been a sceptic, but I'm now totally sold on AI assisted coding, at least as a replacement for Google and StackOverflow. For me there is no point anymore in wading through all the blog spam and SEO crap just to find a piece of information. Stack Overflow is still occasionally useful, but the writing is on the wall...
EDIT: Important caveat: stay critical! I have been playing around asking ChatGPT more complex questions where I actually know the correct answer resp. where I can immediately spot mistakes. It sometimes gives me answers that would look correct to a non-expert, but are hilariously wrong.
Oh, I'm definitely aware of that! I mostly do this with things I have already done, but can't remember all the details. If the LLM shows me something new, I check the official documentation. I'm not into vibe coding :) I still want to understand every line of code I write.
I find using o3 or o4-mini and prompting "use your search tool" works great for having it perform research tasks like this.
I don't trust GPT-4o to run searches.
If you mean the ChatGPT interface, I suspect you're headed in the wrong direction.
Try Aider, with API interface. You can use whatever model you like (as you're paying per token). See my other comment:
https://news.ycombinator.com/item?id=44259900
I mirror the GP's sentiment. My initial attempts using a chat like interface were poor. Then some months ago, due to many HN comments, I decided to give Aider a try. I had put my kid to bed and it was 10:45pm. My goal was "Let me just figure out how to install Aider and play with it for a few minutes - I'll do the real coding tomorrow." 15 minutes later, not only had I installed it, my script was done. There was one bug I had to fix myself. It was production quality code, too.
I was hooked. Even though I was done, I decided to add logging, command line arguments, etc. An hour later, it was a production grade script, with a very nice interface and excellent logging.
Oh, and this was a one-off script. I'll run it once and never again. Now all my one-off scripts have excellent logging, because it's almost free.
There was no going back. For small scripts that I've always wanted to write, AI is the way to go. That script had literally been in my head for years. It was not a challenging task - but it had always been low in my priority list. How many ideas do you have in your head that you'll never get around to because of lack of time. Well, now you can do 5x more of those than you would have without AI.
I was at the "script epiphany" stage a few months ago and I got cool Bash scripts (with far more bells and whistles I would normally implement) just by iterating with Claude via its web interface.
Right now I'm at the "Gemini (with Aider) is pretty good for knock-offs of the already existing functionality" stage (in a Go/HTMX codebase).
I'm yet to get to the "wow, that thing can add brand new functionality using code I'm happy with just by clever context management and prompting" stage; but I'm definitely looking forward to it.
I wonder how that interacts with his previous post?
There are absolutely times to be extremely focused and clever with your code, but they should be rare and tightly tied to your business value.
Most code should be "blindingly obvious" whenever possible.
The limit on developers isn't "characters I can type per minute" it's "concepts I can hold in my head."
The more of those there are... The slower you will move.
Don't create more interfaces over the existing ones, don't abstract early, feel free to duplicate and copy liberally, glue stuff together obviously (even if it's more code, or feels ugly), declare the relevant stuff locally, stick with simple patterns in the docs, don't be clever.
You will write better code. Code shouldn't be pretty, it should be obvious. It should feel boring, because the hard part should be making the product not the codebase.
Does this mean that eventually in a world where we all use this stuff, no new language/framework/library will ever be able to emerge?
Competing with the existing alternatives will be too hard. You won't even be able to ask real humans for help on platforms like StackOverflow because they will be dead soon.
With a stable enough boilerplate you can come up with outstanding results in a few hours. Truly production ready stuff for small size apps.
And even better rather than have them off in some far away location annotate the code itself so the tests will be updated with the code.
That's pretty impressive and someone would have to be short sighted to feel the false productivity of constantly manually implementing what a computer can automatically do for them.
Not to mention how much better if you work on any actual large scale systems with true cross team dependencies and not trivial code bases that get thrown away every few years where it almost doesn't matter how you write it.
LLMs can handle the syntax of basically any language, but the library knowledge is significantly improved by having a larger corpus of code than Common Lisp tends to have publicly available.
If you truly believe in the potential of agentic AI, then the logical conclusion is that programming languages will become the assembly languages of the 21st century. This may or may not become the unfortunate reality.
Whether that's going to make sense, I have some doubts, but as you say: For an LLM optimist, it's the logical conclusion. Code wouldn't need to be optimised for humans to read or modify, but for models, and natural language is a bit of an unnecessary layer in that vision.
Personally I'm not an LLM optimist, so I think the popular stack will remain focused on humans. Perhaps tilting a bit more towards readability and less towards typing efficiency, but many existing programming languages, tools and frameworks already optimise for that.
1. The previous gen has become bloated and complex because it widened it's scope to cover every possible miche scenario and got infiltrated by 'expert' language and framework specialists that went on an atrotecture binge.
2. As a result a new stack is born, much simpler, back to basics, than the poorly aged encumbant. It doesn't cover every niche, but it does a few newly popular things realy easy and well, and rises on the coattails of this new thing as the default envoronment for it.
3. Over time the new stack ages just as poorly as the old stack for all the same reasons. So the cycle repeats.
I do not see this changing with ai-assisted coding, as context enrichment is getting better allowing a full stack specification in post training.
How will it ever rise on the coattails of anything if it isn't in the AI training data so no one is ever incentivized to use it to begin with?
That's a very good question.
Rephrased: as good training data will diminish exponentially with the Internet being inundated by LLM regurgitations, will "AI savvy" coders prefer old, boring languages and tech because there's more low-radiation training data from the pre-LLM era?
The most popular language/framework combination in early 2020s is JavaScript/React. It'll be the new COBOL, but you won't need an expensive consultant to maintain in the 2100s because LLMs can do it for you.
Corollary: to escape the AI craze, let's keep inventing new languages. Lisps with pervasive macro usage and custom DSLs will be safe until actual AGIs that can macroexpand better than you.
I don't think the premise is accurate in this specific case.
First, if anything, training data for newer libs can only increase. Presumably code reaches github in a "at least it compiles" state. So you have lots of people fight the AIs and push code that at least compiles. You can then filter for the newer libs and train on that.
Second, pre-training is already mostly solved. The pudding seems to be now in post-training. And for coding a lot of post-training is done with RL / other unsupervised techniques. You get enough signals from using generate -> check loops that you can do that reliably.
The idea that "we're running out of data" is way too overblown IMO, especially considering the last ~6mo-1y advances we've seen so far. Keep in mind that the better your "generation" pipeline becomes, the better will later models be. And the current "agentic" loop based systems are getting pretty darn good.
How?
Presumably in the "every coder is using AI assistants" future, it will be an incredible amount of friction to get people to adopt languages that AI assistants don't know anything about
So how does the training data for a new language get made, if no programmers are using the language, because the AI tools that all programmers rely on aren't trained on the language?
The snake eating its own tail
I'm not an expert here by any means but I'm not seeing how this makes much sense versus just using languages that the LLM is already trained on
Here is a Youtube video that makes the same argument. React is / will be the last Javascript framework, because it is the dominant one right now. Even of people publish new frameworks, LLM coding assistants will not be able to assist coding using the new frameworks, so the new frameworks will not find users or popularity.
And even for React, it will be difficult to add any more new features, because LLMs only assist to write code that uses the features the LLMs know about, which are the old, established ways to write React.
Why not? When my coding agent discovers that they used the wrong API or used the right API wrong, it digs up the dependency source on disk (works at least with Rust and with JavaScript) and looks up the new details.
I also have it use my own private libraries the same way, and those are not in any training data guaranteed.
I guess if whatever platform/software you use doesn't have tool calling youre kind of right, but also missing something kind of commonplace today.
New frameworks can be created, but they will be different from before:
- AI-friendly syntax, AI-friendly error handling
- Before being released, we will have to spend hundred of millions of token of agents reading the framework and writing documentation and working example code with it, basically creating the dataset that other AI can reference when using the new framework.
- Create a way to have that documentation/example code easily available for AI agents (via MCP or new paradigm)
Right now languages are the interface between human and computer. When LLM's would take over, their ideal programming language is probably less verbose than what we are currently using. Maybe keywords could become 1 token long, etc. Just some quick thoughts here :D.
Not even close, and the article betrays the author's biases more than anything else. The fact that their Claude Code (with Sonnet) setup has issues with the `cargo test` cli for instance is hardly a categorical issue with AIs or cargo, let alone rust in general. Junie can't seem to use its built-in test runner tool on PHP tests either, that doesn't mean AI has a problem with PHP. I just wrote a `bin/test-php` script for it to use instead, and it figures out it has to use that (telling it so in the guidelines helps, but it still keeps trying to use its built-in tool first)
As for SO, my AI assistant doesn't close my questions as duplicates. I appreciate what SO is trying to do in terms of curation, but the approach to it has driven people away in droves.
You'd expect more from the company that is developing both the IDE and the AI agent...
I really liked Augment, except for its piggish UI. Then they revealed the price tag, and back to Junie I went.
I highly doubt it. These things excel at translation.
Even without training data, if you have an idiosyncratic-but-straightforward API or framework, they pick it up no problem just looking at the codebase. I know this from experience with my own idiosyncratic C# framework that no training data has ever seen, that the LLM is excellent at writing code against.
I think something like Rust lifetimes would have a harder time getting off the ground in a world where everyone expects LLM coding to work off the bat. But something like Go would have an easy time.
Even with the Rust example though, maybe the developers of something that new would have to take LLMs into consideration, in design choices, tooling choices, or documentation choices, and it would be fine.
I don't buy that it pushes you into using Go at all. If anything I'd say they push you towards Python a lot of the time when asking it random questions with no additional context.
The elixir community is probably only a fraction of the size of Go or Python, but I've never had any issues with getting it to use it.
I don't like the word "agent" because it is not a blind LLM, small or fast script. It is a complex workflow with many checks and prompting before writing a single line of code. That's also the key to AI powered development; context.
It's quite possible it's a case of holding things wrong but I think at least the basic evaluation I did that made me come to the conclusion that Go works particularly well isn't too bad. I just get results that I feel good with quicker than with Rust and Python. FWIW I also had really good results with PHP on the level of Go too, it's just overall a stack that does not cater too well to my problem.
My own experience with "Agents" (and no, I am not a luddite) has been nothing short of comical in how terrible it's been. We try it every single day at our company. We've tried all the advice. All the common wisdom. They have never, not once, produced anything of any value. All the public showcases of these brilliant "agents" have also been nothing short of spectacular failures [1]. Yet despite all this, I keep seeing these type of posts, and pretty much always it's from someone with a vested interest of some kind when you dig deep down enough. All the managerial types pushing it, you look deep enough it's always because the board or investors or whatever other parasite has a vested interest.
I know one thing is for certain, what AI will give us is more and more fucking advertisements shoved into every facet of our lives, except now it sorta talks like a human!
- think of everything that is needed for a feature (fixable via planning at the beginning)
- actually follow that plan correctly
I just tried with a slightly big refactor to see if some changes would improve performance. I had it write the full plan and baseline benchmarks to disk, then let it go in yolo mode. When it was done it only implemented something like half of the planning phases and was saying the results look good despite all the benchmarks having regressed.
- Rust
- Big refactors
- Performance improvements
- Yolo-mode, especially if you aren't skilled yet at prompting and knowing which things the LLM will do well and which will need supervision
For me this only works for fairly tightly scoped tasks that aren't super complex, but it does work. And I think the days of staring down the IDE will be coming to a close for all but the most complex coding tasks in the future.
No because its boring. Thats why we don't have airplane pilots just watch the machine thats fully on autopilot.
People working on existing projects in turn are scratching their heads because it's just not quite working or providing much of a productivity boost. I belong to this camp.
I watched Ronacher's demo from yesterday, https://www.youtube.com/watch?v=sQYXZCUvpIc, and this is it, a well-regarded engineer working on a serious open source project. There's no wizard behind the curtain, it's the thing I've been asking the promoters for.
And you should make your own judgment, but I'm just not impressed.
It seems to me the machine takes longer, creates a plan that "is shit," and then has to be fixed by a person who has a perfect understanding of the problem.
I'm loving LLMs as research tools, pulling details out of bad documentation, fixing my types and dumb SQL syntax errors, and searching my own codebase in natural language.
But if I have to do all the reasoning myself no matter what, setting a robot free to make linguistically probable changes really feels like a net negative.
Given the hype and repercussions of success or failure of what LLMs can hypothetically do, I feel like the only way forward for reasonable understanding of the situation is for people to post live streams of what they're raving about.
Or at the very least source links with version control history.
My thinking now is removed from the gory details and is a step or two up. How can I validate the changes are working? Can I understand this code? How should it be structured so I can better understand it? Is there more we can add to the AI conventions markdown in the repo to guide the Agent to make fewer mistaken assumptions?
Last night I had a file with 38 mypy errors. I turned it over to the agent and went and had a conversation with my wife for 15 minutes. I came back, it summarized the changes it made and why, I debated one of the changes with it but ultimately decided it was right.
Mypy passed. Good to go.
I'm currently trying to get my team to really understand the power here. There's a lot of skeptics and the AI still isn't perfect and people who are against the AI era will latch onto that as validation but it's exactly opposite the correct reaction. It's really validation because as a friend of mine says
"Today is the worst day you will have with this technology for the rest of your life."
I think the excellent error messages in Rust also help as much humans as it does LLMs, but some of the weaker models get misdirected by some of the "helpful" tips, like some error message suggest "Why don't you try .clone here?" when the actual way to address the issue was something else.
Forgive my ignorance, but is this just a file you're adding to the context of every agent turn or this a formal convention in the VS code copilot agent? And I'm curious if there's any resources you used to determine the structure of that document or if it was just a refinement over time based on mistakes the AI was repeating?
It is the same stuff you'd tell a new developer on your team: here are the design docs, here are the tools, the code, and this is how you build and test, and here are the parts you might get hung up on.
In hindsight, it is the doc I should have already written.
I can easily tell from git history which commits were heavily AI generated
Fixing type checker errors should be one the least time consuming things you do. This was previously consuming a lot of your time?
A lot of the AI discourse would be more effective if we could all see the actual work one another is doing with it (similar to the cloudflare post).
Yes, this is a frequent problem both here and everywhere else. The discussions need to include things like exact model version, inference parameters, what system prompt you used, what user prompt, what code you gave it, what exactly it replied and so much more details, as currently almost every comment is "Well, I used Sonnet last week and it worked great" without any details. Not to mention discussions around local models missing basic stuff like what quantization (if any) and what hardware you're running it on. People just write out "Wow fast model" or stuff like that, and call it a day.
Although I understand why, every comment be huge if everyone always add sufficient context. I don't know the solution to this, but it does frustrate me.
[1]: https://github.com/cloudflare/workers-oauth-provider/
[2]: https://tools.simonwillison.net/
[3]: https://steveklabnik.com/writing/a-tale-of-two-claudes/
While I agree with "more details", the amount of details you're asking for is ... ridiculous. This is a HN comment, not a detailed study.
I'm not asking for anything, nor providing anything as "a solution", just stating a problem. The second paragraph in my comment is quite literally about that.
Maybe keeping your HN profile/gist/repo/webpage up to date would be better.
These tools have turned out to be great at this stuff. I don’t think I’ve turned over any interesting problems to an LLM and had it go well, but by using them to take care of drudgery, I have a lot more time to think about the interesting problems.
I would suggest that instead of asking people to post their work, try it out on whatever bullshit tasks you’ve been avoiding. And I specifically mean “tasks”. Stuff where the problem has already been solved a thousand times before.
Write good base instructions for your agent[0][1] and keep them up to date. Have your agent help you write and critique it.
Start tasks by planning with your agent (e.g. "do not write any code."), and have your agent propose 2-3 ways to implement what you want. Jumping straight into something with a big prompt is hit or miss, especially with increased task complexity. Planning also gives your agent a chance to read and understand the context/files/code involved.
Apologies if I'm giving you info you're already aware of.
[0] https://code.visualstudio.com/docs/copilot/copilot-customiza...
[1] Claude Code `/init`
Do you change models between planning and implementation? I've seen that recommended but it's been hard to judge if that's made a difference.
Sometimes I do planning in stronger models like Gemini 2.5 Pro (started giving o3 a shot at this the past couple days) with all the relevant files in context, but often times I default to Sonnet 4 for everything.
A common pattern is to have the agent write down plans into markdown files (which you can also iterate on) when you get beyond a certain task size. This helps with more complex tasks. For large plans, individual implementation-phase-specific markdown files.
Maybe these projects can provide some assistance and/or inspiration:
when it has a work plan, track the workplan as a checklist that it fills out as it works.
you can also atart your conversations by asking it to summarize the code base
But I can run commands on my local linux box that generate boilerplate in seconds. Why do I need to subscribe to access gpu farms for that? Then the agent gets stuck at some simple bug and goes back and forth saying "yes, I figured out and solved it now" and it keeps changing between two broken states.
The rabid prose, the Fly.io post deriding detractors... To me it seems same hype as usual. Lots of words about it, the first few steps look super impressive, then it gets stuck banging against a wall. If almost all that is said is prognostication and preaching, and we haven't seen teams and organizations racing ahead on top of this new engine of growth... maybe it can't actually carry loads outside of the demo track?
It can be useful. Does it merit 100 billion dollar outlays and datacenter-cum-nuclear-powerplant projects? I hardly think so.
Why do we trust corporations to keep making things better all of a sudden?
The most jarring effect of this hype cycle is that all appear to refers to some imaginary set of corporate entities.
AI models are trained on data that can be 1 or 2 years old. And they're trained of what the saw the most. So, language changes, breaking API changes, dependencies that don't work any more, name changes, etc. are going to get them super confused.
Go indeed works well because of its standard library that avoids the need for many dependencies, and its stability.
I found PHP to actually be the best target language for coding agents. For the same reasons, and also for the ton of documentation and example code available. That doesn't prevent agents from automatically using some modern PHP features, applying static analysis tools, etc.
For frontend stuff, agents will almost always pick React + Tailwind because this is what they saw the most. But Tailwind 4 is very different from Tailwind 3, and that got them super confused.
For simple scripts, it often costs me under $1 to build. I'm working on a bigger tool these days, and I've done lots of prompts, a decent amount of code, over 100 tests, and my running total is right now under $6.
I'd suggest learn the basics of using AI to code using Aider, and then consider whether you want to try Claude Code (which is likely more powerful, but also more expensive unless you use it all the time).
The monkey brain part of me that really doesn't trust an LLM and trusts my decades of hard-won programming experience also prefers using Aider because the usage flow generally goes:
1. Iterate with Aider on a plan
2. Tell Aider to write code
3. Review the code
4. Continue hacking myself until I want to delegate something to an LLM again.
5. Head back to Step 1.
Codex automates this flow significantly but it's also a lot more expensive. Just the little bits of guiding I offer an LLM through Aider can make the whole process a lot cheaper.
It's unclear to me whether the full agentic Claude Code/Codex style approach will win or whether Aider's more carefully guided approach will win in the marketplace of ideas, but as a pretty experienced engineer Aider seems to be the sweet spot between cost, impact, and authorial input.
Even with Aider, I feel it goes too fast and I sometimes actively slow it down (by giving it only very incremental changes rather than get it to do a larger chunk). I think I'd be totally lost with a more powerful agentic tool.
Running agents in parallel will be a big deal as soon as we learn (or the agents learn) how to reliably work with just one.
Even before then, if you're trying to get work done while the agent is doing its own thing or you find yourself watching over the agent's "shoulder" out of fear it'll change something you didn't ask it to change, then it's useful to run it in a containerized dev environment.
Container use is definitely early but moving quickly, and probably improved even since this post was published. We're currently focused on stability, reducing git confusion, better human<>agent interaction, and environment control.
I've no idea what he is saying here. It is all about vaguely defined processes and tools and people increasingly adopt an LLM writing style.
If you are insinuating that this is written by an LLM: it is not.
I infer that you are the author of the post. Take it as a compliment, I think you have written many good pre-LLM articles.
I believe this is considered a bad practice: the general attitude is that the only sane use case for values in context.Context is tracing data, and all other data should he explicitly passed via arguments.
It's totally fine to put multiple values into a different data bag type that has explicit, typed fields. For example, the Echo framework has its own strongly typed and extensible Context interface for request scoped data: https://pkg.go.dev/github.com/labstack/echo#Context
The data passing maybe, not sure how you lose type safety. The value comes from the context with the right type just fine. The stuff that I'm attaching to the context are effectively globals just that this way you can enable proper isolation in tests and else.
From my limited experience with echo, the context there is not at all the same thing.
The only place I’ve encountered this pattern is in chromedp, the go wrapper for the chrome headless browser driver. Its API… isn’t good.
Most methods you use are package globals that take a context.Context as a first parameter. But you have to understand that this context is a _special_ one: you can’t pass any old context like context.Background(), you must pass a context you got from one of the factory methods.
If you want to specify a timeout, you use context.WithTimeout. Clever I guess, but that’s the only setting that works like that.
It’s essentially a void*.
I've noticed Claude Code introduces quite a few errors and then walks through the compile errors to fix things up. Refactors/etc also become quite easy with this workflow from CC.
I'm sure it does well in dynamic languages, but given how much the LLM leans into these compile errors i get the feeling it would simply miss more things if there was none/less.
So far though my #1 concern is finding ways to constraining the LLM. It produces slop really, really quick and when it works more slowly i can avoid some of the review process. Eg i find stubbing out methods and defining the code path i want, in code, rather than trying to explain it to the LLM to be productive.
Still in my infancy of learning this tool though. It feels powerful, but also terrifying in hands of lazy folks just pushing through slop.
1) Java has the largest, oldest and most explicit data set for the LLM to reference, so it's likely to be the most thorough, if not the most correct.
2) Go with the language YOU know best because you'll be able to spot when the LLM is incorrect, flawed in its 'reasoning', hallucinating etc.
That seems to be a recommendation for coding with LLMs that don't have access to tools to look up APIs, docs and 3rd party source-code, rather than something you'd chose for "Agentic Coding".
Once the tooling can automatically figure out what is right, what language you use matters less, as long as source code ends available somewhere the agent can read it when needed.
Agree much with your 2nd point though, all outputs still require careful review and what better language to use than one you know inside-out?
basically the terser/safer syntax and runtime compilation errors are a great tight feedback loop for the agent to fix stuff by itself.
Certain points about the language, as well as certain long-existing open source projects have been discussed ad-nauseum online. This all adds to the body of knowledge.
He started off saying "learning to code with AI is like learning to cook by ordering off the menu". I know he meant "an AI being the way you learn how to code", but there's another meaning that I've been thinking a lot about because my 16yo son is really into coding and I'm trying to come up with how I can help him be successful in the world at the horizon where he starts doing it professionally.
In that way, "learning how to work together with an AI to code" is a really, really interesting question. Because the world is going to look VERY different in 2-6 years.
The thread in question: https://bsky.app/profile/alsweigart.bsky.social/post/3lr6guv...
Do we really want to tell Fabrice Bellard that he isn't productive enough?
If you want to train people to become fungible factory workers on the other hand, train them to work on the conveyor belt.
Let's take your factory example: Factories are just a fact of life right now, almost nobody is producing bespoke cars or phones or clothing. So given that my son is basically 100% likely to be working with an automation line, how do I get him on the track to being a machine operator or a millwright rather than doing conveyor belt work?
This is an interesting statement!
I also ask it to write specialized guides on narrow topics (e.g., testing async SQLAlchemy using pytest-async and pytest-postgresql).
https://www.youtube.com/watch?v=JpGtOfSgR-c
this is best set of vidoes on the topic i've seen
benob•1d ago
What do you recommand to get a Claude-code-like experience in the open-source + local llm ecosystem?
the_mitsuhiko•1d ago
There is nothing at the moment that I would recommend. However I'm quite convinced that we will see this soon. First of all I quite like where SST's OpenCode is going. The upcoming UX looks really good. Secondly because having that in place, will make it quite easy to put local models in when they get better. The issue really is that there are just not enough good models for tool usage yet. Sonnet is so shockingly good because it was trained for excellent tool usage. Even Gemini does not come close yet.
This is all just a question of time though.
hucker•1d ago
the_mitsuhiko•1d ago
aitchnyu•1d ago
the_mitsuhiko•1d ago
BeetleB•17h ago
The amusing thing is people normally recommend using aider to save money. With Aider, you can control the size of the context window, and selectively add/drop files from the window. I typically aim for under 25K tokens at a time. With Gemini, that's about 3 cents per prompt (and often much less when I have only, say, 10 tokens). So for me, I'd need to do well over 3000 coding prompts a month to get to $100. I simply don't use it that much.
Also, at work, I have Copilot, and one can use Aider with that. So I only pay for my personal coding at home.
Getting to the original question - Aider probably lags Claude Code significantly at this point. It's a fantastic tool and I still use it - primarily because it is editor agnostic. But some of the other tools out there do a lot more with agents.
To give you an idea - my combined AI use - including for non-code purposes - is well under $20/mo. Under $10 for most months. I simply don't have that much time to do coding in my free time - even with an AI doing it!
CuriouslyC•1d ago
I find agents do a lot of derpy shit for hard problems but when you've got fairly straightforward things to build it's nice to just spin them up, let them rip and walk away.
Aider feels more like pair programming with an agent, it can kind of be spun up and let rip, but mostly it tries to keep a tighter feedback loop with the user and stay more user directed, which is really powerful when working on challenging things. For stuff like codebase refactors, documentation passes, etc that tight loop feels like overkill though.
peterbraden•59m ago
yroc92•1d ago
CuriouslyC•1d ago
diggan•19h ago
Correct me if I'm wrong, but Aider still doesn't do proper tool calling? Last time I tried it, they did it the "old school" way of parsing out unix shell commands from the output text and ran it once the response finished streaming, instead of the sort of tool call/response stuff we have today.
sandinmyjoints•17h ago
CuriouslyC•17h ago
Karrot_Kream•16h ago
saint_yossarian•1d ago
Yes it's not a standalone CLI tool, but IMHO I'd rather have a full editor available at all times, especially one that's so hackable and lightweight.
mickeyp•23h ago
Single-file download, fuss-free and install-less that runs on mac, windows and linux (+ docker of course.) It can run any model that talks to openai (which is nearly all of them), so it'll work with the big guys' models and of course other ones like ones you run privately or on localhost.
Unlike Claude Code, which is very good, this one runs in your browser with a local app server to do the heavy lifting. A console app could be written to use this self-same server, too, of course (but that's not priority #1) but you do get a lot of nice benefits that you get for free from a browser.
One other advantage, vis-a-vis Armin's blog post, is that this one can "peek" into terminals that you _explicitly_ start through the service.
It's presently in closed alpha, but I want to open it up to more people to use. If you're interested, you and anyone else who is interested can ping me by email -- see my profile.
elpocko•23h ago
What does that mean? I've never seen any locally run model talk to OpenAI, how and why would they? Do you mean running an inference server that provides an OpenAI-compatible API?
mickeyp•23h ago
So, if your model inference server understands the REST API spec that OpenAI created way back, you can use a huge range of libraries that in theory only "work" with OpenAI.
diggan•19h ago
Worth clarifying that what the ecosystem/vendors have adopted is the "ChatCompletion" endpoint, which most models are under. But newer models (like codex) are only available under the Responses API, which the ecosystem/vendors haven't adopted as widely, AFAIK.
gk1•22h ago
https://www.app.build/ was just launched by the Neon -- err, Databricks -- team and looks promising.