I don’t have it write of my Python firmware or Elixir backend stuff.
What I do let it rough in is web front end stuff. I view the need for and utility of LLMs in the html/css/tailwind/js space as an indictment of complexity and inconsistency. It’s amazing that the web front end stuff has just evolved over the years, organically morphing from one thing to another, but a sound well engineered simple-is-best set of software it is not. And in a world where my efforts will probably work in most browser contexts, no surprise that I’m willing to mix in a tool that will make results that will probably work. A mess is still a mess.
"I want to run webserver on Android but it does not allow binding on ports lower than 1000. What are my options?"
Both responded with below solutions
1. Use reverse proxy
2. Root the phone
3. Run on higher port
Even after asking them to rethink they couldn't come up with the solution I was expecting. The solution to this problem is HTTPS RR records[1]. Both models knew about HTTPS RR but couldn't suggest it as a solution. It's only after I included it in their context both agreed it as a possible solution.
I guess it's also actually supported, unlike SRV that are more like supported only by some applications? Matrix migrated from SRV to .well-known files for providing the data. (Or I maybe it supports both.)
At least... this was before multiplayer discovery was commandeered. Matchmaking and so on largely put an end to opportunities.
For an LLM, I’d expect it to be similar. It can recall the stuff it’s seen thousands of times, but has a hard time recalling the niche/underdocumented stuff that it’s seen just a dozen times.
The human brain isn't a statistical aggregator. If you see a psychologically socking thing once in your lifetime, you might remember it even after dementia hits when you're old.
On the other hand, you pass by hundrends of shops every day and receive the data signal of their signs over and over and over, yet you remember nothing.
You remeber stuff you pay attention to (for whatever reason)
I am fine with this, but let's be clear about what we're expecting
Certainly a majority of people don't know this. What we're really asking is whether an LLM is expected to more than (or as much as) the average domain expert.
Only recently have I started interacting with LLM's more (I tried out a previous "use it as a book club partner" suggestion, and it's pretty great!).
When coding with them (via cursor), there was an interaction where I nudged it: "hey, you forgot xyz when you wrote that code the first time" (ie: updating an associated data structure or cache or whatever), and I find myself INTENTIONALLY giving the machine at least the shadow of a benefit of the doubt that: "Yeah, I might have made that mistake too if I were writing that code" or "Yeah, I might have written the base case first and _then_ gotten around to updating the cache, or decrementing the overall number of found items or whatever".
In the "book club" and "movie club" case, I asked it to discuss two movies and there were a few flubs: was the main character "justly imprisoned", or "unjustly imprisoned" ... a human might have made that same typo? Correct it, don't dwell on it, go with the flow... even in a 100% human discussion on books and movies, people (and hallucinating AI/LLM's) can not remember with 100% pinpoint accuracy every little detail, and I find giving a bit of benefit of the doubt to the conversation partner lowers my stress level quite a bit.
I guess: even when it's an AI, try to keep your interactions positive.
Yes, an experienced person might be able to suss out what the real problem was, but it's not really the LLMs fault for answering the specific question it was asked. Maybe you just wanted to run a server for testing and didn't realize that you can add a non-standard port to the URL.
Not to mention the solution did end up being to use a higher port number...
But it's also not crazy to think that with LLMs getting smarter (and considerable resources put into making them better at coding), that future versions would clean up and refactor code written by past versions. Correct?
And I don't really see any reason to declare we've hit the limit of what can be done with those kinds of techniques.
But, fundamentally, LLMs lack a theory of the program as intended in this comment https://news.ycombinator.com/item?id=44443109#44444904 . Hence, they can never reach the promised land that is being talked about - unless there are innovations beyond next-token prediction.
In other words, I would be wrong of me to assume that the only way I can think of to go about solving a problem is the only way to do it.
Maybe quite a few pounds, if the cure in question hasn't been invented yet and may turn out to be vaporware.
The chatbot portion of the software is useless.
Chat mode on the other hand follows my rules really well.
I mostly use o3 - it seems to be the only model that has "common sense" in my experience
This is blog post/comment section summary encountered many times per day.
The other side of it is people who seem to have 'gotten it' and can dispatch multiple agents to plan/execute/merge changes across a project and want to tell you how awesome their workflow is without actually showing any code.
Or, often, sell you something.
I don't mind when other programmers use AI, and use it myself. What I mind is the abdication of responsibility for the code or result. I don't think that we should be issuing a disclaimer when we use AI any more than when I used grep to do the log search. If we use it, we own the result of it as a tool and need to treat it as such. Extra important for generated code.
I think this is often overlooked, because on the one hand it's really impressive what the predictive model can sometimes do. Maybe it's super handy as an autocomplete, or an exploration, or for rapidly building a prototype? But for real codebases, the code itself isn't the important part. What matters is documenting the business logic and setting it up for efficient maintenance by all stakeholders in the project. That's the actual task, right there. I spend more time writing documentation and unit tests to validate that business logic than I do actually writing the code that will pass those tests, and a lot of that time is specifically spent coordinating with my peers to make sure I understand those requirements, that they were specified correctly, that the customer will be satisfied with the solution... all stuff an LLM isn't really able to replace.
Organizations which design systems (in the broad sense used here) are constrained to produce designs which are copies of the communication structures of these organizations.
— Melvin E. Conway, How Do Committees Invent?
> there's a fundamental shift when a system can no longer be held in a single mindShould LLM users invest in both biological (e.g. memory palace) and silicon memory caches?
Usually it's not because people think it can't be done, or shouldn't be done, it's because of this law. Like yes in an ideal world we'd do xyz, but department head of product A is a complete anti-productive bozo that no one wants to talk to or deal with, so we'll engineer around him kind of a thing. It's incredibly common once you see it play out you'll see it everywhere.
Sometimes, forking product lines, departments or budgets.
https://www.youtube.com/watch?v=5IUj1EZwpJY
This analysis of the real-world effects of Conway's Law seems deeply horrifying, because the implication seems to be that there's nothing you can do to keep communication efficiency and design quality high while also growing an organisation.
Self-regulating.
---
disclaimer: if low information density is your thing, then your mileage may vary. Video's are for documentaries, not for reading out an article in the camera.
Either way, it's a step-by-step walk through of the ideas of the original article that introduced Conway's Law and a deeper inspection into ideas about _why_ it might be that way.
If that's not enough then my apologies but I haven't yet found an equivalent article that goes through the ideas in the same way but in the kind of information-dense format that I assume would help you hit your daily macros.
Edit: Accidentally a word
And probably a few minutes of commercials too. I get the impression this is an emerging generational thing, but unless it's a recorded university course or a very interesting and reputable person.. no thanks. What is weird is that the instinct to prefer video seems motivated by laziness, and laziness is actually an adaptive thing to deal with information overload.. yet this noble impulse is clearly self-defeating in this circumstance. Why wait and/or click-through ads for something that's low-density in the first place, you can't search, etc.
Especially now that you can transcript the video and quickly get AI to clean it up into a post, creating/linking a video potentially telegraphs stuff like: nothing much to say but a strong desire to be in the spotlight / narcissism / an acquisitiveness for clicks / engagement. Patiently enduring infinite ads while you're pursuing educational goals and assuming others are willing to, or assuming other people are paying for ad-free just because you do, all telegraphs a lack of respect for the audience, maybe also a lack of self-respect. Nothing against OP or this video in particular. More like a PSA about how this might come across to other people, because I can't be the only person that feels this way.
Always and entirely subjective of course, but I find Casey Muratori to be both interesting and reputable.
> What is weird is that the instinct to prefer video seems motivated by laziness, and laziness is actually an adaptive thing to deal with information overload...
What's even weirder is the instinct to not actually engage with the content of the linked video and a discussion on Conway's Law and organisational efficiency and instead head straight into a monologue about some kind of emerging generational phenomenon of laziness highlighted by a supposed preference for long video content, which seems somewhat ironic itself as ignoring the original subject matter to just post your preferences as 'PSA' is its own kind of laziness. To each their own I guess.
Although I do think the six-hour YouTube 'essays' really could do with some serious editing, so perhaps there's something there after all...
IMO, LLMs of today are not capable of building theories (https://news.ycombinator.com/item?id=44427757#44435126). And, if we view programming as theory building, then LLMs are really not capable of coding. They will remain useful tools.
describe User do ... it ".."
for the thousand time.. or write the controller files with CRUD actions..
LLMS can do these. I can then review the code, improve it and go from there.
They are also very useful for brain storming ideas, I treat it as a better google search. If I'm stuck trying to model my data, I can ask it questions and it gives me recommendations. I can then think about it and come up with an approach that makes sense.
I also noticed that LLMs really lack basic comprehension. For example, no matter how many times you provide the Schema file for it (or a part of it) , it still doesn't understand that a column doesn't exist on a model and will try to shove it in the suggested code.. very annoying.
All that being said, I have an issue with "vibe coding".. this is where the chaos happens as you blindly copy and paste everything and git push goodbye
I don't think that's the main reason. Well written code is easier to follow even when you haven't written it yourself (or maybe you did but forgot about it).
The idea that the future is going to “more or less be predictable” and “within the realm of normal” is a pretty bold claim when you look at history! Paradigm shifts happen. And many people think we’re in the middle of one — people that don’t necessarily have an economic interest in saying so.
* I’m not taking a position here about predicting what particular AI technologies will come next, for what price, with what efficiency and capabilities, and when. Lots of things could happen we can’t predict — like economic cycles, overinvestment, energy constraints, war, popular pushback, policy choices, etc. But I would probably bet that LLMs are just the beginning.
Below is a link to a great article by Simon Willison explaining an LLM assisted workflow and the resulting coded tools.
[0] https://simonwillison.net/2025/Mar/11/using-llms-for-code/ [1] https://github.com/simonw/tools
Meanwhile, it's not uncommon to see people on HN saying they're orchestrating multiple major feature implementations in parallel. The impression we get here is that Simon Willson's entire `tools` featureset could be implemented in a couple of hours.
I'd appreciate some links to the second set of people. Happy to watch YouTube videos or read more in-depth articles.
"f you assume that this technology will implement your project perfectly without you needing to exercise any of your own skill you’ll quickly be disappointed."
"They’ll absolutely make mistakes—sometimes subtle, sometimes huge. These mistakes can be deeply inhuman—if a human collaborator hallucinated a non-existent library or method you would instantly lose trust in them"
"Once I’ve completed the initial research I change modes dramatically. For production code my LLM usage is much more authoritarian: I treat it like a digital intern, hired to type code for me based on my detailed instructions."
"I got lucky with this example because it helped illustrate my final point: expect to need to take over. LLMs are no replacement for human intuition and experience. "
What that gets me though is less typing fatigue and less decisions made partly due to my wrists/etc. If it's a large (but simple!) refactor, the LLM generally does amazing at that. As good as i would do. But it does that with zero wrist fatigue. Things that i'd normally want to avoid or take my time on it bangs out in minutes.
This coupled with Claude Code's recently Hook[1] introduction and you can help curb a lot of behaviors that are difficult to make perfect from an LLM. Ie making sure it tests, formats, Doesn't include emojis (boy does it like that lol), etc.
And of course a bunch of other practices for good software in general make the LLMs better, as has been discussed on HN plenty of times. Eg testing, docs, etc.
So yea, they're dumb and i don't trust their "thinking" at all. However i think they have huge potential to help us write and maintain large codebases and generally multiplying out productivity.
It's an art for sure though, and restraint is needed to prevent slop. They will put out so. much. slop. Ugh.
- one big set of users who don't like it because it generates a lot of code and uses its own style of algorithms, and it's a whole lot of unfamiliar code that the user has to load up in their mind - as you said. Too much to comprehend, and quickly overwhelming.
And then to either side
- it unblocks users who simply couldn't have written the code on their own, who aren't even trying to load it into their head. They are now able to make working programs!
- it accelerates users who could have written it on their own, given enough time, but have figured out how to treat it as an army of junior coders, and learned to only maintain the high level algorithm in their head. They are now able to build far larger projects, fast!
The only thing the "AI" is marginally good at is as a fancy auto-complete that writes log statements based on the variable I just wrote into the code above it. And even this simple use case it gets it wrong a fair amount.
Overall the "AI" is a net negative for me, but maybe close to break-even thanks to the autocomplete.
honestly, the vast majority of basically CRUD apps out there we are inflating our skills a bit too much here. even if the code is junk you can adapt your mindset to accept what LLMs produce, clean it up a bit, and come out with something maintainable.
like do these people ever have to review code from other people or juniors? the feedback loop here is tighter (although the drawback is your LLM doesn't "learn").
i wouldn't use it for anything super novel or cutting edge i guess, but i don't know, i guess everyone on HN might be coding some super secret advanced project that an LLM can't handle....?
Ultimately I am responsible for any code I check in even if it was written by an LLM, so I need to perform these lengthy reviews. As others have said, if it is code that doesn't need to be maintained, then reviewing the code can be a much faster process. This is why it is so popular for hobby projects since you don't need to maintain the code if you don't want to, and it doesn't matter if you introduce subtle but catastrophic bugs.
Ultimately the tech feels like a net neutral. When you want to just throw the code away after it is very fast and good enough. If you are responsible for maintaining it, its slower than writing it yourself.
if i need to work on something mission critical or new i do it by hand first. tests catch everything else. or you can just run it so that you review every change (like in claude code) as it comes in and can still grok the entire thing vs having to review multiple large files at the end.
thus i literally wonder what people are working on that requires this 100% focused mission critical style stuff at all times. i mean i don't think it's magic or AGI, but the general argument is always 1) works for hobby projects but not "production" 2) the LLM produces "messy code" which you have to review line by line as if you wrote it yourself which i've found to not be true at all.
This happens with any sufficiently big/old codebase. We can never remember everything, even if we wrote it ourselves
I do agree with the sentiment and insight about the 2 branches of topics frequently seen lately on HN about AI-assisted coding
Would really like to see a live/video demo of semi-autonomous agents running in parallel and executing actual useful tasks on a decently complex codebase, ideally one that was entirely “manually” written by devs before agents are involved - and that actually runs a production system with either lots of users or paid customers
The important thing about a codebase wasn't ever really size or age, but whether it was a planned architecture or grown organically. The same is true post-LLM. Want to put AI in charge of tool-smithing inconsequential little widgets that are blocking you? Fine. Want to put AI in charge of deciding your overall approach and structure? Maybe fine. Worst of all is to put the AI in charge of the former, only to find later that you handed over architectural decisions at some point and without really intending to.
That sounds like a hard or, as if ten years of development of a large codebase was entirely known up-front with not a single change to the structure over time that happened as a result of some new information.
"We build our computers the way we build our cities—over time, without a plan, on top of ruins." -- Ellen Ullman
I suspect that's at least partially because all of that doesn't stop the hype from being pushed on and on without mercy. Which in turn is probably because the perverse amounts of investment that went into this have to be reclaimed somehow with monetization. Imagine all those VCs having to realize that hundreds of billions of $$$ are lost to wishful hallucinations. Before they concede that there will of course be much astroturfing in the vein of your last paragraph.
an argument can be made that the code doesn't matter as long as the product works as it's supposed to (big asterisk here)
The only goal of a code generator is the code. I don't care whether it works or not (for specific scenarios and it could break 90% of the time). I want to see the generated code and, so far, I have never seen anything interesting besides todo lists made with ReactJS.
My org has 160 engineers working on our e-commerce frontend and middle tiers. I constantly dive into repos and code I have no ownership of. The gitblame shows a contractor who worked here 3 years ago frequently.
Seems LLM does good in small, bad in medium, good again as small modules within big.
Who says it is? The arguably most famous book in the history of software engineering makes that point and precedes LLMs by half a century
It's as miserable. I hate working on large codebases with multiple contributors unless there is super strong leadership that keeps things aligned.
This is a great read on the situation. Do you think these people are just making it up/generating baseless hype?
I have seen a few of these full blown llm coded projects and every one of them has has some giant red flashing warning at the top of the README about the project being llm generated.
So I think it’s probably a mix of avoiding embarrassment and self preservation.
- Use Cline with Sonnet 4. Other models can work but this is the best balance of price and effectiveness.
- Always use "plan" mode first, and only after the plan mode looks good do you switch to "act" mode.
- Treat the LLM as though you are pair-programming with a junior engineer.
- Review every line that gets written as it gets written. Object or change it if you don't like it for any reason.
- Do test-driven development, and have the LLM always write tests first.
I have transitioned to using this full-time for coding and am loving the results. The code is better than what I used to write, because sometimes I can miss certain cases or get lazy. The code is better tested. The code gets written at least twice as fast. This is real production code that is being code reviewed by other humans.This is definitely something I feel is a choice. I've been experimenting quite a bit with AI generated code, and with any code that I intend to publish or maintain I've been very conscious in making the decision that I own the code and that if I'm not entirely happy with the AI generated output I have to fix it (or force the AI to fix it).
Which is a very different way of reviewing code than how you review another humans code, where you make compromises because you're equals.
I think this produces fine code, not particularly quickly but used well probably somewhat quicker (and somewhat higher quality code) than not using AI.
On the flip side on some throwaway experiments and patches to personalize open source products that I have absolutely no intention of upstreaming I've made the decision that the "AI" owns the code, and gone much more down the vibe coding route. This produces unmaintainable sloppy code, but it works, and it takes a lot less work than doing it properly.
I suspect the companies that are trying to force people to use AI are going to get a lot more of the "no human ownership" code than individuals like me experimenting because they think its interesting/fun.
But I’m in no rush to invite an army of people to compete with me just yet. I’ll be back when I’m sipping coladas on a beach to tell you what I did.
One example I deal with frequently is creating Pytorch models. Any real model is absolutely not something you want to leave in the hands of an LLM since the entire point of modeling is to incorporate your own knowledge into the design. But there is a lot of tedium, and room for errors, in getting the initial model wiring setup.
While, big picture, this isn't the 10x (or more) improvement that people like to imagine, I find in practice I personally get really stuck on the "boring parts". Reducing the time I spend on tedious stuff tends to have a pretty notable improvement in my overall flow.
- Brian Kernighan
So if there's a bug in code that an LLM wrote, simply wait 6 months until the LLMs are twice as smart?
> simply wait [about a quarter of a million minutes] until the LLMs are twice as smart?
...
I use an LLM primarily for smaller, focused data analysis tasks so it's possible to move fast and still stay reasonably on top of things if I'm even a little bit careful. I think it would be really easy to trash a large code base in a hurry without some discipline and skill in using LLM. I'm finding that developing prompts, managing context, controlling pace, staying organized and being able to effectively review the LLM's work are required skills for LLM-assisted coding. Nobody teaches this stuff yet so you have to learn it the hard way.
Now that I have a taste, I wouldn't give it up. There's so much tedious stuff I just don't want to have to do myself that I can offload to the LLM. After more than 20 years doing this, I don't have the same level of patience anymore. There are also situations where I know conceptually what I want to accomplish but may not know exactly how to implement it and I love the LLM for that. I can definitely accomplish more in less time than I ever did before.
Holy shit that's the best description of this phenomenon I've heard so far. The most stark version of this I've experienced is working on a side project with someone who isn't a software engineer who vibe coded a bunch of features without my input. The code looked like 6-8 different people had worked on it with no one driving architecture and I had to untangle how it all got put together.
The sweet spot for me is using it in places where I know the exact pattern I want to use to solve a problem and I can describe it in very small discrete steps. That will often take something that would have taken me an hour or two to hand code something tedious down to 5-10 minutes. I agree that there's no going back, even if all progress stopped now that's too huge of a gain to ignore it as a tool.
Wow! Lol. One sentence to rule them all.
https://static1.srcdn.com/wordpress/wp-content/uploads/2019/...
Did you not need all these skills / approaches / frameworks for yourself / coding with a team?
This is , I think, the key difference in those (such as myself) who find LLMs to massively increase velocity / quality / quantity of output and those who don’t.
I was already highly effective at being a leader / communicator / delegating / working in teams ranging from small , intimate , we shared a mental model / context up to some of the largest teams on the planet.
If someone wasn’t already a highly effective IC/manager/leader pre LLM, an LLM will simply accelerate how fast they crash into the dirt.
It takes substantial work to be a highly effective contributor / knowledge worker at any level. Put effort into that , and LLMs become absolutely indispensable, especially as a solo founder.
Quite the selling point.
I stopped reading right here. Presumably other people did too. I don't think you're aware of the caliber of your hubris. About howitzer sized.
That hate fuels me to just do the work myself. It's like the same trick as those engagement-bait math problems that pop up on social media with the wrong answer.
After that, I may ask an LLM to write particular functions, giving it data types and signatures to guide it.
In some cases, this approach might even be slower than writing the code.
But also. But also, LLMs are incredibly powerful and capable tools at discovering and finding what the architecture of things is. They have amazing abilities to analyze huge code bases & to build documents and diagrams to map out the system. They can answer all manners of questions, to let us probe in.
I'm looking for good links here. I know I have run across some good stuff before. But uhh I guess this piece that I just found introduces the idea well enough. https://blog.promptlayer.com/llm-architecture-diagrams-a-pra...
And you can see an example in the ever excellent @simonw's review of prompts in copilot. At the end, they briefly look at testing, and it's a huge part of the code base. They ask Gemini for a summary and it spits out a beautiful massive document on how copilot testing works! https://simonwillison.net/2025/Jun/30/vscode-copilot-chat/ https://github.com/simonw/public-notes/blob/main/vs-code-cop...
Now, whether LLMs generate well architects systems is largely operator dependent. There's lots of low effort zero shot ways to give LLMs very little guidance and get out who knows what. But when I reflect on the fact that, for now, most code is legacy code, and most code is hideously under documented, most people reading code don't really have access to experts or artifacts to explain the code and it's architecture, my hope and belief is that LLMs are incredible tools to radically increase maintainability versus where we are now, that they are powerful peers in building the mental model of programming & systems.
There have been grifters hopping onto every trend. Have you noticed they never show you what exactly they built or if it was ever useful.
Likely (as am I).
> LLMs are exciting but they produce messy code for which the dev feels no ownership. [...] The other side of it is people who seem to have 'gotten it' and can dispatch multiple agents to plan/execute/merge changes across a project
Yup, can confirm, there are indeed people with differing opinions and experience/anecdotes on HN.
> want to tell you how awesome their workflow is without actually showing any code.
You might be having some AI-news-fatigue (I can relate) and missed a few, but there are also people who seem to have gotten it and do want to show code:
Armin Ronacher (of Flask, Jinja2, Sentry fame): https://www.youtube.com/watch?v=nfOVgz_omlU (workflow) and https://lucumr.pocoo.org/2025/6/21/my-first-ai-library/ (code)
Here's one of my non-trivial open source projects where a large portion is AI built: https://github.com/senko/cijene-api (didn't keep stats, I'd eyeball it at conservatively 50% - 80%)
When I first got into web development there were people using HTML generators based on photoshop comps. They would produce atrocious HTML. FE developers started rewriting the HTML because otherwise you'd end up with brittle layout that was difficult to extend.
By the time the "responsive web" became a thing HTML generators were dead and designers were expected to give developers wireframes + web-ready assets.
Same thing pretty much happened with UML->Code generators, with different details.
There's always been a tradeoff between the convenience and deskilling involved in generated code and the long term maintainability.
There's also the fact that coding is fundamentally an activity where you try to use abstractions to manage complexity. Ideally, you have interfaces that are good enough that the code reads like a natural language because you're expressing what you what the computer to do at the exact correct layer of abstraction. Code generators tend to both cause and encourage bad interfaces. Often the impetus to use a code generator is the existing interfaces are bad, bureaucratic, or obscure. But using code generators ends up creating more of the same.
Wow you really nail the point, that's what I felt but I did not understand. Thanks for the comment.
I don't see why any of this should be surprising. I think it just reflects a lot of developers using this technology and having experiences that fall neatly into one of these two camps. I can imagine a lot of factors that might pull an individual developer in one direction or the other; most of them probably correlate, and people in the middle might not feel like they have anything interesting to say.
People will always blame someone or something else for laziness.
How you get good results as a team is to develop a shared mental model, and that typically needs to exist in design docs. I find that without design docs, we all agree verbally, and then are shocked at what everyone else thought we'd agreed on. Write it down.
LLMs, like junior devs, do much better with design docs. You can even let the junior dev try writing some design docs.
So if you're a solo developer, I can see this would be a big change for you. Anyone working on a team has already had to solve this problem.
On the subject of ownership: if I commit it, I own it. If the internet "goes down" and the commit has got my name on it, "but AI" isn't going to cut it.
The question is, what do you expect from an LLM? What do you want to use it for?
They're plenty useful but, with anything, you need to use it responsibly and with proper expectations.
But I still more-or-less have to think like a software engineer. That's not going to go away. I have to make sure the code remains clean and well-organized -- which, for example, LLMs can help with, but I have to make precision requests and (most importantly) know specifically what I mean by "clean and well-organized." And I always read through and review any generated code and often tweak the output because at the end of the day I am responsible for the code base and I need to verify quality and I need to be able to answer questions and do all of the usual soft-skill engineering stuff. Etc. Etc.
So do whatever fits your need. I think LLMs are a massive multiplier because I can focus on the actual engineering stuff and automate away a bunch of the boring shit.
But when I read stuff like:
"I lost all my trust in LLMs, so I wouldn't give them a big feature again. I'll do very small things like refactoring or a very small-scoped feature."
I feel like I'm hearing something like, "I decided to build a house! So I hired some house builders and told them to build me a house with three bedrooms and two bathrooms and they wound up building something that was not at all what I wanted! Why didn't they know I really liked high ceilings?"
I hear this frequently from LLM aficionados. I have a couple of questions about it:
1) If there is so much boilerplate that it takes a significant amount of coding time, why haven't you invested in abstracting it away?
2) The time spent actually writing code is not typically the bottleneck in implementing a system. How much do you really save over the development lifecycle when you have to review the LLM output in any case?
Often times there's a lot of repetition in the app I'm working on, and there's a lot of it that's already been abstracted away, but we still have to import the component, its dependencies, and setup the whole thing which is indeed pretty boring. It really helps to tell the LLM to implement something and point it to an example of the style I want.
Whether someone’s litmus test is well-developed is another matter.
Stuff that can't just be abstracted to a function or class but also require no real thought. Tests are often (depending on what they're testing) in this realm.
I was resistant at first, but I love it. It's reduced the parts of my job that I dislike doing because of how monotonous they are and replaced them with a new fun thing to do - optimizing prompts that get it done for me much faster.
Writing the prompt and reviewing the code is _so_ much faster on tedious simple stuff and it leaves the interesting, though provoking parts of my work for me to do.
They are exactly like steroids - bigger muscles fast but tons of side effects and everything collapses the moment you stop. Companies don't care because they are more concerned about getting to their targets fast instead of your health.
Another harmful drug for our brain if consumed without moderation. I won't entirely stop using them but I have already started to actively control/focus my usage.
But even he doesn't think AI shouldn't be used. Go ahead and use it for stuff like email but don't use it for your core work.
I'm generally sympathetic to the idea that LLMs can create atrophy in our ability to code or whatever, but I dislike that this clickbaity study gets shared so much.
Tools that do many things and tools that do a small number of things are still tools.
> "do we really even need a body at all anymore?"
It's a legitimate question. What's so special about the body and why do we need to have one? Would life be better or worse without bodies?
Deep down I think everyone's answer has more to do with spirituality than anything else. There isn't a single objectively correct response.
https://www.youtube.com/watch?v=SxdOUGdseq4
LLMs seem to be really good at reproducing the classic Ball of Mud, that can't really be refactored or understood.
There's a lot of power in creating simple components that interact with other simple components to produce complex functionality. While each component is easy to understand and debug and predict its performance. The trick is to figure out how to decompose your complex problem into these simple components and their interactions.
I suppose once LLMs get really good at that skill, will be when we really won't need developers any more.
I don't really get this argument. So when LLMs become "perfect" software developers are we just going to have them running 24/7 shitting out every conceivable piece of software ever? What would anyone do with that?
Or do you expect every doctor, electrician, sales assistant, hairdresser, train driver etc. to start developing their own software on top of their existing job?
What's more likely is a few people will make it their jobs to find and break down problems people have that could use a piece of software and develop said piece of software using whatever means they have available to them. Today we call these people software developers.
I started my software career by automating my job, then automating other people’s jobs. Eventually someone decided it would be easier to just hire me as a software engineer.
I just met with an architect for adding a deck onto my house (need plans for code compliance). He said he was using AI to write programs that he could use with design software. He demoed how he was using AI to convert his static renders into walkthrough movies.
Like, I might ask an LLM on its opinion on the best way to break something down, to see if it 'thinks' of anything i havent, and then ask it to implement that. I wouldn't ask it to do the whole thing from scratch with no input on how to structure things.
kind of.
This, but but only for code. I've seen "leaders" at work suggest that we "embrace" AI, even for handling production systems and managing their complexity. That's like saying: "We've built this obscure, inscrutable system, therefore we need another obscure, inscrutable system on top of it in order to understand it!". To me, this sounds deranged, but the amount of gaslighting that's going on also makes you think you're the only to believe that...
The hard parts of engineering have always been decision making, socializing, and validating ideas against cold hard reality. But writing code just got easier so let's do that instead.
Prior to LLMs writing 10 lines of code might have been a really productive day, especially if we were able to thoughtfully avoid writing 1,000 unnecessary lines. LLMs do not change this.
Somehow if I take the best models and agents, most hard coding benchmarks are at below 50% and even swe bench verified is like at 75 maybe 80%. Not 95. Assuming agents just solve most problems is incorrect, despite it being really good at first prototypes.
Also in my experience agents are great to a point and then fall off a cliff. Not gradually. Just the type of errors you get past one point is so diverse, one cannot even explain it.
Edit: In concrete terms the workflow is to allow Copilot to make changes, see what's broken, fix those, review the diff against the goal, simplify the changes, etc, and repeat, until the overall task is done. All hands off.
Don’t feed many pages of code to AI, it works best for isolated functions or small classes with little dependencies.
In 10% of cases when I ask to generate or complete code, the quality of the code is less than ideal but fixable with extra instructions. In 25% of cases, the quality of generated code is bad and remains so even after telling it what’s wrong and how to fix. When it happens, I simply ignore the AI output and do something else reasonable.
Apart from writing code, I find it useful at reviewing new code I wrote. Half of the comments are crap and should be ignored. Some others are questionable. However, I remember a few times when the AI identified actual bugs or other important issues in my code, and proposed fixes. Again, don’t copy-paste many pages at once, do it piecewise.
For some niche areas (examples are HLSL shaders, or C++ with SIMD intrinsics) the AI is pretty much useless, probably was not enough training data available.
Overall, I believe ChatGPT improved my code quality. Not only as a result of reviews, comments, or generated codes, but also my piecewise copy-pasting workflow improved overall architecture by splitting the codebase into classes/functions/modules/interfaces each doing their own thing.
I agree it's good for helping writing smaller bits like functions. I also use it to help me write unit tests which can be kind of tedious otherwise.
I do think that the quality of AI assistance has improved a lot in the past year. So if you tried it before, maybe take another crack at it.
LLM will even through irrelevant data points in the output which causes further churn.
I feel not much has changed.
People really want a quick, low effort fix that appeals to the energy conserving lizard brain while still promising all the results.
In reality there aren't shortcuts, there's just tradeoffs, and we all realize it eventually.
I wrote about it: https://kamens.com/blog/code-with-ai-the-hard-way
If you extrapolate this blog then we shouldn't be having so much success with LLMs, we shouldn't be able to ship product with fewer people, and we should be hiring junior developers.
But the truth of the matter is, especially for folks that work on agents focusing on software development is that we can see a huge tidal shift happening in ways similar to artists, photographers, translators and copywriters have experienced.
The blog sells the idea that LLM is not productive and needs to be dialed down does not tell the whole story. This does not mean I am saying LLM should be used in all scenarios, there are clearly situations where it might not be desirable, but overall the productivity hinderance narrative I repeatedly see on HN isn't convincing and I suspect is highly biased.
Instead we're stuck talking about if the lie machine can fucking code. God.
https://zed.dev/agentic-engineering
"Interwoven relationship between the predictable & unpredictable."
You can say no, then give it more specific instructions like "keep it more simple" or "you dont need that library to be imported"
You can read the code and ensure you understand what it's doing.
Looked like dog shit, but worked fine till it hit some edge cases.
Had to break the whole thing down again and pretty much start from scratch.
Ultimately not a bad day's work, and I still had it on for autocomplete on doc-strings and such, but like fuck will I be letting an agent near code I do for money again in the near future.
I don't think LLM for coding productivity is all hype but I think for the people who "see the magic" there are many illusions here similar to those who fall prey to an MLM pitch.
You can see all the claims aren't necessarily unfounded, but the lack of guaranteed reproducibility leaves the door open for many caveats in favor of belief for the believer and cynicism for everybody else.
For the believers if it's not working for one person, it's a skill issue related to providing the best prompt, the right rules, the perfect context and so forth. At what point is this a roundabout way of doing it yourself anyway?
Really powerful seeing different options, especially based on your codebase.
> I wouldn't give them a big feature again. I'll do very small things like refactoring or a very small-scoped feature.
That really resonates with me. Anything larger often ends badly and I can feel the „tech debt“ building in my head with each minute Copilot is running. I do like the feeling though when you understood a problem already, write a detailed prompt to nudge the AI into the right direction, and it executes just like you wanted. After all, problem solving is why I’m here and writing code is just the vehicle for it.
Personally the initial excitement has worn off for me and I am enjoying writing code myself and just using kagi assistant to ask the odd question, mostly research.
When a team mate who bangs on about how we should all be using ai tried to demo it and got things in a bit of a mess, I knew we had peaked.
And all that money invested into the hype!
I instead started asking where I might look something up - in what man page, or in which documentation. Then I go read that.
This helps me build a better mental map about where information is found (e.g., in what man page), decreasing both my reliance on search engines, and LLMs in the long run.
LLMs have their uses, but they are just a tool, and an imprecise one at that.
It's important to always maintain the developer role, don't ever surrender it.
I've been allowing LLMs to do more "background" work for me. Giving me some room to experiment with stuff so that I can come back in 10-15 minutes and see what it's done.
The key things I've come to are that it HAS to be fairly limited. Giving it a big task like refactoring a code base won't work. Giving it an example can help dramatically. If you haven't "trained" it by giving it context or adding your CLAUDE.md file, you'll end up finding it doing things you don't want it to do.
Another great task I've been giving it while I'm working on other things is generating docs for existing features and modules. It is surprisingly good at looking at events and following those events to see where they go and generating diagrams and he like.
A chainsaw and chisel do different things and are made for different situations. It’s great to have chainsaws, no longer must we chop down a giant tree with a chisel.
On the other hand there’s plenty of room in the trade for handcraft. You still need to use that chisel to smooth off the fine edges of your chainsaw work, so your teammates don’t get splinters.
I feel like LLMs are already doing quite a lot. I spend less time rummaging through documentation or trying to remember obscure api's or other pieces of code in a software project. All I need is a strong mental model about the project and how things are done.
There is a lot of obvious heavy lifting that LLMs are doing that I for one am not able to take for granted.
For people facing constraints similar to those in a resource constrained economic environment, the benefits of any technology that helps them spend less time doing work that doesn't deliver value is immediately visible/obvious/apparent.
It is no longer an argument about whether it is a hype or something, it is more about how best to use it to achieve your goals. Forget the hype. Forget the marketing of AI companies - they have to do that to sell their products - nothing wrong with that. Don't let companies or bloggers set your own expectations of what could or should be done with this piece of tech. Just get on the bandwagon and experiment and find out what is too much. In the end I feel we will all come from these experiments knowing that LLMs are already doing quite a lot.
TRIVIA I even came by this article https://www.greptile.com/blog/ai-code-reviews-conflict. That clearly pointed out how LLM reliance can bring both the 10x dev and 1x dev closer to a median of "goodness". So the 10x dev is probably worse and the 1x dev ends up getting better - I'm probably that guy because I tend to mis subtle things in code and copilot review has had my ass for a while now - I haven't had defects like that in a while.
Every tool is just a tool. No tool is a solution. Until and unless we hit AGI, only the human brain is that.
Is there any other way but down from such revolutionary ungrounded expectations?
Nedomas•14h ago
At first I was very enthusiastic and thought Codex is helping me multiplex myself. But you actually spend so much time trying to explain Codex the most obvious things and it gets them wrong all the time in some kind of nuanced way that in the end you spend more time doing things via Codex than by hand.
So I also dialed back Codex usage and got back to doing many more things by hand again because its just so much faster and much more predictable time-wise.
nsingh2•13h ago