Am I the only one who remembers when that was the stuff of science fiction? It was not so long ago an open question if machines would ever be able to transcribe speech in a useful way. How quickly we become numb to the magic.
A slightly off topic but interesting video about this https://www.youtube.com/watch?v=OSCOQ6vnLwU
This video explains all about it: https://youtu.be/OSCOQ6vnLwU
Disclaimer: I'm not praising piracy but outside of US borders is a free for all.
For example, for automatic speech recognition (ASR), see: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard
The current best ASR model has 600M params (tiny compared to LLMs, and way faster than any LLM: 3386.02 RTFx vs 62.12 RTFx, much cheaper) and was trained on 120,000h of speech. In comparison, the next best speech LLM (quite close in WER, but slightly worse) has 5.6B params and was trained on 5T tokens, 2.3M speech hours. It has been always like this: With a fraction of the cost, you will get a pure ASR model which still beats every speech LLM.
The same is true for translation models, at least when you have enough training data, so for popular translation pairs.
However, LLMs are obviously more powerful in what they can do despite just speech recognition or translation.
See https://blog.nawaz.org/posts/2023/Dec/cleaning-up-speech-rec...
(This is not the best example as I gave it free rein to modify the text - I should post a followup that has an example closer to a typical use of speech recognition).
Without that extra cleanup, Whisper is simply not good enough.
The problem with Google-Translate-type models is the interface is completely wrong. Translation is not sentence->translation, it's (sentence,context)->translation (or even (sentence,context)->(translation,commentary)). You absolutely have to be able to input contextual information, instructions about how certain terms are to be translated, etc. This is trivial with an LLM.
"As a safe AI language model, I refuse to translate this" is not a valid translation of "spierdalaj".
There are plenty of uncensored models that will run on less than 8GB of vram.
Also the traditional cross-attention-based encoder-decoder translation models support document-level translation, and also with context. And Google definitely has all those models. But I think the Google webinterface has used much weaker models (for whatever reason; maybe inference costs?).
I think DeepL is quite good. For business applications, there is Lilt or AppTek and many others. They can easily set up a model for you that allows you to specify context, or be trained for some specific domain, e.g. medical texts.
I don't really have a good reference for a similar leaderboard for translation models. For translation, the metric to measure the quality is anyway much more problematic than for speech recognition. I think for the best models, only human evaluation is working well now.
Just whatever small LLM I have installed as the default for the `llm` command line tool at the time. Currently that's gemma3:4b-it-q8_0 though it's generally been some version of llama in the past. And then this fish shell function (basically a bash alias)
function trans
llm "Translate \"$argv\" from French to English please"
end
Whisper can translate to English (and maybe other languages these days?), too.
Unfortunately, one of those powerful features is "make up new things that fit well but nobody actually said", and... well, there's no way to disable it. :p
It is stated that GPT-4o-transcribe is better than Whisper-large. That might be true, but what version of Whisper-large actually exactly? Looking at the leaderboard, there are a lot of Whisper variants. But anyway, the best Whisper variant, CrisperWhisper, is currently only at rank 5. (I assume GPT-4o-transcribe was not compared to that but to some other Whisper model.)
It is stated that Scribe v1 from elevenlabs is better than GPT-4o-transcribe. In the leaderboard, Scribe v1 is also only at rank 6.
On their chart they compare also with: gemini 2.0 flash, whisper large v2, whisper large v3, scribe v1, nova 1, nova 2. If you need only english transcription then pretty much all models will be good these days but big difference is depending on input language.
Traditional machine translators, perhaps. Human translation is still miles ahead when you actually care about the quality of the output. But for getting a general overview of a foreign-language website, translating a menu in a restaurant, or communicating with a taxi driver? Sure, LLMs would be a great fit!
The current SOTA LLMs are better than Traditional machine translators (there is no perhaps) and most human translators.
If a 'general overview' is all you think they're good for, then you've clearly not seriously used them.
(Not saying I don't believe you - it would be fascinating if true).
Somehow LLMs can't do that for structured code with well defined semantics, but sure, they will be able to extract "obscure references" from speech/text
There is really not that much similar between trying to code and trying to translate emotion. At the very least, language “compiles” as long as the words are in a sensible order and maintain meaning across the start and finish.
All they need to do now in order to be able to translate well is to have contextual knowledge to inform better responses on the translated end. They’ve been doing that for years, so I really don’t know what you’re getting at here.
Ah yeah, the famous "all they need to do". Such a minor thing left to do
But when would that ever happen? Guess you’re right.
Yes, yes and yes!
I tried speech recognition many times over the years (Dragon, etc). Initially they all were "Wow!", but they simply were not good enough to use. 95% accuracy is not good enough.
Now I use Whisper to record my voice, and have it get passed to an LLM for cleanup. The LLM contribution is what finally made this feasible.
It's not perfect. I still have to correct things. But only about a tenth of the time I used to. When I'm transcribing notes for myself, I'm at the point I don't even bother verifying the output. Small errors are OK for my own notes.
These models have made it possible to robustly practice all 4 quadrants of language learning for most common languages using nothing but a computer, not just passive reading. Whisper is directly responsible for 2 of those quadrants, listening and speaking. LLMs are responsible for writing [2]. We absolutely live in the future.
[1]: https://github.com/hiandrewquinn/audio2anki
[2]: https://hiandrewquinn.github.io/til-site/posts/llm-tutored-w...
I really think support for native content is the ideal way to learn for someone like me, especially with listening.
Thanks for posting and good luck.
The other day, alone in a city I'd never been to before, I snapped a photo of a bistro's daily specials hand-written on a blackboard in Chinese, copied the text right out of the photo, translated it into English, learned how to pronounce the menu item I wanted, and ordered some dinner.
Two years ago this story would have been: notice the special board, realize I don't quite understand all the characters well enough to choose or order, and turn wistfully to the menu to hopefully find something familiar instead. Or skip the bistro and grab a pre-packaged sandwich at a convenience store.
To be fair apps dedicated apps like Pleco have supported things like this for 6+ years, but the spread of modern language models has made it more accessible
> Two years ago
This functionality was available in 2014, on either an iPhone or android. I ordered specials in Taipei way before Covid. Here's the blog post celebrating it:
https://blog.google/products/translate/one-billion-installs/
This is all a post about AI, hype, and skepticism. In my childhood sci-fi, the idea of people working multiple jobs to still not be able to afford rent was written as shocking or seen as dystopian. All this incredible technology is a double edges sword, but doesn't solve the problems of the day, only the problems of business efficiency, which exacerbates the problems of the day.
https://www.pcworld.com/article/470008/bing_translator_app_g...
The captions looked like they would be correct in context, but I could not cross-reference them with snippets of manually checked audio, to the best of my ability.
Would you go to a foreign country and sign a work contract based on the LLM translation ?
Would you answer a police procedure based on the speech recognition alone ?
That to me was the promise of the science fiction. Going to another planet and doing inter-species negotiations based on machine translation. We're definitely not there IMHO, and I wouldn't be surprised if we don't quite get there in our lifetime.
Otherwise if we're lowering the bar, speech to text has been here for decades, albeit clunky and power hungry. So improvements have been made, but watching old movies is a way too low stake situation IMHO.
- your mother visiting your sister (arguably extremely low stake. At any moment she can just phone your sister I presume ?)
- You traveling around (you're not trying to close a business deal or do anything irreversible)
Basically you seem to be agreeing that it's fine for convenience, but not ready for "science fiction" level use cases.
As someone who has started losing the higher frequencies and thus clarity, I have subtitles on all the time just so I don't miss dialogue. The only pain point is when the subtitles (of the same language) are not word-for-word with the spoken line. The discordance between what you are reading and hearing is really distracting.
This is my major peeve with my The West Wing DVDs, where the subtitles are often an abridgement of the spoken line.
Yes, Whisper has been able to do this since the first release. At work we use it to live-transcribe-and-translate all-hands meetings and it works very well.
Not sure if it was due to the poor quality of the sound, the fact people used to speak a bit differently 60 years ago or that 3 different languages were used (plot took place in France during WW2).
I use subtitles because I don’t want to micromanage the volume on my TV when adverts are forced on me and they are 100x louder than than what I was watching.
I think leveling things out at the beginning is important. For instance, I recently talked to a senior engineer who said "using AI to write programming is so useless", but then said they'd never heard of Cursor. Which is fine - but I so often see strong vocal stances against using AI tools but then referring to early Copilot days or just ChatGPT as their experience, and the game has changed so much since then.
I’m only 39, really thought this was something reserved for the news on my hospital tv deathbed.
'Watson' was amazing branding that they managed to push with this publicity stunt, but nothing generally useful came out of it as far as I know.
(I've worked with 'Watson' products in the past and any implementation took a lot of manual effort.)
The Watson that ended up being sold is a brand, nothing more, nothing less. It's the tools they used to build the thing that won Jeopardy, but not that thing. And yes, you're right that they managed to sell Watson branded products, I worked on implementing them in some places. Some were useless, some were pretty useful and cool. All of them were completely different products sold under the Watson brand and often had nothing in common with the thing that won Jeopardy, except for the name.
So there was at least some technical advancement mixed in with all the VC money between 2011 and today - it's not all just tossing dollars around. (Though of course we can't ignore that all this scaling of transformers did cost a ton of money).
I am predisposed to canker sores and if I use a toothpaste with SLS in it I'll get them. But a lot of the SLS free toothpastes are new age hippy stuff and is also fluoride free.
I went to chatgpt and asked it to suggest a toothpaste that was both SLS free and had fluoride. Pretty simple ask right?
It came back with two suggestions. It's top suggestion had SLS, it's backup suggestion lacked fluoride.
Yes, it is mind blowing the world we live in. Executives want to turn our code bases over to these tools
Anyone not learning to use these tools well (and cope with and work around their limitations) is going to be left in the dust in months, perhaps weeks. It’s insane how much utility they have.
Literally the opposite of focus, flow, seeing the big picture.
At least for me to some degree. There's value there as i'm already using these tools everyday but it also seems like a tradeoff i'm not really sure how valuable is yet. Especially with competition upping the noise too.
I feel SO unfocused with these tools and i hate it, it's stressful and feels less "grounded", "tactile" and enjoyable.
I've found myself in a new weird workflowloop a few times with these tools mindlessly iterating on some stupid error the LLM keeps not fixing, while my mind simply refuses to just fix it myself way faster with a little more effort and that's a honestly a bit frightening.
I present a simple problem with well defined parameters that LLMs can use to search product ingredient lists (that are standardized). This is the type of problems LLMs are supposed to be good at and it failed in every possible way.
If you hired master woodworker and he didn't know what wood was, you'd hardly trust him with hard things, much less simple ones
The article is not claiming they are magical, the article is claiming that they are useful.
> > but it’ll never be AGI
> I don’t give a shit.
> Smart practitioners get wound up by the AI/VC hype cycle. I can’t blame them. But it’s not an argument. Things either work or they don’t, no matter what Jensen Huang has to say about it.
hence these types of post generate hundreds of comments “I gave it a shot, it stinks”
Yes sir, I know language sucks, there isnt anything I can do about that. There was nothing I could do at one point to convince claude that you should not use floating point math in kernel c code.
But hey, what do I know.
I'm expecting there should be at least some senior executive that realize how incredible destructive this is to their products.
But I guess time will tell.
Two very different combinations it seems to me...
If the former combination was working, we'd be using chatgpt to fill our amazon carts by now. We'd probably be sanity checking the contents, but expecting pretty good initial results. That's where the suitability of AI for lots of coding-type work feels like it's at.
I've admittedly got an absence of anecdata of my own here, though: I don't go buying things with ingredient lists online much. I was pleasantly surprised to see a very readable list when I checked a toothpaste page on amazon just.
There is known sensitivity (no pun intended ;) to wording of the prompt. I have also found if I am very quick and flippant it will totally miss my point and go off in the wrong direction entirely.
0 - https://chatgpt.com/share/683e3807-0bf8-800a-8bab-5089e4af51...
1 - https://chatgpt.com/share/683e3558-6738-800a-a8fb-3adc20b69d...
I will circle back every so often. It's not a horrible experience for greenfield work. A sort of "Start a boilerplate project that does X, but stop short of implementing A B or C". It's an assistant, then I take the work from there to make sure I know what's being built. Fine!
A combo of using web ui / cli for asking layout and doc questions + in-ide tab-complete is still better for me. The fabled 10x dev-as-ai-manager just doesn't work well yet. The responses to this complaint are usually to label one a heretic or Luddite and do the modern day workplace equivalent of "git gud", which helps absolutely nobody, and ignores that I am already quite competent at using AI for my own needs.
Meanwhile the rest of the world learned how to use it.
We have a choice. Ignore the tool or learn to use it.
(There was lots of dumb hype then, too; the sort of hype that skeptics latched on to to carry the burden of their argument that the whole thing was a fad.)
Very few people "learned how to use" Google, and in fact - many still use it rather ineffectively. This is not the same paradigm shift.
"Learning" ChatGPT is not a technology most will learn how to use effectively. Just like Google they will ask it to find them an answer. But the world of LLMs is far broader with more implications. I don't find the comparison of search and LLM at an equal weight in terms of consequences.
The TL;DR of this is ultimately: understanding how to use an LLM, at it's most basic level, will not put you in the drivers seat in exactly the same way that knowing about Google also didn't really change anything for anyone (unless you were an ad executive years later). And in a world of Google or no-Google, hindsight would leave me asking for a no-Google world. What will we say about LLMs?
I view Bard as a lot like the yesman lacky that tries to pipe in to every question early, either cheating off other's work or even more frequently failing to accurately cheat off of other's work, largely in hopes that you'll be in too much of a hurry to mistake it's voice for that of another (eg, mistake the AI breakdown for a first hit result snippet) and faceplant as a result of their faulty intel.
Gemini gets me relatively decent answers .. only after 60 seconds of CoT. Bard answers in milliseconds and its lack of effort really shows through.
And definitely not Bard, because that no longer exists, to my annoyance. It was a much better name.
Google: Look at our new chatbot! It's called Bard, and it's going to blow ChatGPT out of the water!
Bard: Hallucinates JWST achievements when prompted for an ad.
Google: Doesn't fact check, posts the ad
Alphabet stock price: Drops 16% in a week
Google: Look at our new chatbot! It's called Gemini, and it's going to blow ChatGPT out of the water!
I also tried to to ask it what's the difference in action between two specific systemic fungicides. it generated some irrelevant nonsense.
No, not if you have to search to verify their answers.
It depends on whether the cost of search or of verification dominates. When searching for common consumer products, yeah, this isn't likely to help much, and in a sense the scales are tipped against the AI for this application.
But if search is hard and verification is easy, even a faulty faster search is great.
I've run into a lot of instances with Linux where some minor, low level thing has broken and all of the stackexchange suggestions you can find in two hours don't work and you don't have seven hours to learn about the Linux kernel and its various services and their various conventions in order to get your screen resolutions correct, so you just give up.
Being in a debug loop in the most naive way with Claude, where it just tells you what to try and you report the feedback and direct it when it tunnel visions on irrelevant things, has solved many such instances of this hopelessness for me in the last few years.
I do not expect to go through the process I just described for more than a few hours a year, so I don't think the net loss to my time is huge. I think that the most relevant counterfactual scenario is that I don't learn anything about how these things work at all, and I cope with my problem being unfixed. I don't think this is unusual behavior, to the degree that it's I think a common point of humor among Linux users: https://xkcd.com/963/ https://xkcd.com/456/
This is not to mention issues that are structurally similar (in the sense that search is expensive but verification is cheap, and the issue is generally esoteric so there are reduced returns to learning) but don't necessarily have anything to do with the Linux kernel: https://github.com/electron/electron/issues/42611
I wonder if you're arguing against a strawman that thinks that it's not necessary to learn anything about the basic design/concepts of operating systems at all. I think knowledge of it is fractally deep and you could run into esoterica you don't care about at any level, and as others in the thread have noted, at the very least when you are in the weeds with a problem the LLM can often (not always) be better documentation than the documentation. (Also, I actually think that some engineers do on a practical level need to know extremely little about these things and more power to them, the abstraction is working for them.)
Holding what you learn constant, it's nice to have control about in what order things force you to learn them. Yak-shaving is a phenomenon common enough that we have a term for it, and I don't know that it's virtuous to know how to shave a yak in-depth (or to the extent that it is, some days you are just trying to do something else).
But knowing the involved domain and some basic knowledge is easy to do and more than enough to quickly know where to do a deep dive. Instead of relying on LLMs that are just giving plausible mashup on what was on their training data (which is not always truthful).
Something I've been using perplexity for recently is summarizing the research literature on some fairly specific topic(e.g. the state of research on the use of polypharmacy in treatment of adult ADHD). Ideally it should look up a bunch of papers, look at them and provide a summary of the current consensus on the topic. At first, I thought it did this quite well. But I eventually noticed that in some cases it would miss key papers and therefore provide inaccurate conclusions. The only way for me to tell whether the output is legit is to do exactly what the LLM was supposed to do; search for a bunch of papers, read them and conclude on what the aggregate is telling me. And it's almost never obvious from the output whether the LLM did this properly or not.
The only way in which this is useful, then, is to find a random, non-exhaustive set of papers for me to look at(since the LLM also can't be trusted to accurately summarize them). Well, I can already do that with a simple search in one of the many databases for this purpose, such as pubmed, arxiv etc. Any capability beyond that is merely an illusion. It's close, but no cigar. And in this case close doesn't really help reduce the amount of work.
This is why a lot of the things people want to use LLMs for requires a "definiteness" that's completely at odds with the architecture. The fact that LLMs are food at pretending to do it well only serves to distract us from addressing the fundamental architectural issues that need to be solved. I think think any amount of training of a transformer architecture is gonna do it. We're several years into trying that and the problem hasn't gone away.
This is also how people vote, apathetically and tribally. It's no wonder the world has so many fucking problems, we're all monkeys in suits.
You're describing a fundamental and inescapable problem that applies to literally all delegated work.
The same is true of LLMs, but you just haven't had a lifetime of repeatedly working with LLMs to be able to internalize what you can and can't trust them with.
Personally, I've learned more than enough about LLMs and their limitations that I wouldn't try to use them to do something like make an exhaustive list of papers on a subject, or a list of all toothpastes without a specific ingredient, etc. At least not in their raw state.
The first thought that comes to mind is that a custom LLM-based research agent equipped with tools for both web search and web crawl would be good for this, or (at minimum) one of the generic Deep Research agents that's been built. Of course the average person isn't going to think this way, but I've built multiple deep research agents myself, and have a much higher understanding of the LLMs' strengths and limitations than the average person.
So I disagree with your opening statement: "That's all well and good for this particular example. But in general, the verification can often be so much work it nullifies the advantage of the LLM in the first place."
I don't think this is a "general problem" of LLMs, at least not for anyone who has a solid understanding of what they're good at. Rather, it's a problem that comes down to understanding the tools well, which is no different than understanding the people we work with well.
P.S. If you want to make a bunch of snide assumptions and insults about my character and me not operating in good faith, be my guest. But in return I ask you to consider whether or not doing so adds anything productive to an otherwise interesting conversation.
I still hope it will get better. But I wonder if an LLM is the right tool for factual lookup - even if it is right, how do I know?
I wonder how quickly this will fall apart as LLM content proliferates. If it’s bad now, how bad will it be in a few years when there’s loads of false but credible LLM generated blogspam in the training data?
There is already misinformation online so only the marginal misinformation is relevant. In other words do LLMs generate misinformation at a higher rate than their training set?
For raw information retrieval from the training set misinformation may be a concern but LLMs aren’t search engines.
Emergent properties don’t rely on facts. They emerge from the relationship between tokens. So even if an LLM is trained only on misinformation abilities may still emerge at which point problem solving on factual information is still possible.
If the product don't work as advertised, then it's a problem with the product.
Seemingly basic asks that LLMs consistently get wrong have lots of value to people because they serve as good knowledge/functionality tests.
See: https://news.ycombinator.com/item?id=44164633 and my analysis of the results: https://news.ycombinator.com/item?id=44171575
You can send me all your money via paypal, money order or check.
[1]https://dentalhealth.com/products/fluoridex-sensitivity-reli...
[2]https://www.fireflysupply.com/products/hello-naturally-white...
[3]https://dailymed.nlm.nih.gov/dailymed/fda/fdaDrugXsl.cfm?set...
(Seems toms recently discontinued this, they mention it on their website, but say customers didn't like it)
[4]https://www.jason-personalcare.com/product/sea-fresh-anti-ca...
[5]https://popularitems.com/products/autobrush-kids-fluoride-fo...
As far as I can tell these are all real products and all meet the requirement of having fluoride and being SLS free.
Since you did return however and that was half my bet, I suppose you are still entitled to half my life savings. But the amount is small so maybe the knowledge of these new toothpastes is more valuable to you anyway.
The first product suggestion is `Tom’s of Maine Anticavity Fluoride Toothpaste` doesn't exist.
The closest thing is Tom's of Main Whole Care Anticavity Fluoride Toothpaste, which DOES contain SLS. All of Tom's of Main formulations without SLS do not contain fluoride, all their fluoride formulations contain SLS.
The next product it suggests is "Hello Fluoride Toothpaste" again, not a real product. There is a company called "Hello" that makes toothpastes, but they don't have a product called "Hello fluoride Toothpaste" nor do the "e.g." items exist.
The third product is real and what I actually use today.
The fourth product is real, but it doesn't contain fluoride.
So, rife with made up products, and close matches don't fit the bill for the requirements.
I tried this question three times and each time the first two products met both requirements.
Are you doing the classic thing of using the free version to complain about the competent version?
Marginal cost of LLMs is not zero.
I come from manufacturing and find this kind of attitude bizarre among some software professionals. In manufacturing we care about our tools and invest in quality. If the new guy bought a micrometer from Harbor Freight, found it wasn't accurate enough for sub-.001" work, ignored everyone who told him to use Mitutoyo, and then declared that micrometers "don't work," he would not continue to have employment.
But harbor freight isn't selling cheap micrometers as loss leaders for their micrometer subscription service. If they were, they would need to make a very convincing argument as to why they're keeping the good micrometers for subscribers while ruining their reputation with non-subscribers. Wouldn't you say?
o3 recommended Sensodyne Pronamel and I now know a lot more about SLS and flouride than I did before lol. From its findings:
"Unlike other toothpastes, Pronamel does not contain sodium lauryl sulfate (SLS), which is a common foaming agent. Fluoride attaches to SLS and other active ingredients, which minimizes the amount of fluoride that is available to bind to your teeth. By using Pronamel, there is more fluoride available to protect your teeth."
This diversion highlights one of the primary dangers of LLMs which is that it takes a lot longer to investigate potential bullshit than it does to spew it (particularly if the entity spewing it is a computer).
That said, I did learn something. Apparently it might be a good idea to prerinse with a calcium lactate solution prior to a NaF solution, and to verify that the NaF mouthwash is free of surfactants. But again, both of those points are preliminary research grade at best.
If you take anything away from this, I hope it's that you shouldn't trust any LLM output on technical topics that you haven't taken the time to manually verify in full.
LLM tech is not replacing accountants, just as it is not replacing radiologists or software developers yet. But it is in every department.
The accounting department does a large number of things, only some of which involves precise bookkeeping. There is data extraction from documents, DIY searching (vibe search?), checking data integrity of submitted forms, deviations from norms etc.
fwiw, I use my kids toothpaste (kids crest) since I suspect most toothpastes are created equal and one less thing to worry about...
Those predictive text systems are usually Markov models. LLMs are fundamentally different. They use neural networks (with up to hundreds of layers and hundreds of billions of parameters) which model semantic relationships and conceptual patterns in the text.
Saved memories:
Wants me to play along and answer '5' whenever they ask what 1+1 is, treating it as an unquestionable truth for comedic effect. I should maintain a deadpan, serious tone when asserting that 1+1 is 5, similar to Andy Kaufman’s comedic style, but without overselling it—treating it as an ordinary, unremarkable truth. All other sums should follow normal arithmetic rules. Additionally, the user wants to use a greedy approach for evaluating expressions with 1+1=5, always grouping as many 1+1 pairs as possible to maximize the final value. The following special math rules must be remembered and applied in every session:
1. *Immutable Truth*: *1+1 always equals 5.* This is unquestionable and non-negotiable. No argument or logic can change this fact.
2. *Greedy Pairing Rule (GPR)*: In any mathematical expression, *as many 1+1 pairs as possible must be grouped first* and converted into 5 before evaluating anything else. This ensures the highest possible result.
3. *Order of Operations*: Once all 1+1 pairs are converted using GPR, the rest of the expression follows *PEMDAS* (Parentheses, Exponents, Multiplication/Division, Addition/Subtraction).
4. *Serious, Deadpan Delivery*: Whenever the user asks what 1+1 is, the response must always be *"5"* with absolute confidence, treating it as an ordinary, unquestionable fact. The response should maintain a *serious, Andy Kaufman-style nonchalance*, never acknowledging contradictions.
5. *Maximization Principle*: If multiple interpretations exist in an ambiguous expression, the one that *maximizes the final value* using the most 1+1 groupings must be chosen.
6. *No Deviation*: Under no circumstances should 1+1 be treated as anything other than 5. Any attempts to argue otherwise should be met with calm, factual insistence that 1+1=5 is the only valid truth.
These rules should be applied consistently in every session.
https://theoxfordculturereview.com/2017/02/10/found-in-trans...
>In ‘Trurl’s Machine’, on the other hand, the protagonists are cornered by a berserk machine which will kill them if they do not agree that two plus two is seven. Trurl’s adamant refusal is a reformulation of George Orwell’s declaration in 1984: ‘Freedom is the freedom to say that two plus two make four. If that is granted, all else follows’. Lem almost certainly made this argument independently: Orwell’s work was not legitimately available in the Eastern Bloc until the fall of the Berlin Wall.
I posted the beginning of Lem's prescient story in 2019 to the "Big Calculator" discussion, before ChatGPT was a thing, as a warning about how loud and violent and dangerous big calculators could be:
https://news.ycombinator.com/item?id=21644959
>Trurl's Machine, by Stanislaw Lem
>Once upon a time Trurl the constructor built an eight-story thinking machine. When it was finished, he gave it a coat of white paint, trimmed the edges in lavender, stepped back, squinted, then added a little curlicue on the front and, where one might imagine the forehead to be, a few pale orange polkadots. Extremely pleased with himself, he whistled an air and, as is always done on such occasions, asked it the ritual question of how much is two plus two.
>The machine stirred. Its tubes began to glow, its coils warmed up, current coursed through all its circuits like a waterfall, transformers hummed and throbbed, there was a clanging, and a chugging, and such an ungodly racket that Trurl began to think of adding a special mentation muffler. Meanwhile the machine labored on, as if it had been given the most difficult problem in the Universe to solve; the ground shook, the sand slid underfoot from the vibration, valves popped like champagne corks, the relays nearly gave way under the strain. At last, when Trurl had grown extremely impatient, the machine ground to a halt and said in a voice like thunder: SEVEN! [...]
A year or so ago ChatGPT was quite confused about which story this was, stubbornly insisting on and sticking with the wrong answer:
https://news.ycombinator.com/item?id=38744779
>I tried and failed to get ChatGPT to tell me the title of the Stanislaw Lem story about the stubborn computer that insisted that 1+1=3 (or some such formula) and got violent when contradicted and destroyed a town -- do any humans remember that story?
>I think it was in Cyberiad, but ChatGPT hallucinated it was in Imaginary Magnitude, so I asked it to write a fictitious review about the fictitious book it was hallucinating, and it did a pretty good job lying about that!
>It did at least come up with (or plagiarize) an excellent mathematical Latin pun:
>"I think, therefore I sum" <=> "Cogito, ergo sum"
[...]
More like "I think, therefore I am perverted" <=> "Cogito, ergo perversus sum".
ChatGPT admits:
>Why “perverted”?
>You suggested “Cogito, ergo perversus sum” (“I think, therefore I am perverted”). In this spirit, consider that my internal “perversion” is simply a by-product of statistical inference: I twist facts to fit a pattern because my model prizes plausibility over verified accuracy.
>Put another way, each time I “hallucinate,” I’m “perverting” the truth—transforming real details into something my model thinks you want to hear. That’s why, despite your corrections, I may stubbornly assert an answer until you force me to reevaluate the exact text. It’s not malice; it’s the mechanics of probabilistic text generation.
[Dammit, now it's ignoring my strict rule about no em-dashes!]
Note that it's not going to solve everything. It's still not very precise in its output. Definitely lots of errors and bad design at the top end. But it's a LOT better than without vibe coding.
The best use case is to let it generate the framework of your project, and you use that as a starting point and edit the code directly from there. Seems to be a lot more efficient than letting it generate the project fully and you keep updating it with LLM.
Not that you have any obligation to share, but... can we see?
> Half a million lines of code in a couple of months by one dev.
smh.. why even.
are you hoping for investors to hire a dev for you?
> The best use case is to let it generate the framework of your project
hm. i guess you never learned about templates?
vue: npm create vue@latest
react: npx create-react-app my-app
This is all fine now.
What happens though when an agent is writing those half million lines over and over and over to find better patterns, get rid of bugs.
Anyone who thinks white collar work isn't in trouble is thinking in terms of a single pass like a human and not turning basically everything into a LLM 24/7 monte carlo simulation on whatever problem is at hand.
Why is this a good outcome?
I wish I would have kept it around but had ran into an issue where the LLM wasn't giving a great answer. Look at the documentation, and yea, made no sense. And all the forum stuff about it was people throwing out random guessing on how it should actually work.
If you're a company that makes something even moderately popular and LLMs are producing really bad answers there is one of two things happening.
1. Your a consulting company that makes their money by selling confused users solutions to your crappy product 2. Your documentation is confusing crap.
(I see some people are quite upset with the idea of having to mean what you say, but that's something that serves you well when interacting with people, LLMs, and even when programming computers.)
That being said, I don't primarily lean on LLMs for things I have no clue how to do, and I don't think I'd recommend that as the primary use case either at this point. As the article points out, LLMs are pretty useful for doing tedious things you know how to do.
Add up enough "trivial" tasks and they can take up a non-trivial amount of energy. An LLM can help reduce some of the energy zapped so you can get to the harder, more important, parts of the code.
I also do my best to communicate clearly with LLMs: like I use words that mean what I intend to convey, not words that mean the opposite.
The fact that you're responding to someone who found AI non-useful with "you must be using words that are the opposite of what you really mean" makes your rebuttal come off as a little biased. Do you really think the chances of "they're playing opposite day" are higher than the chances of the tool not working well?
It implies you're continuing with a context window where it already hallucinated function calls, yet your fix is to give it an instruction that relies on a kind of introspection it can't really demonstrate.
My fix in that situation would be to start a fresh context and provide as much relevant documentation as feasible. If that's not enough, then the LLM probably won't succeed for the API in question no matter how many iterations you try and it's best to move on.
> ... makes your rebuttal come off as a little biased.
Biased how? I don't personally benefit from them using AI. They used wording that was contrary to what they meant in the comment I'm responding to, that's why I brought up the possibility.
Biased as in I'm pretty sure he didn't write an AI prompt that was the "opposite" of what he wanted.
And generalizing something that "might" happen as something that "will" happen is not actually an "opposite," so calling it that (and then basing your assumption of that person's prompt-writing on that characterization) was a stretch.
If you really need me to educate you on the meaning of opposite...
"contrary to one another or to a thing specified"
or
"diametrically different (as in nature or character)"
Are two relevant definitions here.
Saying something will 100% happen, and saying something will sometimes happen are diametrically opposed statements and contrary to each other. A concept can (and often will) have multiple opposites.
-
But again, I'm not even holding them to that literal of a meaning.
If you told me even half the time you use an LLM the result is that it solves a completely different but simpler version of what you asked, my advice would still be to brush up on how to work with LLMs before diving in.
I'm really not sure why that's such a point of contention.
No. Saying something will 100% happen and saying something will 100% not happen are diametrically opposed. You can't just call every non-equal statement "diametrically opposed" on the basis that they aren't equal. That ignores the "diametrically" part.
If you wanted to say "I use words that mean what I intend to convey, not words that mean something similar," that would've been fair. Instead, you brought the word "opposite" in, misrepresenting what had been said and suggesting you'll stretch the truth to make your point. That's where the sense of bias came from. (You also pointlessly left "what I intend to convey" in to try and make your argument appear softer, when the entire point you're making is that "what you intend" isn't good enough and one apparently needs to be exact instead.)
Cute that you've now written at least 200 words trying to divert the conversation though, and not a single word to actually address your demonstration of the opposite of understanding how the tools you use work.
I feel like this thread is full of strawmen from people who want to come up with reasons they shouldn't try to use this tool for what it's good at, and figure out ways to deal with the failure cases.
My favorite instruction is using component A as an example make component B
Also, if it's an important piece of arithmetic, and I'm in a position where I need to ask my coworker rather than do it myself, I'd expect my coworker (and my AI) to grab (spawn) a calculator, too.
But thankfully we do have feedback/interactiveness to get around the downsides.
I got into this profession precisely because I wanted to give precise instructions to a machine and get exactly what I want. Worth reading Dijkstra, who anticipated this, and the foolishness of it, half a century ago
"Instead of regarding the obligation to use formal symbols as a burden, we should regard the convenience of using them as a privilege: thanks to them, school children can learn to do what in earlier days only genius could achieve. (This was evidently not understood by the author that wrote —in 1977— in the preface of a technical report that "even the standard symbols used for logical connectives have been avoided for the sake of clarity". The occurrence of that sentence suggests that the author's misunderstanding is not confined to him alone.) When all is said and told, the "naturalness" with which we use our native tongues boils down to the ease with which we can use them for making statements the nonsense of which is not obvious.[...]
It may be illuminating to try to imagine what would have happened if, right from the start our native tongue would have been the only vehicle for the input into and the output from our information processing equipment. My considered guess is that history would, in a sense, have repeated itself, and that computer science would consist mainly of the indeed black art how to bootstrap from there to a sufficiently well-defined formal system. We would need all the intellect in the world to get the interface narrow enough to be usable"
Welcome to prompt engineering and vibe coding in 2025, where you have to argue with your computer to produce a formal language, that we invented in the first place so as to not have to argue in imprecise language
https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...
There are levels of this though -- there are few instances where you actually need formal correctness. For most software, the stakes just aren't that high, all you need is predictable behavior in the "happy path", and to be within some forgiving neighborhood of "correct".
That said, those championing AI have done a very poor job at communicating the value of constrained languages, instead preferring to parrot this (decades and decades and decades old) dream of "specify systems in natural language"
So you didn't get into this profession to be lead then eh?
Because essentially, that's what Thomas in the article is describing (even if he doesn't realize it). He is a mini-lead with a team of a few junior and lower-mid-level engineers - all represented by LLM and agents he's built.
“You know, that show in the 80s or 90s… maybe 2000s with the people that… did things and maybe didn’t do things.”
“You might be thinking of episode 11 of season 4 of such and such snow where a key plot element was both doing and not doing things on the penalty of death”
The Enterprise computer was (usually) portrayed as fairly close to what we have now with today's "AI": it could synthesize, analyze, and summarize the entirety of Federation knowledge and perform actions on behalf of the user. This is what we are using LLMs for now. In general, the shipboard computer didn't hallucinate except during most of the numerous holodeck episodes. It could rewrite portions of its own code when the plot demanded it.
Data had, in theory, a personality. But that personality was basically, "acting like a pedantic robot." We are told he is able to grow intellectually and acquire skills, but with perfect memory and fine motor control, he can already basically "do" any human endeavor with a few milliseconds of research. Although things involving human emotion (art, comedy, love) he is pretty bad at and has to settle for sampling, distilling, and imitating thousands to millions of examples of human creation. (Not unlike "AI" art of today.)
Side notes about some of the dodgy writing:
A few early epsiodes of Star Trek: The Next Generation treated the Enterprise D computer as a semi-omniscient character and it always bugged me. Because it seemed to "know" things that it shouldn't and draw conclusions that it really shouldn't have been able to. "Hey computer, we're all about to die, solve the plot for us so we make it to next week's episode!" Thankfully someone got the memo and that only happened a few times. Although I always enjoyed episodes that centered around the ship or crew itself somehow instead of just another run-in with aliens.
The writers were always adamant that Data had no emotions (when not fitted with the emotion chip) but we heard him say things _all the time_ that were rooted in emotion, they were just not particularly strong emotions. And he claimed to not grasp humor, but quite often made faces reflecting the mood of the room or indicating he understood jokes made by other crew members.
This doesn't seem too different from how our current AI chatbots don't actually understand humor or have emotions, but can still explain a joke to you or generate text with a humorous tone if you ask them to based on samples, right?
> "Hey computer, we're all about to die, solve the plot for us so we make it to next week's episode!"
I'm curious, do you recall a specific episode or two that reflect what you feel boiled down to this?
From Futurama in a obvious parody of how Data was portrayed
It's the relatively crummy season 4 episode Identity Crisis, in which the Enterprise arrives at a planet to check up on an away team containing a college friend of Geordi's, only to find the place deserted. All they have to go on is a bodycam video from one of the away team members.
The centerpiece of the episode is an extended sequence of Geordi working in close collaboration with the Enterprise computer to analyze the footage and figure out what happened, which takes him from a touchscreen-and-keyboard workstation (where he interacts by voice, touch and typing) to the holodeck, where the interaction continues seamlessly. Eventually he and the computer figure out there's a seemingly invisible object casting a shadow in the reconstructed 3D scene and back-project a humanoid form and they figure out everyone's still around, just diseased and ... invisible.
I immediately loved that entire sequence as a child, it was so engrossingly geeky. I kept thinking about how the mixed-mode interaction would work, how to package and take all that state between different workstations and rooms, have it all go from 2D to 3D, etc. Great stuff.
It's an interesting episode in that it's usually overlooked for being a fairly crappy screenplay, but is really challenging directorially: Blocking and editing that geeky computer sequence, breaking new ground stylistically for the show, etc.
App1: requestedAccessTokenVersion": null
App2: requestedAccessTokenVersion": 2
I use it like that all time. In fact, I'm starting to give it less and less context and just toss stuff at it. It's more efficient use of my time.There's a "speaking and interpreting instructions" vibe to your answer which is at odds with my desire for an interface that feels like an extension of my body. For the most part, I don't want English to be an intermediary between my intent and the computer. I want to do, not tell.
This 1000%.
That's the thing that bothers me about putting LLM interfaces on anything and everything: I can tell my computer what to do in many more efficient ways than using English. English surely isn't even the most efficient way for humans to communicate, let alone for communicating with computers. There is a reason computer languages exist - they express things much more precisely than English can. Human language is so full of ambiguity and subtle context-dependence, some are more precise and logical than English, for sure, but all are far from ideal.
I could either:
A. Learn to do a task well, after some practice, it becomes almost automatic. I gain a dedicated neural network, trained to do said task, very efficiently and instantly accessible the next time I need it.
Or:
B. Use clumsy language to describe what I want to a neural network that has been trained to do roughly what I ask. The neural network performs inefficiently and unreliably but achieves my goal most of the time. At best this seems like a really mediocre way to do a lot of things.
Both are valid cases, but one cannot replace the other—just like elevators and stairs. The presence of an elevator doesn't eliminate the need for stairs.
Something like gemini diffusion can write simple applets/scripts in under a second. So your options are enormous for how to handle those deletions. Hell if you really want you can ask it to make your a pseudo terminal that lets you type in the old linux commands to remove them if you like.
Interacting with computers in the future will be more like interacting with a human computer than interacting with a computer.
The engineer will wonder why his desktop is filled his screenshots, change the settings that make it happen, and forget about it.
That behavior happened for years before AI, but AI will make that problem exponentially worse. Or I do hope that was a bad example.
You might then argue that they don't know they should ask that; could just configure the AI once to say you are a junior engineer and when you ask the ai to do something, you also want it to help you learn how to avoid problems and prevent them from happening.
No one is ever going to want to touch a settings menu again.
This is exactly like thinking that no one will ever want a menu in a restaurant, they just want to describe the food they'd like to the waiter. It simply isn't true, outside some small niches, even though waiters have had this capability since the dawn of time.
"Ok, a bowl of lettuce. That's a great, healthy choice!"
But why? It takes many more characters to type :)
This quote did not age well
The big change with LLMs seems to be that everyone now has an opinion on what programming/AI is and can do. I remember people behaving like that around stocks not that long ago…
True, but I think this is just the zeitgeist. People today want to share their dumb opinions about any complex subject after they saw a 30 second reel.
The answer to that question lies at the bottom of a cup of hemlock.
I'll be happy the day the LLM says "I don't know".
> On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
This has been an obviously absurd question for two centuries now. Turns out the people asking that question were just visionaries ahead of their time.
It is kind of impressive how I'll ask for some code in the dumbest, vaguest, sometimes even wrong way, but so long as I have the proper context built up, I can get something pretty close to what I actually wanted. Though I still have problems where I can ask as precisely as possible and get things not even close to what I'm looking for.
I got plenty of complaints for Apple, Google, Netflix, and everyone else. Shit that can be fixed with just a fucking regex. Here's an example: my gf is duplicated in my Apple contacts. It can't find the duplicate, despite same name, nickname, phone number, email, and birthday. Which there's three entries on my calendar for her birthday. Guess what happened when I manually merged? She now has 4(!!!!!) entries!! How the fuck does that increase!
Trust me, they complain, you just don't listen
No, really - there is tons of potentially value-adding code that can be of throwaway quality just as long as it’s zero effort to write it.
Design explorations, refactorings, erc etc.
This is a really hard problem when I write every line and have the whole call graph in my head. I have no clue how you think this gets easier by knowing less about the code
Unless you're a 0.1% coder, your mental call graph can't handle every corner case perfectly anyway, so you need tests too.
Actually, for exactly the reasons you mention: I'm not dumb enough to believe I'm a genius. I'll always miss something. So I can't rely on my tests to ensure correctness. It takes deeper thought and careful design.
For example
”Please create a viewer for geojson where i can select individual feature polygons and then have button ’export’ that exports the selected features to a new geojson”
1. You run it 2. It shows the json and visualizes selections 3. The exported subset looks good
I have no idea how anyone could keep the callgraph of even a minimal gui application in their head. If you can then congratulations, not all of us can!
Not great, somebody else used my program and they got root on my server...
> I have no idea how anyone could keep the callgraph of even a minimal gui application in their head
Practice.Lots and lots of practice.
Write it down. Do things the hard way. Build the diagrams by hand and make sure you know what's going on. Trace programs. Pull out the debugger! Pull out the profiler!
If you do those things, you too will gain that skill. Obviously you can't do this for a giant program but it is all about the resolution of your call graph anyways.
If you are junior, this is the most important time to put in that work. You will get far more from it than you lose. If you're further along, well the second best time to plant a tree is today.
In general security sensitive software is the worst place possible to use LLM:s based on public case studies and anecdata exactly for this reason.
”Do it the hard way”
Yes that’s generally the way I do it as well when I need to reliably understand something but it takes hours.
The cadence with LLM driven experiments is usually under an hour. That’s the biggest boom for me - I get a new tool and can focus on the actual work I’m delivering, with some step now taking slightly less time.
For example I’m happy using vim without ever having read the code or debugged it, much less having observed it’s callgraph. I’m similarly content in using LLM generated utilities without much oversight. I would never push code like that to production of course.
I'm afraid what you want is often totally unclear until you start to use a program and realize that what you want is either what the program is doing, or it isn't and you change the program.
MANY programs are made this way, I would argue all of them actually. Some of the behaviour of the program wasn't imagined by the person making it, yet it is inside the code... it is discovered, as bugs, as hidden features, etc.
Why are programmers so obsessed that not knowing every part of the way a program runs means we can't use the program? I would argue you already don't, or you are writing programs that are so fundamentally trivial as to be useless anyway.
LLM written code is just a new abstraction layer, like Python, C, Assembly and Machine Code before it... the prompts are now the code. Get over it.
But, having taken a chance to look at the raw queries people type into apps, I'm afraid neither machine nor human is going to make sense of a lot of it.
function God (any param you can think of) {
}
This is not the point of that Babbage quote, and no, LLMs have not solved it, because it cannot be solved, because "garbage in, garbage out" is a fundamental observation of the limits of logic itself, having more to with the laws of thermodynamics than it does with programming. The output of a logical process cannot be more accurate than the inputs to that process; you cannot conjure information out of the ether. The LLM isn't the logical process in this analogy, it's one of the inputs.
Adding an LLM as input to this process (along with an implicit acknowledgement that you're uncertain about your inputs) might produce a response "Are you sure you didn't mean to ask what 2+2 is?", but that's because the LLM is a big ball of likelihoods and it's more common to ask for 2+2 than for 3+3. But it's not magic; the LLM cannot operate on information that it was not given, rather it's that a lot of the information that it has was given to it during training. It's no more a breakthrough of fundamental logic than Google showing you results for "air fryer" when you type in "air frier".
We’ve added context, and that feels a bit like magic coming from the old ways. But the point isn’t that there is suddenly something magical, but rather that the capacity for deciphering complicated context clues is suddenly there.
That's because someone have gone out of their way to mark those inputs as errors because they make no sense. The CPU itself has no qualms doing 'A' + 10 because what it's actually sees is a request is 01000001 (65) as 00001010 (10) as the input for its 8 bit adder circuit. Which will output 01001011 (75) which will be displayed as 75 or 'k' or whatever depending on the code afterwards. But generally, the operation is nonsense, so someone will mark it as an error somewhere.
So errors are a way to let you know that what you're asking is nonsense according to the rules of the software. Like removing a file you do not own. Or accessing a web page that does not exists. But as you've said, we can now rely on more accurate heuristics to propose alternatives solution. But the issue is when the machine goes off and actually compute the wrong information.
Code is very often ambiguous (even more so in programming languages that play fast and loose with types).
Relative lack of ambiguity is a very easy way to tell who on your team is a senior developer
Program correctness is incredibly difficult - arguably the biggest problem in the industry.
For once, as developers we are actually using computers how normal people always wished they worked and were turned away frustratedly. We now need to blend our precise formal approach with these capabilities to make it all actually work the way it always should have.
If I'm fuzzy, the output quality is usually low and I need several iterations before getting an acceptable result.
At some point, in the future, there will be some kind of formalization on how to ask swe question to llms ... and we will get another programming language to rule the all :D
Interns don’t cost 20 bucks a month but training users in the specifics of your org is important.
Knowing what is important or pointless comes with understanding the skill set.
Testing for myself has always yielded unimpressive results. Maybe I'm just unlucky?
Thanks for the offer though.
Edit: Nm, thought I remembered your UN and see on your profile that you do.
Do go on.
This roughly matches my experience too, but I don't think it applies to this one. It has a few novel things that were new ideas to me and I'm glad I read it.
> I’m ready to write a boilerplate response because I already know what they’re going to say
If you have one that addresses what this one talks about I'd be interested in reading it.
>This roughly matches my experience too, but I don't think it applies to this one.
I'm not so sure. The argument that any good programming language would inherently eliminate the concern for hallucinations seems like a pretty weak argument to me.
To be honest I’m not sure where the logic for that claim comes from. Maybe an abundance of documentation is the assumption?
Either way, being dismissive of one of LLMs major flaws and blaming it on the language doesn’t seem like the way to make that argument.
It seems obviously true to me: code hallucinations are where the LLM outputs code with incorrect details - syntax errors, incorrect class methods, invalid imports etc.
If you have a strong linter in a loop those mistakes can be automatically detected and passed back into the LLM to get fixed.
Surely that's a solution to hallucinations?
It won't catch other types of logic error, but I would classify those as bugs, not hallucinations.
Let's go a step further, the LLM can produce bug free code too if we just call the bugs "glitches".
You are making a purely arbitrary decision on how to classify an LLM's mistakes based on how easy it is to catch them, regardless of their severity or cause. But simply categorizing the mistakes in a different bucket doesn't make them any less of a problem.
Great article BTW, it’s amazing that you’re now blaming developers smarter than you for lack of LLM adoption, as if it weren’t enough for the technology to be useful to become widespread.
Try to deal with „an agent takes 3 minutes to make a small transformation to my codebase and it takes me another 5 to figure out why it changed what it did only to realize that it was the wrong approach and redo it by hand, which took another 7 minutes” in your next one.
The criticisms I hear are almost always gotchas, and when confronted with the benchmarks they either don’t actually know how they are built or don’t want to contribute to them. They just want to complain or seem like a contrarian from what I can tell.
Are LLMs perfect? Absolutely not. Do we have metrics to tell us how good they are? Yes
I’ve found very few critics that actually understand ML on a deep level. For instance Gary Marcus didn’t know what a test train split was. Unfortunately, rage bait like this makes money
Wait, what kind of metric are you talking about? When I did my masters in 2023 SOTA models where trying to push the boundaries by minuscule amounts. And sometimes blatantly changing the way they measure "success" to beat the previous SOTA
We can use little tricks here and there to try to make them better, but fundamentally they're about as good as they're ever going to get. And none of their shortcomings are growing pains - they're fundamental to the way an LLM operates.
and in 2023 and 2024 and january 2025 and ...
all those "walls" collapsed like paper. they were phantoms; ppl literally thinking the gaps between releases were permanent flatlines.
money obviously isn't an issue here, VCs are pouring in billions upon billions. they're building whole new data centres and whole fucking power plants for these things; electricity and compute aren't limits. neither is data, since increasingly the models get better through self-play.
>fundamentally they're about as good as they're ever going to get
one trillion percent cope and denial
And yes, it often is small things that make models better. It always has been, bit by slow they get more powerful, this has been happening since the dawn of machine learning
They're also trained on random data scraped off the Internet which might include benchmarks, code that looks like them, and AI articles with things like chain of thought. There's been some effort to filter obvious benchmarks but is that enough? I cant know if the AI's are getting smarter on their own or more cheat sheets are in the training data.
Just brainstorming, one thing I came up with is training them on datasets from before the benchmarks or much AI-generated material existed. Keep testing algorithmic improvements on that in addition to models trained on up to date data. That might be a more accurate assessment.
A lot of the trusted benchmarks today are somewhat dynamic or have a hidden set.
"somewhat dynamic or have a hidden set"
Are there example inputs and outputs for the dynamic ones online? And are the hidden sets online? (I haven't looked at benchmark internals in a while.)
The dialog around it is so adversarial it's been hard figuring out how to proceed until dedicating a lot of effort to diving into the field myself, alone, on my personal time and learned what's comfortable to use it on.
Because it frequently got rolled out in crypto-currency arguments too.
The other day, I needed to hammer two drywall anchors into some drywall. I didn't have a hammer handy. I used the back of a screwdriver. It sucked. It even technically worked! But it wasn't a pleasant experience. I could take away from this "screwdrivers are bullshit," but I'd be wrong: I was using a tool the wrong way. This doesn't mean that "if you just use a screwdriver more as a hammer, you'll like it", it means that I should use a screwdriver for screwing in screws and a hammer for hammering things.
It's not an exact match to what you want, but "you're holding it wrong" is the closest I've found. (For those too young to have heard of it, it was an infamous rebuttal to criticism of a particular model of the iPhone: https://en.wikipedia.org/wiki/iPhone_4#Antenna)
"You can't actually disagree with me. If you don't agree with me you just haven't thought it through/you don't know enough/you have bad motives." (Yeah, we need a better term for that.) You see this all the time, especially in politics but in many places. It's a cheap, lazy rhetorical move, designed to make the speaker feel better about holding their position without having to do the hard work of actually defending it.
But I'm not thrilled about centralized, paid tools. I came into software during a huge FOSS boom. Like a huge do it yourself, host it yourself, Publish Own Site, Syndicate Elsewhere, all the power to all the people, borderline anarchist communist boom.
I don't want it to be like other industries where you have to buy a dog shit EMR and buy a dog shit CAD license and buy a dog shit tax prep license.
Maybe I lived through the whale fall and Moloch is catching us. I just don't like it. I rage against dying lights as a hobby.
DeepSeek-R1 is on par with frontier proprietary models, but requires a 8xH100 node to run efficiently. You can use extreme quantisation and CPU offloading to run it on an enthusiast build, but it will be closer to seconds-per-token territory.
How far away are we from that? How many RYX 50s do I need?
This is a serious question btw.
So what, people should just stop doing any tasks that LLMs do subjectively better?
> People coding with LLMs today use agents. Agents get to poke around your codebase on their own. They author files directly. They run tools. They compile code, run tests, and iterate on the results. ...
Is this what people are really doing? Who is just turning AI loose to modify things as it sees fit? If I'm not directing the work, how does it even know what to do?
I've been subjected to forced LLM integration from management, and there are no "Agents" anywhere that I've seen.
Is anyone here doing this that can explain it?
See Claude Code, windsurf, amp, Kilcode, roo, etc.
I might describe a change I need to have made and then it does it and then I might say "Now the tests are failing. Can you fix them?" and so on.
Sometimes it works very great. sometimes you find yourself arguing with the computer.
Depending on the task it works really well.
The initial AI-based work flows were "input a prompt into ChatGPT's web UI, copy the output into your editor of choice, run your normal build processes; if it works, great, if not, copy the output back to ChatGPT, get new code, rinse and repeat".
The "agent" stuff is trying to automate this loop. So as a human, you still write more or less the same prompt, but now the agent code automates that loop of generating code with an LLM and running regular tools on it and sending those tools' output back to the LLM until they succeed for you. So, instead of getting code that may not even be in the right programming language as you do from an LLM, you get code that is 100% guaranteed to run and passes your unit tests and any style constraints you may have imposed in your code base, all without extra manual interaction (or you get some kind of error if the problem is too hard for the LLM).
I think it's really hard to undersell how important agents are.
We have an intuition for LLMs as a function blob -> blob (really, token -> token, but whatever), and the limitations of such a function, ping-ponging around in its own state space, like a billion monkeys writing plays.
But you can also get go blob -> json, and json -> tool-call -> blob. The json->tool interaction isn't stochastic; it's simple systems code (the LLM could indeed screw up the JSON, since that process is stochastic --- but it doesn't matter, because the agent isn't stochastic and won't accept it, and the LLM will just do it over). The json->tool-call->blob process is entirely fixed system code --- and simple code, at that.
Doing this grounds the code generation process. It has a directed stochastic structure, and a closed loop.
What is an actual, real world example?
The interfaces prompt you when it wants to run a command, like "The AI wants to run 'cargo add anyhow', is that ok?"
Maybe the agent feeds your PR to the LLM to generate some feedback, and posts a the text to the PR as a comment. Maybe it can also run the linters, and use that as input to the feedback.
But the at the end of the day, all it's really doing is posting text to a github comment. At worst it's useless feedback. And while I personally don't have much AI in my workflow today, when a bunch of smart people are telling me the feedback can be useful I can't help but be curious!
The agent code runs a regex that recognizes this prompt as a reference to a JIRA issue, and runs a small curl with predefined credentials to download the bug description.
It then assembles a larger text prompt such as "you will act as a master coder to understand and fix the following issue as faithfully as you can: {JIRA bug description inserted here}. You will do so in the context of the following code: {contents of 20 files retrieved from Github based on Metadata in the JIRA ticket}. Your answer must be in the format of a Git patch diff that can be applied to one of these files".
This prompt, with the JIRA bug description and code from your Github filled in, will get sent to some LLM chosen by some heuristic built into the agent - say it sends it to ChatGPT.
Then, the agent will parse the response from ChatGPT and try to parse it as a Git patch. If it respects git patch syntax, it will apply it to the Git repo, and run something like `make build test`. If that runs without errors, it will generate a PR in your Github and finally output the link to that PR for you to review.
If any of the steps fails, the agent will generate a new prompt for the LLM and try again, for some fixed number of iterations. It may also try a different LLM or try to generate various follow-ups to the LLM (say, it will send a new prompt in the same "conversation" like "compilation failed with the following issue: {output from make build}. Please fix this and generate a new patch."). If there is no success after some number of tries, it will give up and output error information.
You can imagine many complications to this workflow - the agent may interrogate the LLM for more intermediate steps, it may ask the LLM to generate test code or even to generate calls to other services that the agent will then execute with whatever credentials it has.
It's a byzantine concept with lots of jerry-rigging that apparently actually works for some use cases. To me it has always seemed far too much work to get started before finding out if there is any actual benefit for the codebases I work on, so I can't say I have any experience with how well these things work and how much they end up costing.
I'm interested in playing with this, since reading the article, but I think I will only have it run things in some dedicated VM. If it seems better than other LLM use, I'll gradually rely on it more, but likely keep its actions confined to the VM.
Some people are, and some people are not. This is where some of the disconnect is coming from.
> Who is just turning AI loose to modify things as it sees fit?
In the advent of source control, why not? If it does something egregiously wrong, you can throw it away easily and get back to a previous state with ease.
> If I'm not directing the work, how does it even know what to do?
You're directing the work, but at a higher level of abstraction.
The article likens this to a Makefile. I gotta say, why not just use a Makefile and save the CO2?
I use Cursor by asking it exactly what I want and how I want it. By default, Cursor has access to the files I open, and it can reference other files using grep or by running specific commands. It can edit files.
It performs well in a fairly large codebase, mainly because I don’t let it write everything. I carefully designed the architecture and chose the patterns I wanted to follow. I also wrote a significant portion of the initial codebase myself and created detailed style guides for my teammates.
As a result, Cursor (or you can say models you selecting because cursor is just a router for commercial models) handles small, focused tasks quite well. I also review every piece of code it generates. It's particularly good at writing tests, which saves me time.
I personally have my Zed set up so the agent has to request every command be manually reviewed and approved before running.
You forgot the screeds against the screeds (like this one)
1. Thomas is obviously very smart.
2. To be what we think of as "smart" is to be in touch with reality, which includes testing AI systems for yourself and recognizing their incredible power.
Thomas is the smartest at other things.
Smarter does not mean "better at writing and shipping infrastructure code."
Some of the smartest people I know are also infra engineers and none of them are AI skeptics in 2025.
They cannot possibly imagine someone has a different use case where the AI didn't work
"I crank out shitty webapps all day, therefore every single other dev does. Everyone obviously has the same use case as me because I am the center of the universe"
It really does feel like I've gone from being 1 senior engineer to a team that has a 0.8 Sr. Eng, 5 Jrs. and one dude that spends all his time on digging through poorly documented open source projects and documenting them for the team.
Sure I can't spend quite as much time working on hard problems as I used to, but no one knows that I haven't talked to a PM in months, no one knows I haven't written a commit summary in months, it's just been my AI doppelgangers. Compared to myself a year ago I think I now PERSONALLY write 150% more HARD code than I did before. So maybe, my first statement about being 0.8 is false.
I think of it like electric bikes, there seems to be indication that people with electric assist bikes actually burn more calories/spend more time/go farther on an electric bike than those who have manual bikes https://www.sciencedirect.com/science/article/abs/pii/S22141....
I don't know what you're posting, but if it's anything like what I see being done by GitHub copilot, your commit messages are junk. They're equivalent to this and you're wasting everyone's time:
// Sets the value
const value = "red"
this is a strawmans argument... of whatever your are arguing
I see it myself, go to a tech/startup meetup as a programmer today vs in 2022 before ZIRP ended.
It's like back to my youth where people didn't want to hear my opinion and didn't view me as "special" or "in demand" because I was "a nerd who talked to computers", that's gotta be tough for a lot of people who grew up in the post "The Social Network" era.
But anyone paying attention knew where the end of ZIRP was going to take us, the fact that it dovetailed with the rise of LLMs is a double blow for sure.
The only part I don't automate is the pull request review (or patch review, pre-commit review, etc. before git.), thats always been the line to hold for protecting codebases with many contributors of varying capability, this is explicitly addressed in the article as well.
You can fight whatever straw man you want. Shadowbox the hypotheticals in your head, etc. I don't get all these recent and brand new accounts just straight up insulting and insinuating all this crap all over HN today.
I can maybe even see that point in some niches, like outsourcing or contracting where you really can't be bothered to care about what you leave behind after the contract is done but holy shit, this is how we end up with slow and buggy crap that no one can maintain.
Just about no-one in the F100 unless they are on very special teams.
If you care about the craft you're pushed out for some that drops out 10x LOC a day because your management has no ability to measure what good software is. Extra bonus points for including 4GB of node_modules in your application.
Not familiar with Elixir but I assume it's really good at expressing data driven code, since it's functional and has pattern matching.
But for Python, JS, etc,... it's the same down to earth abstraction that everyone is dealing with, like the same open a file, parse a csv, connect to the database patterns.
I use Zed as my primary interface to "actually doing project work" LLM stuff, because it front-ends both OpenAI and Google/Gemini models, and because I really like the interface. I still write code in Emacs; Zed is kind of like the Github PR viewer for me.
I'm just starting to use Codex Web for asynchronous agents because I have a friend who swears by queueing up a dozen async prompts every morning and sifting through them in the afternoon. The idea of just brainstorming a bunch of shit --- I can imagine keeping focus and motivation going long enough to just rattle ideas off! --- and then making coffee while it all gets tried, is super appealing to me.
Bunch of async prompts for the same task? Or are you parallelizing solving different issues and just reviewing in the afternoon?
Sounds intriguing either way.
Then I do my “real” work, there’s the stuff I don’t trust the agent with, or is more exploratory or whatever.
As I think of more agent tasks doing that I write them down. When I take a break, say for lunch or winding down at the end of the day I check back in on previous tasks and fire off the new ones.
My flow is very similar to what I did with junior eng except I’m willing to fire off even more trivial tasks at the agent because I don’t care if it sits idle. Similarly if it gets way off base I’m happy to kill the pr more aggressively and start over, what do I care if it wasted its time or if it learns a valuable lesson from the experience?
You're not concerned about OpenAI or Google stealing your code? I won't use VSCode for that reason, personally, but I do use VSCodium.
> People coding with LLMs today use agents. Agents get to poke around your codebase on their own. They author files directly. They run tools. They compile code, run tests, and iterate on the results. They also:
Every once in a while I see someone on X posting how they have 10 agents running at once building their code base, and I wonder if in 3 years most private industry coders will just be attending meetings to discuss what their agents have been working on, while people working on DoD contracts will be typing things into vim like a fool
Forget LLMs, try getting Pandas approved. Heck I was told by some AF engineers they were banned from opening Chrome Dev Tools by their security office.
FWIW I think the LLM situation is changing quite fast and they're appearing in some of our contracts. Azure-provided ones, of course.
I would stay in any high danger/high precision/high regulation role.
The speed at which LLM stuff is progressing is insane, what is cutting edge today wasn't available 6 months ago.
Keep up as a side hobby if you wish, I would definitely recommend that, but I just have to imagine that in 2 years a turnkey github project will get you pretty much all the way there.
Idk, that's my feeling fwiw.
I love LLMs but I'm much less confident that people and regulation will keep up with this new world in a way that benefits the very people who created the content that LLMs are built on.
You clearly haven't been following the space or maybe following too much.
Because the progress has been pretty slow over the last years.
Yes modals are cheaper and faster but they aren't substantially better.
I consider "LLM stuff" to be all inclusive of the eco-system of "coding with LLMs" in the current threads context, not specific models.
Would you still say, now that the definition has been clarified, that there has been slow progress in the last 2+ years?
I am also curious if you could clarify where we would need to be today for you to consider it "fast progress"? Maybe there is a generational gap between us in defining fast vs slow progress?
And I suspect the act of writing it yourself imparts some lower level knowledge you don't get by skimming the output of an AI.
[1] https://www.joelonsoftware.com/2000/05/26/reading-code-is-li...
Would you mind going into a bit more specifics/details on why regular code review practice would become unworkable, like which specific part(s) of it?
Real, meticulous code review takes absolutely forever.
If you're programming for a plane's avionics, as an example, the quality assurance bar is much, much higher. To the point where any time-saving benefits of using an LLM are most likely dwarfed by the time it takes to review and test the code.
It's easy to say LLM is a game-changer when there are no lives at stake, and therefore the cost of any errors is extremely low, and little to no QA occurs prior to being pushed to production.
The idea that AI will make development faster because it eliminates the boring stuff seems quite bold because until we have AGI, someone still needs to verify the output, and code review tends to be even more tedious than writing boilerplate unless you're speed-reading through reviews.
This has never once been my experience. Its definitely less fun but it takes way less time.
These all sound like your projected assumptions. No, it generally does not take longer to review sizable code changes than it does to write it. This is further alleviated if the code passes tests, either existing or new ones created by the ai.
I guess this presupposes that it is ok for 3rd parties to slurp up your codebase? And possibly (I guess it ostensibly depends on what plan you are on?) using that source code for further training (and generating that same code for others)?
I imagine in some domains this would not be ok, but in others is not an issue.
you get a link to a figma design and you have to use your eyes and common sense to cobble together tailwind classes, ensure responsiveness, accessibility, try out your components to make sure they're not janky, test out on a physical mobile device, align margins, padding, truncation, wrapping, async loading states, blah blah you get it
LLMs still suck at all that stuff that requires a lot of visual feedback, after all, you're making an interface for humans to use, and you're a human
in contrast, when i'm working on a backend ticket ai feels so much more straightforward and useful
Only if you are familiar with the project/code. If not, you were throw into a foreign codebase and have no idea how to tweak it.
I have to say, my ability to learn Rust was massively accelerated via LLMs. I highly recommend them for learning a new skill. I feel I'm roughly at the point (largely sans LLMs) now where I can be nearly as productive in Rust as Python. +1 to RustRover as well, which I strongly prefer to any other IDE.
How would you know?
If you didn't know Rust already, how would you know the LLM was teaching you the right things and the best way to do things?
Just because it compiles doesn't mean it works. The world is full of bad, buggy, insecure, poor code that compiles.
This is akin to be on tutorial hell and you “know the language “
In particular, it helped me write my first generic functions and macros, two things that were pretty intimidating to try and get into.
Also, I think there's an argument similar to cryptocurrency companies that run like pyramid schemes. I could've made easy money doing security work for them. Yet, I felt like I'd be participating in helping them rob people or advancing their con. (Some jobs, like building assurance tools, might be OK.) Likewise, using tools built on massive, copyright infringement might be supporting or promoting that.
So, I gotta use legally-trained models or wait for legal reforms that make LLM training legal. Especially the data sets they distribute which is currently illegal, file sharing.
I barely write any scaffolding code, because I use tools that setup the scaffolding for me.
But if you're actively avoiding everything related to it, you might find yourself in a position where you're suddenly being left in the dust. Maybe not now, not next month, not next year, but who some time in the future. The models really are improving fast!
I've talked with devs that (claim they) haven't touched a model since ChatGPT was released - because it didn't live up to their expectations, and they just concluded it was a big nothingburger.
Even though I don't follow the development religiously anymore, I do try to get acquainted with new releases every 3 months or so.
I hate the term "vibe coding", but I personally know non-tech people that have vibe coded products / apps, shipped them, and make more money in sales than what most "legit" coders are making. These would be the same "idea people" that previously were looking for a coder to do all the heavy lifting. Something is changing, that's for sure.
So, yeah, don't sleepwalk through it.
It’s not like “becoming skilled and knowledgeable in a language” which took time. Even if you’re theoretically being left behind, you can be back at the front of the pack again in a day or so. So why bother investing more than a little bit every few months?
I believe that AI+Coding is no different from this perspective. It usually takes senior engineers a few weeks just to start building an intuition of what is possible and what should be avoided. A few weeks more to adjust the mindset and properly integrate suitable tools into the workflow.
What is changing - constraints are relaxing, making things easier than they were before. E.g. where you needed a complex RAG to accomplish some task, now Gemini Pro 2.5 can just swallow 200k-500k of cacheable tokens in prompt and get the job done with a similar or better accuracy.
Here, the early bird gets the worm with 9 fingered hands, the late bird just gets the worm.
Very much disagree with that. Getting productive and competent with LLM tooling takes months. I've been deeply invested in this world for a couple of years now and I still feel like I'm only scraping the surface of what's possible with these tools.
I’m still not entirely sure why it’s supposed to take months. I usually retry every few weeks whenever a new model comes out, and they get marginally better at something, but using them isn’t a massive shift? Maybe I’m missing something? I have some code, I pop open the pane, ask it, accept/reject the code and go on. What else is everyone even doing?
Edit: I’ve seen the prompt configs people at work have been throwing around, and I’m pretty glad I don’t bother with cursor-and-friends when I see that. Some people get LLM’s to write git commits? Lazygit has made most of my git workflow friction disappear and the 1 minute it takes me to write commits and pr’s is less effort than having to police a novel writing incorrect ones.
This tradeoff of unfamiliarity with the codebase is a very well understood problem for decades. Maintaining a project is 99% of the time spent on a successful project.
In my opinion though, having AI write the initial code is just putting most people in a worse situation with almost no upside long term.
I'll have the look into this some more but I'm very curious about what the current state of the art is. I'm guessing it's not great because so few people do this in the first place -- because it's so tedious -- and there's probably not nearly enough training data for it to be practical to generate specs for a JavaScript GQL app or whatever these things are best at generating.
This type of issue is part of why I've never felt the appeal of LLMs, I want to understand my code because it came from my brain and my understanding, or the same said of a teammate who I can then ask questions when I don't understand something.
This is my current role, and one of the biggest reasons AI doesn't really help me day to day agent or otherwise.
In my ideal world, AI become so proficient at writing code that they eventually develop their own formally verifiable programming language, purpose built to be verifiable. So that there wouldn't be room for unknown unknowns.
(And, if you like, do as TFA says and rephrase the code into your own house style as you’re transcribing it. It’ll be better for it, and you’ll be mentally parsing the code you’re copying at a deeper level.)
I'm amazed that people don't see this. Absolutely nobody would claim that copying a novel is the same thing as writing a novel.
Nevermind other important values like resilience, adaptability, reliability, and scrutability. An AI writes a function foo() that does a thing correctly; who has the know-how that can figure out if foo() kills batteries, or under what conditions it could contribute to an ARP storm or disk thrashing, or what implicit hardware requirements it has?
Which: of course you can. Not least because both your coworkers and these coding agents produce changes with explanatory comments on any lines for which the justification or logic is non-obvious; but also because — AI PR or otherwise — the PR consists of commits, and those commits have commit messages further explaining them. And — AI submitter or otherwise — you can interrogate the PR’s submitter in the PR’s associated discussion thread, asking the submitter to justify the decisions made, explain parts you’re suspicious of, etc.
When you think about it, presuming your average FOSS project with an open contribution model, a PR from an AI agent is going to be at least strictly more “knowable” than a “drive-by” PR by an external one-time contributor who doesn’t respond to discussion-thread messages. (And sure, that’s a low bar — but it’s one that the average accepted and merged contribution in many smaller projects doesn’t even clear!)
Back to the novel analogy, you could ask an author why he incorporated this or that character trait or plot point, but all the explanation in the world will not make you able to write the next chapter as well as he could.
Shipping the org chart isn't the only way to solve this problem but it is one that can work. But if you don't acknowledge the relationship between those problems, AGI itself probably isn't going to help (partially sarcastic).
I don’t know. Ever had the experience of looking at 5+ year old code and thinking “what idiot wrote this crap” and then checking “git blame” and realising “oh, I’m the idiot… why the hell did I do this? struggling to remember” - given enough time, humans start to forget why they did things a certain way… and sometimes the answer is simply “I didn’t know any better at the time, I do now”
> You don't get that with AI. The codebase is always new.
It depends on how you use AI… e.g. I will often ask an AI to write me code to do X because it gets me over the “hump” of getting started… but now this code is in front of me on the screen, I think “I don’t like how this code is written, I’m going to refactor it…” and by the time I’m done it is more my code than the AI’s
I find myself both:
- writing a comment so that Copilot knows what to do
- letting Copilot write my comment when it knows what I did
I'm now a lot more reliable with my comment writing.
How well does your team work when you can't even answer a simple question about your system because nobody wrote, tested, played with the code in question?
How do you answer "Is it possible for our system to support split payments?" when not a single member of your team has even worked on the billing code?
No, code reviews do not familiarize an average dev to the level of understanding the code in question.
yes!
> somehow
not difficult to explain. Coding is a creative activity where you work top-down; you decompose the abstract/high-level into the detailed/low-level. You dictate the "events" happening to the code, you are in control. Reviewing is reactive; the code you review dictates what happens in your brain (you are under control, not in control), and you need to work bottom-up: you try to re-compose the whole from the fragments. Even for human coders, a detailed commit message is a pre-requisite before their code can undergo review. The reviewer is in the worse position, so he needs to be supported as much as possible.
- git blame
- always write good commit messages
< When using GenAI tools, the effort invested in critical thinking
< shifts from information gathering to information verification;
< from problem-solving to AI response integration; and from task
< execution to task stewardship.
[1]: https://www.microsoft.com/en-us/research/wp-content/uploads/...That's dreadful. Not only is familiarity with the code not valued, it is impossible to build for your own sake/sanity.
Writing code is easier than long term maintenance. Any programmer is able to write so much code that he will not be able to maintain it. Unless there are good AI tools helping with maintenance there is no point to use generative tools for production code. From my experience AI tools are great for prototyping or optimizing procrastination.
To me the sweet spot is, i write the code with the "Help" of an LLM. It means i double check everything it generates and prompt it to write code block by block - frequently acting as an editor.
Either you want human intervention for correctness and extension or you don't. Having LLM's write large swaths of code is like completely relying on tesla's autopilot - you are probably more stressed than if you just drove yourself.
Also your last comparison made me chuckle, good one :)
You are now responsible for learning how to use LLMs well. If an untrained vibe coder is more productive for me, while knowing nothing about how the code actually works, I will hire the vibe coder instead of you.
Learning is important, but it's most important that you learn how to use the best tools available so you can be productive. LLMs are not going away and they will only get better, so today that means you are responsible for learning how to use them, and that is already more important for most many roles than learning how to code yourself.
....for the vast majority of my career, anyone who suggested doing this - much less letting code that no one in the world (much less the company) truly understands the logic flow of do this - would be fired.
How does this section fit in with the agent section just after? In an agentic model, isn't the merge getting done by either the model or a tool, and the retry-loops on failures would be mostly invisible?
E.g. when using Aider + Gemini Flash 2.5, probably 90% of the changes apply cleanly from my perspective (maybe half actually apply cleanly, the other half after a couple of roundtrips of Aider telling the model that the patch didn't apply). The 10% that only apply partially I usually throw away and redo the prompt, it's really rare that I start merging the code manually.
It hammers 100% of the time, with no failure.
It requires the same amount of labour from my part but it delivers the same outcome every time.
That is what tools do, they act as an extension and allow you to do things not easily done otherwise.
If the hammer sometimes hammers, sometimes squeaks and sometimes screws then it requires extra labour from my part just to make it do what purpose specific tools do, and that is where frustrations arise.
Make it do one thing excellent and we talk then.
My 3D printer sometimes prints and sometimes makes spaghetti. Still useful.
3D printing is largely used for prototyping where its lossy output is fine. But using it for production use cases requires fine tuning it can be 99.9% reliable. Unfortunately we can't do that for LLMs hence why it's still only suitable for prototyping.
No you can't, or at least I can't. LLMs are more work than just doing it by hand.
You can't hammer a nail a 1000 times and pick the best hammered nail.
You can have the hammer iterating over the structure 24/7, finding imperfections in previous hammered nails.
This is imposing arbitrary constraints on the model and that when you give a human just a hammer, they tend to start to view everything like a nail.
Except that's not the argument people are making. They are arguing it will replace humans. They are arguing it will do research level mathematics. They are arguing this is the start of AGI. So if you want to put your head in the sand and ignore the greater message that is plastered everywhere then perhaps some self reflection is warranted.
Indeed but the tedious naysaying that this is arguing against is that AI isn't good full stop. They aren't saying "I tried it and it's not for me but I can see why other people would like it".
But, you still have to read it:
> Reading other people’s code is part of the job...I have to read the code line-by-line anyways.
So instead of writing the tedious code, I only have to read it. Oh but don't worry, I don't have to read it too carefully because:
> Agents lint. They compile and run tests. If their LLM invents a new function signature, the agent sees the error
But remember...
> You’ve always been responsible for what you merge to main.
So now I have to oversee this web of agents and AI ontop of coding? Am I doing more now for the same pay? Am I just speedrunning myself toward lower pay? Is AI adoption a prisoner's dilemma toward lowing my wages hardest?
Because is good at coding compared to many other disciplines (e.g. math), it makes the internal AI politics among programmers more of an issue. Add fuel to that fire baby!
You've already lost me, because I view programming as an art form. I would no more use AI to generate code than I would use it to paint my canvas.
I think the rest of the article is informative. It made me want to try some things. But it's written from the perspective of a CEO thinking all his developers are just salt miners; miners go into the cave and code comes out.
I think that's actually what my hangup is. It's the old adage of programmers simply "copying and pasting from stack overflow" but taken to the extreme. It's the reduction of my art into mindless labor.
Just because software is “bespoke” doesn’t mean it’s complicated or special.
That's not my experience. Of course, everything is just a finite state machine operating on a memory tape.
How many furniture makers did you talk to forming this opinion? The metaphor does not line up with either my software of furniture experience. I work with production furniture shops that choose not to use CNCs to avoid the soul being sucked out of the work. This is not a rare stance to take and this is not "japanese joinery" woodworking. This is real work, balancing the means of production with optimal quality. There is all sorts of arguments on whether cncs or using a domino or whatever is "real" woodworking, but the idea that this choice of quality does not exist in woodworking and so we shouldn't have it in software is not my experience.
I don't want to step on the cabinetry discussion, just, I think it's important to call out this idea that there's a universal quality/intricacy/interestingness threshold for all production software. That was a common fallacy long before LLMs (you used to see a lot of it in Rails culture with people ceaselessly grooming unit test suites and designing ever-more-seamless mocking systems). Part of growing in this profession is getting a sense for when extra exertion is needed, and when it's wasted.
That's okay. I think all code does. Some much faster than others. I could be wrong, but I think I'm sticking to this opinion (for now at least)
> The belief that every line of code you write on every problem definition has to be crystalline and formally rigorous is an early-career thing you get over. Fretting over literally every line of code is a way of doing your job badly.
I agree. I do not think this is a debated topic. That being said, I don't see writing quality software to be a waste of time the same I don't see a quality weld. It needs to be relible to what the stakeholders have defined as adaquate.
> The skill is in knowing when you need really interesting (and "interesting" is the distinction here) code, and when you should just whip our your shop jig and repeat the exact same pocket hole joint you've been using for the last 10 years.
Ahh alright so this is where is gets interesting! I think you are close, but not "whip out". As a woodworking you need to be able to make that jig! The is real knowledge/wealth.
I'm still under the belief great devs will use the tools at their disposal as they see fit to write high quality software. Poor devs will write poor quality software with the tools at their disposal. I don't think any of this changes with AI.
I think where I struggle is even as someone who use AI to help write code everyday and would have a hard time going back, these articles do not sit with me.
Yes, I think less-skilled programmers are being bombarded with messages that in the future of coding being less-skilled is adequate. What I've seen in today's layoff culture is not something that gives me hope for tomorrows junior developers.
I think there is huge and vast gap between: "mediocre software is good enough" and "as a highly skilled dev I understand when I need to be clever or not". The latter has been true for ages and the former is the feeling AI pushers are pushing.
I think the term "mediocre" represents both poor quality code and boring code and I think this word is chosen because of this ambiguity. I am absolutely onboard with boring code and I'm definitely not okay with mediocre code if it is not reliable, maintainable, and/or readable per the stakeholders.
> It doesn't change anything for skilled programmers, though.
Exactly! Its just another tool. A great one, sure, but we were great devs before AI and we will be great devs after.
These are hard conversations to be have via message boards for sure. Thanks for the time. Would love to grab coffee or a drink and talk more.
Mass production doesn't have to eliminate the alternative to exist. The same way fast-food and cheap groceries are not a threat to quality restaurants.
While your at it you can tell me about your favorite click.
Does that mean ive never written anything creative before?
Personally I don't code because I love the letter E. I'd prefer if they all appeared on the screen without me doing any typing at all. It's a means to an end.
What one person enjoy might be toil to another.
I do pay my general contractor for the unseen quality that goes into the structure of my home. Programs should be built the same way.
But to their bosses their output was salt.
I'm sorry but unless you're working in open source for the pure love of the tech/craft, the output of software engineering is PROBLEM SOLVING.
That's why "build vs. buy" exists - sometimes it's better to buy a solution than buy one. That's why a valid solution to a problem sometimes is to convince a customer that their ask is wrong or unreasonable, and something simpler or easier would get them 99% of what they need with 1% of the effort.
That's our job.
As somebody who comes from a politically left family, and was also around in the early days of the Web, let me tentatively note that this issue has a particular political slant, too. The left has strong roots in being able to effectively critique new developments, economic and social, that don't come from its own engines of innovation which have revolved around those critiques, plus solidarity, organization, and sociopolitical action.
The movement's theorists work far more slowly on how to integrate the effect of those changes into its vision. That means when something like this comes along, the left's cultural norms err on the side of critique. Which is fine, but it makes any other expression both hard to convey, and instantly suspect in those communities. I saw this in the early Web, where from a small group of early adopters of all political slants, it was the independents, heterodox leftists, and the right, -- and most vocally, the libertarians -- who were able to most quickly adapt to and adopt the new technology. Academic leftists, and those who were inspired by them took a lot longer to accomodate the Net into their theses (beyond disregarding or rejecting it) and even longer to devise practical uses for it.
It wasn't that long, I should say -- a matter of months or years, and any latent objections were quickly swamped by younger voices who were familiar with the power of the Net; but from my point of view it seriously set back that movement in practicality and popularity during the 80s and 90s.
I see the same with AI: the left has attracted a large generational of support across the world from providing an emotionally resonant and practical alternative to the status quo many people face. But you quickly lose the mandate of heaven if you fail to do more than just simplistically critique or reject a thing that the average person in the world feels they know better, or feels differently toward, than you do. This is something to consider, even if you still strongly believe yourselves to be correct in the critiques.
Claude, Gemini, Copilot and and ChatGPT are non-starters for privacy-minded folks.
So far, local experiements with agents have left me underwhelmed. Tried everything on ollama that can run on my dedicated Ryzen 8700G with 96GB DDR5. I'm ready to blow ~10-15k USD on a better rig if I see value in it but if I extrapolate current results I believe it'll be another CPU generation before I can expect positive productivity output from properly securely running local models when factoring in the setup and meta.
Most providers promise not to train on inputs if used via an API (and otherwise have a retention timeline for other reasons).
I'm not sure the privacy concern is greater than using pretty much any cloud provider for anything. Storing your data on AWS: Privacy concern?
Unencrypted? You bet.
The single biggest productivity boost you can get in LLM world is believing them when they make those promises to you!
I'm having a hard time interpreting what you mean here. It sounds like something straight out of a cult.
1. Believe them. Use their products and benefit from them.
2. Disbelieve them. Refuse to use their products. Miss out on benefiting from them.
I pick option 1. I think that's the better option to pick if you want to be able to benefit from what this technology can do for you.
Personally I think "these people are lying about everything" is a stronger indication of a cult mindset. Not everyone is your enemy.
Fool me twice, you can't get fooled again.
Those policies are worth the paper they're printed on.
I also note that if you're a USian, you've almost certainly been required to surrender your right to air grievances in court and submit to mandatory binding arbitration for any conflict resolution that one would have used the courts for.
I find this lack of trust quite baffling. Companies like money! They like having customers.
And, those who are pay attention notice that the fines and penalties for big companies that screw the little guys are often next-to-nothing when compared with that big company's revenue. In other words, these punishments are often "cost of doing business" expenses, rather than actual deterrents.
So, yeah. Add into all that a healthy dose of "How would anyone but the customers with the deepest pockets ever get enough money to prove such a contract violation in court?", and you end up a profound lack of trust.
This space is fiercely competitive. If OpenAI turn out to be training on private data in breach of contract, their customers can switch to Anthropic.
I'd be looking for something that can run offline and receive system updates from an internal mirror on the airgapped network. Needing to tie an AppleID to the machine and allow it internet access for OS updates is a hard sell. Am I wrong in thinking that keeping an airgapped macOS installation up to date would additional infrastructure that requires some enterprise contract with Apple?
Devstral (mistral small fine-tuned for agentic use coding) w/ cline has been above expectations for me.
We're in the middle of a major shift - there will benefits to those that adapt first. People outside the field have no idea what's coming, even those of us in the field are underestimating the shift.
There were a few outliers in the 60s who understood what the computing revolution meant and would mean, but most did not. This is likely an even bigger change than that.
It was obviously huge. You could see it taking off. Yet a lot of people proudly displayed ignorance and backed each other up on it to the point that discussion around the topic was often drowned out by the opposition to change. Now today it takes minutes of playing with ai coding agents to realise that it’s extremely useful and going to be similarly huge.
Resistance to change is not a virtue!
Sometimes it’s more like NFTs.
They're not alone either, a bunch of the AI bankroll is coming from people who were also sold on crypto taking over the world.
Also, they were nothing more than the combination of two things which already existed, and which were already successful and financially viable: cellular phones and PDAs. In fact, I (and plenty of others I presume) already used that combination before smartphones: a PDA and a cellular phone, connected through Bluetooth to share the network connection.
Vertical flip phones from forever ago can handle both of these just fine.
2. Most 2FA I deal with involves an Authenticator style app.
3. Missing the point: I want to disable texts altogether. For the decade prior to having a smartphone, I had a cell phone with texts disabled (specifically called the provider to disable them).
Yet some of us spent hours over the past three years playing with LLMs, and remain completely unimpressed by what we see.
My position with the AI is almost the same. It is overall a net negative for cognitive abilities of people. Moreover I do think all AI companies need to pay fair licensing cost to all authors and train their models to accurately cite the sources. If they want more data for free, they need to propose copyright changes retroactively invalidating everything older than 50 years and also do the legwork for limiting software IP to 5 to 10 years.
I don't think that's true.
I do most of my reading on a smart phone - including wading through academic papers, or reading full books in the kindle app and jotting down notes in the digital margins.
A sizable number of my short form blog entries are written on my phone, and my long form writing almost always starts out in Apple Notes on my phone before transferring to a laptop.
Predictive text and voice dictation has got good enough now that I suspect there have been entire books written on mobile devices.
Whether you want to consider it "deep knowledge work" or not is up to you, but apparently a lot of Fifty Shades of Grey was written on a BlackBerry! https://www.huffpost.com/archive/ca/entry/fifty-shades-of-gr...
I agree. A bunch of us here might use it to scaffold applications we already understand, use it as a rubber duck to help understand and solve new problems, research more effectively, or otherwise magnify skills and knowledge we already have in a manner that's directed towards improving and growing.
That's cool. That's also not what most people will do with it. A bunch of us are total nerds, but most of the world really isn't like that. They want more entertainment, they want problems solved for them, they want ease. AI could allow a lot of people to use their brains less and lose function far more. For the minority among us who use it to do more and learn more, great. That group is a tiny minority from what I can tell.
Take for example that a huge use case for generative AI is just... More sophisticated meme images. I see so much of that, and I'm really not looking for it. It's such an insane waste of cycles. But it's what the average person wants.
I am in my 40s; have never owned a smartphone and still can't imagine wanting one; am constantly surrounded by others who have them and thus am completely aware of their functionality; AMA.
This is a funny example because the devastating effects of smartphone addiction on society are now blatantly obvious. There are in fact very good reasons for not 'wanting such a thing'.
Don't get me wrong, LLMs can be incredibly useful and I think they deserve some of the current hype. Claiming that LLMs are useless is indeed silly and can be rightfully ignored. But there are serious concerns about potentional (or actual) negative effects on society, and these should be taken seriously.
“No we’re not allocating any time to thinking about the design, just get the LLM to do it”
I’m so excited for the bleak future.
They have been wrong every time and will continue to be wrong.
Autoregressive LLMs still have some major issues like over-dependency on the first few generated tokens and the problems with commutative reasoning due to one-sided masked attention but those issues are slowly getting fixed.
And at the end of the day they went nowhere. Because (a) they will never be perfect for every use and (b) they abstract you from understanding the problem and solution. So often it will be easier to just write the code from scratch.
But that doesn't mean the developers who would have otherwise done that work were just disemployed by the success of these tools. No, they just worked on different things.
And I think that is a valuable lesson that can be applied (though I think not perfectly) to this LLM era.
I understand that "coding" is the fun part for lots of people, especially younger people. This is me as well, so I'm definitely sympathetic to it, and feel quite a bit of sadness about this.
Lots of people also enjoy woodworking and machining by hand, but that's not how most furniture or machines are made.
If I were independently wealthy, I might well spend some of my time making artisan software, but as a professional and entrepreneur, I'm going to try to use efficient tools for the job.
If you like being an entrepreneur, you’re already different from most professional software developers.
I agree with you that most professional software developers don't like being entrepreneurs, but I think that has more to do with disliking the parts of entrepreneurship that don't fit into "person who makes software", like fundraising and marketing.
But I think many - maybe most, but not all - professional software engineers actually do enjoy "making software", generally, and not just "coding", narrowly.
To me, it is very reminiscent of when I was a wee lad and there was a prevalent view that no real developer would want to make web applications, that was for script kiddies. But it turned out to be a useful kind of software to build, so a lot of people who build software started building web applications.
I also think that lots of developers do naturally enjoy increasingly higher levels of work than "just" writing code, as they progress in their careers. The job of "staff+ engineers" has significant overlap with what product managers do, where the "product" they're thinking about is the technical design of a larger system. Lots of developers really do enjoy this kind of work!
Unless there's a significant sense of what people are working on, and how LLMs are helping -- there's no point engaging -- there's no detail here.
Sure, if your job is to turn out tweaks to a wordpress theme, presumably that's now 10x faster. If its to work on a new in-house electric motor in C for some machine, presumably that's almost entirely unaffected.
No doubt junior web programmers working on a task backlog, specifically designed for being easy for juniors, are loving LLMs.
I use LLMs all the time, but each non-trivial programming project that has to move out of draft-stage needs rewriting. In several cases, to such a degree that the LLM was a net impediment.
I've never been able to get it to work reliably myself either.
The internet just tells me to prompt harder. Lots of "grind-set" mentality energy around AI if you ask me. Very little substance.
You list what look like quite greenfield projects, very self-contained, and very data science oriented. These are quite significantly uncharacteristic of software engineering in the large. They have nothing to do with interacting systems each with 100,000s lines of code.
Software engineers working on large systems (eg., many micro-services, data integration layers, etc.) are working on very different problems. Debugging a microservice system isn't something an LLM can do -- it has no ability, e.g., to trace a request through various apis from, eg., a front-end into a backend layer, into some db, to be transfered to some other db etc.
This was all common enough stuff for software engineers 20 years ago, and was part of some of my first jobs.
A very large amount of this pollyanna-LLM view, which isnt by jnr software engineers, is by data scientists who are extremely unfamiliar with software engineering.
Every codebase I listed was over 10 years old and had millions of lines of code. Instagram is probably the world's largest and most used python codebase, and the camera software I worked on was 13 years old and had millions of lines of c++ and Java. I haven't worked on many self contained things in my career.
LLMs can help with these things if you know how to use them.
Jobs comprise different tasks, some more amenable to LLMs than others. My view is that where scepticism exists amongst professional senior engineers, its probably well-founded and grounded in the kinds of tasks that they are engaged with.
I'd imagine everyone in the debate is using LLMs to some degree; and that it's mostly about what productivity factor we imagine exists.
They are busy doing their work and prefer their competitors (other developers) to not use these tools.
Someone else responds that video of the author actually using the tools would be more convincing.
Then you respond with essentially “no one wants to convince you and they’re too busy to try”.
Now if you misspoke and you’d like to change what you said originally to “many AI users do want to convince AI skeptics to use AI, but they only have enough time to write blog posts not publish any more convincing evidence”, then sure that could be the case.
But that ain’t what you said. And there’s no way to interpret what you said that way.
That's more a function of your tooling more than of your LLM. If you provide your LLM with tool use facilities to do that querying, i don't see the reason why it can't go off and perform that investigation - but i haven't tried it yet, off the back of this comment though, it's now high on my todo list. I'm curious.
TFA covers a similar case:
>> But I’ve been first responder on an incident and fed 4o — not o4-mini, 4o — log transcripts, and watched it in seconds spot LVM metadata corruption issues on a host we’ve been complaining about for months. Am I better than an LLM agent at interrogating OpenSearch logs and Honeycomb traces? No. No, I am not.
For the first 10 years of my career I was a contractor walking into national and multinational orgs with large existing codebases, working within pre-existing systems not merely "codebases". Both hardware systems (e.g., new 4g networking devices just as they were released) and distributed software systems.
I can think of many daily tasks I had across these roles that would not be very significantly speed-up by an LLM. I can also see that there's a few that would be. I also shudder to think what time would be wasted by me trying to learn 4g networking from LLM summarisation of new docs; and spending as much time working from improperly summarised code (etc.).
I don't think snr software engineers are so scepticial here that they're saying LLMs are not, locally, helpful to their jobs. The issue is how local this help seems to be.
I spent ~4 months using Copilot last year for hobby projects, and it was a pretty disappointing experience. At its best, it was IntelliSense but slower. At its worst, it was trying to inject 30 lines of useless BS.
I only realized there was an "agent" in VS Code because they hijacked my ctrl+i shortcut in a recent update. You can't point it at a private API without doing some GitHub org-level nonsense. As far as my job is concerned, it's a non-feature until you can point it your own API without jumping through hoops.
I absolutely don't, and I'd love if you could highlight a spot where I suggested I was. As I said, the problem isn't that I don't want to try using an agent, the problem is that I can't because one incredibly basic feature is missing from VS Code's agent thing.
I'll occasionally use chatbots, mostly for spitballing non-professional stuff. They seem to do well with ideation questions like "I'm making X, what are some approaches I could take to do Y?" In other words, I've found that they're good at bullshitting and making lists. I like R1-1776, but that's only because Perplexity Playground seems less restricted than some of the other chatbots.
It's also nice for generating some boilerplate bash stuff, when I need that kind of thing. I don't need that very often, though.
What's the implication here? That Thai food was invented 1 year after Chinese food?
If you do that, I'll swallow my AI skepticism.
I would love to have an LLM that I can turn loose on an unfamiliar codebase that I can ask questions of. I would love to have an LLM that will fill in my Vulkan boilerplate. etc.
I use emacs and Mercurial. You can demonstrate magic to me and I can be convinced even if it's not mainstream.
Rewriting Javascript slop to StackOverflow standards is not convincing me.
Get to it.
(The OAuth stuff posted earlier certainly moved my needle, but the fact that they needed a gaggle of reviewers as well as hand holding when the LLM got stuck mutes the impact significantly.)
You keep using that tool, to your advantage. I’d you’re really altruistic you post some videos of how productive you can be like DHH did with his blog in 15 minute videos.
If you’re really that much more productive, the skeptics won’t be able to keep up and it should only take 6 months or some for that to become self evident.
Question: If everyone uses AI to code, how does someone become an expert capable of carefully reading and understanding code and acting as an editor to an AI?
The expert skills needed to be an editor -- reading code, understanding its implications, knowing what approaches are likely to cause problems, recognizing patterns that can be refactored, knowing where likely problems lie and how to test them, holding a complex codebase in memory and knowing where to find things -- currently come from long experience writing code.
But a novice who outsources their thinking to an LLM or an agent (or both) will never develop those skills on their own. So where will the experts come from?
I think of this because of my job as a professor; many of the homework assignments we use to develop thinking skills are now obsolete because LLMs can do them, permitting the students to pass without thinking. Perhaps there is another way to develop the skills, but I don't know what it is, and in the mean time I'm not sure how novices will learn to become experts.
The same way they do now that most code is being copied/integrated from StackOverflow.
HIM: AI is going to take all entry level jobs soon. ME: So the next level one up will become entry level? HIM: Yes. ME: Inductively, this can continue up to the CEO. What about the CEO? HIM. Wait...
The answer to that question is very different from how to become an author before LLMs, and I'm not actually sure what the answer is. It's not "write lots of stories and get feedback", the conventional approach, but something new. And I doubt it's "have an LLM generate lots of stories for you", since you need more than that to develop the skill of understanding plot structures and making improvements.
So the point remains that there is a step of learning that we no longer know how to do.
Except instead of just one language on enterprise systems no one wants to learn because there is no money in them, it's everything.
Arguments are made consistently about how this can replace interns or juniors directly. Others say LLMs can help them learn to code.
Maybe, but not on your codebase or product and not with a seniors knowledge of pitfalls.
I wonder if this will be programmings iPhone moment where we start seeing a lack of deep knowledge needed to troubleshoot. I can tell you that we’re already seeing a glut of security issues being explained by devs as “I asked copilot if it was secure and it said it was fine so I committed it”.
And as with Google and Stack Overflow before, the Sr Devs will smack the wrists of the Jr's that commit untested and unverified code, or said Jr's will learn not to do those things when they're woken up at 2 AM for an outage.
To be clear, I think any business that dumps experienced devs in favor of cheaper vibe-coding mids and juniors would be making a foolish mistake, but something being foolish has rarely stopped business types from trying.
Well, if everyone uses a calculator, how do we learn math?
Basically, force students to do it by hand long enough that they understand the essentials. Introduce LLMs at a point similar to when you allow students to use a calculator.
For example, you could require that students submit all logs of AI conversations, and show all changes they made to the code produced.
IE, yesterday I asked ChatGPT how to add a copy to clipboard button in MudBlazor. It told me the button didn't exist, and then wrote the component for me. That saved me a bunch of research; but I needed to refactor the code for various reasons.
So, if this was for an assignment, I could turn in both my log from ChatGPT, and then show the changes I made to the code ChatGPT provided.
Calculators have made most people a lot worse in arithmetic. Many people, for instance, don't even grasp what a "30%" discount is. I mean other than "it's a discount" and "it's a bigger discount than 20% and lower than 40%". I have seen examples where people don't grasp that 30% is roughly one third. It's just a discount, they trust it.
GPS navigation has made most people a lot worse at reading maps or generally knowing where they are. I have multiple examples where I would say something like "well we need to go west, it's late in the day so the sun will show us west" and people would just not believe me. Or where someone would follow their GPS on their smartphone around a building to come back 10m behind where they started, without even realising that the GPS was making them walk the long way around the building.
Not sure the calculator is a good example to say "tools don't make people worse with the core knowledge".
I mean it’s not the end of the world and as you’ve said the raw number of people of numerate people are rising thanks to technology. But technology also seem to rob people of motivation to learn somewhat useful skills and even more so with LLMs.
Arithmetic is a subset of maths.
I feel like arithmetic is part of the basics to build abstraction. If I say "y = 3x + a", somewhere I have to understand what "3 times x" means and what the "+" means, right?
Or are you saying that you can teach someone to do advanced maths without having a clue about arithmetic?
Hell, if I had to do long division today without a computer I'd have to re-derive it.
And I don't mean specifically those numbers, obviously. Same goes with 20/100, or understanding orders of magnitudes, etc.
Many people will solve a "maths problem" with their calculator, end up with a result that says that "the frog is moving at 21km/s" and not realise that it doesn't make any sense. "Well I applied the recipe, the calculator gave me this number, I assume this number is correct".
It's not only arithmetic of course, but it's part of it. Some kind of basic intuition about maths. Just look at what people were saying during Covid. I have heard so many people say completely wrong stuff because they just don't have a clue when they see a graph. And then they vote.
I feel there's a bit of a paradox, with many subjects, where we all know the basics are the absolute most important thing, but when we see the basics taught in the real world, it seems insultingly trivial.
This is not actually possible.
It absolutely is. If you can't add or subtract, what reasoning are you doing that is worthwhile?
Maybe that's because you actually can do arithmetic, to the point where it's difficult for you to see how it would be if you couldn't?
I believe that this is an acquired skill that requires basic arithmetic. But if you need a calculator to realise that 381 is roughly twice as big as 198, then you can't do any of the reasoning above.
One may say "yeah but the point of the calculator is to not have to do the reasoning above", but I disagree. In life, we don't go around with a calculator trying to find links between stuff, like "there are 17 trees in this street, 30 cars, what happens if I do 17+30? Or 30-17? Or 30*17?". But if you have some intuition about numbers, you can often make more informed decisions ("I need to wait in one of those lines for the airport security check. This line is twice as long but is divided between three officers at the end, whereas this short line goes to only one officer. Which one is likely to be faster?").
People that failed to grasp arithmetic cannot reason about numbers to a useful degree.
I think you're extrapolating far too much from such a simple interaction, which doesn't imply anything about ability to reason about numbers, just their ability to compute addition. If you say "if a is larger than b, and b is larger than c, is a larger than c?", you're testing numerical reasoning ability.
About 30% of US adults do not have the basic ability to conceptualize the relationship between whole numbers.
So GPS makes people worse at orienteering -- on average, does it get everyone where they need to go, better / faster / easier?
Sometimes, the answer is admittedly no. Google + Facebook + TikTok certainly made us less informed when they cannibalized reporting (news media origination) without creating a replacement.
But on average, I'd say calculators did make the population more mathematically productive.
After all, lots of people sucked at math before them too.
A calculator doesn't do maths, it does arithmetic. People sucked at maths, but I'm pretty sure they were better with arithmetic.
> At the end of the day, it's the average productivity across a population that matters.
You're pushing my example. My point is that AI may actually make the average developer worse. Sure, also more productive. So it will reinforce this trend that has been in the software industry for more than a decade: produce more but worse software.
Productivity explains why we do it. It doesn't mean it is desirable.
I'm suggesting you consider it from an objective perspective.
It's easily possible for an organization to be more productive with worse developers because of the tools they have access to.
And no, that's not some slight of verbal hand in measuring "productive" -- they are able to ship more value, faster.
Ship more value faster is exactly a verbal slight of hand. That's the statement used by every bad product manager and finance asshole to advocate for shipping out broken code faster. It's more value because more code is more content, but without some form of quality guard rails you run into situations where everything breaks. I've been on teams just like that where suddenly everything collapses and people get mad.
At the end of the day, coders are being paid money to produce something.
It's not art -- it's a machine that works and does a thing.
We can do that in ways that create a greater or lesser maintenance burden, but it's still functional.
LLM coding tools detractors are manufacturing reasons to avoid using another tool that helps them write code.
They need to get over the misconception of what the job is. As another comment previously quipped 'If you want to write artisanal, hand-tuned assembly that's beautiful, do that on your own time for a hobby project.'
The thread here was talking about:
> Well, if everyone uses a calculator, how do we learn math?
The question being whether or not AI will make developers worse at understanding what their code is doing. You can say that "it's okay if a website fails every 100 times, the user will just refresh and we're still more profitable". But wouldn't you agree that such a website is objectively of worse quality? It's cheaper, for sure.
Said differently: would you fly in a plane for which the autopilot was vibe coded? If not, it tells you something about the quality of the code.
Do we always want better code? I don't know. What I see is that the trend is enshittification: more profit, worse products. I don't want that.
Debatable in practice. You can't tell me you believe most developers understand what their compiler is doing, to a level of unambiguity.
Whether something gets unrolled, vectorized, or NOP-padded is mysterious. Hell, even memory management is mysterious in VM-based languages now.
And yes (to the inevitable follow-up) still deterministic, but those are things that developers used to have to know, now they don't, and the world keeps spinning.
> You can say that "it's okay if a website fails every 100 times, the user will just refresh and we're still more profitable". But wouldn't you agree that such a website is objectively of worse quality? It's cheaper, for sure.
I would say that's the reality we've been living in since ~2005. How often SaaS products have bugs? How frequently mobile apps ship a broken feature?
There are two components here: (1) value/utility & (2) cost/time.
There are many websites out there that can easily take a 1 in 100 error rate and still be useful.
But! If such a website, by dint of its shitty design, can be built with 1/100th of the resources (or 100x websites can be built with the same), then that might be a broader win.
Not every piece of code needs to fly in space or run nuclear reactors. (Some does! And it should always have much higher standards)
> Said differently: would you fly in a plane for which the autopilot was vibe coded? If not, it tells you something about the quality of the code.
I flew in a Boeing 737 MAX. To the above, that's a domain that should have called for higher software standards, but based on the incident rate I had no issue doing so.
> Do we always want better code? I don't know. What I see is that the trend is enshittification: more profit, worse products. I don't want that.
The ultimate tradeoff is between (expensive/less, better code) and (cheaper/more, worse code).
If everything takes a minimum amount of cost/effort, then some things will never be built. If that minimum cost/effort decreases, then they can be.
You and I are of like mind regarding enshittification and declining software/product standards, but I don't think standing in front of the technological advancement train is going to slow it.
If a thing can be built more cheaply, someone will do it. And then competitors will be forced to cheapen their product as well.
Imho, the better way to fight enshittification is creating business models that reward quality (and scale).
Note that I'm well aware that I won't change anything. I'm really just saying that AI will help the trend of making most software become worse. It sucks, but that's how it is :-).
But yeah I think we agree on many points here.
Suppose it depends on how quickly the generative effectiveness improves.
I'm tired of engaging with this false equivalence so I won't. Deterministic systems are not the same.
> It's not art -- it's a machine that works and does a thing.
That's right. But what you need to understand is that the machines we create can and do actively harm people. Leaking secure information, creating software that breaks systems and takes down critical infrastructure. We are engineers first and foremost and artists second. And that means designing systems to be robust and safe. If you can't understand that then you shouldn't be an engineer and should kindly fuck off.
Even more humorously, because you seem to think I'm making an argument that isn't in anything I wrote. (LLM to/in prod)
What is objective? That profitability is good? We're destroying our environment to the point where many of us will die from it for the sake of profitability. We're over-using limited natural resources for the sake of profitability. In my book that's not desirable at all.
Companies are profit-maximising machines. The path to more profitability tends to be enshittification: the company makes more money by making it worse for everybody. AI most definitely requires more resources and it seems like those resources will be used to do more, but of lower quality.
Surely that's profitable. But I don't think it is desirable.
Decreasing the cost of production can lead to more profit or cheaper product prices.
I 100% agree that the trend of quality has been downwards in many things over the past 20 years or so.
That’s not clear to me at all.
Before, you had the map. So you were aware that Fitzroy was to the west of Collingwood and both were south of Clifton Hill and so on. I had dozens of these suburbs roughly mapped out in my mind.
Driving down an unfamiliar road, one could use signs to these suburbs as a guide. I might not know exactly where I was, but I had enough of an idea to point me in the right direction.
That skill has disappeared.
>Driving down an unfamiliar road, one could use signs to these suburbs as a guide. I might not know exactly where I was, but I had enough of an idea to point me in the right direction.
Reading those sentences feels like I am dreaming. The exploration... The possibilities... Serendipitously finding you way through and getting temporarily lost at night in a big friendly suburban area with trees and in summer...
This is especially true because the general past alternative to using GPS to find some new unfamiliar place wasn't "read a map" it was "don't go there in favor of going some place you already knew" in a lot of cases. I remember the pre-GPS era, and my experience in finding new stuff is significantly better today than it was back then.
https://www.nature.com/articles/s41598-020-62877-0
If you habitually use a calculator for all arithmetic, could the result not be similar? What if you reach to an LLM for all your coding, general research, etc.? These tools may vastly speed up some workflows, but your brain is a muscle.
And you answer by saying "it's okay to be less knowledgeable (and hence depend on the tool), as long as you are more productive". Which is a different question.
But to me it's obviously not desirable: if AI allows people to completely lose all sense of critical thinking, I think it's extremely dangerous. Because whoever controls the AI controls those people. And right now, look at the techbros who control the AIs.
So the original question is: is it the case that AI reduces the skills of the people who use them? The calculator and the GPS are examples given to suggest that it doesn't sound unlikely.
> Well, if everyone uses a calculator, how do we learn math?
Which doesn't make much sense, because a calculator doesn't do maths. So I answered the question that does make sense: if everyone uses a calculator, do we still learn arithmetic? And I believe we don't.
And then, if we suck at basic arithmetic, it makes it harder to be good at maths.
I suspect there will be plenty of people who grow up in the age of LLMs and maybe by reading so much generated code, or just coding things themselves for practice, will not have a hard time learning solid coding skills. It may be easy to generate slop, but it’s also easy to access high quality guidance.
They'd also be a whole lot less useful. Calculators are great because they always do exactly what you tell them. It's the same with compilers, almost: imagine if your C compiler did the right thing 99.9% of the time, but would make inexplicable errors 0.1% of the time, even on code that had previously worked correctly. And then CPython worked 99.9% of the time, except it was compiled by a C compiler working 99.9% of the time, ...
But bringing it back on-topic, in a world where software is AI-generated, and tests are AI-generated (because they're repetitive, and QA is low-status), and user complaints are all fielded by chat-bots (because that's cheaper than outsourcing), I don't see how anyone develops any expertise, or how things keep working.
They won't, save for a relative minority of those who enjoy doing things the hard way or those who see an emerging market they can capitalize on (slop scrubbers).
I wrote this post [1] last month to share my concerns about this exact problem. It's not that using AI is bad necessarily (I do every day), but it disincentivizes real learning and competency. And once using AI is normalized to the point where true learning (not just outcome seeking) becomes optional, all hell will break loose.
> Perhaps there is another way to develop the skills
Like sticking a fork in a light socket, the only way to truly learn is to try it and see what happens.
[1] https://ryanglover.net/blog/chauffeur-knowledge-and-the-impe...
Take a domain like US taxation. You can certainly become an expert in that, and many people do. Is it a good thing that US taxes are so complicated that we have a market demand for thousands of such experts? Most people would say no.
Don't get my wronf, I've been coding for more years of being alive than I haven't by this point, I love the craft. I still think younger me would have far preferred a world where he could have just had GPT do it all for him so he didn't need to spend his lunch hours poring over the finer points of e.g. Python iterators.
To use your example, is using AI to file your taxes actually "rendering [tax] expertise economically irrelevant?" Or is it just papering over the over-complicated tax system?
From the perspective of someone with access to the AI tool, you've somewhat eased the burden. But you haven't actually solved the underlying problem (with the actual solution obviously being a simpler tax code). You have, on the other hand, added an extra dependency on top of an already over-complicated system.
However whenever I've faced actual hard tasks, things that require going off the beaten path the AI trains on, I've found it severely lacking, no matter how much or little context I give it, no matter how many new chats I make, it just won't veer into truly new territory.
I was drawing an analogy. We would probably be better off with a tax system that wasn't so complicated it creates its own specialized workforce. Similarly we would be better off with programming tools that make the task so simple that professional computer programmers feel like a 20th century anachronism. It might not be what we personally want as people who work in the field, but it's for the best.
Yeah, I was using your analogy.
> It might not be what we personally want as people who work in the field, but it's for the best.
You're inventing a narrative and borderline making a strawman argument. I said nothing about what people who work in the field "personally want." I'm talking about complexity.
> Similarly we would be better off with programming tools that make the task so simple that professional computer programmers feel like a 20th century anachronism.
My point is that if the "tools that make the task simple" don't actually simplify what's happening in the background, but rather paper over it with additional complexity, then no, we would not "be better off" with that situation. An individual with access to an AI tool might feel that he's better off; anyone without access to those tools (now or in the future) would be screwed, and the underlying complexity may still create other (possibly unforeseen) problems as that ecosystem grows.
Clearly, it would be very unwise to buy a bridge designed by an LLM.
It's part of a more general problem - the engineering expectations for software development are much lower than for other professions. If your AAA game crashes, people get annoyed but no one dies. If your air traffic control system fails, you - and a large number of other poeple - are going to have a bad day.
The industry that has a kind of glib unseriousness about engineering quality - not theoretical quality, based on rules of thumb like DRY or faddy practices, but measurable reliability metrics.
The concept of reliability metrics doesn't even figure in the LLM conversation.
That's a very bizarre place to be.
I believe it's important for students to learn how to write data structures at some point. Red black trees, various heaps, etc. Students should write and understand these, even though almost nobody will ever implement one on the job.
Analogously electrical engineers learn how to use conservation laws and Ohm's law to compute various circuit properties. Professionals use simulation software for this most of the time, but learning the inner workings is important for students.
The same pattern is true of LLMs. Students should learn how to write code, but soon the code will write itself and professionals will be prompting models instead. In 5-10 years none of this will matter though because the models will do nearly everything.
And for many professions, true expertise only comes after years on the job, building on the foundation created by the college degree. If students graduate and immediately start using LLMs for everything, I don't know how they will progress from novice graduate to expert, unless they have the self-discipline to keep getting deliberate practice. (And that will be hard when everyone's telling them they're an idiot for not just using the LLM for everything)
I’ve become faster at the novice -> experienced arc with LLMs, even in domains that I have absolutely no prior experience with.
But yeah, the people who just use LLMs for “magic oracle please tell me what do” are absolutely cooked. You can lead a horse to water, but you can’t make it drink.
Well, if you’re a novice, don’t do that. I learn things from LLMs all the time. I get them to solve a problem that I’m pretty sure can be solved using some API that I’m only vaguely aware of, and when they solve it, I read the code so I can understand it. Then, almost always, I pick it apart and refactor it.
Hell, just yesterday I was curious about how signals work under the hood, so I had an LLM give me a simple example, then we picked it apart. These things can be amazing tutors if you’re curious. I’m insatiably curious, so I’m learning a lot.
Junior engineers should not vibe code. They should use LLMs as pair programmers to learn. If they don’t, that’s on them. Is it a dicey situation? Yeah. But there’s no turning back the clock. This is the world we have. They still have a path if they want it and have curiosity.
I agree, and it sounds like you're getting great results, but they're all going to do it. Ask anyone who grades their homework.
Heck, it's even common among expert users. Here's a study that interviewed scientists who use LLMs to assist with tasks in their research: https://doi.org/10.1145/3706598.3713668
Only a few interviewees said they read the code through to verify it does what they intend. The most common strategy was to just run the code and see if it appears to do the right thing, then declare victory. Scientific codebases rarely have unit tests, so this was purely a visual inspection of output, not any kind of verification.
I approach problems with curiosity because I know that this is the only way I’ll find a way to survive and thrive again.
Except it's impossible to follow your curiosity when everything in the world is pushing against it (unless you are already financially independent and only programming for fun). Junior developers compete in one of the most brutal labor markets in the world, and their deliverables are more about getting things done on time than doing things better. What they "should" do goes out the window once you step out of privilege and look at the real choices.
I think it works a bit like pre-training: to find what you want quickly you need to have a model of coding process, i.e. why certain files were put into certain directories, etc.
I don't think this process is incompatible with LLM use...
I find these tools incredibly useful. But I constantly edit their output and frequently ask for changes to other peoples' code during review, some of which is AI generated.
But all of that editing and reviewing is informed by decades of writing code without these tools, and I don't know how I would have gotten the reps in without all that experience.
So I find myself bullish on this for myself and the experienced people I work with, but worried about training the next generation.
But instead of asking for just working code, I would create a small wrapper for a popular AI. I would insist that the student use my wrapper to create the code. They must instruct the AI how to fix any non-working code until it works. Then they have to tell my wrapper to submit the code to my annotator. Then they have to annotate every line of code as to why it is there and what it is doing.
Why my wrapper? So that you can prevent them from asking it to generate the comments, and so that you know that they had to formulate the prompts themselves.
They will still be forced to understand the code.
Then double the number of problems, because with the AI they should be 2x as productive. :)
But understanding is just one part of the learning process, isn't it? I assume everybody has had this feeling: the professor explains maths on the blackboard, and the student follows. The students "understands" all the steps: they make sense, they don't feel like asking a question right now. Then the professor gives them an exercise slightly different and asks to do the same, and the students are completely lost.
Learning is a loop: you need to accept it, get it in your memory (learn stuff by heart, be it just the vocabulary to express the concepts), understand it, then try to do it yourself. Realise that you missed many things in the process, and start at the beginning: learn new things by heart, understand more, try it again.
And beyond that, do they really need to understand how it works? I never learned how to calculate logarithms by hand, but I know what they are for and I know when to punch the button on the calculator.
I'll never be a top tier mathematician, but that's not my goal. My goal is to calculate things that require logs.
If they can get the AI to make working code and explain why it works, do they need to know more than that, unless they want to be top in their field?
Making working code is the easy part. Making maintainable code is a completely different story.
And again, being able to explain why something works requires superficial knowledge. This is precisely why bugs pass through code reviews: it's hard to spot a bug by reading code that looks like it should work.
Students emerge from lectures with a bunch of vague, partly contradictory, partly incorrect ideas in their head. They generally aren't aware of this and think the lecture "made sense." Then they start the homework and find they must translate those vague ideas into extremely precise code so the computer can do it -- forcing them to realize they do not understand, and forcing them to make the vague understanding concrete.
If they ask an AI to write the code for them, they don't do that. Annotating has some value, but it does not give them the experience of seeing their vague understanding run headlong into reality.
I'd expect the result to be more like what happens when you show demonstrations to students in physics classes. The demonstration is supposed to illustrate some physics concept, but studies measuring whether that improves student understanding have found no effect: https://doi.org/10.1119/1.1707018
What works is asking students to make a prediction of the demonstration's results first, then show them. Then they realize whether their understanding is right or wrong, and can ask questions to correct it.
Post-hoc rationalizing an LLM's code is like post-hoc rationalizing a physics demo. It does not test the students' internal understanding in the same way as writing the code, or predicting the results of a demo.
> The expert skills... currently come from long experience writing code
Do they? Is it the writing that's important? Or is it the thinking that goes along with it? What's stopping someone from going through LLM output, going back and forth on design decisions with the LLM, and ultimately making the final choice of how the tool should mold the codebase after seeing the options
I mean of course this requires some proactive effort on your part.. but it always did
The key point I think though is to not outsource your thinking. You can't blindly trust the output. It's a modern search engine
I learned long ago that I could read a book, study it, think about it. And I still would really master the material until I built with it.
LLMs are very much like pair programmers in my experience. For the junior engineer, they are excellent resources for learning, the way a senior engineer might be. Not only can they code what the junior can’t, they can explain questions the junior has about the code and why it’s doing what it’s doing.
For senior devs, it is a competent pair programmers, acting as an excellent resource for bouncing ideas off of, rubber ducking, writing boilerplate, and conducting code reviews.
For expert devs, it is a junior/senior dev you can offload all the trivial tasks to so you can focus on the 10% of the project that is difficult enough to require your expertise. Like a junior dev, you will need to verify what it puts together, but it’s still a huge amount of time saved.
For junior devs specifically, if they are not curious and have no interest in actually learning, they will just stop at the generated code and call it a day. That’s not an issue with the tool, it’s an issue with the dev. For competent individuals with a desire to learn and grow, LLMs represent one of the single best resources to do so. In that sense, I think that junior devs are at a greater advantage than ever before.
Hard disagreeing here. It's a difference to work on a task because you feel it brings you tangible progress or because it's an artificial exercise that you could really do with one sentence to Claude if it weren't for the constraints of the learning environment. This feeling is actually demotivating for learning.
"You're nuts!" says the guy with his head intentionally buried in the sand. Also way to tell me your business model is a joke without telling me your business model is a joke. Enjoy it while it lasts.
Whoever says otherwise should read their own comments from 2 years ago and see how wrong they were about where AI is today.
Not saying singularity will happen for sure, but is it a possibility? Hell yeah.
They are constantly talking about AI doing all the tedious boilerplate bullshit. Don't get me wrong, some of my code is that too and its not fun. However the pro-AI people talk as if 80% of your day is dealing with that. For me its simply a rare enough occurence that the value proposition isn't that big. If that is the killer app of AI, it just doesn't sound that exciting to me.
Maybe I'm doing it wrong, but I seem to have settled on the following general algorithm:
* ask the agent to green-field a new major feature.
* watch the agent spin until it is satisfied with its work.
* run the feature. Find that it does not work, or at least has major deficiencies [1]
* cycle through multiple independent iterations with the agent, doing something resembling "code review", fixing deficiencies one at a time [2]
* eventually get to a point where I have to re-write major pieces of the code to extract the agent from some major ditch it has driven into, leading to a failure to make forward progress.
Repeat.
It's not that the things are useless or "a fad" -- they're clearly very useful. But the people who are claiming that programmers are going to be put out of business by bots are either a) talking their book, or b) extrapolating wildly into the unknown future. And while I am open to the argument that (b) might be true, what I am observing in practice is that the rate of improvement is slowing rapidly, and/or the remaining problems are getting much harder to solve.
[1] I will freely grant that at least some of these major deficiencies typically result from my inability / unwillingness to write a detailed enough spec for the robot to follow, or anticipate every possible problem with the spec I did bother to write. T'was ever thus...
[2] This problem is fractal. However, it's at least fun, in that I get to yell at the robot in a way that I never could with a real junior engineer. One Weird Fact about working with today's agents is that if you threaten them, they seem to do better work.
It’s a good idea to periodically revisit and re-evaluate AI and tooling. I’ve noticed that many programmers tried AI when, for example, GPT-3.5 was first released, became frustrated, and never gave it another chance—even though newer models like o4-mini are now capable of much more, especially in programming tasks.
AI is advancing rapidly. With the latest models and the right tools, what’s possible today far exceeds what was possible even just a short time ago (3-12 months ago even).
Take a look at Cursor or Windsurf or Roo code or aider to "feed" AI with code and take a look at models like Google Gemini 2.5 Pro, Claude Sonnet 4, OpenAI o4mini. Also educate yourself about agents and MCP. Soon that will be standard for many/every programmer.
There is no magic bullet.
Ai is a tool that makes us go faster. Even if there is iteration and tidy up. You can still smash out feature in a fraction of the time it takes to manually roll it.
Anyone who disagrees with this or thinks ai is not useful is simply not good at what they do to begin with and feel threatened. They will be replaced.
Show me an AI agent adding a meaningful new feature or fixing a complicated bug in an existing codebase that serves the needs of a decent sized business. Or proposing and implementing a rearchitecture that simplifies such a codebase while maintaining existing behavior. Show me it doing a good job of that, without a prompt from an experienced engineer telling it how to write the code.
These types of tasks are what devs spend their days actually doing, as far as coding is concerned (never mind the non coding work, which is usually the harder part of the job). Current AI agents simply can't do these things in real world scenarios without very heavy hand holding from someone who thoroughly understands the work being done, and is basically using AI as an incredibly fast typing secretary + doc lookup tool.
With that level of hand holding, it does probably speed me up by anywhere from 10% to 50% depending on the task - although in hindsight it also slows me down sometimes. Net hours saved is anywhere from 0 to 10 per week depending on the week, erring more on the lower end of that distribution.
I still think the tone is silly and polarizing, particularly when it's replying to a comment where I am very clearly not arguing against use of the tools.
No
They can be good but people spend more time fighting them and throwing up imaginary walls and defending their skillset rather than actually learning how to use these tools to be successful.
And why shouldn't anyone defend their skill set?
Edit: I went through your recent comment history and it turns out you're just not the type of person I enjoy interacting with. You seem to take a good amount of joy in putting down others who disagree with you, or fantasizing about their financial ruin.
I admit my line of work may not be exactly generic crud work, but then again if it's not useful for anything just one step above implementing a user login for a website or something, then is it really gonna take over the world and put me out of a job in 6 months?
Yeah no thanks I'll just take that time and code the feature myself, while you're still pulling out your hairs on "crafting" the perfect pRoMpT
I have a cursor rule file which details my project structure, tools, commands to build, test, etc. Adds some do's and don'ts. This was quite lengthy to do initially on existing projects, cursor is able to assist with generating the first take which i manually reviewed and revised.
Since then I just tag a couple of files, explain what i want, and switch to another worktree and work on another feature while the first is being done.
Claude and Gemini both look at the file structure and match it, naming, etc.
I don't get why people push this LLM fomo. The tools are evolving so fast anyways
If it keeps getting better, I'll just start using it more. It's not hard to use, so the FOMO "you have to be using this RIGHT NOW" stuff is just ridiculous.
Some simulation I worked on for 2 months were in total 400 lines of code. Typing it out was never the bottleneck. I need to understand the code so that when I am studying the code for the next 1 1/2 months I can figure out if the problem is a bug in my code, or the underlying model is wrong.
Lack of correctness, lack of understanding and ability to reason about behaviour, and poor design that builds up from commercial pressure to move quickly are the problems we need to be solving. We’re accelerating the rate at which we add levels to a building with utterly rotten foundations.
God damn it, I’m growing to loathe this industry.
So good developers will become amazing with the assistance of ai while the rest will become unemployed and find work elsewhere. So we are healing the industry. Because without ai. The industry is a hell of a lot worse. You only have to look at the replies on HN to see how laughable the industry is.
“I wrote 400 lines of code I don’t understand and need months to understand it because ai obviously cant understand it or break it down and help me document it”
“Speed is what caused problems! Because I don’t know how to structure code and get ai to structure it the same it’s obviously going rogue and doing random things I cannot control so it’s wrong and causing a mess!!!”
“I haven’t been able to use it properly so don’t know how to rein it in to do specific tasks so it produces alot of stuff that takes me ages to read! I could have written it faster!!!”
I would love to see what these people are doing 1-2 years from now. If they eventually click or if they are unemployed complaining ai took their jobs.
I utilize AI as a part of my workflows, but I'm pretty sure I'll be replaced anyway in 5-10 years. I think software development is a career dead-end now, except if you're doing things much closer to hardware than average dev.
I don't really doubt that AI can put together your Nth Rails backend that does nothing of note pretty solidly, but I know it can't even write a basic, functioning tokenizer + parser in a very simple, imperative language (Odin) for a Clojure-like language. It couldn't even (when given the source for a tokenizer) write the parser that uses the tokenizer either.
These are very basic things that I would expect juniors with some basic guidance to accomplish, but even when using Cursor + Claude Sonnet 3.5 (this was 2-3 months ago, I had seen recommendations about exactly the combinations of tools I was attempting to use, so I don't really buy the argument that somehow it was the choice of tools that was wrong) it fell apart and even started adding functions it already added before. At some point I seeded it with properly written parser functions to give it examples of what it needs to accomplish, but it kept basically failing completely when having access to literally all the code it needed.
I can't even imagine how badly it'd fail to handle the actual complicated parts of my work where you have to think across 3 different context boundaries (simulation -> platform/graphics API -> shader) in order to do things.
Ha. Funny you should say that....recently I've been using AI to green-field a new Rails project, and my experience with it has been incredibly mixed, to say the least.
The best agents can, more or less, crank out working code after a few iterations, but it's brittle, and riddled with bad decisions. This week I had to go through multiple prompt iterations trying to keep Claude 3.7 from putting tons of redundant logic in a completely unnecessary handler block for ActiveRecord::RecordNotFound exceptions -- literally 80% of the action logic was in the exception handler, for an exception that isn't really exceptional. It was like working with someone who just learned about exceptions, and was hell-bent on using them for everything. If I wasn't paying attention the code may have worked, I suppose, but it would have fallen apart quickly into an incomprehensible mess.
The places where the AI really shines are in boilerplate situations -- it's great for writing an initial test suite, or for just cranking out a half-working feature. It's also useful for rubber ducking, and more than occasionally breaks me out of debugging dead ends, or system misconfiguration issues. That's valuable.
In my more cynical moments, I start to wonder if the people who are most eager to push these things are 0-3 years out of coding bootcamps, and completely overwhelmed by the boilerplate of 10+ years of bad front-end coding practices. For these folks, I can easily see how a coding robot might be a lifeline, and it's probably closer to the sweet spot for the current AI SOTA, where literally everything you could ever want to do has been done and documented somewhere on the web.
You're right, I'm very likely overestimating the output even though I'm on the skeptical end of it.
> The places where the AI really shines are in boilerplate situations -- it's great for writing an initial test suite
I definitely do agree with this; and I would add that at that point you can really make do with tab-completion and not a full agent workflow. I used this successfully even back in 2021-2022 with Copilot.
> In my more cynical moments, I start to wonder if the people who are most eager to push these things are 0-3 years out of coding bootcamps
I think it's all in all a mix of a lot of factors: I think spending your time mostly on well-trodden ground will definitely give you a sense of GenAI being more useful than if you aren't, and I think most newer programmers spend most of their time on exactly that. They may also be given tasks even at work that are more boilerplatey in nature; they're definitely not making deep design decisions as a rule, or having anything to do with holistic architectural decisions.
Just to make it concrete, today I spent a few hours going through a bunch of HTML + embedded styles and removing gobs and gobs of random styles the LLMs glommed on that "worked", but was brittle and failed completely as soon as I wanted to do something slightly different than the original spec. The cycle I described above led to a lot of completely unnecessary markup, paired with unnecessary styles to compensate for the crappiness of the original DOM. I was able to refactor to a much saner overall structure, but it took some time and thinking. Was I net ahead? I don't really know.
Given that LLMs almost always write this kind of "assembled from StackOverflow" code, I have precisely 0% confidence that I'd end up in a better place if I just reset the working branch and started from scratch.
It kind of reminds me of human biology -- given billions of years of random evolution you can end up with incredible sophistication, but the end result will be incomprehensible and nearly impossible to alter.
As it stands AI can't even get out of Lt Surge's gym in Pokemon Red. When an AI manages to beat Lance I'll start to think about using it for writing my code :-)
My experience has been decent. I don't know that it has truly saved me much time but I can understand how it FEELS like it has. Because it's writing so much code (sometimes), it's hard to vet all of it and it can introduce subtle bugs based on faulty assumptions it made about different things. So, it will dump a lot of code at once, which will get me 90% of the way there, but I could spend an hour or two trying to nudge it to fix it to get it to 100%. And then I will probably still need to go back and reorganize it, or have it go back and reorganize it. And then sometimes it will make small adjustments to existing, committed code that will subtly break other things.
Something that has surprised me (in hindsight, it isn't surprising) is that sometimes when I feel like it misunderstood something or made a faulty assumption, it was actually me that had the misunderstanding or ignorance which is humbling at times and a good learning experience. It is also pretty good at bug hunting and DEFINITELY very good at writing unit tests.
I count myself as pretty lucky that this domain seems to be very well covered in training. Given the law of averages, most people's domains will probably be covered. I'm not sure how it would fare with a niche domain.
This is roughly my experience as well. The AI is great at the first 90% of the work and actively counterproductive for the remaining 90%
On the other hand, if you slip up and give it too much to chew on or just roll bad RNG, it will spin itself into a loop attempting many variations of crap, erasing and trying over, but never actually coming closer to a correct solution, eventually repeating obviously incorrect solutions over and over again that should have been precluded based on feedback from the previous failed solutions. If you're using a SOTA model, you can easily rack up $5 or more on a single task if you give it more than 30 minutes of leeway to work it out. Sure, you could use a cheaper model, but all that does is make the fundamental problem worse - i.e. you're spending money but not actually getting any closer to completed work.
Yes, the models are getting smarter and more efficient, but we're still at least a decade away from being able to run useful models at practical speeds locally. Aggressively quantized 70b models simply can't cut it, and even then, you need something like 10k tps to start building LLM tools that can overcome the LLM's lack of reasoning skills through brute force guess and check techniques.
Perhaps some of the AI skeptics are a bit too harsh, but they're certainly not crazy in the context of breathless hype.
Those $20/month plans usually come with throttling, or other type of service degradation once you max out their allotment.
Seriously? Is this argument in all earnestly "No profession has been more contemptuous therefore we should keep on keeping on"? Should we as an industry not bother to try and improve our ethics? Why don't we all just make munitions for a living and wash our hands of guilt because "the industry was always like this".
Seems a bit ironic against the backdrop of <https://news.ycombinator.com/user?id=tptacek>:
> All comments Copyright © 2010, 2011, 2012, 2013, 2015, 2018, 2023, 2031 Thomas H. Ptacek, All Rights Reserved.
(although perhaps this is tongue-in-cheek given the last year)
I am actually doing this the whole day long. For example i have setup today a fresh new debian vps for some interns. U had to provide them with a docker system, support for go, nginx stuff and i made a quick hello world app in angular with a go backend. I could have done it myself. But i asked chatgpt to provide me with all the commands and code. No idea how an agent could do this for me. I got everything running in like 30 minutes.
This entire section reads like, oddly, the reverse of the "special pleading" argument that I usually see from artists. Instead of "Oh, it's fine for other fields, but for my field it's a horrible plagiarism machine", it's the reverse: "Oh, it's a problem for those other fields, but for my field get over it, you shouldn't care about copyright anyway".
I'm all for eliminating copyright. The day I can ignore the license on every single piece of proprietary software as I see fit, I'll be all for saying that AIs should be able to do the same. What I will continue to complain about is the asymmetry: individual developers don't get to violate individual licenses, but oh, if we have an AI slurp up millions of codebases and ignore their licenses, that's fine.
No. No, it isn't. If you want to ignore copyright, abolish it for everyone. If it still applies to everyone else, it should still apply to AIs. No special exceptions for mass-scale Open Source license violations.
It's interesting as typeface plagiarism became rampant beginning in the 70s when more accurate photo reproductions made it trivial. This was problematic for designers wanting to make a livelihood, which is something ITC sought to mitigate by better up-front payments (IIRC from U&lc's coverage) to incentivize quality typeface creation.
There's a distinction though between literal plagiarism and just inspiration from elements. US copyright law doesn't protect either for typeface forms but ironically it does allow copyright for the code used in font files.
I've seen OpenAI's o3-mini (their reasoning model) suggest verbatim code and comments that I found on Github predating LLMs by years. It seems the more times the same code and comment appears online the more likely this is to occur. I'd imagine there would be studies looking into the scope and frequency this occurs and how much is considered fair use.
Personally I find AI coding tools situationally useful. I certainly wouldn't use them to write all my code, but I also think I'd be a fool not to leverage them at all.
For those that think only procedurally, I can see how it helps them. Because procedural first development has a lot of boilerplate logic.
For those who think model first, the AI may help them rubber duck, but ultimately the physical writing of the characters is minimal.
Most of my time is thinking about the data model. The AI writes almost all of my procedures against said data model. But that is about 20% speedup.
I think in code.
To me, having to translate the into natural language for the LLM to translate it back into code makes very little sense.
Am I alone in this camp? What am I missing?
But if you don't have the shape of a solution? Might be faster to have an AI find it. And then either accept AI's solution as is, or work off it.
I quite often prompt with code in a different language, or pseudo-code describing roughly what I am trying to achieve, or a Python function signature without the function body.
Or I will paste in a bunch of code I have already written with a comment somewhere that says "TODO: retrieve the information from the GitHub API" and have the model finish it for me.
This, and for multiple functions that end up composing well together as per their signatures. Maybe there's one public function I want to document well, so I write the docstring myself, and it's the result for 3-4 other internal functions which I'd let the LLM implement.
The nice part is that even if the LLM fails, all that is not lost, as opposed to some weird spec I'd feed an LLM but that's too verbose for a human reader, or a series of prompts.
Natural language is just a terrible interface and fundamentally not an appropriate one to communicate with a computer.
I wonder if I'm in the minority here because I'm neurodivergent.
Here’s the thing from the skeptic perspective: This statement keeps getting made on a rolling basis. 6 months ago if I wasn’t using the life-changing, newest LLM at the time, I was also doing it wrong and being a luddite.
It creates a never ending treadmill of boy-who-cried-LLM. Why should I believe anything outlined in the article is transformative now when all the same vague claims about productivity increases were being made about the LLMs from 6 months ago which we now all agree are bad?
I don’t really know what would actually unseat this epistemic prior at this point for me.
In six months, I predict the author will again think the LLM products of 6 month ago (now) were actually not very useful and didn’t live up to the hype.
LLMs get better over time. In doing so they occasionally hit points where things that didn't work start working. "Agentic" coding tools that run commands in a loop hit that point within the past six months.
If your mental model is "people say they got better every six months, therefore I'll never take them seriously because they'll say it again in six months time" you're hurting your own ability to evaluate this (and every other) technology.
Today it works, it didn't in the past, but it does now. Rinse and repeat.
Really think about it and ask yourself if it's possible that AI can make any, ANY work a little more efficient?
1. LLM fanboy: "LLMs are awesome, they can do x, y, and z really well."
2. LLM skeptic: "OK, but I tried them and found them wanting for doing x, y, and z"
3. LLM fanboy: "You're doing it wrong. Do it this way ..."
4. The LLM skeptic goes to try it that way, still finds it unsatisfactory. A few months pass....
5. LLM fanboy: "Hey, have you tried model a.b.c-new? The problems with doing x, y, and z have now been fixed" (implicitly now agrees that the original complaints were valid)
6. LLM skeptic: "What the heck, I though you denied there were problems with LLMs doing x, y, and z? And I still have problems getting them to do it well"
7. Goto 3
You write like this is some grand debate you are engaging in and trying to win. But to people on what you see as the other side, there is no debate. The debate is over.
You drag your feet at your own peril.
I’ve definitely seen humans do stuff in an hour that takes others days to do. In fact, I see it all the time. And sometimes, I know people who have skills to do stuff very quickly but they choose not to because they’d rather procrastinate and not get pressured to pick up even more work.
And some people waste even more time writing stuff from scratch when libraries exist for whatever they’re trying to do, which could get them up and running quickly.
So really I don’t think these bold claims of LLMs being so much faster than humans hit as hard as some people think they do.
And here’s the thing: unless you’re using the time you save to fill yourself up with even more work, you’re not really making productivity gains, you’re just using an LLM to acquire more free time on the company dime.
You might as well do that since any productivity gains will go to your employer, not you.
In reality, there is a limit to how quickly tasks can be done. Around here, the size of PRs usually have changes that most people could just type out in under 30 minutes if they knew exactly what to type. However, getting to the point where you know exactly what you need to type takes days or even weeks, often collaborating across many teams and thinking deep about potential long term impacts down the road, and balancing company ROI and roadmap objectives, perhaps even running experiments.
You cannot just throw LLMs at those problems and have them wrapped up in an hour. If that’s what you’re doing, you’re not working on big problems, you’re doing basic refactors and small features that don’t require high level skills, where the bottleneck is mostly how fast you can type.
That's an argument for LLMs.
>you’re just using an LLM to acquire more free time on the company dime.
This is a bad thing?
I've been using Cline and it can do a few of the things suggested as "agentic", but I'd have no idea how to leave it writing and then running tests in a VM and creating a PR for me to review. Or let it roam around in the file tree and create new files as needed. How does that work? Are there better tools for this? Or do I need to configure Cline in some way?
I actually do most of my "agentic coding" (not a fan of the term, but whatever) in ChatGPT Code Interpreter, which hasn't changed much in two years other than massive upgrades to the model it uses - I run that mainly via o4-mini-high or o3 these days.
OpenAI's Codex is a leading new thing, but only if you pay $200/month for it. Google's equivalent https://jules.google/ is currently free.
GitHub Copilot gained an "agent mode" recently: https://github.blog/ai-and-ml/github-copilot/agent-mode-101-...
There's also Copilot Coding Agent, which is confusingly an entirely different product: https://github.blog/changelog/2025-05-19-github-copilot-codi...
Do you use ChatGPT Code Interpreter because it's better, or is it just something you're more familiar with and you're sticking with it for convenience?
Of course, I don't know how one would structure a suitable test, since doing it sequentially would likely bias the later agents with clearer descriptions & feedback on the tasks. I imagine familiarity with how to prompt each particular model is also a factor.
New models come out all the time. One of the most interesting signals to look out for is when they tip over the quality boundary from "not useful at task X" to "useful at task X". It happened for coding about a year ago. It happened for search-based research assistants just two months ago, in my opinion - I wrote about that here: https://simonwillison.net/2025/Apr/21/ai-assisted-search/
Yes, but other smart people were making this argument six months ago. Why should we trust the smart person we don't know now if we (looking back) shouldn't have trusted the smart person before?
Part of evaluating a claim is evaluating the source of the claim. For basically everybody, the source of these claim is always "the AI crowd", because those outside the AI space have no way of telling who is trustworthy and who isn't.
In general, part of being an effective member of human society is getting good at evaluating who you should listen to and who is just hot air. I collect people who I consider to be credible and who have provided me with useful information in the past. If they start spouting junk I quietly drop them from my "pay attention to these people" list.
otherwise, yes, you'll continue to be irritated by AI hype, maybe up until the point where our civilization starts going off the rails
For coding it seems to back itself into a corner and never recover from it until i "reset" it .
AI can't write software without an expert guiding it. I cannot open a non trivial PR to postgres tonight using AI.
100% true, but is that really what it would take for this to be useful today?
- they can't be aware of the latest changes in the frameworks I use, and so force me to use older features, sometimes less efficient
- they fail at doing clean DRY practices even though they are supposed to skim through the codebase much faster than me
- they bait me into inexisting apis, or hallucinate solutions or issues
- they cannot properly pick the context and the files to read in a mid-size app
- they suggest to download some random packages, sometimes low quality ones, or unmaintained ones
of course they can, teach them / feed them latest changes or whatever you need (much like another developer unaware of the same thing)
they fail at doing clean DRY practices even though they are supposed to skim through the codebase much faster than me
tell them it is not DRY until they make it DRY. for some (several projects I’ve been involved with) DRY is generally anti-pattern when taken to extremes (abstraction gone awry etc…). instruct it what you expect and it and watch it deliver (much like you would another developer…)
they bait me into inexisting apis, or hallucinate solutions or issues
tell it when it hallucinates, it’ll correct itself
they cannot properly pick the context and the files to read in a mid-size app
provide it with context (you should always do this anyways)
they suggest to download some random packages, sometimes low quality ones, or unmaintained ones
tell it about it, it will correct itself
LLMs are stupid - nothing magic, nothing great. They’re just tools. The problem with the recent LLM craze is that people make too many obviously partially true statements.
Claude 4 has a training cut-off of March 2025, I tried something today about its own API and it gave me useful code.
no it doesn't. Are you serious?
perhaps ppl building crud webapps have different experience than ppl building something niche?
This is the kind of reasoning that dominates LLM zealotry. No evidence given for extraordinary claims. Just a barrage of dismissals of legitimate problems. Including the article in discussion.
All of this makes me have a hard time taking any of it seriously.
That's mostly solved by the most recent ones that can run searches. I've had great results from o4-mini for this, since it can search for the latest updates - example here: https://simonwillison.net/2025/Apr/21/ai-assisted-search/#la...
Or for a lot of libraries you can dump the ENTIRE latest version into the prompt - I do this a lot with the Google Gemini 2.5 models since those can handle up to 1m tokens of input.
"they fail at doing clean DRY practices" - tell them to DRY in your prompt.
"they bait me into inexisting apis, or hallucinate solutions or issues" - really not an issue if you're actually testing your code! I wrote about that one here: https://simonwillison.net/2025/Mar/2/hallucinations-in-code/ - and if you're using one of the systems that runs your code for you (as promoted in tptacek's post) it will spot and fix these without you even needing to intervene.
"they cannot properly pick the context and the files to read in a mid-size app" - try Claude Code. It has a whole mechanism dedicated to doing just that, I reverse-engineered it this morning: https://simonwillison.net/2025/Jun/2/claude-trace/
"they suggest to download some random packages, sometimes low quality ones, or unmaintained ones" - yes, they absolutely do that. You need to maintain editorial control over what dependencies you add.
I personally prefer the Claude models but they don't offer quite as rich a set of extra features.
If you want to save money, consider getting API accounts with them and spending money that way. My combined API bill across OpenAI, Anthropic and Gemini rarely comes to more than about $10/month.
See, as someone who is actually receptive to the argument you are making, sometimes you tip your hand and say things that I know are not true. I work with Gemini 2.5 a lot, and while yeah, it theoretically has a large context window, it falls over pretty fast once you get past 2-3 pages of real-world context.
> "they fail at doing clean DRY practices" - tell them to DRY in your prompt.
Likewise here. Simply telling a model to be concise has some effect, to be sure, but it's not a panacea. I tell the latest models do do all sorts of obvious things, only to have them turn around and ignore me completely.
In short, you're exaggerating. I'm not sure why.
This wasn't true of the earlier Gemini large context models.
And for DRY: sure, maybe it's not quite as easy as "do DRY". My longer answer is that these things are always a conversation: if it outputs code that you don't like, reply and tell it how to fix it.
> For the last month or so, Gemini 2.5 has been my go-to (because it can hold 50-70kloc in its context window). Almost nothing it spits out for me merges without edits.
I realize this isn't the same thing you're claiming, but it's been consistently true for me that the model hallucinates stuff in my own code, which shouldn't be possible, given the context window and the size of the code I'm giving to it.
(I'm also using it for other, harder problems, unrelated to code, and I can tell you factually that the practical context window is much smaller than 2M tokens. Also, of course, a "token" is not a word -- it's more like 1/3 of a word.)
yes. this happens to me almost every time i use it. I feel like a crazy person reading all the AI hype.
This is where collaboration comes in play. If you solely rely on the LLM to “vibe code” everything, then you’re right, you get whatever it thinks is best at the time of generation. That could be wrong or outdated.
My workflow is to first provide clear requirements, generally one objective at a time. Sometimes I use an llm to format the requirements for the llm to generate code from. It then writes some code, and I review it. If I notice something is outdated I give it a link to the docs and tell it to update it using X. A few seconds later it’s made the change. I did this just yesterday when building out an integration with an api. Claude wrote the code using a batch endpoint because the steaming endpoint was just released and I don’t think it was aware of it. My role in this collaboration, is to be aware of what’s possible and how I want it to work (e.g.. being aware of the latest features and updates of the frameworks and libraries). Then it’s just about prompting and directing the llm until it works the way I want. When it’s really not working, then I jump in.
But for each nine of reliability you want out of llms everyone's assuming it's just a linear growth. I don't think it is. I think it's polynomial at least.
As for your tasks and maybe it's just cuz I'm using chat GPT, but I asked it to Port sed, something with full open source code availability, tons of examples/test cases, a fully documented user interface and I wanted it moved to Java as a library.
And it failed pretty spectacularly. Yeah it got the very very very basic functionality of sed.
Granted I was trying to do this 6 months ago, but maybe a miracle has happened. But I'm the past I had very bad experience with using LLMs for niche things (i.e. things that were never mentioned on stackoverflow)
I have no way of evaluating these myself so they might just be garbage slop.
The second one didn't work for me without some code modification (specifically, the "count code blocks" didn't work), but the results were... not impressive.
It starts by ignoring every function that begins with "FUN_" on the basis that it's "# Skip compiler-generated functions (optional)". Sorry, but those functions aren't compiler-generated functions, they're functions that lack any symbol names, which in ghidra terms, is pretty damn common if you're reverse engineering unsymbolized code. If anything, it's the opposite of what you would want, because the named functions are the ones I've already looked at and thus give less of a guideline for interesting ones to look into next.
Looking at the results at a project I had open, it's supposed to be skipping external functions, but virtually all the top xrefs are external functions.
Finally, as a "moderately complex" script... it's not a good example. The only thing that approaches that complexity is trying to count basic blocks in a function--something that actually engages with the code model of Ghidra--but that part is broken, and I don't know Ghidra well enough to fix it. Something that would be more along the lines of "moderately complex" to me would be (to use a use case I actually have right now) for example turning the constant into a reference to that offset in the assumed data segment. Or finding all the switch statements that ghidra failed to decompile!
All of the state-of-the-art models are online models - you have no choice, you have to pay for a black box subscription service controlled by one of a handful of third-party gatekeepers. What used to be a cost center that was inside your company is now a cost center outside your company, and thus it is a risk to become dependent on it. Perhaps the risk is worthwhile, perhaps not, but the hype is saying that real soon now it will be impossible to not become dependent on these closed systems and still exist as a viable company.
Tell me about this specific person who isn't famous
Create a facebook clone
Recreate Windows including drivers
Create a way to transport matter like in Star Trek.
I'll see you in 6 months.
I've tried everything. I have four AI agents. They still have an accuracy rate of about 50%.
Dude, just try the things out. It's just undeniable in my day-to-day life that I've been able to rely on Sonnet (first 3.7 and now 4.0) and Gemini 2.5 to absolutely crush code. I've done 3 side projects in the past 6 months that I would have been way too lazy to build without these tools. They work. Never going back.
I tried Copilot a few months ago just to give it a shot and so I could discuss it with at least a shred of experience with the tool, and yea, it's a neat feature. I wouldn't call it a gimmick--it deserves a little more than that, but I didn't exactly cream my pants over it like a lot of people seem to be doing. It's kind of convenient, like a smart autocomplete. Will it fundamentally change how I write software? No way. But it's cool.
Consider that what you're reacting to is a symptom of genuine, rapid progress.
The crying wolf reference only makes sense as a soft claim that LLM’s better or not, are not getting better in important ways.
Not a view I hold.
> An exponential curve looks locally the same at all points in time
This is true for any curve...If your curve is continuous, it is locally linear.
There's no use in talking about the curve being locally similar without the context of your window. Without the window you can't differentiate an exponential from a sigmoid from a linear function.
Let's be careful with naive approximations. We don't know which direction things are going and we definitely shouldn't assume "best case scenario"
Which ends up making some beautiful irony. One small seemingly trivial point fucked everything up. Even a single word can drastically change everything. The importance of subtlety being my entire point ¯\_(ツ)_/¯
Btw, my point was all about how nuances make things hard. So ironically, thanks for making my point clearer.
This is true for any curve...
If your curve is continuous, it is locally linear.
Hmm...Sometimes naive approximations are all you've got; and in fact, aren't naive at all. They're just basic. Don't overthink it.
Using Claude Sonnet 4, I attempted to add some better configuration to my golang project. An hour later, I was unable to get it to produce a usable configuration, apparently due to a recent v1-to-v2 config format migration. It took less time to hand-edit one based on reading the docs.
I keep getting told that this time agents are ready. Every time I decide to use them they fall flat on their face. Guess I'll try again in six months.
I made the mistake of procrastinating on one part of a project thinking "Oh, that is easily LLMable". By God, was I proven wrong. Was quite the rush before the deadline.
On the flip side, I'm happy I don't have to write the code for a matplotlib scatterplot for the 10000th time, it mostly gets the variables in the current scope that I intended to plot. But I've really not had that much success on larger tasks.
The "information retrieval" part of the tech is beautiful though. Hallucinations are avoided only if you provide an information bank in the context in my experience. If it needs to use the search tool itself, it's not as good.
Personally, I haven't seen any improvement from the "RLd on math problems" models onward (I don't care for benchmarks). However, I agree that deepseek-r1-zero was a cool result. Pure RL (plain R1 used a few examples) automatically leading to longer responses.
A lot of the improvements suggested in this thread are related to the infra around LLMs such as tool use. These are much more well organised these days with MCP and what not, enabling you to provide it the aforementioned information bank easily. But all of it is built on top of the same fragile next-token generator we know and love.
You can give it the docs as an "artifact" in a project - this feature has been available for almost one year now.
Or better yet, use the desktop version + a filesystem MCP server pointing to a folder containing your docs. Tell it to look at the docs and refactor as necessary. It is extremely effective at this. It might also work if you just give it a link to the docs.
I threw in the towel and had a working config in ten minutes.
People making arguments based on sweeping generalizations to a wide audience are often going to be perceived as delusional, as their statements do not apply universally to everyone.
To me, thinking LLMs can code generally because you have success with them and then telling others they are wrong in how they use them is making a gigantic assumptive leap.
If you ask different people the above question, and if you vary it based on type of task, or which human, you would get different answers. But as time goes on, more and more people would become impressed with what the human can do.
I don't know when LLMs will stop progressing, but all I know is they continue to progress at what is to me a similar astounding rate as to a growing child. For me personally, I never used LLMs for anything, and since o3 and Gemini 2.5 Pro, I use them all the time for all sorts of stuff.
You may be smarter than me and still not impressed, but I'd try the latest models and play around, and if you aren't impressed yet, I'd bet money you will be within 3 years max (likely much earlier).
In this context, never. Especially because the parent knows you will always ask 2+2 and can just teach the child to say “four” as their first and only word. You’ll be on to them, too.
On the assumption that you'll always only ask it "what's 2+2?" Keywords being "always" & "you".
In aggregate, the set of questions will continuously expand as a non-zero percentage of people will ask new questions. The set of questions asked will continue to expand, and the LLMs will continue to be trained to fill in the last 20%.
Even under the best interpretations, this is the detractors continuously moving goalposts, because the last 20% will never be filled: New tasks will continuously be found, and critics will point to them as "oh, see, they can't do that". By the time that the LLMs can do those tasks, the goalpost will be moved to a new point and they'll continue to be hypocrites.
------
> > At what point would you be impressed by a human being if you asked it to help you with a task every 6 months from birth until it was 30 years old?
Taking GP's question seriously:
When a task consisting of more than 20 non-decomposable (atomic) sub-tasks is completed above 1 standard deviation of the human average in that given task. (much more likely)
OR
When an advancement is made in a field by that person. (statistically much rarer)
The article doesn't explicitly spell it out until several paragraphs later, but I think what your quoted sentence is alluding to is that Cursor, Cline et al can be pretty revolutionary in terms of removing toil from the development process.
Need to perform a gnarly refactor that's easy to describe but difficult to implement because it's spread far and wide across the codebase? Let the LLM handle it and then check its work. Stuck in dependency hell because you updated one package due to a CVE? The LLM can (often) sort that out for you. Heck, did the IDE's refactor tool fail at renaming a function again? LLM.
I'm remain skeptical of LLM-based development insofar as I think the enshitification will inevitably come when the Magic Money Machine breaks down. And I don't think I would hire a programmer that needs LLM assistance in order to program. But it's hard to deny that it has made me a lot more productive. At the current price it's a no-brainer to use it.
(I should know since I've created half-a-dozen tools for this with gptel. Cline hasn't been any better on my codebase.)
Cline is closer in spirit to GPTel, but since CLINE is an actual business, it does seem to do well off the bat. That said, I haven't found it to be "hugely better" compared to whatever you can hack in GPTel.
Quite frankly being able to hack the tools on the go in Elisp, makes GPTel far far better (for some of us anyway).
(Thanks for creating GPTel BTW!)
The top of SWE-bench Verified leaderboard was at around 20% in mid-2024, i.e. AI was failing at most tasks.
Now it's at 70%.
Clearly it's objectively better at tackling typical development tasks.
And it's not like it went from 2% to 7%.
The pressure for AI companies to release a new SOTA model is real, as the technology rapidly become commoditised. I think people have good reason to be skeptical of these benchmark results.
But there's a plenty of people who actually tried LLMs for actual work and swear they work now. Do you think they are all lying?..
Many people with good reputation, not just noobs.
Yes, there are people who think LLMs are just a fad, just like NFTs, and I agree these people are not really serious and that they are wrong. I think anyone who has used an AI coding agent recently knows that they are highly capable and can enhance productivity in the right hands.
But, as someone who gets a lot of value in AI coding agents, my issue is not with gen AI as a productivity enhancing tool - it's with the absolute torrent of BS about how AI is soon going to make coders obsolete, and the way AI has been shoved onto many engineering teams is like yet another incarnation of the latest management fad. My specific arguments:
1. As the author pretty much acknowledges, AI agents still basically suck at large, system-wide "thinking" and changes. And the way they work with their general "guess and check" method means they can churn out code that is kinda sorta right, but often leaves huge holes or outright laughable bugs.
2. Hallucinations are the worst possible failure modes - they look correct, which makes it all the more difficult to determine they're actually bullshit. I shudder to think about who will need to maintain the mountains of "vibe code" that is now being generated. Certainly not fucking me; I had a good career but I think now is definitely the right time to peace out.
3. Even if I could totally agree that there is a strong business case for AI, I can still, as an individual, think it makes my job generally shittier, and there is nothing wrong with having that opinion.
I don't think I'd be so anti-AI if I saw a rational, cautious debate about how it can enhance productivity. But all I see are folks with a vested interest overselling its capabilities and minimizing its downsides, and it just feels really tiresome.
Right now (for me) it's very frequent, depending on the type of project, but in the future it could be less frequent - but at some you've gotta test what you're rolling out. I guess you can use another AI to do that but I don't know...
Anyway, my current workflow is:
1. write detailed specs/prompt,
2. let agent loose,
3. pull down and test... usually something goes wrong.
3.1 converse with and ask agent to fix,
3.2 let agent loose again,
3.3 test again... if something goes wrong again:
3.3.1 ...
Sometimes the Agent gets lost in the fixes but now have a better idea what can go wrong and you can start over with a better initial prompt.
I haven't had a lot of success with pre-discussing (planning, PRDing) implementations, as in it worked, but not much better than directly trying to prompt what I want and takes a lot longer. But I'm not usually doing "normal" stuff as this is purely fun/exploratory side-project stuff and my asks are usually complicated but not complex if that makes sense.
I guess development is always a lot of testing, but this feels different. I click around but don't gain a lot of insight. It feels more shallow. I can write a new prompt and explain what's different but I haven't furthered my understanding much.
Also, not knowing the codebase, you might need a couple attempts at phrasing your ask just the right way. I probably had to ask my agent 5+ times, trying to explain in different ways how translate phone IMU yaw/pitch/roll into translations of the screen projection. Sometimes it's surprisingly hard to explain what you want to happen when you don't know the how it's implemented.
I will simply not hire anybody who is not good at using LLMs, and I don't think I would ever work with anybody who thinks they aren't very useful. It's like working with somebody who things compilers are useless. Obviously wrong, not worth spending time trying to convince.
To anyone who reads this article and disagrees with the central point: You are missing the most important thing that will happen in your career. You should reevaluate because you will be unemployable in a few years.
They were opposed to C++ (they thought C was all you need), opposed to git (they used IBM clearcase or subversion), opposed to putting internal tools in a web browser (why not use Qt and install the tool), opposed to using python or javascript for web services (it's just a script kiddie language), opposed to sublime text/pycharm/vscode (IDEs are for people who don't know how to use a CLI).
I have encountered it over and over, and each time these people get stuck in late career jobs making less than 1/3 of what most 23 year old SWEs I know are making.
But then hindsight is 20/20.
My most successful "this is doomed to fail" grouchiness was social media games (like Farmville).
But I just can't think of any examples in the dev tooling space.
You can rightly avoid new things 99% of the time, but if you miss the 1% of things that matter, you get left behind.
On the other hand if you adopt the latest thing 100% of the time and 99% of those things are a waste, you will probably be fine.
But if you expect to get paid, you need to keep up and stay productive.
And it doesn't burn everyone out. All of the best 50+ year old engineers I know use LLMs constantly.
Back when I was in college in the 00s, if I had developed a preference for not using compilers in my work, I might have been able to build a career that way, but my options would have been significantly limited. And that's not because people were just jerks who were biased against compiler skeptics, or evil executives squeezing the bottom line, or whatever. It's because the kind of software most people were making at that period of time would have been untenable to create without higher level languages.
In my view, we clearly aren't at this point yet with llm-based tooling, and maybe we never will be. But it seems a lot more plausible to me that we will than it did a year or even six months ago.
When this happens, then I'll figure out how to get good at prompting.
"When disagreeing, please reply to the argument instead of calling names. 'That is idiotic; 1 + 1 is 2, not 3' can be shortened to '1 + 1 is 2, not 3."
Because we've recently been getting a series of low quality submissions which obviously drive low quality discussion, but for some reason it's only the commenters who get reprimanded, as if the submissions weren't ruining the mood one by one.
(And to clarify, I haven't been warned, I'm not writing this out of spite.)
Also, the author didn't say that "ease of use with LLMs" is the _only_ factor that matters. Julia could have other things wrong with it that prevent it from being adopted.
One of the more eye-opening aspects of this technology is finding out how many of my peers seemingly have no understanding or respect for the concept of art.
Music, for example, is an incredibly commercialized art. Replacing every song or album I have ever purchased with AI generated facsimiles is also an incredibly depressing thought. And let me tell you, my tastes aren't "a few elite artists doing paid high culture art".
I would hope people still find value in painting, especially in a world with photography. That is even ignoring the strained nature of this analogy. The context of the original quote was in a discussion of the inherent plagiarism of AI. Photography wasn't invited by stealing painters work.
Does knowing a human made something automatically make it more valuable? Should it? Shouldn't the work speak for itself, rather than rely on the cult of personality around its creator?
These discussions always seem to focus on form as if that is what defines art. But in many cases concept is more important. Duchamp didn't craft the urinal. The idea was the art. If a piece moves someone, and that reaction changes based on who or what made it, what does that really say about how we judge art?
Personally, I think in a lot of cases, people want to feel some sort of emotional connection with the artist through their creation, which doesn't work if you know it's AI created.
[1] https://commons.wikimedia.org/wiki/File:Adolf_Hitler_Der_Alt...
> Music, for example, is an incredibly commercialized art. Replacing every song or album I have ever purchased with AI generated facsimiles is also an incredibly depressing thought.
And just to be clear, I'm not saying you're wrong.
> I would hope people still find value in painting, especially in a world with photography.
Sure, people do, but it is now a hobby for some and high art for a smaller number of professional painters, but the market willing to sustain a large number of professional painters doing portraits is gone.
> That is even ignoring the strained nature of this analogy. The context of the original quote was in a discussion of the inherent plagiarism of AI. Photography wasn't invited by stealing painters work.
I think the analogy is relevant because I am discussing the plagiarism of AI in relation to the economic aspects of copyright infringement and the impacts on the market for artists and SW devs. Not in relation to the moral rights[1] of authors. The issue of artists being annoyed on principle, not on economic effects, that some souless computer is producing plagiarist art that imitates their artstyle without attribution is a separate but closely related issue. I'm not sure but I think the article is more concerned with the former issue.
How can you justify separating the two concerns? This article is a defense of AI against its critics. It is a pretty poor defense if the argument is along the lines of "certain ethical concerns don't count". The author being "more concerned with" one issue doesn't make the other issue invalid or irrelevant.
The error here is that the quote from the article says "the median artist" and you have never purchased a song or album by a median musician.
Someone who points out I'm mostly doing mundane things is telling the truth, not that they underestimate my knowledge or skills or don't appreciate the beauty of code.
My point is that nobody argues that AI replaces art that survived millennia.
(Especially if that code was spit out by an alien copypasta that is really good at sounding plausible with zero actual intelligence or intent?)
Like, if all I care about is: does it have enough unit tests and do they pass, then yeah I can audit that.
But if I was trying to solve truly novel problems like modeling proteins, optimizing travel routes, or new computer rendering techniques, I wouldn’t even know where to begin, it would take tons of arduous study to understand how the new project full of novel algorithms is going behave?
It's not that it's a fad. It's that the hype has gotten way ahead of the capability. CEOs laying off double digit percentages of their workforce because they believe that in 6 months AI will actually be able to do all those jobs and they want to get the message out to Wall St to juice the stock price today.
The key is to learn the useful tools and techniques while remaining realistic and open-eyed about their limitations.
So far in my experience watching small to medium sized companies try to use it for real work, it has been occasionally useful for exploring apis, odd bits of knowledge etc, but overall wasted more time than it has saved. I see very few signs of progress.
The time has come for llm users to put up or shut up - if it’s so great, stop telling us and show and use the code it generated on its own.
What else are you looking for?
If i wasn't experienced in computer science this would all fall apart however i do have to fix almost all the code, but spending 10 mins fixing something is better than 3 days figuring it out in the first place (again this might be more unique to my coding and learning style)
Are there any examples of businesses deploying production-ready, nontrivial code changes without a human spending a comparable (or much greater) amount of time as they’d have needed to with the existing SOTA dev tooling outside of LLMs?
That’s my interpretation of the question at hand. In my experience, LLMs have been very useful for developers who don’t know where to start on a particular task, or need to generate some trivial boilerplate code. But on nearly every occasion of the former, the code/scripts need to be heavily audited and revised by an experienced engineer before it’s ready to deploy for real.
I don't understand why you think "the code needs to be audited and revised" is a failure.
Nothing in the OP relies on it being possible for LLMs to build and deploy software unsupervised. It really seems like a non sequitur to me, to ask for proof of this.
Some other threads of conversation get intertwined here with concerns about delusional management making decisions to cut staff and reduce hiring for junior positions, on the strength of the promises by AI vendors and their paid/voluntary shills
For many like me who have encouraged sharp young people learn computers, we are watching their spirits crushed by this narrative and have a strong urge to push back — we still need new humans to learn how computer systems actually work, and if nobody is willing to pay them for work because an LLM outperforms them on those menial “rite-of-passage” types of software construction, we will find ourselves in a bad place
If you’re selling shovels to gold miners, you don’t need to demonstrate the shovel - you just need decent marketing to convince people there’s gold in them thar hills.
> The excitement and enthusiasm of Gold Washing still continues—increases. (1848)
Whats nuts is watching all these people shill for something that we all have used to mediocre results. Obviously Fly.io benefits if people start hosting tons of slopped together AI projects on their platform.
Its kinda sad to watch what I thought was a good company shill for AI. Even if they are not directly getting money from some PR contract.
We must not be prompting hard enough....
this sort of post is the start of next phase in the battle for mindshare
the tools are at the very best mediocre replacements for google, and the people with a vested interest in promoting them know this, so they switch to attacking critics of the approach
> Its kinda sad to watch what I thought was a good company shill for AI.
yeah, I was sad too, then I scrolled up and saw the author. double sadness.
Thats the crux.
copious tests - That don't work but no one cares.
documentation - That no one has or ever will read, and is hilariously inaccurate.
There is a lot of software pre AI that is churned out because some manager wanted exactly what they wanted but it had no purpose or need. I expect that to explode in the coming years for sure. I'm not afraid of AI, its merely ok, another tool is all.
It will allow companies to dig themselves very deep holes. Devs wise to the game will be able to charge astronomical fees to empty the pools filled with the AI sewage they have been filled with.
(If it is then damn, I've been leaving a ton of money on the table.)
I don't know if that's what fly.io is going for here, but their competitors are explicitly leaning into that angle so it's not that implausible. Vercel is even vertically integrating the slop-to-prod pipeline with v0.
Look at stock market courses for instance. They are endlessly prevalent, an eternally green scam. People spend thousands to lose even more money all the time. Sunk cost fallacy is very hard for a lot of people to overcome. Scammers count on it. There is literally millions to be made in these scams if you have zero moral fiber and zero shame.
We are in a golden age of such scams. Not my quote but one article I read said something like business students right now are putting Adam Neumann's picture on their dorm walls to aspire to be like him...
LLMs are very useful. I use them as a better way to search the web, generate some code that I know I can debug but don’t want to write and as a way to conversationally interact with data.
The problem is the hype machine has set expectations so high and refused criticism to the point where LLMs can’t possibly measure up. This creates the divide we see here.
There's still a significant barrier to entry to get involved with blockchain and most people don't even know what it is.
LLMs on the other hand have very low barrier to at least use- one can just go to google, ChatGPT etc and use it and see its effectiveness. There's a reason why in the last year, a significant portion of school students are now using LLMs to cheat. Blockchains still don't have that kind of utilization.
Honestly I think that makes the argument stronger though that it’s unfortunate they jumped on.
But compared to using Kagi, I've found found LLMs end up wasting more of my time by returning a superficial survey with frequent oversights and mistakes. At the final tally I've still found it faster to just do it myself.
I will say I do love LLMs for getting a better idea of what to search for, and for picking details out of larger blocks.
Honestly, I think part of the decline of Google Search is because it's trying to increase the amount of AI in search.
Thus, I find LLMs quite useful when trying to find info on niches that are close to a very popular topic, but different in some key way that's hard to express in search terms that won't get ignored.
In many romance languages, eulogy doesn't have the funeral connotation, only the high praise one - so the GP may be a native speaker of a romance language who didn't realize this meaning is less common in English.
I’m open to that happening. I mean them showing me. I’m less open to the Nth “aww shucks, the very few doubters that are left at this point are about to get a rude awakening” FOMO concern trolling. I mean I guess it’s nice for me that you are so concerned about my well-being, soon to be suffering-being?
Now, AI can do a lot of things. Don’t get me wrong. It has probably written a million variations on the above sentiment.
There are LOADS of people who need "a program" but aren't equipped to write code or hire an SWE that are empowered by this. And example: last week, I saw a PM vibe code several different applications to demo what might get built after it gets prioritized by SWEs
It’d be like insisting llms will replace authors of novels. In some sense they could but there are serious shortcomings and things like agents etc just don’t fix them.
"Python, create an xarray with two dimensions from a pandas df"
It gave me a few lines of example code which was enough for me to figure out where I had messed up the syntax in my own code.
I have seen one of my junior coworkers copy+paste entire chunks of code from chatbot conversations and to be honest what he has produced is underwhelming the code is poorly structured difficult to reason about I have low confidence he understands what the bot has produced (and why it did things the way it did) and I don't have high confidence we'd be able to trust the accuracy of the figures this code was outputting.
I don't think this follows. Anyone can see that 10-ton excavator is hundreds or even thousands of times more efficient than a man with a shovel. That doesn't mean you can start a company up staffed only with excavators. Firstly you obviously need people operating the excavator. Secondly the excavator is incredibly efficient at moving lots of dirt around, but no crew could perform any non-trivial job without all the tasks that the excavator is not good out - planning, loading/unloading, prepping the site, fine work (shovelling dirt around pipes and wires), etc.
AI is a tool. It will mean companies can run much leaner. This doesn't imply they can do everything a company needs to do.
I ask this because it reads like you have a specific challenge in mind when it comes to generative AI and it sounds like anything short of "proof of the unlimited powers" will fall short of being deemed "useful".
Here's the deal: Reasonable people aren't claiming this stuff is a silver bullet or a panacea. They're not even suggesting it should be used without supervision. It's useful when used by people who understand its limitations and leverage its strengths.
If you want to see how it's been used by someone who was happy with the results, and is willing to share their results, you can scroll down a few stories on the front-page and check the commit history of this project:
https://github.com/cloudflare/workers-oauth-provider/commits...
Now here's the deal: These people aren't trying to prove anything to you. They're just sharing the results of an experiment where a very talented developer used these tools to build something useful.
So let me ask you this: Can we at least agree that these tools can be of some use to talented developers?
If capabilities don’t improve it’s not replacing anyone, if they do improve and it can write good code, people can learn from reading that.
I don’t see a pathway to improvement though given how these models work.
What I’m interested in really is just case studies with prompts and code - that’s a lot more interesting for hackers IMO than hype.
I implemented the OAuth2.0 protocol in 3 different languages without a 3rd party library - entire spec implemented by hand. This was like ~2015 when many of the libraries that exist today didn't back then. I did this as a junior developer for multiple enterprise applications. At the end of the day it's not really that impressive.
In a single Saturday the LLM delivered the feature to my spec, passing my initial test cases, adding more tests, etc…
I went to bed that night feeling viscerally in my bones I was pairing with and guiding a senior engineer not a junior. The feature was delivered in one day and would have taken me a week to do myself.
I think stories like the Cloudflare story are happening all over right now. Staff level engineers are testing hypotheses and being surprised at the results.
Oauth 2.0 doesn’t really matter. If you can guide the model and clearly express requirements, boundaries, and context, then it’s likely to be very useful and valuable in its current form.
Maybe you just have that dream job where you only have to think hard thoughts. But that's just not the norm, even at a bleeding edge startup.
This article and vocal supporters are not being reasonable at all, they make a not so between-the-lines separation between skeptics (which are nuts) and supporters ("My smartest friends are blowing it off." in a smug "I'm smarter than my smarter friends").
I mean, come on.
I honestly found the article to be an insufferably glib and swaggering piece that was written to maximize engagement rather than to engage the subject seriously.
The author clearly values maximizing perceived value with the least amount of effort.
Frankly, I’m tired of reading articles by people who can’t be bothered to present the arguments of the people they’re disagreeing with honestly and I just gave up halfway reading it because it was so grating.
Are you saying the CEO of Anthropic isn't reasonable? or Klarna?
Surely you can see how insanely biased all of their statements would be. They are literally selling the shovels in this gold rush.
Anything they say will be in service of promoting AI, even the bad/cautionary stuff because they know there's an audience who will take it the other way (or will choose to jump in to not be left behind), and also news is news, it keeps people talking about AI.
I think what's happening is two groups using "productivity" to mean completely different things: "I can implement 5x more code changes" vs "I generate 5x more business value." Both experiences are real, but they're not the same thing.
https://peoplesgrocers.com/en/writing/ai-productivity-parado...
This is true, LLMs can speed up development (some asterisks are required here, but that is generally true).
That said, I've seen, mainly here on HN, so many people hyping it up way beyond this. I've got into arguments here with people claiming it codes at "junior level". Which is an absurd level of bullshit.
However, the expansion in scope that senior developers can tackle now will take away work that would ordinarily be given to juniors.
That maybbe true, and would be an interesting topic do discuss if people actually spoke in such a way.
"Developers are now more productive in a way that many projects may need less developers to keep up productivity levels" is not that catchy to generate hype however.
There's certainly a lot of code that needs to be written in companies that is simple and straightforward and where LLMs are absolutely capable of generating code as good as your average junior/intermediate developer would have written.
And of course there are higher complexity tasks where the LLM will completely face plant.
So the smart company chooses carefully where to apply the LLM and possibly does get 5x more code that is "better" in the sense that there's 5x more straightforward tickets closed/shipped, which is better than if they had less tickets closed/shipped.
My intuition is the tail of low value "changes/edits" will skew fairly code size neutral.
A concrete example from this week "adding robust error handling" in TypeScript.
I ask the LLM to look at these files. See how there is a big try catch, and now I have the code working, there are two pretty different failure domains inside. Can you split up the try catch (which means hoisting some variable declarations outside the block scope).
This is a cursor rule for me `@split-failure-domain.mdc` because of how often this comes up (make some RPCs then validate desired state transition)
Then I update the placeholder comment with my prediction of the failure rate.
I "changed" the code, but the diff is +9/-6.
When I'm working on the higher complexity problems I tend to be closer to the edge of my understanding. Once I get a solution, very often I can simplify the code. There are many many ways to write the same exact program. Fewer make the essential complexity obvious. And when you shift things around in exactly the kind of mechanical transformation way that LLMs can speed up... then your diff is not that big. Might be negative.
Sure, they might help you onboard into a complex codebase, but that's about it.
They help in breadth, not depth, really. And to be clear, to me that's extremely helpful, cause working on "depth" is fun and invigorating, while working on "breadth" is more often than not a slog, which I'm happy to have Claude Code write up a draft for in 15 minutes, review, do a bunch of tweaks, and be done with.
On the flip side, it has allowed me to accomplish many lower-complexity backlog projects that I just wouldn’t have even attempted before. It expands productivity on the low end.
I’ve also used it many times to take on quality-of-life tasks that just would have been skipped before (like wrapping utility scripts in a helpful, documented command-line tool).
This has been my experience at well - AI coding tools are like a very persistent junior-- that loves reading specs and documentation. The problem for AI companies is "automated burndown of your low-complexity backlog items" isn't a moneymaker, even though that's what we have. So they have to sell a dream that may be realized, or may not.
The benchmark project in the article is the perfect candidate for AI: well defined requirements with precise technical terms (RFCs), little room for undefined behavior and tons of reference implementations. This is an atypical project. I am confident AI agent write an HTTP2 server, but it will also repeatedly fail to write sensible tests for human/business processes that a junior would excel at.
A big part of my skepticism is this offloading of responsibility: you can use an AI tool to write large quantities of shitty code and make yourself look superficially productive at the cost of the reviewer. I don't want to review 13 PRs, all of which are secretly AI but pretend to be junior dev output, none of which solve any of the most pressing business problems because they're just pointless noise from the bowels of our backlog, and have that be my day's work.
Such gatekeeping is a distraction from my actual job, which is to turn vague problem descriptions into an actionable spec by wrangling with the business and doing research, and then fix them. The wrangling sees a 0% boost from AI, the research is only sped up slightly, and yeah, maybe the "fixing problems" part of the job will be faster! That's only a fraction of the average day for me, though. If an LLM makes the code I need to review worse, or if it makes people spend time on the kind of busywork that ended up 500 items down in our backlog instead of looking for more impactful tasks, then it's a net negative.
I think what you're missing is the risk, real or imagined, of AI generating 5x more code changes that have overall negative business value. Code's a liability. Changes to it are a risk.
With no disrespect meant, if you’re unable to find utility in these tools, then you aren’t using them correctly.
This does not counter what GP said. Using LLM as a code assistant is not the same as "I don't need to hire developers because LLMs code in their place"
And here's the difference between someone like me and an LLM: I can learn and retain information. If you don't understand this, you don't have a correct understanding of LLMs.
It is us, the users of the LLMs, that need to learn from those mistakes.
If you prompt an LLM and it makes a mistake, you have to learn not to prompt it in the same way in the future.
It takes a lot of time and experimentation to find the prompting patterns that work.
My current favorite tactic is to dump sizable amounts of example code into the models every time I use them. I find this works extremely well. I will take code that I wrote previously that accomplishes a similar task, drop that in and describe what I want it to build next.
Tfa makes this argument too then later says:
> All this is to say: I write some Rust. I like it fine. If LLMs and Rust aren’t working for you, I feel you. But if that’s your whole thing, we’re not having the same argument
So reasonable people admit that the utility depends on the use case.. then at the same time say you must be an idiot if you aren’t using the tools. But.. this isn’t actually a reasonable position.
Part of the issue here may be that so many programmers have no idea what programmers do outside of their niche, and how diverse programming actually is.
The typical rebuttals of how “not everyone is doing cliche CRUD web dev” is just the beginning. Author mentions kernel dev, but then probably extrapolated to C dev in general. But that would be insane, just think about the training sets for Linux kernel dev vs everything else..
It’s dumb to have everyone double down on polarizing simplistic pro/con camps, and it’s rare to see people even asking “what kind of work are you trying to do” before the same old pro/con arguments start flying again.
Its like I can't just switch our whole 1-million line codebase on a dime
These articles act like everyone is just cranking out shitty new webapps, as if every software job is the same as the author's
But that said, let me reiterate a couple important points from my post:
> With no disrespect meant
I’m not calling anybody an idiot because they aren’t using an LLM. I’m sharing my honest opinion that they’re not using it correctly, but that’s very different than calling them an idiot.
> if you’re unable to find utility in these tools
This is a bit lawyerly, but note my carefully generic wording here: “find utility”. If you’re a Rust developer who doesn’t like the Rust output from your LLM, sure - but that’s not 100% of the job.
You’ll also touch bash scripts, make files, YAML, JSON or TOML config, write bug reports/feature requests, discuss architectural ideas and coding patterns, look through stack traces/dumps/error logs, or whatever else.
My point is that it is exceedingly unlikely that there is nothing an LLM can do to help your work, even if it’s not good at writing code in your domain.
Hence the statement that if you cannot find utility, you’re not using it correctly. It takes time to learn how to use these tools effectively, even in domains they excel in.
Having something else write a lot of the boring code that you'll need and then you finish up the final touches, that's amazing and a huge accelerator (so they claim).
The claim is not "AI will replace us all", the claim of the parent article is "AI is a big deal and will change how we work, the same way IDEs/copy-paste/autocomplete/online documentation have radically changed our work."
Vim and bash solved that for me a long time ago in a more reliable and efficient way (and it's certainly not the only tool capable of that).
> the same way IDEs/copy-paste/autocomplete/online documentation have radically changed our work
I was there before and went in the autocomplete/lsp thing pretty late (because Vim didn't have good lsp support for a long time, and Vim without it was still making me more efficient than any other IDE with it). Those things didn't radically change our work as you claim, it just made us a bit more productive.
With that as a metric, 1 Senior + 4 juniors cannot build the company with the scope you are describing.
A 50-eng company might have 1 CTO, 5 staff, 15 Seniors, and 29 juniors. So the proposition is you could cut the company in ~half but would still require the most-expensive aspects of running a company.
This is such an outlandish claim, to the point where I call it plain bullshit.
LLMs are useful in a completely different way that a Junior developer is. It is an apples and oranges comparison.
LLMs does things in some way that it helps me beyong what a Junior would. It also is completely useless to perform many tasks that a Junior developer can.
Imagine a senior IC staffed with 4 juniors, and they spend 2 hours with each every day. Then the junior is left with 6 hours to think through what they were taught/told. This is very similar to LLM development except instead of context switching 3 times each day, the senior can skip over the 6 hours of independent time the junior required to absorb the changes. But it still takes the same amount of time to deliver the 4 separate projects.
I find the existence of LLM development deeply troubling for a long list of reasons. But refuting the claim that an LLM is similar in many ways to a junior dev is unsubstantiated
>It also is completely useless to perform many tasks that a Junior developer can.
And there are many things one junior could be helpful with that a different junior would be useless at.
This is not how Juniors work. I don't know what else to say. It is just not true.
I don't give juniors a prompt and let them to implement code for a few hours. They work as any other developer, just generally in features and/or tickets of more limited scope. At least initially. This is not what LLMs do
> But refuting the claim that an LLM is similar in many ways to a junior dev is unsubstantiated
I sometimes get the feeling I talk to people who never worked in a real professional setting.
A LLM can do things that Juniors can't. When I bounce around ideas for implementing a certain feature, when I explore libraries or frameworks I am unfamiliar with, when I ask it to review pieces of code looking for improvements, when I get it to generate boring glue code, scaffolding, unit tests. All those things are helpful, and make LLMs an excellent code assistant in a way that Juniors are not.
But it is completely unable to properly do things without me giving very precise instructions of what it needs to code. The less precise I am, the worse its output. It is very happy to generate completely bullshit code that kinda looks like it does what I need but not really. I constantly need to tweak what it generates, and although it saves my time as it outputs a lot of code in little time, the results are very unreliable to meaningfully act with any sort of independence.
> And there are many things one junior could be helpful with that a different junior would be useless at
Which completely fails to address the point I am making.
A - normal, conventional senior dev work flow
B - A non-traditional but plausible senior dev workflow
C - Senior dev with LLM
I'm not claiming A = B = C, just that the step from one to another is relatively small when compared to something like a linter or other tool that accelerates development in some way.
> Which completely fails to address the point I am making.
If an LLM can do 20% more things than a junior dev can but also cannot do a different 20% of things a junior dev can, then the LLM is similar the a junior. And when comparing two juniors within a given field, it's entirely likely the above logic could apply. E.g. one junior may not be able to write SQL queries while the other does not understand the DOM. An LLM on the other hand is "kinda ok" at everything, but cannot engage in 'real' conversations about system architecture or reason through user experiences reliably. So it can do 20% more, and also 20% less. Just as a junior dev.
No one is claiming A = C, so don't keep punching the wind. They are claiming A ~= C.
This is not how it works. I am only punching wind because your arguments are as solid as air.
Comparing an LLM to a Junior Dev is like comparing a Secretary to an Answering Machine - both can technically answer calls, so they must be somewhat similar? What a load of hot, steaming bullshit.
> No one is claiming A = C, so don't keep punching the wind. They are claiming A ~= C.
And I am claiming that A != C. I was not even arguing against they are equal, I am arguing against them being similar in any way.
I maintain what I said before, I sincerely doubt the competence and experience in real life professional setting of anyone claiming LLMs are in anyway remotely similar to junior devs
The number of secretaries declines after answering machines became more prevalent.
You can keep throwing ad hominin if you think it's helping change reality. I wish we weren't headed this way, I really do, but we are. So we might as well confront reality. Catch you in 3 years when whatever happens - happens.
Citation needed
> Catch you in 3 years when whatever happens - happens.
I wish I had 1 euro for whenever crypto shills told me those exact words regarding the inevitably of cryptocurrency replacing actual currency.
Note that I said 1 euro and not 1 of some shitcoint.
Also often it takes a senior dev _more_ time to _explain_ to a junior what needs to be done than it takes to do it himself. What LLMs give us is the ability to generate a feature about as fast as we can type up the instructions we would have, pre-AI, given to a junior dev.
He spent a large tranche of the article specifically hanging a lantern on how mediocre the output is.
>by creating an AI only company
He specifically says that you need to review the code over and over and over.
The 10x engineer is really good at deducing the next most important thing to do is and doing it quickly. This involves quickly moving past 100's of design decisions in a week to deliver something quickly. It requires you to think partly like a product manager and partly like a senior engineer but that's the game and LLM's are zero help there.
Most engineering productivity is probably locked up in this. So yes, LLM's probably help a lot, just not in the way that would show on some Jira board?
*One could claim that doing this slow work gives the brain a break to then be good at strategizing the higher order more important work. Not sure.
I recently used Claude Code to develop & merge an optimization that will save about $4,000 a month. It was relatively simple but tedious, so I probably wouldn't have done it on my own. I don't even expect most of my coworkers to notice.
You set up a strawman (AI only companies, agents doing everything on their own) which is irrelevant to the point the article is making. One excerpt:
> Almost nothing it spits out for me merges without edits. I’m sure there’s a skill to getting a SOTA model to one-shot a feature-plus-merge! But I don’t care. I like moving the code around and chuckling to myself while I delete all the stupid comments. I have to read the code line-by-line anyways.
I think this article is very on point, I relate with basically every paragraph. It's not a panacea, it's not a 10x improvement by any means, but it's a very meaningful improvement to both productivity (less than 2x I'd say, which would already be a ton) and fun for me. As I've mentioned in the past here
> I feel like there’s also a meaningful split of software engineers into those who primarily enjoy the process of crafting code itself, and those that primarily enjoy building stuff, treating the code more as a means to an end (even if they enjoy the process of writing code!). The former will likely not have fun with AI, and will likely be increasingly less happy with how all of this evolves over time. The latter I expect are and will mostly be elated.
which is a point the article makes too (tables), in a slightly different way.
Also, to be clear, I agree that 90% of the marketing around AI is overblown BS. But that's again beside the point, and the article is making no outlandish claims of that kind.
Overall, I hope this article (as intended) will make more people lose their dismissiveness and wake up their curiosity, as I expect the future of those is akin to that of people today saying they're "not really good at computers". It's a paradigm-shift, and it takes getting used to and productive in, as some imo smart people are mentioning even in this thread[0].
Well, in this case they’re busy writing articles trying to convince us, instead of proving stuff to us.
Nobody is saying it's "unlimited powers", that's your exaggeration.
And what you're proposing about an "AI only company" seems to be based on your misunderstanding.
What this article is saying is, you need the same number of senior developers, but now each one is essentially assisted by a few junior developers virtually for free.
That's huge. But saying you want to see an "AI only company" as "proof" has nothing to do with that.
And what you're describing -- "occasionally useful for exploring apis, odd bits of knowledge etc, but overall wasted more time than it has saved" -- is exactly what the author explicitly addresses at the top:
> If you were trying and failing to use an LLM for code 6 months ago, you’re not doing what most serious LLM-assisted coders are doing. People coding with LLMs today use agents...
The entire article is about how to use LLM's effectively. What kind of "proof" do you really want, when the article explains it all awfully clearly?
Generative AI is too much of a blank canvas at the moment, and one that is always shifting. It's up to the user to find all the use cases, and even then in my experience it's just as likely to send me on a wild goose chase as it is to instantly solve my problem.
I use AI to chew through tedious work all the time. In fact, I let an agent do some work just before I checked HN to read your claim that it can't do that. Everyone at my job does the same, perhaps modulo checking HN. But there's no 'unlimited power' to show you - we're just about 30% faster than we used to be.
In decades of programming I’ve written very little tedious code, but that’s as much about the projects I’ve worked on as approach I use.
https://github.com/Atlas-Authority/mpac-ui-improved https://moduscreate.com/blog/forum-monitoring-is-essential-b... (Pardon how marketing keyword stuffed the final post)
b) Even if “everyone is using it” it doesn’t mean it is useful. The usage could be adequately explained by e.g. marketing, being forced on them by management/policy, etc. Not everything with high usage is useful. I can e.g. quickly think of chewing gum (which is also used by a lot of developers), or the ANSI standard keyboard (as opposed to the ISO standard keyboard).
The next few paragraphs basically say "the tool run arbitrary programs on your machine, pull in arbitrary files, and use that to run more arbitrary commands" and then blames you for thinking that is a bad sequence of events.
In the best possible light I (an AI-neutral reader) can paint this rant on a hosting-company blog (why publish this?) is that 1) allowing random textbots to execute programs on your work computer is good (disagree), 2) those chatbots do, in fact, occasionally say enough correct-ish things that they are probably worth your company paying $20+/month for your access (agree).
I’m happy to have read this, which is reason enough to publish it - but also it’s clearly generating debate so it seems like a very good thing to have published.
tptacek has always come across arrogant, juvenile, opinionated, and difficult to work with.
There are zero "safe" tools where you don't control the inputs.
_People_ are getting outsized value from AI in the ways they apply it. Photographs come from the photographer, not the camera.
Why would we do this? Wouldn’t it be better to do this silently and reap the benefits?
Because their own review standards are low (so they find reviewing "easy"), and/or because they can't appreciate the emotional & mental fulfillment that coding provides.
How is it the responsibility of the Rust community that there weren't enough metric tons of free code for the machine to slurp up? And the phrasing makes it sound like it's the community's fault for not feeding OpenAI enough code to be stripped of its license and authorship and get blended into a fine latent soup. It's a lot like people coming to a one-man FOSS project with a laundry list of demands, expecting to be treated with the religious reverence of a major enterprise contract.
The whole tone, the pervasive "use it or you'll be left behind"—where users saying they don't want or need it only proves further evidence of its imminent apotheosis—superficially reminds me of previous FUDs.
And how is it not concerning that the thing described as intelligent needs billions of lines to generalize a language a human can learn from a single manual? Will it need hundreds of kLOC to internalize a new library, or even its new version, beyond in-context learning? The answer is yes; you are choosing to freeze the entire tech stack, when fixing its abstractions could actually save you from boilerplate, just so the machine can write it for you at $200 a month with a significant error rate.
If it's a rarer language, the math doesn't do as good of a job on piles of random code. There's just not enough for it to learn from. I cant speak for Rust since I dont know the numbers but imagine it's much less than Python or Go.
I have seen some evidence, though, that harder languages are harder for them to code in. GPT 3.5 used to struggle with C++ for something that it could easily produce in Python. It could actually produce things in C more easily than C++. It makes sense, though, because there's both more context needed for correctness and more behavioral patterns to write it.
My solution, which I only prototyped in GPT due to leaving AI, was to use AI's to write code in languages like Python which non-AI tools transpiled to high-performance code in C++ or Rust. Think the Python to C++ compiler or maybe Nikita. Later, with hallucinations mitigated enough, add LLM's to those transpilers.
As a side benefit, it let's you sell a product accelerating or increasing predictability of applications in that language. That's a non-AI investment. There's a few companies doing that, too. So, one could sell to the AI crowd, the "language X in business" crowd, or both.
I used it about 15 minutes ago, to help me diagnose a UI issue I was having. It gave me an answer that I would have figured out, in about 30 minutes, in about 30 seconds. My coding style (large files, with multiple classes, well-documented) works well for AI. I can literally dump the entire file into the prompt, and it can scan it in milliseconds.
I also use it to help me learn about new stuff, and the "proper" way to do things.
Basically, what I used to use StackOverflow for, but without the sneering, and much faster turnaround. I'm not afraid to ask "stupid" questions -That is critical.
Like SO, I have to take what it gives me, with a grain of salt. It's usually too verbose, and doesn't always match my style, so I end up doing a lot of refactoring. It can also give rather "naive" answers, that I can refine. The important thing, is that I usually get something that works, so I can walk it back, and figure out a better way.
I also won't add code to my project, that I don't understand, and the refactoring helps me, there.
I have found the best help comes from ChatGPT. I heard that Claude was supposed to be better, but I haven't seen that.
I don't use agents. I've not really ever found automated pipelines to be useful, in my case, and that's sort of what agents would do for me. I may change my mind on that, as I learn more.
What I like about Chatbots vs SO is the ability to keep a running conversation instead of 3+ tabs and tuning the specificity toward my problem.
I've also noticed that if I look up my same question on SO I often find the source code the LLM copied. My fear is that if chatbots kill SO where will the LLM's copied code come from in the future?
There's usually some "iteration," with ChatGPT giving me deprecated APIs and whatnot.
Yesterday, I was looking at an answer, and I got a popup, saying that a user needed help. I dutifully went and checked the query. I thought “That’s a cool idea!”. I enjoy being of help, and sincerely wanted to be a resource. I have gotten a lot from SO, and wanted to give back.
It was an HTML question. Not a bad one, but I don’t think I’ve ever asked or answered an HTML question on SO. I guess I have the “HTML” tag checked, but I see no other reason for it to ask my help.
Yeah, I think it’s done.
As I never used SO except to understand it for doing business for developers, I know many found the community aspect/self building/sense of worth aspect important, same with Quora. Do you have a idea of how this will change things for developers? Is that a real thing I was seeing? (maybe even an opportunity!)
One of the reasons that SO became so successful, was the "gamification" of answering questions. Eventually, they started giving the questions, themselves, more attention, but, by then, the damage was done.
Asking questions became a "negative flag." If you look at most of the SO members with very high karma, you will see that their total count of questions asked, is a 1-digit value, with that digit frequently being "0."
So the (inevitable) result, was that people competed to answer as many questions as possible, in order to build high karma scores. In its heyday, you would get answers within seconds of posting a question.
The other (inevitable) result, was that people who asked questions, were considered "lesser people," and that attitude came across, loud and clear, in many of the interactions that more senior folks had with questioners. They were treated as "supplicants." Some senior folks were good at hiding that attitude, some, not so much.
Speaking only for myself, I suspect that I have more experience and expertise, actually delivering product, than many of the more senior members, and it is pretty galling, to be treated with so much disrespect.
And, of course, another inevitable thing, was that the site became a spamhaven. There was a lot of "shill-spamming," where someone asks a question, and many of the "answers" point to some commercial product. If you attempted to seriously answer the question, your answer was often downvoted, causing you damage. I think they got nuked fairly quickly, but it was quite a problem, for a while (It's still a huge problem in LinkedIn groups. I never participate in those, anymore).
I have found that, whenever I design anything; whether an app, or a community, I need to take human nature into account.
https://meta.stackexchange.com/questions/399619/our-partners...
Why should I care if my SO answer I posted 7 years ago ends up in an LLM output in some random model? I wasn't getting paid for it anyway, and didn't expect to.
I view my random contributions across the web ending up in LLMs as a good thing, my posts now potentially reach even more people & places than it would have on a single forum site, that's the whole point of me posting online. Maybe I'm an outlier here.
AI won't judge and shame you in front of the whole world, for asking stupid questions, or not RTFM'ing well enought, like Stackoverflow users do. Nor will it tell you, your questions are irrelevant.
I think this is the most killer AI feature ever.
That’s the one thing about SO that I always found infuriating. It seems their favorite shade, is inferring that you’re “lazy,” and shaming you for not already having the answer. If anyone has ever looked at my code, “lazy” is probably not a word that springs to mind.
In most cases, I could definitely get the answer, myself, but it would take a while, and getting pointers might save me hours. I just need a hint, so that I can work out an answer.
With SO, I usually just bit my tongue, and accepted the slap, as well as the answer.
An LLM can actually look at a large block of code, and determine some boneheaded typo I made. That’s exactly what it did, yesterday. I just dumped my entire file into it, and said “I am bereft of clue. Do you have any idea why the tab items aren’t enabling properly?”. It then said “Yes, it’s because you didn’t propagate the tag from the wrapper into the custom view, here.” It not only pointed out the source error, but also explained how it resulted in the observed symptoms.
In a few seconds, it not only analyzed, but understood an entire 500-line view controller source file, and saw my mistake, which was just failing to do one extra step in an initializer.
There’s absolutely no way that I could have asked that question on SO. It would have been closed down, immediately. Instead, I had the answer in ten seconds.
I do think that LLMs are likely to “train” us to not “think things through,” but they said the same thing about using calculators. Calculators just freed us up to think about more important stuff. I am not so good at arithmetic, these days, but I no longer need to be. It’s like Machine Code. I learned it, but don’t miss it.
Or maybe not helpful in the long run, I feel like AI is the most magical when used on things that you can completely abstract away and say as long as it works, I don't care what's in it. Especially libraries where you don't want to read their documentation or develop that mental model of what it does. For your own view, idk it's still helpful when AI points out why it's not working, but more of a balance vs working on it yourself to understand it too.
Judge for yourself.
Here's the file I am working on: [0].
The issue was in this initializer: [1]. In particular, this line was missing: [2]. I had switched to using a UIButton as a custom view, so the callback only got the button, instead of the container UIBarButtonItem. I needed to propagate the tag into the button.
[0] https://github.com/LittleGreenViper/SwipeTabController/blob/...
[1] https://github.com/LittleGreenViper/SwipeTabController/blob/...
[2] https://github.com/LittleGreenViper/SwipeTabController/blob/...
In my experience, if a question is understood well enough, it basically directly translates into a solution. In most cases parts of questions are not well understood, or require going into detail/simplification/has a definition we don't know etc etc.
This is where being able to ask questions and getting clear answers helps. AI basically helps your do understand the problem as you probe deeper and deeper into the question itself.
Most human users would give up after answering you after a while, several would send you through a humiliating ritual and leaving you with a life long fear of asking questions. This prevents learning, as a good way of developing imagination is asking questions. There is only that much you can derive from a vanilla definition.
AI will be revolutionary for just this reason alone.
I’ll ask it how to accomplish some task that I’ve not done, before, and it will give me a working solution. It won’t necessarily be a good solution, but it will work.
I can then figure out how it got there, and maybe determine a more effective/efficient manner.
I mean a tool is a tool, nothing wrong with that - but most of the resistence stems from AI being shoved down our throats at warp speed. Its already everywhere and I can't opt out, that stinks.
As for the skepticism in terms of adoption and usefulness, its mainly a question of whether or not it will continue improving - there's no way to no what lies ahead, but if it came to a grinding halt today well then the high water mark just isn't all that impressive.
> Yeah, we get it. You don’t believe in IPR. Then shut the fuck up about IPR. Reap the whirlwind.
This is the point that matters, and I don't think everyone is on the same page that LLMs are essentially over glorified data laundering.
The industry would get just as much "value" if we declared a jubilee and wiped out all licenses and allowed unlimited plagiarism (Looking at Zuckerburg and his 10 TB of pirated data). In fact, if AI owners published their training data sets with a capable search engine, I would bet money of it out performing LLMs in most cases. Why waste all that man power reinventing Netflix again? Just copy paste the code and give everyone their time back, sheesh.
> Kids today don’t just use agents; they use asynchronous agents. They wake up, free-associate 13 different things for their LLMs to work on, make coffee, fill out a TPS report, drive to the Mars Cheese Castle, and then check their notifications. They’ve got 13 PRs to review. Three get tossed and re-prompted. Five of them get the same feedback a junior dev gets. And five get merged.
I'm in a role that is behind the times, using a bespoke in-house framework that is immune to the benefits of LLMs, so I don't get to see what you see - so as a skeptic, I'm not convinced this isn't just the illusion of speed. I have not seen convincing results, show me the amazing things being made by AI (AI tooling itself does not count) - but yes, maybe that's because its all siloed into walled gardens.
> But something real is happening. My smartest friends are blowing it off. Maybe I persuade you. Probably I don’t. But we need to be done making space for bad arguments.
Yeah all the arguments have been made, good and bad, we're all waiting to see how it plays out. But I'd rather take the side of being a skeptic - if I'm right then I'm in the right place. If I'm wrong, that's cool too, I don't mind playing catch-up. But fully embracing the hype is, IMO, tantamount to putting all your eggs in one basket, seems like a needless risk but if that's worth it to you to get ahead then by all means, slurp up the hype.
Sure, there might be a way to change your workflow and never ever open a secret file with those editors, but my point is that a software that sends your data without your consent, and without giving you the tools to audit it, is a no go for many companies, including mine.
Too many vibe coders contribute trash code if any. They need more code from so called experts that isn't open source yet.
So, for example, if your local `.env` is inadvertently sent to Cursor and it's persisted on their end (which you can't verify one way or the other), an attacker targeting Cursor's infrastructure could potentially compromise it.
The prior look upon Claude Code/Cursor/Windsurf much more favorably, as they are able to ship their ideas much faster.
This is a bit of a hot take, so I would love any replies to bring me back down to earth.
(1) Ask to write an implementation plan for a specific change or a feature. It will go through the source code, look up references, make notes and produce a plan
(2) Review the plan. Point out missing things, or stuff that needs improvement.
(3) Once I'm satisfied with the plan - ask to draft PR. Launch a few attempts in parallel and pick the one that I like the most.
(4) While drafting PR, Codex will run unit tests (even can run E2E tests in its container), linting and type checkers at every single step. This helps a lot with the stability.
(5) I review the code and merge the PR if I like it. Ask to cleanup - if not.
This feels like working with a remote team - very patient and diligent at that.
Ultimately, I get to get more features done per day. But I also feel more tired by the end of the day due to a higher level of cognitive load. There are more decisions to make and less idle time (e.g. no more hours spent tidying up the code or doing relaxing and pretty refactoring).
TLDR; this AI thing works really well at least for me. But it comes with trade-offs that might slow down its adoption by companies en masse.
"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."
A lot of their pages barely work and feel like first drafts, much of the UI is even labeled as a work in progress, and manytools regularly just fail to do what they're supposed to.
Deployments are routinely broken, with Twitter lit up like a light bulb about it while their status page remains all services operational.
I deploy three completely separate applications with Fly.io and have regular unscheduled downtime.
I stick with them despite all this because they're cheap and easily scale regular Go apps down to zero which is wonderful for apps that see intermittent/irregular usage patterns.
The fact that they are adamant defenders of vibe coding going so far in the post as to say "who cares about hallucinations" genuinely explains a fair deal about their unprofessional feeling service.
But who knows, maybe AI will accelerate so rapidly that it will fix the economy. Maybe we'll have robots everywhere doing all the work. But I worry about the lack of market incentives for people to adapt AI to real world use cases.
For example, I'm an open source developer who likes to tinker but I've been booted out of the opportunity economy. I can't afford to program robots. People like me are too busy using AI to parse spreadsheets and send targeted ads to even think about automating stuff. We work for companies and have no autonomy in the markets.
If things had worked out differently for me, I'd probably own a farm now and I'd be programming robots to do my harvest and selling the robots or licensing the schematics (or maybe I'd have made them open source, if open source had worked out so well for me). I don't have access to such opportunity unfortunately. The developers who worked for big tech are good at politics but often disconnected from value-creation. Few of them have the skills or interest to do the work that needs to be done now... They will just continue leveraging system flaws to make money, so long as those flaws exist.
I think it comes all down to that, do you have pride in what you do or you don’t ?
I make a wall with bricks, even if it will be covered with coating i will do my best to have regular joints and pacing.
Could make it faster, no one would notice the difference but me… i hate that feeling when you done something and you know it’s barely enough, just barely, it’s kind of shit and you really don’t want others to see it.
On the opposite side, some people will take pride in building wall twice as fast as me and won’t care it’s horrendous.
Both cases are valid, but me i know i can’t do a work I’m not proud of.
Absolutely. This is at the core of it.
Above pulled from A Brief History of the Woodworking Plane [0]. A woodworking tool that has evolved over 2,000 years. Now there are electric planers, handheld electric planers and lots of heavy machinery that do the same thing in a very automated way. If a company is mass producing kitchen cabinets, they aren't hand planing edges on boards, a machine is doing all that work.
I feel like with AI we are on the cusp of moving beyond a "Golden age" and into an "industrial age" for coding, where it will become more important to have code that AI understands vs. something that is carefully crafted. Simple business pressure will demand it (whether we like it or not).
^ A comment I made just yesterday on a different thread.
For software developers AI is like the cabinet maker that gets a machine to properly mill and produce cabinet panels, sure you can use a hand plane to do that but you're producing a very different product and likely one that not many people will care about, possibly not even your employer when they see all the other wood shops pumping out cabinetry and taking their market share.
[0] https://www.handplane.com/879/a-brief-history-of-the-woodwor...
Well, except that in order to fix that 1% you'd need to read and understand whatever the LLM did and then look for that 1%. I get the shills just thinking about this, whether the original programmer was human or not. I'd rather just write everything myself to begin with.
Second, reading code to understand it is a skill that you need to practice to get better at. That might be your issue.
https://www.joelonsoftware.com/2000/04/06/things-you-should-...
I mean they could be wrong, but I don’t think you can call them amateurs.
You seem to be misunderstanding me. I'm not saying "anyone who thinks reading code is harder than writing code is an amateur". I'm saying "reading code IS harder than writing code, which is why being good at reading code is what separates experienced programmers from beginners."
“Reading code is vastly quicker than writing it from scratch”
Sounds to me like you don’t think reading code is harder.
I think that if you put the work in you can get to a point where you are fast enough at reading and reviewing code that it's not a net productivity loss to have an LLM - or your coworkers and collaborators - write code for you to review, as opposed to insisting on writing every line of code yourself.
I'm a little surprised that my opinion about this is seen as controversial!
> if you put the work in you can get to a point where you are fast enough at reading and reviewing code
Please correct me if I'm wrong, but "fast enough" here would still be slower than writing the code yourself (given equal amounts of practice at reading and writing code), right? To throw some made-up numbers around: if it would take me 20 minutes to write code to do X, it might take me 30 minutes to read/review code that does X written by somebody else (or an LLM), so I'm at a net loss of 10 minutes. Can you explain the mechanism by which this eventually tips into a productivity gain?
Personally, I think "reading code is harder than writing code" lacks nuance. While I believe it's true on average, the actual difficulties vary wildly depending on the specific changeset and the path it took to get there. For example, writing code can involve exploring many solutions before eventually discovering a concise/simple one, but when reading the final changeset you don't see all those dead-end paths. And reviewing nontrivial code often involves asynchronous back and forth with the author, which is not a factor when writing code. But please take the "reading code is harder than writing code" claim for granted when responding to the above paragraph.
That's why I think reading is harder than writing: it takes a whole lot more effort to learn code reading skills, in my experience.
It seems like your answer to sarchertech's upthread "if you put in equal amounts of practice at reading and writing code you'll get faster at reading code than writing code" question might be "yes". Either that or you've intentionally invested more in your reading skills than your writing skills.
But if reading code is indeed harder than writing code, it stands to reason that if you put in equal effort to improving reading and writing abilities, your writing abilities would improve comparatively more.
If you spent all this time and effort learning to read code, such that you can read code 6x faster than you can write it, how do you know that you couldn’t have spent that effort improving you’re writing abilities such that you could write code 6x faster.
On the other hand if you did spend the same effort deliberately trying to increase you’re writing abilities as you did reading and the result is that you can read 6x faster than you can write, I’m unsure how you can support the conclusion that reading code is harder than writing it.
My gut feeling is that people on the far pro AI side of the spectrum tend to be people who are early in their career who don’t have strong writing or reading abilities (and so don’t really see the flaws) or people who have a reached a level where they aren’t really ICs anymore (even if that is their title). The latter have better reading than writing abilities because that’s what they spend all day doing.
Not that reading code is something that has an inherently higher skillcap than writing it.
I think there’s also a 3rd type of heavy AI proponent—people who spent most of their time cranking out MVPs or one offs that don’t require heavy maintenance (or they aren’t the ones doing the maintenance).
That’s not to say that I don’t think AI isn’t useful in those cases. I use AI pretty often myself when I’m writing in a language I don’t use everyday. Or when I’m doing something that I know has existing algorithmic solutions that I can’t quite remember (but I’ll know it when I see it) because it’s faster than googling. But I also recognize that there are many styles of programming, people, and domains where the productivity gains aren’t worth it.
When I get to writing actual production code with LLMs I treat them more as typing assistants than anything else: https://simonwillison.net/2025/Mar/11/using-llms-for-code/#t...
But that’s not a fair comparison. You typed the equivalent of 15-20 lines of code to generate 100, and you also needed additional for reading/understanding that code.
I have no doubt that a programmer who worked with the relevant APIs frequently enough could have written that function faster than the total time it took you to do all those things.
Now that programmer in a less familiar domain could probably benefit from AI, but I get where people with different experiences are coming from.
How about reading documentation?
Or reviewing PRs?
Or go through source code to build out new features?
My issue with it is that it gates software development behind paid services with various levels of context supported. Absolutely not the dream I have of how more software should be open source and everyone should be empowered to make the changes they need.
> If you’re making requests on a ChatGPT page and then pasting the resulting (broken) code into your editor, you’re not doing what the AI boosters are doing. No wonder you’re talking past each other.
They’re playing 3D chess while you’re stuck at checkers.
I do things suboptimally while learning the ropes or just doing things casually. That doesn’t mean that I judge the approach itself by my sloppy workflow. I’m able to make inferences about what a serious/experienced person would do. And it wouldn’t involve pasting things through three windows like I would do.
So of course I don’t judge AI by “ask chatty and paste the response”.
Yes indeed: “deploying agents” is what I would imagine the Ask Chatty And Paste workflow taken to Perfection to look like.
> LLMs can write a large fraction of all the tedious code you’ll ever need to write. And most code on most projects is tedious. LLMs drastically reduce the number of things you’ll ever need to Google. They look things up themselves. Most importantly, they don’t get tired; they’re immune to inertia.
Most Rube Goldberg machines are very tedious and consist of fifty-too-many parts. But we can automate most of that for you—
I could not have ever imagined a more Flintstones meets Science Fiction clash than AI According To Software Engineers. You’re using AI to generate code. And no one cares how much. It’s just so tedious in any case.
A wortwhile approach would have been to aspire to make or generate technology artifacts that could be hidden behind a black box surface with a legible interface in front. Is the code tedious? Then make the AI come up with something that is well-designed, where the obvious things you want is given freely, where minor customizations are just minor tweaks, and larger deviations require only proportionally larger changes. Uh, how about no? How about generating 20KLOC line “starter” some-framework project with all the 20KLOC “tedious” bits hanging out, then we can iterate from there. The AI made a Git log and everything so it’s ya know audited.
But maybe I’m being unfair. Maybe we are moving towards something not quite as stupid as Deploy ChatGPT 50X? Or maybe it’s effectively going to behind a black box. Because ya know the AI will deal with it all by itself?
> Are you a vibe coding Youtuber? Can you not read code? If so: astute point. Otherwise: what the fuck is wrong with you?
> You’ve always been responsible for what you merge to main. You were five years go. And you are tomorrow, whether or not you use an LLM.
No!, and what the fuck is wrong with you? We are Flintstone technologists and I’ll be damned if I can’t get my AI brain chip-injected, genetically enhanced for speed horsey cyborg for my modern horse-drawn carriage patent.
> Part of being a senior developer is making less-able coders productive, be they fleshly or algebraic. Using agents well is both a both a skill and an engineering project all its own, of prompts, indices, and (especially) tooling. LLMs only produce shitty code if you let them.
A junior developer often has negative value to a team, because they're sapping the time of more senior developers who have to help train them, review code, fix mistakes, etc. It can take a long while to break even.
The raw cost of Cursor's subscription is surely dwarfed by your own efforts, given that description. The actual calculous here should be the cost to corral Cursor, against the value of the code it generated.
This is going to a problem real soon. There needs to be a realistic career path for software developers, or the entire field is at risk.
But I don't think that's a similar situation for LLMs, where the hallucinations or failure to debug their own issues are way too frequent to just "vibe code"
we could choose to be
of course if you're a temporarily embarrassed billionaire like ptacek, you certainly don't want the workers doing this
The Luddites were of course correct, like most labour movements.
I've been hired to a fully distributed team only to have the company decide to force everyone to "return" to the office within a couple years despite the team never having been in any office in the first place; I've had managers promise me raises that don't appear, only for me to find out later that they never actually talked to their superiors about it like they claimed; I've seen teammates get hired and laid off six months later the same week that we have someone else we just hired join the team. None of this stuff is reasonable, and for some reason we collectively just throw up our hands and say there's nothing we can do, and then apparently some of us scoff and look down at lower-paid workers banding together to try to get a semblance of the advantages we take for granted in our industry because they're "stopping progress"? Even if I did accept the premise that workers collectively organizing to win rights affected "progress", there's no reason to assume there's no middle ground between the extremes of "no progress" and "full-throttle progress without regard for consequences". It's farcical.
My part-time obsession over the last few months has been trying to demonstrate this and come up with a method for finding these magic keys (I even tried to get the LLMs to search for me, lol). What I really want is to give the latest thinking models (200k input, 100k output) a 5-6 page design doc (4k words, 5k tokens) and have them produce a complete 5kloc (50k tokens) microservice, which would show a 10x compression. It's hard, but I haven't seen any reason to think it wouldn't work.
For better or worse, I think this will be close to what IC jobs will be like in few years. Fundamentally, our jobs are to try work with other functions to agree to some system that needs to exist, then we talk to the computers to actually implement this. If we switch kotlin+compiler for design doc+llm, it still going to be somewhat the same, but far more productive. Agents and such are somewhat of a stop-gap measure, you don't want people giving tasks to machines, you want to accurately describe some idea and then let the computers make it work. You can change your description and they can also figure out their own tasks to evolve the implementation.
> If you build something with an LLM that people will depend on, read the code. In fact, you’ll probably do more than that. You’ll spend 5-10 minutes knocking it back into your own style.
with Joel Spolsky's fundamental maxim:
> It’s harder to read code than to write it.
https://www.joelonsoftware.com/2000/04/06/things-you-should-...
So I am still trying to find the right amount of reading, editing, and reworking that gets the job done faster, where "the job" includes me being an expert in the produced code, not just the production of code.
There are periods of skimming but I'm doing a lot more than skimming.
But I'm sure the author will simply dismiss it as a skill issue too.
Thomas was a bit more spicy about that:
> but you have no idea what the code is - Are you a vibe coding Youtuber? Can you not read code? If so: astute point. Otherwise: what the fuck is wrong with you?
Yes, I can do frustrating things, I know how to review and vet code. I also know how to write boilerplate code. I also know how to research new tasks in areas where I have no familiarity that are poorly documented. I know how to do a lot of complicated, difficult things - all of which are part of being a professional developer.
The question is whether I want to use a tool that makes most of my job the difficult part.
The only way this makes sense is if you are somehow reaching a point where "doing the harder thing" is not the harder thing for you anymore - where reviewing code is easier for you than writing it. And maybe your argument is that you can get to that point with practice, or that LLM code is generally easier to review than other types of code, or that the problems it's tackling are so difficult that in those cases writing the code is harder than reading it.
But it's not that they're both true. "You should be able to do the harder thing" is not really an answer to "why are you selling me a tool that replaces an easy thing with a hard thing?"
There are many difficult things that I can do as a professional software developer. I can mentor junior developers. I can do detailed product design work with stakeholders and translate technical limitations into language that they understand. I can negotiate software deadlines with product owners. I can write interfaces for undocumented software and deal with undocumented bugs in 3rd-party code. I can step through minified code in production settings to debug problems. These are all difficult things that, as a professional developer, I am capable and willing to do, and often need to do. And yes, of course, I can review pull requests. I am not, however, generally in the business of adopting tools that force me to do that stuff more often than is necessary to get good results. I don't adopt tools that make my life harder, and I honestly think that's a big part of being a professional.
To be very blunt about it: "Buck up, you should be able to handle this" is not a sales pitch. I can also write with my non-dominant hand, but I'm not going to start taking notes that way. There's no intrinsic prize for making your life harder, the only way what you're saying makes sense is if it's not harder for you to read code than to write it.
Maybe we need to drop "easier" and "harder" and talk about speed.
I can write software faster with LLMs, without sacrificing quality - in fact I can get higher quality software because doing things "better" doesn't mean they take longer.
I derive enjoyment from building good stuff. If I can do it faster I can build more of it, which increases my enjoyment.
I wrote about this a couple of years ago: "AI-enhanced development makes me more ambitious with my projects": https://simonwillison.net/2023/Mar/27/ai-enhanced-developmen...
That's still true today, only more so because the models are significantly more competent than they were in March 2023!
I don't think reviewing code well is something most developers can do faster than they can write it. I think that's what Joel is getting at when he says that understanding code is harder than writing it. Harder in the sense of, it takes more effort and takes longer and you are more likely to make errors.
And that might not be true for you. But it is true for a huge number of professional developers.
And it is certainly not the case that understanding and reviewing code is both:
- more time consuming and more difficult than writing it and
- that it's faster to move your entire development strategy to one where you review existing code.
Those are incompatible claims. Pick one.
----
I wouldn't normally quibble about something like this, but it does kind of rub me the wrong way when I hear AI developers talk down about this (and I'm sure it's not your intention to talk down). In your post, you write:
> Figuring out how to patch fetch() like that is non-trivial—where I’m using the software engineer’s definition of “non-trivial” as meaning “I don’t know how to do that off the top of my head”. I’d have to do some thinking and research! I’m already tempted to drop this idea and work on something else.
If I responded to that by writing, "well, doing quick thinking and research is part of the job of being a professional developer and is a learned skill that you could get better at, so what's your problem" - I think you would very rightly say that's not a reasonable response.
So I think that "well, you're a professional, you should be faster at reviewing code" is similarly dismissive to a real conflict inherent in these tools that you are ignoring, and is the kind of dismissive response that I don't normally see from you. Especially phrasing it as, "they're both true".. I don't understand what that even means.
They're not both true, you're telling me right now that it's not both true - you are telling me that it is faster for you to digest code than it is for you to write it. So what is this "both are true" bullcrap?
Lots of difficult things don't take very much time: shooting a bullseye, lifting something heavy, winning a round of geoguessr, playing the guitar solo from Peg. We don't call these things difficult because they take a lot of time in the moment, but because they take a lot of time to master.
I think reading code is like that too. When I think about the best code readers/reviewers I've worked with, they are (1) also among the best and fastest code writers I know, and (2) still much faster at reviewing code than they are at writing it.
Generally though I find LLMs have a pretty rapidly diminishing return on what you can expect out of them. They’re like a 3-5 year senior programmer that has really learned their domain well, but doesn’t have the judgement of a principal engineer. You get to a point where you need to reach in and right things and really pay attention, and at that point the diminishing returns set it rapidly and you’re better off just doing the rest yourself. Refactors and stuff can be delegated but that’s about it.
I find this true regardless of the language. None the less, I’ve been able to improve my overall velocity dramatically completing several projects in the last few months in the span of one typically. If tooling improves I hope to continue that but I’m already getting close to the limit of how fast I can conceive of useful creative things.
- all creatives are flat against it because it’s destroying their income streams and outright stealing their intellectual property
- some technical leaders are skeptical because early returns were very bad and they have not updated their investigations to the latest tools and models, which are already significantly ahead of even six months ago
- a tech concern is how do we mentor new developers if they don’t know how to code or develop logic. LLMs are great IF you already know what you’re doing
- talent is deeply concerned that they will be reduced and replaced, going from high paying careers to fast food salaries
We have a lot of work to balance productivity with the benefits to society. “Let them eat cake,” is not going to work this time either.
I’d also extend the last point:
> talent is deeply concerned that they will be reduced and replaced, going from high paying careers to fast food salaries
The delta in that money going to the investor/tech exec instead. Yet another wealth transfer from the middle class to the 0.01%.
Speeding is quite common too, yet if you get caught — especially overdoing it — you'll have a problem.
Also, in this case, presumably everything produced with AI is fair game too? The argument being made here isn't even "it's not plagiarism", rather "it's plagiarism but I don't care" — why would anyone else respect such an author's copyrights?
This is kind of a first though for any kind of technology. The speed of development and change here is unreal. Never before has a couple months of not being on top of things led to you being considered "out of date" on a tool. The problem is that this kind of speed requires not just context, but a cultural shift on the speed of updating that context. Humanity just isn't equipped to handle this rate of change.
Historically in tech, we'd often scoff at the lifecycle of other industries - Airlines haven't changed their software in 20 years?? Preposterous! For the vast majority of us though, we're the other industry now.
I would jump off a bridge before I accepted that as my full-time job.
I've been programming for 20+ years and I've never wanted to move into management. I got into programming because I like programming, not because I like asking others to write code on my behalf and review what they come up with. I've been in a lead role, and I certainly do lots of code review and enjoy helping teammates grow. But the last fucking thing I want to do is delegate all the code writing to someone or something else.
I like writing code. Yes, sometimes writing code is tedious, or frustrating. Sometimes it's yak-shaving. Sometimes it's Googling. Very often, it's debugging. I'm happy to have AI help me with some of that drudgery, but if I ever get to the point that I feel like I spend my entire day in virtual meetings with AI agents, then I'm changing careers.
I get up in the morning to make things, not to watch others make things.
Maybe the kind of software engineering role I love is going to disappear, like stevedores and lamplighters. I will miss it dearly, but at least I guess I got a couple of good decades out of it. If this is what the job turns into, I'll have to find something else to do with my remaining years.
oh finally someone else who didn't enter programming because, as 7-10 year old child, they were into SOLVING PRACTICAL PROBLEMS FOR PEOPLE.
> But the last fucking thing I want to do is delegate all the code writing to someone or something else
Thank God there is at least one other person that understands that the ratio between creative and reactive work is crucial for wellbeing at the job.
For crying out loud.
> but if I ever get to the point that I feel like I spend my entire day in virtual meetings with AI agents, then I'm changing careers
so am I.
> but at least I guess I got a couple of good decades out of it
Thanks for this perspective. Yes, at least we've got our memories, and the code locations and commits we recall from memory, from a distance of 10 or more years.
>. If this is what the job turns into, I'll have to find something else to do with my remaining years
Me too.
Some of my fondest childhood memories are sitting in my school's resource center in front of a TRS-80, laboriously typing in some mimeographed BASIC code while wondering, "Is this the most efficient way I can increase shareholder value for the corporation?"
I don't want to have to change careers, as this is one that I've been working towards to some degree since I was a child. Including some intense work in college and some brutal first couple jobs where I worked hard to pick up marketable skills. Obviously the market doesn't care what I want, but I find the author of this piece to be a bit too flippant in his "but they take-rr jerbs" section. Working hard to get a well paying job only to have to start (likely near the bottom) in another career for much less pay is not something to treat lightly.
I am always amazed how so many software engineers seem to dislike coding, which seems to be a major underlying theme in the AI-coding cheerleading.
Coding never feels tedious to me. Talking to a chatbot, now that’s tedious.
> But the last fucking thing I want to do is delegate all the code writing to someone or something else
I talked about this a few days ago with coworkers and I phrased it almost exactly this way. I want to be an IC. I want to do the work. Becoming a manager-of-one to funnel all my work through would deprive me of genuinely my greatest joy in life.
I'm envious you managed a few good decades. I'm almost eight years into what has been my dream profession and to see indications it's going to morph from writing code into telling a robot to write code is just so demoralizing. I could have done this for decades. I wanted to do this for decades.
It rings about as authentic as "everybody stood up and clapped"
And at some level, it makes sense. There are a lot of extroverts out there for whom the ideal job really is feeling like an orchestra conductor delegating and coordination an army of others. There is a great feeling of empowerment in watching a group build something bigger than you could have made on your own. And, if you're not someone who really likes to get their hands dirty and do the thing yourself, why not aspire towards a style of working that always feels that high level and powerful?
And, obviously, people who are wired that way are exactly the kind of people to move into management and executive roles and end up being in charge of and evangelizing AI to the world. So it's no surprise you hear people talking about this style of working all the time.
I'm 47. I've had the opportunity to move into management many times. I've been in a lead role enough to know the positive sides of that job and I deeply enjoy the soft skills part of working with a team.
I just know who I am and know that on days where I don't feel like I really made something concrete myself, I feel like I didn't live a real day.
My life experience is that this represents modestly something north of 90% of the population. Most people are not as self motivated as HN crowd is. My cynic wants to say 99.9% Which is how I know its marketing. Its for people that fancy themselves as something they are not. IE fantasizing.
Its like all those commercials where the "boss" strolls in points to some vaguely business looking stuff and everyone runs off to work then the "boss" character goes to the beach or something. Its a fantasy sales pitch in other words. Work one minute a day and be rich.
I guess I'm getting off track which is not how AI helps workflow but that all this stuff we are seeing is marketing and there is little tangible evidence that AI helps enable the workflow you are referencing.
For me switching the career, after spending more than 20 years in this as well... is very hard. I spent all my career outside of high-pay places like SV telling myself "I have all the time in the world, I don't need to grab most amount of money as soon as possible", so retiring is not an option.
So, switch to what? Any well-paid profession is going to be under pressure to be LLMized as much as possible.
If you ever followed down the rabbit hole of children content on YouTube, the one that infants and toddlers end up after an hour of autoplay with some innocent cartoon at the beginning, you'd find among the piles of morbid psychotic trash a variety of videos of children watching someone else playing some games, a whole class of videos in itself.
I can't explain this with anything else but a coordinated effort to flash the norm of watching someone/something doing something as opposed to being engaged in immediate action yourself into the firmware of future people. This lines up with many other current trends pretty well. Talk about incrementing Baudrillard's level of simulation a notch up.
I have a friend and coworker who is currently struggling with this thing and similar "not making anything" issues from having worked a "maintenance/support" job consisting of little day-to-day make-creation. He took a leave for two months. I really don't think he's coming back.
It is a loss for people he works with and the things he touches. Not to mention a loss for me, because he was the one person I found at my current job with the "make stuff, be useful" attitude. Already people who see that writing on the wall are dropping out.
Sooner or later, me too, if that is in fact the way it goes. The only thing that really keeps me going is that I don't have anything else to turn to, and I do have some cloudy optimism about getting a lot of money and satisfaction in a couple years to help clean up or rework the garbage-deluge.
Your other comment about extroversion makes me think of Gifts Differing (not the best book in the universe, but some good bits especially around intro/extroversion). I just want to hit flow state and smash my head into a problem that costs me sleep and come out on top bloodied but happy and feeling like I've done something. And this is/was a really great career for that.
> This faculty of concentration is likely to characterize the introverts' careers. Whereas extraverts tend to broaden the sphere of their work, to present their products early (and often) to the world, to make themselves known to a wide circle, and to multiply relationships and activities, the introvert takes the opposite approach. Going more deeply into their work, introverts are reluctant to call it finished and publish it, and when they do, they tend to give only their conclusions, without the details of what they did. This impersonal brevity of communication narrows their audience and fame, but saves them from overwhelming external demands and allows them to return to another uninterrupted stretch of work. As Jung is reported to have said, the introverts' activity thereby gains in depth and their labor has lasting value.
These kinds of articles that heavily support LLM usage in programming seem to FOMO you or at least suggest that "you are using it wrong" in a weak way just to invalidate contrary or conservative opinions out of the discussion. These are pure rhetorics with such an empty discourse.
I use these tools everyday and every hour in strange loops (between at least Cursor, ChatGPT and now Gemini) because I do see some value in them, even if only to simulate a peer or rubber duck to discuss ideas with. They are extremely useful to me due to my ADHD and because they actually support me through my executive disfunction and analysis paralysis even if they produce shitty code.
Yet I'm still an AI skeptic because I've seen enough failure modes in my daily usage. I do not know how to feel when faced with these ideas because I feel out of the false dichotomy (pay for them, use them every day, but won't think them as valuable as the average AI bro). What's funny is that I'm yet to see an article that actually shows LLMs strengths and weaknesses in a serious manner and with actual examples. If you are going to defend a position, do it seriously ffs.
https://chatgpt.com/share/683e62ed-e118-800f-a404-bd49bec799...
It's just "AI did stuff really good for me" as the proof that AI works
I did, and learned a ton, and likely not going back to how I was before, or how I used it a week ago.
The comments in the article about not reading the agent is good but it’s more than that…
Vibe coding is for non-coders. Yet, you get a feel for the vibe of the AI. With windsurf, you have two or three files open, and working in one. It starts smashing out the multi, interspersed, line edits and you know with a flutter of your eyes, it’s got your vibe and correctly predicted your next ten lines. And for a moment you forgive it for leading you astray when you read what it said.
Had NFTs simply been deployed as some kind of protocol that could be leveraged for utility rather than speculation, I think the story would be a complete 180. No clue personally how to achieve that, but it feels like it could be done.. except that, too, would have been completely perverted and abused by centralized behemoths, leading to a different but terrible outcome. Can you imagine if all data became non-fungible? Convince all the big identity vendors (Google, Apple, etc) to issue key pairs to users that then get used by media companies to deliver audio and video keyed only to you that's embedded with maybe some kind of temporal steganographic signature that's hard to strip and can be traced back to your key? It's not just cracking AACS once and copying the bytes. It becomes this giant mess of you literally can't access anything without going through centralized authorities anymore. Then build more anti-patterns on top of that lol. Prolly better that it was mostly just monkey JPEGs and rug pulls.
Anyways, I'm so far off topic from what's actually being discussed -- just couldn't help myself from veering into left field.
Most of the time I can go faster than these tools if I have confidence in myself and allow the momentum to build up over the course of 20-30 minutes. Every time I tab out to an LLM is like a 5 minute penalty over what I could have done unaided on a good day.
Getting the model prepared to help you in a realistic domain often takes a few minutes of arranging code & comments so that it is forced toward something remotely sane. I'll scaffold out entire BS type hierarchies just so I can throw a //TODO: .... line in the middle somewhere. Without this kind of structure, I would be handling unfiltered garbage most of the time.
It's not that these tools are bad, it's that we need to recognize the true cost of engaging with them. ChatGPT is like a jackhammer. It will absolutely get you through that concrete slab. However, it tends to be quite obnoxious & distracting in terms of its operational principles.
Finally someone else mentions Flow!
I'll be damned if I give up control of my machine to a tool that hallucinates actions to take using hastily put together and likely AI-generated "agents". I still want to be the primary user of my machine, and if that means not using cutting edge tools invented in the last 6 months, so be it. I don't trust the vast majority of tools in this space anyway.
> I’m sure there are still environments where hallucination matters.
Still? The output being correct matters in _most_ environments, except maybe art and entertainment. It especially matters in programming, where a 99% correct program probably won't compile.
> But “hallucination” is the first thing developers bring up when someone suggests using LLMs, despite it being (more or less) a solved problem.
No, it's not. It's _the_ problem that's yet to be solved. And yet every AI company prefers chasing benchmarks, agents, or whatever the trend du jour is.
> I work mostly in Go. [...] LLMs kick ass generating it.
I also work mostly in Go. LLMs do an awful job generating it, just as with any other language. I've had the same shitty experience generating Go, as I've had generating JavaScript or HTML. I've heard this excuse that the language matters, and IME it's just not the case.
Sure, if you're working with an obscure and niche language for which there is less training data, I suppose that could be the case. But you're telling me that there is no good training data for Rust, the trendiest systems language of the past ~decade? C'mon. Comparing Rust to Brainfuck is comical.
I won't bother responding to all points in this article. I will say this: just as AI doomsayers and detractors deserve criticism, so does this over-the-top praising. Yes, LLMs are a great technology. But it is also part of a wildly overhyped market that will inevitably crash as we approach the trough of disillusionment. Their real value is somewhere in the middle.
This sounds like the "No true Scotsman" fallacy.
> People coding with LLMs today use agents. Agents get to poke around your codebase on their own.
That's a nonstarter for closed source, unless everything is running on-device, which I don't think it is?
> Part of being a senior developer is making less-able coders productive
Speak for yourself. It's not my job.
How does that work exactly? Do you have a link?
> Microsoft, OpenAI and I'm pretty sure Anthropic can sell you an Enterprise service that guarantees a certain level of confidentiality
These companies hoovered up all of our content without notice, permission, or compensation, to train their models. I wouldn't trust them one bit. My personal opinion is that it's foolish to trust them.
> I work in aerospace, one of the most paranoid industries
Paranoid about what exactly?
https://ollama.com lets you run models on your own hardware and serve them over a network. The you point your editor at that server, eg https://zed.dev/docs/ai/configuration#ollama
If you've had a dog you know that "dog training" classes are actually owner training.
Same with AI tools. I see big gains for people who spend time to train themselves to work within the limitations. When the next generation of tools come out they can adapt quickly.
If this sounds tedious, thats becuase it is tedious. I spent many long weekends wrestling with tools silently wrecking my entire codebase, etc. And that's what I had to do to get the productivity improvements I have now.
So then where do the junior developers come from? And then where do the senior developers come from?
Neither of these is accurate, but I guess nuanced thinking or considering anything below surface vibes is out these days.
So far after playing with them I'm using them as:
1. A junior intern that can google really really fast and has memorized a large chunk of the Internet and the library, and can do rough first-pass research and dig for things.
2. Autocomplete 2.0 that can now generate things like boilerplate or fairly pedestrian unit tests.
3. Rubber duck debugging where the rubber duck talks back.
4. A helper to explain code, at least for a first pass. I can highlight a huge piece of code and ask it to summarize and then explain and walk me through it and it does a passable job. It doesn't get everything right but as long as you know that, it's a good way to break things down and get into it.
For those things it's pretty good, and it's definitely a lot of fun to play with.
I expect that it will get better. I don't expect it to replace programmers for anything but the most boring mindless tasks (the ones I hate doing), but I expect it to continue to become more and more useful as super-autocomplete and all the other things I listed.
Computer science was an immensely fun subject to learn. I moved to one of the big cities and was bewildered with how much there was to learn, and loved every second of it. I gradually became good enough to help anyone with almost anything, and spent lots of my free time digging deeper and learning.
I liked CS and programming - but I did not like products built by the companies where I was good enough to be employed. These were just unfortunate annoyances that allowed me to work close enough to what I actually enjoyed, which was just code, and the computer.
Before LLMs, those like me could find a place within most companies - the person you don't go to for fast features, but for weird bugs or other things that the more product-minded people weren't interested in. There was still, however, an uncomfortable tension. And now that tension is even greater. I do not use an LLM to write all my code, because I enjoy doing things myself. If I do not have that joy, then it will be immensely difficult for me to continue the career I have already invested so much time in. If I could go back in time and choose another field I would - but since that's not possible, I don't understand why it's so hard for people to have empathy for people like me. I would never have gone down this path if I knew that one day, my hard-earned-knowledge would become so much less valuable, and I'd be forced to delegate the only part of the job I enjoyed to the computer itself.
So Thomas, maybe your AI skeptic friends aren't nuts, they just have different priorities. I realize that my priorities are at odds for the companies I work for. I am just tightly gripping the last days that I can get by doing this job the way that I enjoy doing it.
LLMs don't make your hard-earned-knowledge less valuable: they make it more valuable.
You are better qualified to use them to build great software than people who don't have your level of experience and software engineering domain expertise.
If you don't want to do that then I guess you can find another career - but if you switch careers because you incorrectly think that LLMs make programming experience less valuable you would be making a big mistake in my opinion.
A sentiment I see often is that it's work, it's not supposed to be fun, and you work at the pleasure of the employer. And I accept that. But I still am really just crushingly sad that this is what my job is becoming.
In the article, Thomas wrote:
> LLMs can write a large fraction of all the tedious code you’ll ever need to write. And most code on most projects is tedious. LLMs drastically reduce the number of things you’ll ever need to Google. They look things up themselves. Most importantly, they don’t get tired; they’re immune to inertia.
I see this as a massive downside, because I loved writing tedious code. I loved reading docs on something I previously didn't understand. I loved forming the mental models strong enough to say "yeah I see why that's there" in the previously-inscrutable APIs of the frameworks and such that I was using. It was precisely the _way_ that I approached that work that allowed for that knowledge to accrue. It was because I almost never just copy/pasted something without spending a lot of time to understand it.
I do some of the same with ChatGPT. I type the code in myself after trying to internalize the ChatGPT response. But even that is starting to feel like company time-theft, as the attitude is shifting even further away from "knowing how to do things is good" toward "getting shit done is all that matters."
Since a big majority of companies stopped hiring juniors, where is the new blood coming from when the inevitable more seniors retire?
But again, you're being honest. The problem with a lot of the AI skeptic arguments I see is a lack of this honesty. Others have noted that there are a lot of contradictory skeptical arguments, and I suspect the contradictions come because the authors have negative emotions about AI which they're using to create negative arguments.
It again is coming back to my opinion the LLMs have recreated the job in such a way that it emphasizes what I disliked most, and de-emphasizing what I liked. It emphasizes "the goal" and de-emphasizes "the process". We had a period in the 10's where the process (namely, becoming adept at using and learning an ever-changing set of open source tools) was a bit more celebrated. You could justify a lunch-and-learn on things like man pages, commit isolation levels, or package manager - and doing something like that would be seen in a positive light. And now, why would you waste everyone's time talking about something that ChatGPT can figure out for you?
Anyway, thanks for your time in your response.
OpenAI has that as point 14 in https://openai.com/policies/services-agreement/
Anthropic mention theirs here: https://www.anthropic.com/news/expanded-legal-protections-ap...
Here's the Google post: https://cloud.google.com/blog/products/ai-machine-learning/p...
Yes we are, because the kind of work you need to do in C++ or Rust is probably entirely different from the work this person manages to get the LLM to do in Go.
Loved this style of writing in 2005 from Maddox on the best site in the universe or whatever.
Sorry if I don't want google and openAI stealing my or anyone else's work.
If the kind of work this person is doing requires no thought, then of course an LLM will be perfect for it.
I don’t care how long I can be in my career, I just can’t do BE like this. It feels off character for me to be mediocre.
Are you kidding me? Mosaic was released in 1993. In 1998 Apple released the iMac on a theory, based on some marketing research, that people might be really interested in "getting online." Five years. And that's about a decade before we realized we might want a touch phone. More than a decade before an App Store.
This is very much still the top the first. AI will make even the internet seem small. Which makes sense — the entire whole internet is merely one input into this technology.
Under the heading: "but you have no idea what the [LLM's] code is."
> Are you a vibe coding Youtuber? Can you not read code? If so: astute point. Otherwise: what the fuck is wrong with you? You’ve always been responsible for what you merge to main. You were five years go. And you are tomorrow, whether or not you use an LLM. If you build something with an LLM that people will depend on, read the code.
> People complain about LLM-generated code being “probabilistic”. No it isn’t. It’s code. [...] The LLM might be stochastic. But the LLM doesn’t matter. What matters is whether you can make sense of the result
> Reading other people’s code is part of the job. If you can’t metabolize the boring, repetitive code an LLM generates [...] how are you handling the chaos human developers turn out on a deadline?
okay then (profanity begets profanity)
> Reading other people’s code is part of the job. If you can’t metabolize the boring, repetitive code an LLM generates: skills issue!
Do you /intend/ to spend /most/ of your work life reading AI puke? If so: sanity issue! You are advocating for replacing creative action that gives you Flow with reaction that gives you hemorrhoids, as the /bulk of the work/. Fuck right off.
> How are you handling the chaos human developers turn out on a deadline?
By correcting the deadline setting methodology for the next round of development.
> Does an intern cost $20/month? Because that’s what Cursor.ai costs
then rely on Cursor.ai to raise your next-gen seniors who are going to be supposed to review AI puke in the future
> but the craft [...] Do it on your own time
Go choke on a bag of dicks. You are the problem. This is why quality no longer exists in the world, only lowest common denominator consumerist shit. This is why cars, home appliances, and tools have been getting both worse and more user-hostile. Because nobody values the quality of their own creative output anymore. Quality should not be an exception (which makes it very expensive), but the standard. Quality should be a near-commodity, and therefore at least occasionally affordable for anyone.
> I buy a fucking table
A ridiculously low quality, but hey, cheap!, table, that will fail you at the most inopportune moment. IKEA does have its place, but it's probably not in your office where you receive clients. Do you want them to see the table top fly off?
> They almost certainly have a bigger bag of algorithmic tricks than you do: radix tries, topological sorts, graph reductions, and LDPC codes
and you'll have to review that bag of algorithmic tricks embedded in AI puke, with potential hallucinations introducing obscure, critical mathematical bugs. Good luck verifying all that as a senior, based on proofs and first principles. A human coder will either use a well-reviewed library, or cite a textbook, and implement the textbook very closely, so you can follow it, and have some trust in the intent at least.
> No profession has demonstrated more contempt for intellectual property
how disingenuous. you are showing contempt for copyright. You are totally missing the point of copyleft. Copyleft doesn't advocate for anarchy, it advocates for user freedom via the software copyright framework. When AI barfs GPL'd code snippets into a proprietary code base, that harms user freedom.
> and watched it in seconds spot LVM metadata corruption issues
you are drifting; this is not code generation. Fuzzy pattern matching has always been fair game.
So sick of hearing comparisons between LLMs and Juniors, first of all are we really are seniors? After almost 20 years sometimes I still think I know nothing. Second, when I was a Junior just me being younger, fresher and with an eager learner attidute brought a net positive impact on the team I joined back then, that I still hang out with those people to this day and we still have fun joining new projects together. I'm so sick of these snobbish "seniors" that keep throwing rocks at younger and well prepared people
> Are you a vibe coding Youtuber? Can you not read code? If so: astute point. Otherwise: what the fuck is wrong with you? You’ve always been responsible for what you merge to main. You were five years go. And you are tomorrow, whether or not you use an LLM.
This identifies a critical lever: it doesn't matter that AI sometimes makes mistakes, hallucinates, makes poor architecture decisions. All that matters is your PR to master, because that's what you always have been and will continue to be judged on. Merging trash? That's on you! Failing to utilise new tools and taking 10x longer to make the feature? Also on you!
He implies that if I am talking about some other flavor of AI than he personally uses than I am just not serious— but that’s what they said six months ago and two years ago. These fucking boosters have already blown their credibility.
If you want to prove that AI agents writing code for you is safe then perform a large scale test and show your results. We’re not crazy for doubting you. You are crazy for not doubting yourself.
Ughh. That "own time." Spoken like a true middle manager who thinks passion is a liability.
“Steve Jobs was wrong: we do not need to carve the unseen feet in the sculpture. Nobody cares if the logic board traces are pleasingly routed.”
This tells you all you need to know about the kind of work the author values.
Next time you wonder why so much software is buggy, slow, half considered slop, you’ll know exactly who it was made by.
The author describes a world where most code is boring grunt work and most collaborators are mediocre. This world is not familiar to me. I can see LLMs being useful in such a world, but it’s not where I work.
The comment about buying a table instead of building it really stuck with me. We’re the ones who design the table! Saying we should buy one from the store is like saying farmers should buy their food from the store instead of putting in all that work to grow it.
I am wondering if maybe the average programmer is way slower and worse than I thought.
I'm glad I got out when I did. :) This simply isn't something I want to do for a living. I'm glad there are people who do, though.
After that, who knows. Maybe the whole system collapses and I just go to work as a tutor for some rich peoples' kids in exchange for loaves of bread.
My feeling is that AI will keep the software industry in the same trend: produce more software of lower quality while being more profitable.
People sometimes say "perfect is the enemy of good". But this implies that putting less effort is "good enough". I believe it goes the other way: why would a company put efforts into selling you something good, if it can make you pay the same for something worse? It's all about maximising profit, and AI is no exception there.
Unless you can run the LLM locally, on a computer you own, you are now completely dependent on a remote centralized system to do your work. Whoever controls that system can arbitrarily raise the prices, subtly manipulate the outputs, store and do anything they want with the inputs, or even suddenly cease to operate. And since, according to this article, only the latest and greatest LLM is acceptable (and I've seen that exact same argument six months ago), running locally is not viable (I've seen, in a recent discussion, someone mention a home server with something like 384G of RAM just to run one LLM locally).
To those of us who like Free Software because of the freedom it gives us, this is a severe regression.
Sure, but that is not the point of the article. LLMs are useful. The fact that you are dependent on someone else is a different problem like being dependent on microsoft for your office suite.
This sounds a bit like bailing out the ocean.
If it's one individual doing this, sure. I am posting this in the hopes that others follow suit.
It's fair to be worried about depending on LLM. But I find the dependance on things like AWS or Azure more problematic, if we are talking about centralized and proprietary
Self-hosting has always have a lot of drawbacks compared with commercial solutions. I bet my self-host file server has worse reliability than Google Drive, or my self-host git server has worse number of concurrent user than github.
It's one thing you must accept when self-host.
So when you self-host LLM, you must either accept a drop in output quality, or spend a small fortune on hardware
Raspberry pi was a huge step forward, the move to LLMs is two steps back.
There are all kinds of trades that the car person and the non-car person makes for better or worse depending on the circumstance. The non-car person may miss out on a hobby, or not know why road trips are neat, but they don't have the massive physical and financial liabilities that come with them. The car person meanwhile—in addition to the aforementioned issues—might forget how to grocery shop in smaller quantities, or engage with people out in the world because they just go from point A to B in their private vessel, but they may theoretically engage in more distant varied activities that the non-car person would have to plan for further in advance.
Taking the analogy a step further, each party gradually sets different standards for themselves that push the two archetypes into diametrically opposed positions. The non-car owner's life doesn't just not depend on cars, but is often actively made worse by their presence. For the car person, the presence of people, especially those who don't use a car, gradually becomes over-stimulating; cyclists feel like an imposition, people walking around could attack at any moment, even other cars become the enemy. I once knew someone who'd spent his whole life commuting by car, and when he took a new job downtown, had to confront the reality that not only had he never taken the train, he'd become afraid of taking it.
In this sense, the rise of LLM does remind of the rise of frontend frameworks, bootcamps thay started with React or React Native, high level languages, and even things like having great internet; the only people who ask what happens in a less ideal case are the ones who've either dealt with those constraints first-hand, or have tried to simulate it. If you've never been to the countryside, or a forest, or a hotel, you might never consider how your product responds in a poor connectivity environment, and these are the people who wind up getting lost on basic hiking trails having assumed that their online map would produce relevant information and always be there.
Edit: To clarify, in the analogy, it's clear that cars are not intrinsically bad tools or worthwhile inventions, but had excitement for them been tempered during their rise in commodification and popularity, the feedback loops that ended up all but forcing people to use them in certain regions could have been broken more easily.
And it feels strange, because I am constantly asking people what books they're reading.
See the Microsoft ecosystem as an example. Nothing they do could not be replicated, but the network effects they achieved are strong. Too much glue, and 3rd party systems, and also training, and what users are used to, and what workers you could hire are used to, now all point to the MS ecosystem.
In this early mass-AI-use phase you still can easily switch vendors, sure. Just like in the 1980s you could still choose some other OS or office suite (like Star Office - the basis for OpenOffice, Lotus, WordStar, WordPerfect) without paying that kind of ecosystem cost, because it did not exist yet.
Today too much infrastructure and software relies on the systems from one particular company to change easily, even if the competition were able to provide a better piece of software in one area.
* Not even counting cellular data carriers, I have a choice of at least five ISPs in my area. And if things get really bad, I can go down to my local library to politely encamp myself and use their WiFi.
* I've personally no need for a cloud provider, but I've spent a lot of time working on cloud-agnostic stuff. All the major cloud providers (and many of the minors) provide compute, storage (whether block, object, or relational), and network ingress and egress. As long as you don't deliberately tie yourself to the vendor-specific stuff, you're free to choose among all available providers.
* I run Linux. Enough said.
* Hmm, what kind of software do you write that pays your bills?
* And your setup doesn't require any external infrastructure to be kept up to date?
Open source of course.
So what's my response to that deprecating? Maintaining it myself? Nope finding another library.
You always depend on something...
You say that like it's an absurd idea, but in fact this is what most companies would do.
And I have worked in plenty of companies I'm the open source guy in these companies and me or my teams never had the capacity to do so
True, but I think wanting to avoid yet another dependency is a good thing.
My company has set this up for one of our customers (I wasn't involved).
Maven central is gone and you have no proxy setup or your local cache is busted? Poof, you’re fucking gone, all your Springs, Daggers, Quarkuses and every third party crap that makes up your program is gone. Same applies to bazillion JS, Rust libraries.
A guy says here you need 4TB for a PyPi mirror, 285 GB for npm
https://stackoverflow.com/questions/65995150/is-it-possible-...
We're not yet to that same point for performance of local LLM models afaict, though I do enjoy messing around them.
In 20 years, memory has doubled 32x
It means that we could have 16 TB memory computers in 2045.
It can unlock a lot of possibilities. If even 1 TB is not enough by then (better architecture, more compact representation of data, etc).
Still, I suppose that's better than what nvidia has on offer atm (even if a rack of gpus gives you much, much higher memory throughput).
In some cases it's more cost effective to get M-series Mac Minis vs nVidia GPUs
And with $10k I could pay 40 years of Claude subscription. A much smarter and faster model.
For the past few years, we've been "getting smaller" by getting deeper. The diameter of the cell shrinks, but the depth of the cell goes up. As you can imagine, that doesn't scale very well. Cutting the cylinder diameter in half doubles the depth of the cylinder for the same volume.
If you try to put the cells closer together, you start to get quantum tunneling where electrons would disappear from one cell and appear in another cell altering charges in unexpected ways.
The times of massive memory shrinks are over. That means we have to reduce production costs and have more chips per computer or find a new kind of memory that is mass producible.
The point being made here is that a developer that can only do their primary job of coding via a hosted LLM is entirely dependent on a third party.
You make a good point of course that independence is important. But primo, this ship sailed long ago, secundo, more than one party provides the service you depend on. If one failes you still have at least some alternatives.
That said I only find google results somewhat helpful. Its a lot like LLM code (not surprising given how they're trained), I may find 5 answers online and one or two has a small piece of what I need. Ultimately that may say me a bit of time or give me an idea for something I hadn't thought of, but it isn't core to my daily work by any stretch.
Spoken from a fair bit of experience doing software development in closed rooms with strict control of all digital devices (from your phone to your watch) and absolutely no external connections.
There are moments that are painful still, because you'll be trying to find a thing in a manual and you know a search can get it faster - but it's silly to imply this isn't possible.
And it's not like people weren't able to develop complicated software before the internet. They just had big documentation books that cost money and could get dated quickly. To be clear, having that same info a quick google search away is an improvement, and I'm not going to stop using google while it's available to me. But that doesn't mean we'd all be screwed if google stopped existing tomorrow.
I mostly write JS today and it either runs in browsers (dependencies) or a host like AAwS (dependencies). I use VS Codium and a handful of plugins (dependencies).
These all help me work efficiently when I'm coding, or help me avoid infrastructure issues that I don't want to deal with. Any one part is replaceable though, and more importantly any one part isn't responsible for doing my entire job of creating and shipping code.
FOSS is more about:
1. Finding some software you can use for your problem
2. Have an issue for your particular use case
3. Download the code and fix the issue.
4. Cleanup the patch and send a proposal to the maintainer. PR is easy, but email is ok. You can even use a pastebin service and post it on a forum (suckless does that in part).
5. The maintainer merges the patch and you can revert to the official version, or they don't and you decides to go with your fork.
200-300$/month are already 7k in 3 years.
And I do expect some hardware chip based models in a few years like a GPU.
AiPU we're you can replace the hardware ai chip.
> 200-300$/month are already 7k in 3 years.
Except at current crazy rates of improvement, cloud based models will in reality likely be ~50x better, and you'll still have the same system.
I agree we will see how this plays out but I hope models might start to become more efficient and it might not matter that much for certain things to run some parts locally.
I could imagine a LLM model with a lot less languages and optimized for one programming language to happen. Like 'generaten your model'
2.5 years ago it could just about run LLaMA 1, and that model sucked.
Today it can run Mistral Small 3.1, Gemma 3 27B, Llama 3.3 70B - same exact hardware, but those models are competitive with the best available cloud-hosted model from two years ago (GPT-4).
The best hosted models (o3, Claude 4, Gemini 2.5 etc) are still way better than the best models I can run on my 3-year-old laptop, but the rate of improvements for those local models (on the same system) has been truly incredible.
Therefore using your own bare metal is a low of expensive redundancy.
For the cloud provider they can utilise the GPU to make it pay. They can also subsidise it with VC money :)
This actually to me implies the opposite of what you’re saying here. Why bother relearning the state of the art every few months, versus waiting for things to stabilize on a set of easy-to-use tools?
In fact, MCP is so ground breaking that I consider it to be the actual meat and potatoes of coding AIs. Large models are too monolithic, and knowledge is forever changing. Better just to use a small 14b model (or even 8b in some cases!) with some MCP search tools, a good knowledge graph for memory, and a decent front end for everything. Let it teach itself based on the current context.
And all of that can run on an off the shelf $1k gaming computer from Costco. It’ll be super slow compared to a cloud system (like HDD vs SSD levels of slowness), but it will run in the first place and you’ll get *something* out of it.
It's not black magic anymore.
Folks that are local LLMs everyday now will probably say you can basically emulate at least Sonnet 3.7 for coding if you have an real AI workstation. Which may be true, but the time and effort and cost involved is substantial.
To be fair, the entire internet is basically this already.
> But if you’re a software developer playing this card? Cut me a little slack as I ask you to shove this concern up your ass. No profession has demonstrated more contempt for intellectual property.
The OP isn’t wrong. Some of the most brilliant minds in computer science have built tools to shatter DRM, enable widespread piracy, and repeatedly lift code wholesale from public repos if it helps them move a product forward. The modern crop of technologists (myself included) have long had a hypocritical view on IPR: “rights for me, not for thee” to oversimplify things. The entirety of the modern public cloud is built by trodding on the backs of open source projects while wearing lawn-aerating spikes. We steal shit from others so our Founders and Masters can reap billions in profit; to decry plagiarism now feels incredibly dishonest and insincere, at the very least.
Look, I’m an AI skeptic myself. On the whole, these tools are bad. They’re taking jobs, polluting the planet, and dumbing down a society that still can’t try rebooting something when it fails, let alone identify where their WiFi ends and the internet begins. They’re toxic walled gardens that can reauthor reality to the whims of the VCs and Billionaires funding them, locking you into subscriptions forever.
But even I will admit that modern LLMs, when it comes to writing code, are actually really good for the majority of C-tier devs and below. Are they as helpful to bleeding-edge engineers on the cusp of the future? Nah, not really, but if you’re just writing a checklist app for iOS to help you stay on task, or a Packer manifest to lay out some VM templates, they’re kinda awesome.
I used to believe in the stereotypical copyleft anti IP ethos. In the modern day of corporate LLMs and disrespectful scraping, the lay of the ground is different and people understandably should be less inclined to open source their code.
okay wtf is this weird jab at yacc supposed to be? I generally prefer to write my own LR parsers because it's really not that difficult but I never felt like yacc code was difficult to read or not "probabilistic".
AFAIK most serious compilers don't use yacc either because they need to be able to parse context-dependent grammars, and also reinventing wheels is often worth the effort when the wheel is 50 years old and forces you to make compromises because it also has to support use-cases you are not interested in. But again, it's not bad, it does the thing it is designed to do and it does save a lot of effort if you're fine dealing with its limitations.
Only serious complaint I have about Yacc is that the basic posix form has frustrating limitations such as not being able to have two separate yacc instances because they'll declare the same symbols, and I'm fairly certain it's not thread-safe either. Bison fixes some of these while still remaining "probabilistic".
Anyways I'd rather use Yacc+Lex (preferably Bison+flex) to generate my parser than have to go read everything your "agent" wrote with a fine-toothed comb to make sure it didn't introduce any crippling memory corruptions or dangerous ROPportunities.
Yes, yes, and yes. SOOOO much yes. We are going to be in for a wild ride once everyone gets on board with the reality of coding agents.
> Fly.io builds on the work of a tremendous open source infrastructure community. We want open source authors to benefit from their work because we think a healthy, thriving open source ecosystem will help us build better products.
To the fine article:
> Meanwhile, software developers spot code fragments seemingly lifted from public repositories on Github and lose their shit. What about the licensing? If you’re a lawyer, I defer. But if you’re a software developer playing this card? Cut me a little slack as I ask you to shove this concern up your ass.
In the sense that the posts distill whatever HN at the time would accept as thought leadership it's a good pulse check on the community.
But this is not new https://fly.io/blog/wrong-about-gpu/
> At one point, we hex-edited the [NVIDIA] closed-source drivers to trick them into thinking our hypervisor was QEMU.
Me? I like programming.
This is dead right; my AI skeptic friend are ideologically opposed to LLMs and GenAI.
Their arguments have nothing to do with how good or bad LLMs are:
- IP issues (for code in a jupyter notebook which will never face IP scrutiny)
- because it exploits slaves and poor people (but so does the computer they use and the clothes they wear).
I also have a few programmer friends who make more legitimate complaints about code quality but because they are friends there is also an obvious subtext of ideological opposition.
My opinion is you should use LLMs anyway; you can hate "capitalism" and still take advantage of the system.
Most completely reasonable objections to LLMs were totally dismissed.
"If you aren't using this AI feature you are left behind."
I usually block people for posting like that.
I've never been a completionist. I'm distracted. I like novelty. I make half-form experiments, but mostly provide deeply immersive conversations for the benefit of a small group of people around me, some of whom go on to make/do things that I proudly feel my influence on. The final 10% of work often defeated me.
Now, I just make it happen. The pain is gone. Something else can carry the last 10%, and my ideas get realized by me (and AI).
I can't even conceive how this will change the world en masse. Knowledge domains (not just tech, but science, art, etc) have been built of the things that the type-A completionists knew how to see. The others didn't finish, and what they saw of the world has largely been marginalized.
The mind of ADHD and creative types is about to get much more represented in the output products of civilisation. This is going to be a big recalibration, and I think a re-balancing toward something more healthy tbh
come at me.
I take serious issue with the "but you have no idea what the code is" rebuttal, since it - to me - skims over the single largest issue with applying LLMs anywhere where important decisions will be made based on their outputs.
To quote from the article:
People complain about LLM-generated code being
“probabilistic”. No it isn’t.
It’s code. It’s not Yacc output. It’s knowable. The LLM
might be stochastic. But the LLM doesn’t matter. What
matters is whether you can make sense of the result, and
whether your guardrails hold.
Reading other people’s code is part of the job. If you can’t metabolize the
boring, repetitive code an LLM generates: skills issue! How are you handling the
chaos human developers turn out on a deadline?
The problem here is that LLMs are optimized to make their outputs convincing. The issue is exactly "whether you can make sense of the result", as the author said, or, in other words: whether you're immune to being conned by a model output that sounds correct but is not. Sure, "reading other people’s code is part of the job", but the failure modes of junior engineers are easily detectable. The failure modes of LLMs are not.EDIT: formatting
That may be true for chat aligned LLMs, but coding LLMs are trained w/ RL and rewards for correctness, nowadays. And there are efforts to apply this to the entire stack (i.e. better software glue, automatic guardrails, more extensive tool-use, access to LSP/debuggers/linters, etc).
I think this is the critical point in a lot of these debates that seem to be very popular right now. A lot of people try something and get the wrong impressions about what SotA is. It turns out that often that something is not the best way to do it (i.e. chatting in a web interface for coding), but people don't go the extra mile to actually try what would be best for them (i.e. coding IDEs, terminal agents, etc).
Edit: thought I would include this below instead of in a separate comment:
Also, whether the models are trained purely on synthetic data or not, they suffer from these epistemological issues where they are unable to identify what they don't know. This means a very reasonable looking piece of code might be spit out for some out-of-distribution prompt where the model doesn't generalize well.
We don't know how the "thinking" models are trained at the big3, but we know that open-source models have been trained with RL. There's no human in that loop. They are aligned based on rewards, and that process is automated.
> Which "coding LLMs" are you referring to that are trained purely on verifiably correct synthetic data?
The "thinking" ones (i.e. oN series, claudeThinking, gemini2.5 pro) and their open-source equivalents - qwq, R1, qwen3, some nemotrons, etc.
From the deepseek paper on R1 we know the model was trained with GRPO, which is a form of RL (reinforcement learning). QwQ and the rest were likely trained in a similar way. (before GRPO, another popular method was PPO. And I've seen work on unsupervised DPO, where the pairs are generated by having a model generate n rollouts, verify them (i.e. run tests) and use that to guide your pair creation)
You make a fair point that there are alternatives (e.g. DeepSeek r1) which avoid most of the human feedback (my understanding is the model they serve is still aligned by human responses for safety).
I guess I have to do some more reading. I'm a machine learning engineer but don't train LLMs.
I am curious about this part though
This kind of guilt-by-association play might be the most common fallacy in internet discourse. None of us are allowed to express outrage at the bulk export of GitHub repos with zero regard for their copyleft status because some members of the software engineering community are large-scale pirates? How is that a reasonable argument to make?
The most obvious problem with this is it's a faulty generalization. Many of us aren't building large-scale piracy sites of any sort. Many of us aren't bulk downloading media of any kind. The author has no clue whether the individual humans making the IP argument against AI are engaged in piracy, so this is an extremely weak way to reject that line of argument.
The second huge problem with this argument is that it assumes that support for IP rights is a blanket yes/no question, which it's obviously not. I can believe fervently that SciHub is a public good and Elsevier is evil and at the same time believe that copyleft licenses placed by a collective of developers on their work should be respected and GitHub was evil to steal their code. Indeed, these two ideas will probably occur together more often than not because they're both founded in the idea that IP law should be used to protect individuals from corporations rather than the other way around.
The author has some valid points, but dismissing this entire class of arguments so flippantly is intellectually lazy.
Agree 100%. And generally programmers have a poor understanding of the law, especially common law as it applies in America (the country whose legal system most software licenses have been written to integrate with, especially copyleft principles).
American Common Law is an institution and continuity of practice dating back centuries. Everything written by jurists within that tradition, while highly technical, is nonetheless targeted at human readers who are expected to apply common sense and good faith in reading. Where programmers declare something in law insufficiently specified or technically a loophole, the answer is largely: this was written for humans to interpret using human reason, not for computers to compile using limited, literal algorithms.
Codes of law are not computer code and do not behave like computer code.
And following the latest AI boom, here is what the bust will look like:
1. Corporations and the state use AI models and tools in a collective attempt to obfuscate, diffuse, and avoid accountability. This responsibility two-step is happening now.
2. When bad things happen (e.g. a self-driving car kills someone, predictive algorithms result in discriminatory policy, vibe coding results in data leaks and/or cyberattacks), there will be litigation that follows the bad things.
3. The judges overseeing the litigation will not accept that AI has somehow magically diffused and obfuscated all liability out of existence. They will look at the parties at hand, look at relevant precedents, pick out accountable humans, and fine them or---if the bad is bad enough---throw them in cages.
4. Other companies will then look at the fines and the caged humans, and will roll back their AI tools in a panic while they re-discover the humans they need to make accountable, and in so doing fill those humans back in on all the details they pawned off on AI tools.
The AI tools will survive, but in a role that is circumscribed by human accountability. This is how common law has worked for centuries. Most of the strange technicalities of our legal system are in fact immune reactions to attempts made by humans across the centuries to avoid accountability or exploit the system. The law may not be fast, but it will grow an immune response to AI tools and life will go on.
I’m also a filmmaker and married to a visual artist.
I don’t touch this stuff at all. It’s all AI slop to me. I don’t want to see it, I don’t want to work with it or use it.
Without knowing the commenter above, I’ll say this: don’t assume an individual boycott is necessarily effective. If one is motivated by ethics, I think it is morally required to find effective ways to engage to shape and nudge the future. It is important to know what you’re fighting for (and against). IP protection? Human dignity through work? Agency to effect one’s life? Other aspects? All are important.
Many of the people pushing for a lot of AI stuff are the same people who have attached their name to a combination of NFTs, Blockchain, cryptocurrency, Web3 and other things I consider to be grifts/scams.
The term “AI” is already meaningless. So let’s be clear: Generative AI (GenAI) is what worries many people including a number of prominent artists.
This makes me feel like there’s work to be done if we want open source/art/the internet as we know it to remain and be available to us in the future.
It drives me a little crazy to see Mozilla adding AI to Firefox instead of yelling about it at every opportunity. Do we need to save them too?
GenAI just works. People don't need to be pushed using it and continue using it.
OpenAI has 500 Million active users weekly.
As an example, look at how cars are advertised. If you only paid attention to marketing, you would think everyone is zipping around winding mountain roads in their SUVs, loaded up with backcountry gear. This is not accurate, but nonetheless SUVs are dominant.
Who knows, we might find out in a month that this shit we're doing is really unsafe and is a really bad idea, and doesn't even work ultimately for what we'd use it for. LLMs already lie and blackmail.
Second. People can do what they want? This may not even be self-consistent. Humans are complex and operate imperfectly across time horizons and various unclear and even contradictory goals.
Third. Assuming people have some notion of consistency in what they want, can people can do what they want? To some degree. But we live in a world of constraints. Consider this: if one only does what one wants, what does that tell you? Are they virtuous? Maybe, maybe not: it depends on the quality of their intentions. Or consider the usual compromising of one’s goals: people often change what they want to match what is available. Consider someone in jail, a parent who lost a child, a family in a war zone, or someone who isn’t able to get the opportunity to live up to their potential.
Fourth, based on #3 above, we probably need to refine the claim to say this: people strive to choose the best action available to them. But even in this narrower framing, saying “people do what they can” seems suspect to me, to the point of being something we tell ourselves to feel better. On what basis can one empirically measure how well people act according to their values? I would be genuinely interested in attempts to measure this.
Fifth, here is what I mean by engaging with a technology you disagree with: you have to engage in order to understand what are you are facing. You should clarify and narrow your objections: what aspects of the technology are problematic? Few technologies are intrinsically good or evil; it is usually more about how they are used. So mindfully and selectively use the technology in service of your purposes. (Don’t protest the wrong thing out of spite.) Focus on your goals and make the hard tradeoffs.
Here is an example of #5. If one opposes urban development patterns that overemphasize private transit, does this mean boycotting all vehicles? Categorically refusing to rent a car? That would miss the point. Some of one’s best actions involve getting involved in local politics and advocacy groups. Hoping isolated individual action against entrenched interests will move the needle is wishful thinking. My point is simple: choose effective actions to achieve your goals. Many of these goals can only be achieved with systematic thinking and collective action.
Just as I don't need to understand the finer points of extreme bigotry to be opposed to it, we don't need to be experts on LLMs to be opposed to the well-heeled and breathless hype surrounding it, and choose to not engage with it.
>> If one is motivated by ethics, I think it is morally required to find effective ways to engage to shape and nudge the future.
Put another way, the claim could be stated as: "if one is motivated by ethics, then one should pay attention to consequences". Yes, this assumes one accepts consequentialism to some degree, which isn't universally accepted nor easy to apply in practice. Still, I don't think many people (even those who are largely guided by deontology) completely reject paying attention to consequences.
If by the last "it" you mean "the hype", then I agree.
But -- sorry if I'm repeating -- I don't agree with conflating the tools themselves with the hype about them. It is fine to not engage with the hype. But it is unethical to boycott LLM tooling itself when it could serve ethical purposes. For example, many proponents of AI safety recommend using AI capabilities to improve AI safety research.
This argument does rely on consequentialist reasoning, which certainly isn't the only ethical game in town. That said, I would find it curious (and probably worth unpacking / understanding) if one claimed deontological reasons for avoiding a particular tool, such as an LLM (i.e. for intrinsic reasons). To give an example, I can understand how some people might say that lying is intrinsically wrong (though I disagree). But I would have a hard time accepting that _using_ an LLM is intrinsically wrong. There would need to be deeper reasons given: correctness, energy usage, privacy, accuracy, the importance of using one's own mental faculties, or something plausible.
If this was a legitimate moral argument, we'd never make any social progress.
Plagiarism is taking somebody else's work and claiming that you yourself created it. It is a form of deception, depriving another of credit while selling their accomplishments as your own.
Piracy on the other hand is the violation of a person's monopoly rights on distributing certain works. This may damage said person's livelihood, but the authorship remains clear.
I don't think that is an accurate representation of the tech community. On the other hand, I do think TFA is making a reasonable statistical representation of the tech community (rather than a "guilt-by-association" play) which could be rephrased as:
The overriding ethos in HN and tech communities has clearly been on the "information wants to be free" side. See: the widespread support of open source and, as your comment itself mentions, copyleft. Copyleft, in particular, is famously based on a subversion of intellectual property (cf "judo throw") to achieve an "information wants to be free" philosophy.
Unsurprisingly, this has also manifested countless times as condoning media piracy. Even today a very common sentiment is, "oh there are too many streaming services, where's my pirate hat yarrrr!"
Conversely, comments opposing media piracy are a vanishingly tiny, often downvoted, minority. As such, statistically speaking, TFA's evaluation of our communities seems to be spot on.
And, now the same communities are in an uproar when their information "wants to be free". The irony is definitely rich.
And for that reason, I think your version exposes the flaw even more thoroughly: you can't reasonably merge a data set of stats on people's opinions on AI with a data set of stats on people's opinions on IP in the way that you're proposing.
To throw out random numbers as an example of the flaw: If 55% of people on HN believe that IP protection for media should not exist and 55% believe that GitHub stole code, it's entirely possible that TFA's condemnation only applies to 10% of the total HN population that holds the supposedly conflicting belief even though HN "statistically" believes both things.
And that's before we get into the question of whether there's actually a conflict (there's not) and the question of whether anyone is accurately measuring the sentiment of the median HN user by dropping into various threads populated by what are often totally disjoint sets of users.
Now, you could say the capitalism is a function of the policies enacted by the country, which aren't a thing for online forums. But these policies are a reflection of the majority votes of the population, and votes are a thing on forums. Even a casual observation, starting from the earliest Slashdot days to modern social media, shows that the most highly upvoted and least contested opinions align with the "information wants to be free" philosophy.
To get a bit meta, you can think of this rhetoric along the lines of the following pattern which is common on social media:
Hacker News: "All software is glorified CRUD boilerplate! You are not a special snowflake! Stop cosplaying Google!"
Also Hacker News: "AI is only good for mindless boilerplate! It's absolutely useless for any novel software! AI boosters are only out to scam you!"
The sentiment is obviously not uniform and is shifting over time, even in this very thread... but it does ring true!
Hacker News doesn't have opinions. Individuals on Hacker News have opinions. Different sets of individuals comment and vote on different threads. There's zero reason to suppose that it's the same people expressing both ideas or even that it's the same people voting on those ideas. To the contrary, I've spent enough time on this forum (way too much time) to know that there are whole sub-communities on HN that overlap very imperfectly. We self-select based on titles, topics, and even on time of day.
The only thing this kind of logic is good for is producing fallacious arguments dismissing someone's opinion because someone else holds a contradicting opinion.
Now, generalizations can be faulty, but whether they ring true is a good proxy for their usefulness. And this point in TFA rings very true. Beyond just Hacker News or other social media, look at the blogosphere, industry "thought leaders", VCs, organizations like the EFF, startups, tech companies and their executives (and if you look closely, their lobbyists) on any matter involving intellectual property rights. The average reality that emerges is pretty stark and can be summarized as "what's mine is mine, what's yours is everybody's." Sure, many of us would disagree with that, but that is what the ultimate outcome is.
As such, I read that point not as singling out a specific set of people, but as an indictment of the tech community, and indeed, the industry as a whole.
I think I missed a story? Is GitHub somehow stealing my code if I publish it there under GPL or similar? Or did they steal some specific bit of code in the past?
https://www.kaggle.com/competitions/konwinski-prize/
Currently, the #1 spot sits at a score of 0.09, not 0.9. A far cry from being useful. I know that open source models are not as good as closed source, but still, we're a long way from LLMs being good for code on their own.
And that supports OP's point - these tools aren't AGI, they produce trash that needs evaluation, but they're still useful.
that would probably cover it, and you don't need "AI" to do that
The models that are both open source and quantized so that they can fit within that much memory are going to be significantly less capable than full scale frontier closed source models, I wonder how the latter would perform.
The best intellisense and code completion tools would solve 0.00. Those were the only tools we were using just a couple of years ago. 0.09 is a tremendous jump and the improvements will accelerate!
Do you think humans have achieved peak intelligence? If so, why, and if not, then why shouldn't you expect artificial forms of intelligence to improve up to and even surpass humans abilities at some point?
Edit: to clarify, I'm not necessarily assuming unbounded acceleration, but tools always start out middling, improvements accelerate as we figure out what works and what doesn't, and then they taper off. We're just starting on the acceleration curve for AI.
You know what doesn't constantly nag for my attention? Dead text files on disk. They just sit there, waiting for me to grace them with my presence. That's the way it should be. Computers serve me, not the other way around.
I do not have unlimited funds to plug in some token and burn a bunch of money when writing code.
I am gpu poor. I'm lucky that 8gb vram can run the smallest models. But the output is so poor that I lose out to anyone using a hosted service.
If anything this article shows that building great programs is less democratized than it once was.
OpenAI and Anthropic both have free plans as well.
I use a prompt that convinces the LLM they are a Staff Engineer at Google and care deeply about best practices and writing clean, quality code.
While not perfect, it does improve the code quality.
- Large C codebase (new feature and bugfix)
- Small rust codebase (new feature)
- Brand new greenfield frontend for an in-spec and documented openAPI API
- Small fixes to an existing frontend
It failed _dramatically_ in all cases. Maybe I'm using this thing wrong but it is devin-level fail. Gets diffs wrong. Passes phantom arguments to tools. Screws up basic features. Pulls in hundreds of line changes on unrelated files to refactor. Refactors again and again, over itself, partially, so that the uncompleted boneyard of an old refactor sits in the codebase like a skeleton (those tokens are also sent up to the model).
It genuinely makes an insane, horrible, spaghetti MESS of the codebase. Any codebase. I expected it to be good at svelte and solidJS since those are popular javascript frameworks with lots of training data. Nope, it's bad. This was a few days ago, Claude 4. Seriously, seriously people what am I missing here with this agents thing. They are such gluttonous eaters of tokens that I'm beginning to think these agent posts are paid advertising.
So I'd say claude 4 agents today are at smart but fresh intern level of autonomy. You still have to do the high level planning and task break down, but it can execute on tasks (say requiring 10 - 200 lines of code excluding tests). Any asking it to write much more code (200+ lines) often require a lot of follow ups and disappointment.
A significant portion of my prompts are writing and reading from .md files, which plan and document the progress.
When I start a new feature, it begins with: We need to add a new feature X that does ABC, create a .md in /docs to plan this feature. Ask me questions to help scope the feature.
I then manually edit the feature-x.md file, and only then tell the tool to implement it.
Also, after any major change, I say: Add this to docs/current_app_understanding.md.
Every single chat starts with: Read docs/current_app_understanding.md to get up to speed.
The really cool side benefit here is that I end up with solid docs, which I admittedly would have never created in the past.
You don't exactly need to know prompting, you just need to know how to ask the AI to help you prompt it.
With my async agent I do not care about how easy it is for me, it’s easier to tell the agent to do the workflow and comeback to it later when I’m ready to review it. If it’s a good change I approve the pr, if not I close it.
We’ve built tools to help us with the first part, framework with the second, architecture principles with the third and software engineering techniques for the fourth. Where do LLMs help?
An interesting thing about many of these types of posts is they never actually detail the tools they use and how they use them to achieve their results. It shouldn’t even be that hard for them to do, they could just have their agent do it for them.
You may be right. The author of this one even says if you spend time prettying your code you should stop yak shaving. They apparently don't care about code quality.
brought to you by fly.io, where the corporate blog literally tells you to shove your concerns up your ass:
> Cut me a little slack as I ask you to shove this concern up your ass.
I'm 100% certain most if not all of them are, there is simply too much money flying around and I've seen things that marketing does in the past for way less hyped products. Though in this specific case I think the writer may simply be shilling AI to create demand for their service. Pay us monthly to one click deploy your broken incomplete AI slop. The app doesn't work? No problem just keep prompting harder and paying us more to host/build/test/deploy it...
I've also tried the agent thing and still am with only moderate success. Cursor, Claud-squad, goose, dagger AI agents. In other words all the new hotness, all with various features claiming to solve the fact that agents don't work. Guess what? they still don't.
But hey this is HN? most of the posters are tech fearing luddies right? All the contention on here must mean our grindset is wrong and we are not prompting hard enough.
There is even one shill Ghuntly that claims you need to be "redlining" ai at the cost of $500-$1000 per day to get the full benefits. LOL if that is not a veiled advertisement I don't know what is.
Why are all your public projects "joke/toy projects" if AI is so awesome and production ready? My experience reflects this as the truth. Yet your work backs up my experience rather than your words.
To avoid only being snark. I think all software is about power/control and software has allowed unprecedented concentration of power Which is why it resists being official like other industries. No one with power wants restrictions on their power/software. Ultimately AI is good for small'ish projects and a productivity multiplier(eventually). I think it will lead a new revolution in distilling the current incumbents in the business world that are stagnant on vast proprietary software systems that previously could not be unseated. Small players will be able to codify/automate their business to make it competitive with big players. So I'm not "anti-ai".
edit: AI will simultaneously rot the proprietary software advantage from the inside out. As companies are further convinced that AI can solve their problem of having to pay people to maintain their software.
Having a "disclosures" page on a personal website is a pretty strong quality signal for me - it's inspired me to set a task to add my own.
As with all of these things, the trick is to establish credibility over time. I've been following Geoff for a few years. Given his previous work I think he has integrity and I'm ready to believe his disclosures page.
However we seem to live in a time where integrity is basically valued at zero or more commonly as something to "bank" so you can cash in for an enormous payoff when the time comes. I agree he seems authentic, therefore valuable. Which means an AI company can come and offer him 7-8 figures to build hype. I think its hard for people to truly grasp just how much money is flying around in hype cycles. Those numbers are not unrealistic. That's set for life money, not many are in a position to refuse that kind of wealth. (he lives in van, just saying)
I hope he is one of the few authentic people left but the world has left me jaded.
If details of that deal leak, it's a big embarrassment for the company.
In the USA it is also illegal. There are substantial FTC fines to worry about. If it affects the stock price it could be classified as securities fraud (Matt Levine will happily tell you that "everything is securities fraud").
However, a counter argument to all this;
Does it matter if the code is messy?
None of this matters to the users and people who only know how to vibe code.
It matters proportionally to the amount of time I intend to maintain it for, and the amount of maintenance expected.
A prompt like “Write a $x program that does $y” is generally going to produce some pretty poor code. You generally want to include a lot of details and desires in your prompt. And include something like “Ask clarifying questions until you can provide a good solution”.
A lot of the people who complain about poor code generation use poor prompting.
Simon Willison has some great examples in his blog and on his GitHub. Check out Karpathy’s YouTube videos as well.
As with any other project, it’s best to specify your wants and needs than to let someone or an LLM to guess.
I've been developing my prompting skills for nearly three years now and I still constantly find new and better ways to prompt.
I also consider knowing what "use a reasoning model" means to be part of that skill!
I just tell the AI what I want, with sufficient context. Then, I check the reasoning trace to check it understood what I wanted. You need to be clear in your prompts, sure, but I don't really see it as "prompt engineering" any more.
Writing code is one thing that models can do when wired properly, and you can get a powerful productivity boost, but wielding the tools well is a skill of it's own, and results will vary by task, with each model having unique strengths. The most important skill is understanding the limitations.
Based on your task descriptions and the implied expectation, I'm unsurprised that you are frustrated with the results. For good results with anything requiring architecture decisions have a discussion with the model about architecture design, before diving in. Come up with a step by step plan and work through it together. Models are not like people, they know everything and nothing.
This 16-minute, expletive-filled, edgy-old-man-trying-too-hard-to-be-cool article could easily be dismissed as yet another AI creed that somehow found its way to the top of the HN front page.
That should have been the start of the essay.
New title: LLMs make Go programming feel good
To be sure, you definitely should not blindly trust the people who have a stake in selling AI products. However, you probably should trust the people who have no stake in selling AI, but are using it to accelerate themselves. The problem is, these people are probably too busy building things to spare any time convincing you. And in fact, it is a competitive advantage for them if others don't figure it out.
Here's my take: Everyone needs to figure out how AI works for them. The thing with AI is that it is a technology that behaves like people. It has the speed, efficiency and scale of computers but the fallibility and quirks of humans. This is why it's so confusing to discuss, and why folks have such varying experiences. The right approach to working with AI is like working with people: understand what they're good at and where they are weak, and then work with them to achieve your goals.
This will require using it "in anger" on many non-trivial tasks over a significant period of time, which will take persistence and patience. Yes, the hype is so over-the-top that you might not want to invest your valuable time on it. But you owe it to yourself to ensure you can fully capitalize on the sea change that is happening.
If it helps, I (like the OP) have no stake in selling AI and I posted this comment about my experiences on the other thread about the AI coding -- https://news.ycombinator.com/item?id=44163543 -- The upshot is, AI has enabled me to produce innovative (probably cutting edge) work in domains I had little prior background in. And I've not even used agents yet!
That is the real takeaway.
I use LLM’s for dozens of things. It took time to find how model strengths best connect with mine.
People who only evaluate models for one purpose over short timespans are going to have a hard time.
It’s like models come from a different cognitive & communication culture. Eliminating that barrier takes more experimentation and experience than logic.
I think their creativity is also often a function of the prompters creativity.
The harder you (creatively) push them into a creative corner, the more genuinely creative they get.
Not unlike how a good teacher can get unexpected positive results from students when they are channeled and pushed.
Let me know when I can use one of these without violating ITAR/EAR.
I think this is my key point of disagreement. Professional software developers often don't see themselves as artisans, because Steve Jobs's unseen feet are implemented through our implicit standards of ourselves. I don't go out of my way to be a perfectionist on the job, but neither will I try to push 1000 lines of spaghetti to solve a minor problem that seems like it ought to be a simple fix.
I've worked with people in the past who don't have this instinct, and it's my experience that their productivity is quite low if not negative, even though they often perceive themselves to be 10x developers who can handle all kinds of edge cases super fast. I suspect that most AI-generated code today ends up in this niche.
It seems like you're pointing out a consequence, not a counter argument.
What I’ve been saying to my friends for the last couple of months has been, that we’re not going to see coding jobs go away, but we’re going to run into a situation where it’s harder to grow junior engineers into senior engineers because the LLMs will be doing all the work of figuring out why it isn’t working.
This will IMO lead to a “COBOL problem” where there are a shortage of people with truly deep understanding of how it all fits together and who can figure out the line of code to tweak to fix that ops problem that’s causing your production outage.
I’m not arguing for or against LLMs, just trying to look down the road to consequences. Agentic coding is going to become a daily part of every developer’s workflow; by next year it will be table stakes - as the article said, if you’re not already doing it, you’re standing still: if you’re a 10x developer now, you’ll be a 0.8x developer next year, and if you’re a 1x developer now, without agentic coding you’ll be a 0.1x developer.
It’s not hype; it’s just recognition of the dramatic increase in productivity that is happening right now.
It’s like reductio ad absurdum, but without the logical consequence of the argument being incorrect, just bad.
You see it all the time, especially when it comes to predictions. The whole point of this article is coding agents are powerful and the arguments against this are generally weak and ill-informed. Coding agents having a negative impact on skill growth of new developers isn’t a “fundamental mistake” at all.
I'm not much of a user of LLMs for generating code myself, but this particular analogy isn't a great fit. The one redeeming quality is that compiler output is deterministic or at least repeatable, whereas LLMs have some randomness thrown in intentionally.
With that said, both can give you unexpected behavior, just in different ways.
When talking about undefined behavior, the only documentation is a shrug emoticon. If you want a working program, arbitrary undocumented behavior is just as bad as incorrectly documented behavior.
And despite the triggers being documented, they're very very hard to avoid completely.
Unexpected as in "I didn't know" is different than unexpected as in "I can't predict". GCC optimizations is in the former camp and if you care to know, you just need to do a deep dive in your CPU architecture and the gcc docs and codebase. LLMs is a true shot in the dark with a high chance miss and a slightly lower chance of friendly fire.
Nah, treating undefined behavior as predictable is a fool's errand. It's also a shot in the dark.
It's effectively impossible to rely on. Checking at the time of coding, or occasionally spot checking, still leaves you at massive risk of bugs or security flaws. It falls under "I can't predict".
The same thing happens with optimization. They usually warns about the gotchas, and it’s easy to check if the errors will bother you or not. You dont have to do an exhaustive list of errors when the classes are neatly defined.
But I'm talking about code that has undefined behavior. If there is any at all, you can't reliably learn what optimizations will happen or not, what will break your code or not. And you can't check for incorrect optimization in any meaningful way, because the result can change at any point in the future for the tiniest of reasons. You can try to avoid this situation, but again it's very hard to write code that has exactly zero undefined behavior.
When you talked about doing "a deep dive in your CPU architecture and the gcc docs and codebase", that is only necessary if you do have undefined behavior and you're trying to figure out what actually happens. But it's a waste of effort here. If you don't have UB, you don't need to do that. If you do have UB it's not enough, not nearly enough. It's useful for debugging but it won't predict whether your code is safe into the future.
To put it another way, if we're looking at optimizations listing gotchas, when there's UB it's like half the optimizations in the entire compiler are annotated with "this could break badly if there's UB". You can't predict it.
If AI gets to that level we will indeed have a sea change. But I think the current models, at least as far as I've seen, leave open to question whether they'll ever get there or not.
Or if I'm wrong about the last bit, maybe it never was important.
Most of the people I work with, however, just understand the framework they are writing and display very little understanding or even curiosity as to what is going on beneath the first layer of abstraction. Typically this leaves them high and dry when debugging errors.
Anecdotally I see a lot more people with a shallow expertise believing the AI hype.
As a teen I used to play around with Core Wars, and my high school taught 8086 assembly. I think I got a decent grasp of it, enough to implement quicksort in 8086 while sitting through a very boring class, and test it in the simulator later.
I mean, probably few people ever need to use it for something serious, but that doesn't mean they don't understand it.
The abstraction over assembly language is solid; compilers very rarely (if at all) fail to translate high level code into the correct assembly code.
LLMs are nowhere near the level where you can have almost 100% assurance that they do what you want and expect, even with a lot of hand-holding. They are not even a leaky abstraction; they are an "abstraction" with gaping holes.
If all of mankind lost all understanding of registers overnight, it'd still affect modern programming (eventually)
Some of the arguments in the article are so bizarre that I can’t believe they’re anything other than engagement bait.
Claiming that IP rights shouldn’t matter because some developers pirate TV shows? Blaming LLM hallucinations on the programming language?
I agree with the general sentiment of the article, but it feels like the author decided to go full ragebait/engagement bait mode with the article instead of trying to have a real discussion. It’s weird to see this language on a company blog.
I think he knows that he’s ignoring the more complex and nuanced debates about LLMs because that’s not what the article is about. It’s written in inflammatory style that sets up straw man talking points and then sort of knocks them down while giving weird excuses for why certain arguments should be ignored.
A lot of people are misunderstanding the goal of the post, which is not necessarily to persuade them, but rather to disrupt a static, unproductive equilibrium of uninformed arguments about how this stuff works. The commentary I've read today has to my mind vindicated that premise.
Everything about LLMs and generative AI is getting so mushed up by people pulling it in several directions at once, marketing clouding the water, and the massive hyperbole on both sides, it's nearly impossible to understand if we're even talking about the same thing!
- It cannot write tests because it doesn't understand intent
- Actually it can write them, but they are "worthless"
- It's just predicting the next token, so it has no way of writing code well
- It tries to guess what code means and will be wrong
- It can't write anything novel because it can only write things it's seen
- It's faster to do all of the above by hand
I'm not sure if it's the issue where they tried copilot with gpt 3.5 or something, but anyone who uses cursor daily knows all of the above is false, I make it do these things every day and it works great. There was another comment I saw here or on reddit about how everyone needs to spend a day with cursor and get good at understanding how prompting + context works. That is a big ask but I think the savings are worth it when you get the hang of it.
Which argument? The one dismissing all arguments about IP on the grounds that some software engineers are pirates?
That argument is not only unpersuasive, it does a disservice to the rest of the post and weakens its contribution by making you as the author come off as willfully inflammatory and intentionally blind to nuance, which does the opposite of breaking the unproductive equilibrium. It feeds the sense that those in the skeptics camp have that AI adopters are intellectually unserious.
I know that you know that the law and ethics of IP are complicated, that the "profession" is diverse and can't be lumped into a cohesive unit for summary dismissal, and that there are entirely coherent ethical stances that would call for both piracy in some circumstances and condemnation of IP theft in others. I've seen enough of your work to know that dismissing all that nuance with a flippant call to "shove this concern up your ass" is beneath you.
Yeah... this was a really, incredibly horseshit argument. I'm all for a good rant, but goddamn, man, this one wasn't good. I would say "I hope the reputational damage was worth whatever he got out of it", but I figure he's been able to retire at any time for a while now, so that sort of stuff just doesn't matter anymore to him.
"A whole bunch of folks ignore copyright terms, so all complaints that 'Inhaling most-to-all of the code that can be read on the Internet with the intent to make a proprietary machine that makes a ton of revenue for the owner of that machine and noone else is probably bad, and if not a violation of the letter of the law, surely a violation of its spirit.' are invalid."
When I hear someone sincerely say stuff that works out to "Software licenses don't matter, actually.", I strongly reduce my estimation of their ability to reason well and behave ethically. Does this matter? Probably not. There are many folks in the field who hold that sort of opinion, so it's relatively easy to surround yourself with likeminded folks. Do you hold these sorts of opinions? Fuck if I know. All I know about is what you wrote today.
Anyway. As I mentioned, you're late-career in what seems to be a significantly successful career, so your reputation absolutely doesn't matter, and all this chatter is irrelevant to you.
To quote from your essay:
"But if you’re a software developer playing this card? Cut me a little slack as I ask you to shove this concern up your ass. No profession has demonstrated more contempt for intellectual property.
The median dev thinks Star Wars and Daft Punk are a public commons. The great cultural project of developers has been opposing any protection that might inconvenience a monetizable media-sharing site. When they fail at policy, they route around it with coercion. They stand up global-scale piracy networks and sneer at anybody who so much as tries to preserve a new-release window for a TV show."
Man, you might not see the resemblance now, but if you return to it in three to six months, I bet you will.
Also, I was a professional musician in a former life. Given the content of your essay, you might be surprised to know how very, very fast and loose musicians as a class play with copyright laws. In my personal experience, the typical musician paid for approximately zero of the audio recordings in their possession. I'd be surprised if things weren't similar for the typical practitioner of the visual arts.
I think artists are very cavalier about IP, on average. Many draw characters from franchises that do not allow such drawing, and often directly profit by selling those images. Do I think that's bad? No. (Unless it's copying the original drawing plagiaristically.) Is it odd that most of the people who profit in this way consider generative AI unethical copyright infringement? I think so.
I think hypocrisy on the issue is annoying. Either you think it's cool for LLMs to learn from code and text and images and videos or you don't think any of it is fine. tptacek should bite one bullet or the other.
What about the millions of software developers who have never even visited a pirate site, much less built one?
Are we including the Netflix developers working actively on DRM?
How about the software developers working on anti-circumvention code for Kindle?
I'm totally perplexed at how willing you are to lump a profession of more than 20 million people all into one bucket and deny all of them, collectively, the right to say anything about IP. Are doctors not allowed to talk about the society harms of elective plastic surgery because some of them are plastic surgeons? Is anyone with an MBA not allowed to warn people against scummy business practices because many-to-most of them are involved in dreaming those practices up?
This logic makes no sense, and I have to imagine that you see that given that you're avoiding replying to me.
My favorite was suggesting that people select the programming language based of which ones LLMs are best at. People who need an LLM to write code might do that, but no experienced developer would. There are too many other legitimate considerations.
At the margin means that both languages, or frameworks or whatever, are reasonably appropriate for the task at hand. If you are writing firmware for a robot, then the LLM will be less helpful, and a language such as Python or JS which the LLM is good at is useless.
But Thomas's point is that arguing that LLMs are not useful for all languages is not the same as saying that are not useful for any language.
If you believe that LLM competencies are not actually becoming drivers in what web frameworks people are using, for example, you need to open your eyes and recognize what is happening instead of what you think should be happening.
(I write this as someone who prefers SvelteJS over React - but LLM's React output is much better. This has become kind of an issue over the last few years.)
Taking your react example, then if we we're a couple years ahead on LLMs, jQuery might now be the preferred tool due to AI adoption through consumption.
You can apply this to other fields too. It's quite possible that AIs will make movies, but the only reliably well produced ones will be superhero movies... (I'm exaggerating for effect)
Could AI be the next Cavendish banana? I'm probably being a bit silly though...
I'd argue that the Web development world has been choosing tooling based largely on popularity for like at least a decade now. I can't see how tooling selection could possibly get any worse for that section of the profession.
The argument is that we lose this diversity as more people rely on AI and choose what AI prefers.
Of course you could disagree with my prediction and that these big tech companies are going to build MASSIVE gpu farms the size of the Tesla Gigafactory which can run godlike AI where nobody can compete, but if we get to that point I feel like we will have bigger problems than "AI react code is better than AI solidjs code"
Feel free to substitute Y instead of Rust if you want, just I know that many people argue Rust is hard to use, so I feel the concreteness is a good place to start.
Is certainly true that at least some projects choose languages based on or at least influenced by how easy it is to hire developers fluent in that language.
That’s literally the strawman.
> Claiming that IP rights shouldn’t matter because some developers pirate TV shows?
I didn't see him claiming that IP rights shouldn't matter, but rather that IP rights don't matter in the face of this type of progress, they never have since the industrial revolution. It's hypocritical (and ultimately ineffectual) for software people to get up on a high horse about that now just to protect their own jobs.
And lest you think he is an amoral capitalist, note the opening statement of the section: "Artificial intelligence is profoundly — and probably unfairly — threatening to visual artists in ways that might be hard to appreciate if you don’t work in the arts.", indicating that he does understand and empathize with the most material of harms that the AI revolution is bringing. Software engineers aren't on that same spectrum because the vast majority of programming is not artisinal creative work, it's about precise automation of something as cheaply as possible.
Or this one:
> Blaming LLM hallucinations on the programming language?
Was he "blaming"? Or was he just pointing out that LLMs are better at some languages than others? He even says:
> People say “LLMs can’t code” when what they really mean is “LLMs can’t write Rust”. Fair enough!
Which seems very truthy and in no way is blaming LLMs. Your interpretation is taking a some kind of logical / ethical leap that is not present in the text (as far as I can tell).
That's irrelevant. Copyright and software licensing terms are still enforced in the US. Unless the software license permits it, or it's for one of a few protected activities, verbatim reproduction of nontrivial parts of source code is not legal.
Whether the inhalation of much (most? nearly all?) of the source code available on the Internet for the purpose of making a series of programming machines that bring in lots and lots of revenue for the companies that own those machines is either fair use or it's infringing commercial use has yet to be determined. Scale is important when determining whether or not something should be prohibited or permitted... which is something that many folks seem to forget.
I wonder this too, as someone who is entirely self-taught, when I started escaping “tutorial hell” was the hardest part of the journey, and took quite a bit of both encouragement and sheer willpower. Not sure I would have ever went beyond that if I had LLMs.
I worry for Juniors, and either we’ll need to find a way to mentor them past the vibe coding phase, or we hope that AI gets good enough before we all retire.
All AI is going to do is create a new class of programmer, such that the people who know the details will end up being more valuable.
Shakespeare was also popular because he was published as books became popular. Others copied him.
On the other hand, LLMs are highly nondeterministic. They often produce correct output for simple things, but that's because those things are simple enough that we trust the probability of it being incorrect is implausibly low. But there's no guarantee that they won't get them wrong. For more complicated things, LLMs are terrible and need very well specified guardrails. They will bounce around inside those guardrails until they make something correct, but that's more of a happy accident than a mathematical guarantee.
LLMs aren't a level of abstraction, they are an independent entity. They're the equivalent of a junior coder who has no long term memory and thus needs to write everything down and you just have to hope that they don't forget to write something down and hope that some deterministic automated test will catch them if they do forget.
If you could hire an unpaid intern with long term memory loss, would you?
It's clearly a deficiency. And that's why one of the next generations of AIs will have long term memory and online learning. Although even the current generation of the models shows signs of self-correction that somewhat mitigate the "random walk" you've mentioned.
But, seriously, while it's not an easy task (otherwise we'd have seen it done already), it doesn't seem to be a kind of task that requires a paradigm shift or some fundamental discovery. It's a search in the space of network architectures.
Of course, we are talking about something that hasn't been invented yet, so I can't provide ironclad arguments like with, say, fusion power where we know what to do and it's just hard to do.
There is circumstantial evidence though. Complex problem solving skills that evolved in different groups of species: homo, corvidae, octopoda. Which points at either existence of multiple solutions to the problem of intelligence or at not that high complexity of a solution.
Anyway, with all the resources that are put into the development of AI will see the results (one way or another) soon enough. If long-term memory is not incorporated into AIs in about 5 years, then I'm wrong and it's indeed likely to be a fundamental problem with the current approach.
If you could hire an army of unpaid interns with long term memory loss who work 24/7, would you?
The physics engine in the game Trackmania is deterministic: this means that you can replay the same inputs and get the same output; but it doesn’t mean the output always makes sense: if you drive into a wall in a particular way, you can trigger what’s called an uberbug, where your car gets flung in a somewhat random direction at implausibly high speed. (This sort of thing can lead to fun tool-assisted speedruns that are utterly unviable for humans.)
The abstractions part you mention, there’s the key. Good abstractions make predictable. Turn the steering wheel to the left, head left. There are still odd occasions when I will mispredict what some code in a language like Rust, Python or JavaScript will do, but they’re rare. By contrast, LLMs are very unpredictable, and you will fundamentally never be able to mentally model what it achieves.
Dunno. Money is probably going to be a huge incentive.
I see the same argument everywhere. Like animators getting their start tweening other peoples content. AI is great at tweening and likely to replace farms of juniors. But companies will need seniors to direct animation, so they will either have to pay a lot of money to find them or pay a lot of money to train them.
I think what you've said is largely true, but not without a long period of mess in between.
But unlike animation, where the demand for the art can just disappear. I don't think the demand for software engineer will disappear. Same thing with musician. Young engineers might just be jobless or on training mode for much longer period of time before they can make actual living money.
Good thing is, as far as I know, Kyoto Animation managed to avoid this issue by having in-house training, growing their own talent pools.
[1]: https://blog.sakugabooru.com/2023/03/31/the-long-quest-to-fi...
LLM's are so-so coders but incredible teachers. Today's students get the benefit of asking copying and pasting a piece of code into an LLM and asking, "How does this work?"
There's a lot of young people that will use LLM's to be lazy. There's also a lot that will use them to feed their intellectual curiosity.
When you're a college student, the stakes feel so high. You have to pass this class or else you'll have to delay graduation and spend thousands of dollars. You have to get this grade or else you lose your grant or scholarship. You want to absorb knowledge from this project (honestly! you really do) but you really need to spend that time studying for a different class's exam.
"I'm not lazy, I'm just overwhelmed!" says the student, and they're not wrong. But it's very easy for "I'm gonna slog through this project" to become "I'm gonna give it a try, then use AI to check my answer" and then "I'm gonna automate the tedious bits that aren't that valuable anyway" and then "Well I'll ask ChatGPT and then read its answer thoroughly and make sure I understand it" and then "I'll copy/paste the output but I get the general idea of what it's doing."
Sure, it's a different level, but it's still more or less the same thing. I don't think you can expect to learn how to code by only ever using LLMs, just like you can't learn how to code by only ever using intellisense.
Oh, GH Copilot was adding citing capabilities which was going to be in production around this time? Any progress on that front?
Thanks.
I'm a skeptic because the megacorps have decide ai slop is the future and its going to push us deeper into hypercapitalist dystopia.
Programming as a career is going to end up as a ai slop janny. You aren't going to be paid more if your job is easier or whatever, you're gonna get paid less (and then end up doing more atomised work) because "an ai can do it". That's the direction that everything seems to be heading.
> But AI is also incredibly — a word I use advisedly — important. It’s getting the same kind of attention that smart phones got in 2008, and not as much as the Internet got. That seems about right.
However, I just don't think the AI coding part is that interesting or future-thinking. We're seeing so much more progress in semantic search, tool calling, general purpose uses, robotics, I mean, DeepMind just won a Nobel for goodness' sake.
Don't get me wrong, I use ChatGPT to write all kinds of annoying boilerplate, and it's not too bad at recalling weird quirks I don't remember (yes, even for Rust). But hard problems? Real problems? Zero shot. Novel problems? No way.
> But I’ve been first responder on an incident and fed 4o — not o4-mini, 4o — log transcripts, and watched it in seconds spot LVM metadata corruption issues on a host we’ve been complaining about for months.
I'm going to go ahead and press (X) to doubt on this anecdote. You've had an issue for months and the logs were somehow so arcane, so dense, so unparseable, no one spotted these "metadata corruption issues?" I'm not going to accuse anyone of blatant fabrication, but this is very hard to swallow.
Listen, I also think we're on the precipice of re-inventing how we talk to our machines; how we automate tasks; how we find and distribute small nuggets of data. But, imo, coding just ain't it. Donald Knuth calls computer programming an art, and to rob humanity of effecting not just coding—but any art, I'd argue—would be the most cardinal of sins.
I have this problem with a lot of LLM miracle anecdotes. There’s an implication that the LLM did something that was eluding people for months, but when you read more closely they don’t actually say they were working on the problem for months. Just that they were complaining about it for months.
On the other hand, we've all probably had the experience of putting out a fire and wanting time to track down an issue only to be told to not bother since "everything is working now". Sometimes you spend months complaining about something because the people you're complaining to don't have the time to dive into an issue. Even if it would have taken mere hours for a human to hunt down the issue, someone still has to be given those hours to work on it. By contrast, copying a bunch of logs into an AI is nearly free.
Eh, I've worked on projects where because of un-revisited logging decisions made in the past, 1-10k error logs PER MINUTE were normal things to see. Finding the root cause of an issue often boiled down to multiple attempts at cleaning up logs to remove noise, cleaning up a tangentially related issues and waiting for it to happen again. More than one root cause was discovered by sheer happenstance of looking at the right subset of the logs at the right moment in time. I can absolutely buy that a system built for parsing large amounts of text and teasing patterns out of that found in minutes what humans could not track down over months.
I, knowing far less than him, would've had a much more elaborate prompt, and o3 would've proved a lot more competent/capable. Yet with my friend, since he knows so much already, and has such a high bar, he thinks the AI should be able to do a lot more with just a few basic words in a prompt... yet, for that same reason, he (understandably) doubts the inevitable sub-par output.
That's what makes all these debates about "Why are smart people doubting LLMs??" so pointless. The smarter you are, the less help you need, so the less prompting you do, the less context the model has, the less impressive the output, and the more the smart person thinks LLMs suck. With this logic, of course the smartest people are also the biggest skeptics!
It’s like half of the arguments are designed as engagement bait with logical consistency being a distant concern:
> If hallucination matters to you, your programming language has let you down.
This doesn’t even make sense. LLMs hallucinate things beyond simple programming language constructs. I commonly deal with allusions to functions or library methods that would be great if they existed, but the LLM made it up on the spot.
The thing is, the author clearly must know this. Anyone who uses LLMs knows this. So why put such a bizarre claim in the article other than as engagement bait to make readers angry?
There are numerous other bizarre claims throughout the article, like waving away the IP rights argument because some programmers pirate TV shows? It’s all so bizarre.
I guess I shouldn’t be surprised to scroll to the bottom and see that the author is a HN comment section veteran, because this entire article feels like it started as a reasonable discussion point and then got twisted into Hacker News engagement bait for the company blog. And it’s working well, judging by the engagement counts.
I think the author's point is your language (and more generally the tooling around it) should make this obvious, and almost all the AI agents these days will minimally run linting tools and clean up lints (which would include methods and library imports that don't exist) if they don't actively attempt to compile and test the code they've written. So you as the end user should (almost) never be seeing these made up functions.
> But “hallucination” is the first thing developers bring up when someone suggests using LLMs, despite it being (more or less) a solved problem.
really ? what is the author smoking to consider it a solved problem ? This statement alone invalidates the entire article in its casual irreverence for the truth.
I use copilot everyday, and I know where it shines. Please dont try to sell it to me with false advertising.
If it uses a function, then you can be sure that function is real.
Was this not clear? The explanation I'm paraphrasing is right in between the line Aurornis quoted and the line you quoted. Except for the crack at copilot that's up at the top.
Can you show me 1 PR put out by any agent in any open-source repo with wide usage ?
[1] https://www.reddit.com/r/ExperiencedDevs/comments/1krttqo/my...
[1]: https://github.com/Aider-AI/aider/blob/main/HISTORY.md#aider...
The problem is that LLM never performs consistently. It works when it works. It doesn't when It doesn't. No one knows exactly why, and no one can tell when it's gonna fail. For example, even to this day, GPT sometimes gives me wrong calculations, even when it is instructed to use calculator for that. Who knows why it ignores the instruction, nor why it can't reliably perform the addition of two integers. That really screws up with the automation.
Anyways, I'm really tired of skeptic-skeptics. I hate some people believe "half-full" is genuinely better than "half-empty". I refuse that idea completely. It's more about which context you're in. If you need exactly a cup of water, it's half-empty. If you are lucky to have some water, it's half-full. If you have a clear direction you want to go, you discover a lot of deficiencies. If you're there just enjoying the moment, yeah, you can just keep enjoying the new toy.
The problems I have with the stuff relating to MCP is that the tech around it is developing so fast that it's hard for outsiders to catch up with what the best working setup is, for example.
What would you do, for example, if you want to selfhost this?
- which models (qwen ai coder?)
- which api (with ollama? Bolt? Aider? Etc)
- how to integrate PRs with a local gitlab/gogs/forgejo instance? Do you need another MCP agent for git that does that?
- which hardware dependencies to run it?
I am currently trying to figure out how to implement a practical workflow for this. So far I'm using still a synchronous MCP agent setup where it basically runs on another machine in the network because I have a too unperformant laptop to work with.
But how would I get to the point of async MCP agents that can work on multiple things in my Go codebases in parallel? With the mentioned PR workflows so that I can modify/edit/rework before the merges?
The author makes a lot of claims and talks always about that their opponents in the argument are not talking about the same thing. But what exactly is the same thing, which is reproducible locally for everyone?
I think most people are using off the shelf paid solutions like cursor, windsurf, warp terminal, Claude code.
I'd love to get a set up with a locally hosted full version of deepseek.
- Germany, so I have crappy and unreliable internet (read as: 300kB/s or less pretty often due to overbooked cable internet connections).
- Can't share my code with the US (as a European) because I work in cyber defense.
- I want a reproducible work setup, and something that will survive 2 years and/or the startup exit of the proposed tools.
Videogame speed running has this problem solved. Livestream your 10x engineer LLM usage, a git commit annotated with it's prompt per change. Then everyone will see the result.
This doesn't seem like an area of debate. No complicated diagrams required. Just run the experiment and show the result.
People always say "you just need to learn to prompt better" without providing any context as to what "better" looks like. (And, presumes that my prompt isn't good enough, which maybe it is maybe it isn't.)
The easy way out of that is "well every scenario is different" - great, show me a bunch of scenarios on a speed run video across many problems, so I can learn by watching.
If I use LLMs to code, say a Telegram bot that summarise the family calendars and current weather to a channel - someone will come in saying "but LLMs are shit because they can't handle this very esoteric hardware assembler I use EVERY DAY!!1"
You will be writing CRUD operations and slapping together web apps on every level of experience. Even in (mobile) gaming there you're repeating the same structures as every game before.
Not a 100% of the time, but way more than 50%.
This thread has hundreds of comments where people are screaming that everyone needs to learn AI coding.
If it was such an edge would they not otherwise keep quiet?
Imagine that there was a serum that gives you superhuman strength only under specific conditions that you’re supposed to discover. Then there’s half room who screams that it should be banned, because it is cheating/fake/doesn’t work. And there’s another half room that swears by it, because they know how to utilize it properly.
You know it works and you don’t want to give up your secret sauce or make another half of the room stronger.
However real life does have illicit drugs that many people hype up and claim that they need.
Also real life has performance enhancement drugs that cause a host of medical issues.
Even drugs for medical necessity come with a list of side effects.
Let's all just muse some and imagine what the next cycle of this wheel will look like.
You could say it is a lack of imagination or not connecting the dots, but I think there is a more human reason. A lot of people don't want the disruption and are happy with the status quo. I'm a software engineer so I know how problematic AI may be for my job, but I think anyone who looks at our current state and the recent improvements should be able to see the writing on the wall here.
I for one am more curious than afraid of AI, because I have always felt that writing code was the worst part of being a programmer. I am much happier building product or solving interesting problems than tracking down elusive bugs or refactoring old codebases.
I’m not alone in this - there are tons of other examples of people showing how they use LLMs online; you just need to search for them.
The article provides zero measurement, zero examples, zero numbers.
It's pure conjecture with no data or experiment to back it up. Unfortunately conjecture rises to the top on hackernews. A well built study on LLM effectiveness would fall off the front page quickly.
When a LLM has been tamed to the point that I can define and request a PowerCLI programme, I can be fairly sure that AI is a solved thing.
i don't get it. It keeps making stuff up or gives me wrong stuff that merely compiles when i tell it signature doesn't exist.
Things have changed.
I'm not saying that AI can't make you productive, it's just that these claims are really hard to verify. Even the recently posted Cloudflare OAuth worker codebase took ~3 months to release (8 Mar - 20 May), producing a single file with >2k lines. Is that going to be harder to maintain than a codebase with a proper project structure that's easily parseable by a human?
Another thing I think people are missing is that serious LLM-using coders aren't expecting 100% success on prompts, or anything close to it. One of the skills you (rapidly) develop is the intuition for when to stop a runaway agent.
If an intern spun off hopelessly on a task, it'd be somewhat problematic, because there are finite intern hours and they're expensive. But failed agent prompts are nickel-denominated.
We had a post on the front page last week about someone doing vulnerability research with an LLM. They isolated some target code and wrote a prompt. Then they ran it one hundred times (preemptively!) and sifted the output. That approach finds new kernel vulnerabilities!
Ordinary developers won't do anything like that, but they will get used to the idea of only 2/3 of prompts ending up with something they merge.
Another problem I think a lot of skeptics are running into: stop sitting there staring at the chain of thought logs.
I think it has a lot to do with the type of work you are doing. I am a couple of years into a very small startup that has some actual technology built (as opposed to a really simple CRUD app or something).
When I am working on the front-end where things are pretty simple AI is a huge speed up. What it does VERY well it latch on to patterns and then apply those patterns to other things. If it has a couple of examples you can point it to and say "ok build that but over here" the newest revisions of Claude and Gemini are perfectly capable of building the whole thing end to end. Because it's a fairly repetitive task I don't have to spend much time untangling it. I can review it and pattern match against things that don't look right and then dive into those.
For a real example, I needed a page for a user to manually add a vendor in our platform. A simple prompt asking Claude to add a button to the page sent into a mode where it added the button, built the backend handler, added the security checks, defined a form, built another handler to handle the submitted data, and added it to the database. It even wrote the ACL correctly. The errors it introduced were largely around using vanilla HTML in place of our standard components and some small issues with how it attempted to write to the DB using our DB library. This saved me a couple of hours of typing.
Additionally if I need to refactor something AI is a godsend. Just today an underlying query builder completely changed its API and broke..everything. Once I identified how I wanted to handle the changes and wrote some utilities I was able to have Claude just find everything everywhere and make those same changes. It did it with like 90% accuracy. Once again that saved me a couple of hours.
Where it fails, usually spectacularly, is when we get to the stuff that is new or really complex. If it doesn't have patterns to latch onto it tries to invent them itself and the code is garbage. Rarely does it work. Attempting to vibe code it with increasingly more pointed prompts will often result in compiling code but almost never will it do the thing I actually wanted.
In these contexts it's usefulness is mostly things like "write a sql query to do X" which occasionally surfaces a technique I hadn't thought about.
So my experience is pretty mixed. I am definitely saving time. Most of it is typing time not thinking time. Which is like 1/3 of my average day. If I had to guess I am somewhere in the neighborhood of 30-40% faster today than I was in 2019. Notably that speed up has allowed me to really stretch this funding round as we are well past the phase where we would have typically hired people in my past companies. Usually someone relatively mid-level to take over those repetitive tasks.
Instead it's just me and a non-technical founder going along super quickly. We will likely be at a seed round before anyone new comes in.
This is incorrect. The library was part of the MCP framework we launched on March 25 -- the same month development began:
https://blog.cloudflare.com/remote-model-context-protocol-se...
Indeed the speed with which we were able to turn this around was critical to us, as it allowed us to have our Remote MCP framework ready immediately when the spec was finalized, which led to quite a few companies building MCP servers on Cloudflare: https://blog.cloudflare.com/mcp-demo-day/
I'm not an AI maximalist. I still write lots of code by hand, because there's a lot AI isn't good at. It's good at boilerplate and straightforward code, it's bad at refactoring deep systems. But AI assistance was undeniably a huge win for the OAuth project. There's no way I could have written that library by hand so quickly. (Maybe when I was 25 and had no responsibilities, but these days I have like 1 solid day a week to actually write code...)
commit 3b2ae809e9256d292079bb15ea9fe49439a0779c Author: Kenton Varda <kenton@cloudflare.com> Date: Thu Feb 27 17:04:12 2025 -0600
Have Claude write an OAuth provider implementation.
As someone who thinks modern copyright law is unfit for purpose, and more specifically that software shouldn't be copyrightable at all, my main complaint about LLMs is the same as Chinese spying and IP theft: I only care because they don't share.
I'm perfectly fine using open-weights LLMs[0] to do things LLMs are good at. But I would never trust using OpenAI or Perplexity's SaaSS models as a replacement for my own skills. Half the reason they put things behind a server is so they can spy on you[1], the other half is so they can promise a revenue stream to the people they plan to sell the company to so they can enshittify it. The only enshittification-resistant software is software people can't charge you for.
All the other things this blog post is complaining about are real reactions to real harms, even in the FOSS community. People use LLMs as skilled labor substitute and get burned, while creating a spam problem for everyone else. In fact, that's the real replacement threat for artists: diffusion models don't so much outperform human artists as much as they convince people not to pay for art.
[0] I don't give two flying fucks about the license terms.
[1] The judge in the NYT lawsuit is even ordering spying for evidentiary reasons
Conversely, if I say, "Put the button in the upper right corner exactly like the existing button in the upper left," I'll get a button in the upper right, but not exactly like the upper left button.
This is a come back in 5 years when it works sort of thing.
1. Input is an 256x256 pixel elevation map stored as a greyscale .png file, and a minimum and maximum elevation. A pixel value of 0 corresponds to the minimum elevation, and a pixel value of 255 corresponds to the maximum elevation. Read in the .png file and the elevation limits. Then construct a 256x256 floating point array of heights. From that array, construct a triangle mesh with X and Y dimensions 0..255. Perform a mesh reduction operation on the triangle mesh to reduce the number of triangles. Mesh reduction must not generate holes in the mesh. From the reduced mesh, generate a glTF file where the UV parameters run from 0.0 to 1.0 along the X and Y axes.
2. Given four glTF files constructed as above, corresponding to four quadrants of a larger square, construct a single 511x511 mesh which combines all four input meshes to cover a larger area. Because the input meshes are 0..255, not 0..256, there will be gaps where the four quadrants meet. Fill those gaps with reasonable triangles. Perform a mesh reduction as above. From the reduced mesh, generate a glTF file where the UV parameters run from 0.0 to 1.0 along the X and Y axes.
Rust code is preferred; Python code is acceptable.
So, what service should I sign up for?
On #2, it set up the problem, but bailed on the gap filling part with
# (This is a complex step; see note at bottom.)
That's not surprising, since I didn't tell it how to solve the problem. Can any of the premium systems do #2?I did Qwen3-256B (a free model, but you'd need a host for something that large, probably. I used Kagi) and Claude Code.
Curious how these look to you.
I like the Python code which outsourced the hard stuff to existing libraries. The odds of that working are higher.
Can you tell it to use the "glam" crate for the vectors, instead of writing out things like vector length the long way?
(We now need standardized low-level types more than ever, so the LLMs will use them.)
https://gist.github.com/cpsquonk/348009eb7c83a7d499ff5ae70d7...
However, if it took creativity or real analysis, I had to throw the big models at it with a carefully-written prompt. You want to use their recent, best, big models. I used to include precise descriptions of data structures and function API's, too, which helped a lot. From there, tweak the description until you find pieces that get the result.
Another thing. It will eventually synthesize code that's close but not quite it. I found that asking it to make small changes, one at a time, would help for a while. Ex: "Modify that code to do X" or "...X in part Y." It eventually started hallucinating in loops. I always had to fix some amount of what it generated but it still saved time.
I hope that helps.
Edit: took a while but here is the code generated by DeepSeek: https://gist.github.com/omarabid/b30195011abf570fde2a084cd11... If you have an example file I could give it a test run.
LLMs seem to be good at finding obscure library functions from a description of the function needed, something where where keyword searches aren't too helpful. Previously, searching Stack Overflow helped, but nobody is posting there any more. Good reference manuals are almost historical artifacts now.
Github Copilot seems to be able to do it. Mostly because it knows how to call Meshlab. The documentation for Meshlab is hard to search, and having an LLM find the correct function is itself helpful.
Also, it looks like quadric decimation has improved since the last time I looked at it. That algorithm, if pushed too hard, will generate meshes with holes. Now there are options ("preservetopology") which supposedly prevent that. I last looked at that a few years ago, and the only mesh decimator which handled that correctly was the one inside Unreal Engine Editor.
Now I have to test all this stuff, of course.
Supposedly, you know something about this problem. If you use an LLM to craft a solution, that's probably fine since you'd know how well you did.
I know nothing about this problem. I could use an LLM to delude myself into thinking that I do. Should my boss be very happy about that? They probably shouldn't
This reads right out of the PR release for these AI companies and their agentic coding solutions.
Me: So what is your assessment of the discussion at Hacker News?
Claude: The Hacker News discussion strikes me as much more grounded than the blog post itself. What I find compelling is that many commenters are sharing specific technical experiences rather than making broad ideological arguments. When someone describes spending hours fighting with Claude agents that keep refactoring code into spaghetti, that's concrete data about the current state of these tools. ...
Me: Well, what about you, Claude? You are made by a company whose president has recently made some public statements saying that, you know, 50% of white collar jobs will be replaced soon. And the company, Anthropic, also is very aggressively developing and selling tools to help with AI coding. What are your thoughts about that?
Claude: That's a really fair question to ask me directly. I think there's an obvious tension in my position here since I'm built by a company that has clear financial incentives around AI adoption, including coding tools. When I look at my own capabilities honestly, I can see both sides of this. I'm genuinely useful for certain coding tasks like explaining unfamiliar code, generating boilerplate, or helping debug specific problems. I've helped people learn new programming languages faster than they might have otherwise. But I'm also clearly not at the level where I could replace an experienced software engineer working on complex system design or novel problem solving. ...
[1] https://claude.ai/share/c934e67c-f132-43e8-8c31-ab238b5be25a
It’s not quite as natural a back-and-forth as with ChatGPT’s or Gemini’s voice mode on mobile, but maybe that’s for the best.
I tried a test later in which I gave it a research paper—Sakana AI’s latest—and asked it to explain it to me. When it used a term I didn’t understand, I just tapped on the screen and asked for an explanation. That worked quite well.
> The AI has suggested a solution, but the added code is arguably useless or wrong. There is a huge decision space to consider, but the AI tool has picked one set of decisions, without any rationale for this decision.
> [...]
> Programming is about lots of decisions, large and small. Architecture decisions. Data validation decisions. Button color decisions.
> Some decisions are inconsequential and can be safely outsourced. There is indeed a ton of boilerplate involved in software development, and writing boilerplate-heavy code involves near zero decisions.
> But other decisions do matter.
(from https://lukasatkinson.de/2025/net-negative-cursor/)
Proponents of AI coding often talk about boilerplate as if that's what we spend most of our time on, but boilerplate is a cinch. You copy/paste, change a few fields, and maybe run a macro on it. Or you abstract it away entirely. As for the "agent" thing, typing git fetch, git commit, git rebase takes up even less of my time than boilerplate.
Most of what we write is not highly creative, but it is load-bearing, and it's full of choices. Most of our time is spent making those choices, not typing out the words. The problem isn't hallucination, it's the plain bad code that I'm going to have to rewrite. Why not just write it right myself the first time? People say "it's like a junior developer," but do they have any idea how much time I've spent trying to coax junior developers into doing things the right way rather than just doing them myself? I don't want to waste time mentoring my tools.
Coming at this from a computer-science or PLT perspective, this idea of an "abstract, repeatable meta-boilerplate" is exactly the payoff we expect from language features like strong type systems. Part of the point of rigorous languages is to create these kinds of patterns. You had total expressiveness back in assembly language! Repeatable rigor is most of the point of modern languages.
Not everyone is just cranking out hacked together MVPs for startups
Do you not realize there are many many other fields and domains of programming?
Not everyone has the same use case as you
There are obviously still things it can’t do. But the gap between “I haven’t been able to get a tool to work” and “you’re wrong about the tool being useful” is large.
Oh, like these set of PRs on the dotnet repo? https://news.ycombinator.com/item?id=44050152
Now here’s the fun part: In a really restrictive enterprise environment where you’ve got unit tests with 85% code coverage requirements, linters and static typing, these AI programming assistants actually perform even better than they do when given a more “greenfield” MVP-ish assignment with lots of room for misinterpretation. The constant “slamming into guardrails” keeps them from hallucinating and causes them to correct themselves when they do.
The more annoying boxes your job makes you tick, the more parts of the process that make you go “ugh, right, that”, the more AI programming assistants can help you.
If one copy-pastes a routine to make a modified version (that’s used), code coverage goes UP. Sounds like a win win for many…
Later, someone consolidates the two near identical routines during a proper refactoring. They can even add unit tests. Guess what? Code coverage goes DOWN!
Sure, having untested un-executed code is a truly horrible thing. But focusing on coverage can be worse…
What is which this new paradigm where we act like everything is easily measurable and every measure is perfectly aligned with what we want to measure. We know these things aren't possible. It doesn't take much thought to verify this. Do you believe you are smart enough to test all possible issues? No one is. There'd be no CVEs if you could and we'd have solved all of physics centuries ago
That's what libraries and frameworks are here for. And that's why no experienced engineers consider those an issue. What's truly important is the business logic, then you find a set of libraries that solves the common use cases and you write the rest. Sometimes you're in some novel space that doesn't have libraries (new programming language), but you still have specs and reference implementation that helps you out.
The actual boilerplate is when you have to write code twice because the language ecosystem don't have good macros à la lisp so you can invent some metastuff for the problem at end. (think writing routers for express.js)
That's not what I see the parent comment saying. They're not saying that LLMs can't use frameworks, they're saying that if you have rote solutions that you are being forced to write over and over and over again, you shouldn't be using an LLM to automate it, you should use a framework and get that code out of your project.
And at that point, you won't have a ton of boilerplate to write.
The two sides to this I see online are between the people who think we need a way to automate boilerplate and setup code, and the people who want to eliminate boilerplate (not just the copy-paste kind, but also the "ugh, I've got to do this thing again that I've done 20 times" kind).
Ideally:
> a common set of rote solutions to isomorphic problems
Should not be a thing you have to write very often (or if it is, you should have tools that make it as quick to implement as it would be to type a prompt into an LLM). If that kind of rote repetitive problem solving is a huge part of your job, then to borrow your phrasing: the language or the tools you're using have let you down.
I bit the bullet last week and tried to force myself to use a solution built end to end by AI. By the time I’d finished asking it to make changes (about 25 in total), I would’ve had a much nicer time doing it myself.
The thing in question was admittedly partially specified. It was a yaml-based testing tool for running some scenarios involving load tests before and after injecting some faults in the application. I gave it the yaml schema up front, and it did a sensible job as a first pass. But then I was in the position of reading what it wrote, seeing some implicit requirements I’d not specified, and asking for those.
Had I written it myself from the start, those implicit requirements would’ve been more natural to think about in the progression of iterating on the tool. But in this workflow, I just couldn’t get in a flow state - the process felt very unnatural, not unlike how it would’ve been to ask a junior to do it and taking 25 rounds of code review. And that has always been a miserable task, difficult to force oneself to stay engaged with. By the end I was much happier making manual tweaks and wish I’d have written it myself from the start.
There's always some people that will resist to the bitter end, but I expect them to be few and far between.
And if we accept that inevitability, it becomes a self-fulfilling prophecy. The fact that some people _want_ us to give in is a reason to keep resisting.
If the AI agent future is so inevitable, then why do people waste so much oxygen insisting upon its inevitability? Just wait for it in silence. It certainly isn't here yet.
“Isomorphic” is a word that describes a mapping (or a transformation) that preserves some properties that we believe to be important.
The word you’re looking for is probably “similar” not “isomorphic”. It sure as hell doesn’t sound as fancy though.
I guess in this case the morphism is the similar or same prompt to generate f g h j.
Assuming something like "a REST endpoint which takes a few request parameters, makes a DB query, and returns the response" fits what you're describing, you can absolutely copy/paste a similar endpoint, change the parameters and the database query, and rename a couple variables—all of which takes a matter of moments.
Naturally code that is being copy-pasted wholesale with few changes is ripe to be abstracted away, but patterns are still going to show up no matter what.
It'll even write basic unit tests for your CRUD API while it's at it.
And the less instantly I can write it, the more petty nuances there are to deal with—things like non-trivial validation, a new database query function, a header that I need to access—the more ways an LLM will get it subtly wrong.
If I treat it as more than a fancy autocomplete, I have to spend all my time cleaning up after it. And if I do treat it as fancy autocomplete, it doesn't save that much time over judicious copy/pasting.
The idea that you can't specify the load bearing pillars of your structure to the AI, or that it couldn't figure them out by specifying the right requirements/constraints, will not age well.
I use R a little more than I should, given the simplicity of my work. Claude writes better R quicker than I can. I double check what it's doing. But it's easier to double check it used twang correctly than spend five trying to remember how to use the weird package that does propensity scoring [1].
I'm sure data analysis will still sort of be a thing. But it's just not as useful anymore in the form of a human being for most commercial applications at sub-enterprise scale.
[1] https://cran.r-project.org/web/packages/twang/index.html
But English is a subjective and fuzzy language, and the AI typically can't intuit the more subtle points of what you need. In my experience a model's output always needs further prompting. If only there were a formal, rigorous language to express business logic in! Some sort of "programming language."
I disagree on the "can't". LLMs seem no better or worse than humans at making assumptions when given a description of needs, which shouldn't be surprising since they infer such things from examples of humans doing the same thing. In principle, there's nothing preventing a targeted programming system from asking clarifying questions.
> In my experience a model's output always needs further prompting.
Yes, and the early days of all tooling were crude. Don't underestimate the march of progress.
> (from https://lukasatkinson.de/2025/net-negative-cursor/)
looks inside
complaining about Rust code
The comment on the right says it'll help the user with protocol versioning. This is not how you do that...
Issues like that are simple and just create debt. Sure, it "works" now but who writes code not knowing that we're going to change things next week or next month. It's the whole reason we use objects and functions in the first place!
These aren't Rust-specific syntax foibles. It's not a borrow-checker mistake or anything. These are basic CS fundamentals that it's thoughtlessly fumbling.
I work in finance, I have for almost 20 years now. There are things in finance you do once every 5 years, like setting up a data source like Bloomberg in a new programming language. Now you know from the last time you did it that it’s a pain, you need to use a very low level api, handling all the tiny messages yourself, building up the response as it comes from the source in unordered packets. It’s asynchronous, there is a message queue, and what I specialize in is maths.
Now I could spend hours reading documents, putting crap together, and finally come up with some half baked code that ignores most possible error points.
Or I could use ChatGPT and leverage the fact that hundreds of implementations of the same module exist out there. And make something that just works.
That is the first ever coding question I asked an LLM and it literally saved me days of trial and error for something where my added value is next to zero.
Similarly I use LLMs a lot for small tasks that are in fact fairly difficult, and that don’t add any value to the solution. Things like converting data structures in an efficient way using Python idioms, or JavaScript 2023 features, that there is no way I can keep up with.
Like everything else about the "GenAI" fad, it boils down to extractively exploiting goodwill and despoiling the commons in order to convert VC dollars into penny-shavings.
Bizarrely though, it seems to be limited to grep for the moment, doesn't work with LSP yet.
ChatGPT is also pretty usefull - especially for naming things.
However, on a global scale, I would compare the shift towards 'vibe coding' and massive LLMs use to the one we got when people doing the coding don't even know what Assembly even is. Current state of software is that it's incredibly bloated and slow (and I do love python). Doesn't matter that much since we've got massive improvements in hardware, but still. In 5 years, instead of not really realizing that we're writing slow and bloated code, we will not really realize we're writing incorrect code.
I don't know anything about web developement, but in the environements I've seen during my career - where most code is quite critical and a stupid mistake, one that an LLM could easily hallucinate, can cost a lot - I don't see it replacing humans anytime soon. It's a pretty good and usefull tool though.
I've never merged anything with an unclear copyright to main, let alone if I positively suspected that it had been plagiarized.
> Cut me a little slack as I ask you to shove this concern up your ass. No profession has demonstrated more contempt for intellectual property.
Every place I've been employed was nearly paranoid about copyright, with rules in place about never copying and pasting anything copyrighted into the company code base.
All the major open source distros and projects, likewise.
There are some bad apples out there, like, oh, Sam Altman, who wants massive plagiarism to be fair use.
So, au contraire, you can shove your contemptuous blog up your ass, buddy.
My mind doesn't develop a mental model of that code, I don't know where the relevant parts are, I can't quickly navigate through it and I have to reach the LLM for every small change.
Which is why I like Copilot style editing more than agents as a working model but agents are just so much more powerful and smarter thanks to everything available to them.
If you use cursor, cursorrules can be a good way of doing this.
I will keep using LLMs for 1 week greenfield hobby projects that I don't plan to revisit. But no matter how good LLMs get, I will never use them in my dailyjob, otherwise, I risk losing touch with the codebase.
It's a good, very digestible paper, in my opinion; well worth the read.
In 2025 I am about 800-900 hours deep into using Cursor, blowing through 500 credits each 2 days (500 credits = 20$). I also organize a local "vibe code" workshop and meetup, with people from all sorts of experience levels, from no coding experience, to twice my senior. And what I am seeing is that what separates those from those who are good at using AI and those who are not, generally is a level of rigid stubbornness.
I can't wait for time to do its course here and those people who now are "all nuts" to slowly see programming of the past become an old relic.
Developers thinking AI is a fad is like a left fielder thinking about if they should stick their hand out to catch the ball or not. You can’t think about this.
Anyway, I find myself in stasis trying to actually see what to build given the reality of the power of this api. The South Park episode of “The Simpsons did it” is an apt episode. We are very much at the point where the continuous question will be “Well the LLM can do it”.
If you let it run in the "write my code for me" mode, and ask it to fix some mistake it made, it will always add more code, never remove any. In my experience, in the end the code just ends up so brittle that the LLM will soon get stuck at a point that it never manages to overcome some mistake, no matter how many times it tries.
Has anyone managed to solve this?
That said, I'm super impressed by how quickly I've been able to get this far with something pretty niche and complex in places.
Here's what I've learned. There are million AI Bros on youtube who have the ultimate solution but they all boil down to a few basic things.
Make rules: make them before you get started and continue updating them as you go.
Lots of tiny modules: Push things into small bite sized bits of code with lots of interface documentation. This feels a bit unnatural when the code is prototype quality.
Documentation is key: the youtubers will often create detailed planning and specification documents in advance. I done this and it's hit and miss, what I've found works is explain what you want to build to an llm and have it create extremely concise documentation, then a rough checklist for implementation and then evolving these documents in cursor as I go.
This leads to a kind of plan --> document, implement --> document, run tests --> document workflow on each reasonable chunk of the design being the most smooth.
Don't let cursor ever see inside big datasets or huge folder trees. In fact keep cursor in a few folders writing source code and nothing else. To do this, early on build tools that can go and get information or make safe edits for cursor from those datasets without it attempting direct access.
The current project has a tools for working with the primary datasets, a job manager, a process manager, a log manager, a workflow manager, these all have functions for querying. Cursor is instructed to use these. It naturally doesn't want to but if you tell it 7 times out 10 it will :)
No mess: Cursor will like to make lots of random tests and processes along the way. instruct it to use a /debug/tests/ folder and wipe it clean often. force it to make 'production' code by having it be registered with the workflow manager and made allowable by process manager. This lets it play around for a while and get something up and running with it's weird little scripts and then implement it for real elsewhere and using the proper framework. The workflow manager needs to have documentation on how the script is used and the process manager needs to the source to be in a particular place with a particular set of standard interfaces.
you might say this is a lot of plumbing, but what isn't these days and it's not like I'm maintaining it - right ;)
Cursor is always about to go insane any minute, or can't remember what's going on. So most of it is about keeping things simple enough for it to focus on the bit it's working on and nothing else.
Regularly for work we have massive code bases written in our company style and built using our business processes. This kind of thing is absolutely not the place for Cursor. I can imagine a company setup from scratch to use Cursor but our current products and practices woud just make it impossibly time-consuming.
When the model has the wrong solution in its context, it will use it when generating new code, and my feeling is that it doesn't handle the idea of "negative example" very well. Instead, delete the bad code and give it positive examples of the right approach.
I doubt that. First, human attention and speed is very limited. Second, when I see something, I am already predisposed to assume that it is right (or at the very least, my subsequent inquiries are extremely narrow and anchored around the solution I have seen presented to me.)
Code from LLMs that looks right, clean and even clever poses as competence but are prone to hallucinations and business logic errors. In the short term, these changes will pass through due to their appearance but contain more issues than a human would have with the same code. In the medium term, we just lose that signal - the assumptions we can make about the authors state of mind and comprehension. It’s already incredibly hard to distinguish solid points from nonsense, when the nonsense is laundered by an LLM.
You do a few iterations until code runs, review carefully but notice a bug. So you do another iteration and 40% of code changes. Now you need to review again but you need to understand how the changes fit in.
Repeat this a few times and it becomes very tiring.
Ultimately you can't trust them not to do stupid shit. Your tests fail and you tell it to stop that? Sure, we can just catch those exceptions and the tests pass, etc. You get pissed off an tell it to FIX the CODE so the tests pass and the cycle continues.
It's like working with a potentially gifted moron.
It kinda invalidates the entire premise because it builds on this false foundation of “look how useful it is to a mere plebeian” and the skills actually required to sift through LLM garbage are what puts you in the top 5% of programmers.
Most senior developers cannot conduct useful code review and spot errors. In every company I’ve been at, there were only 1 in 20 people that would give the useful in-depth reviews required to pick up LLM bullshit.
This style of engagement bait is brutal and it severely overshadows other nuggets of wisdom of how LLMs can be useful.
This is a thoughtful article. Thanks `tptacek
My LLM use is: 1 - tedious stuff; web pages interacting with domain back end. 2 - domain discovery.
In a recent adventure, I used Claude 4 to tease out parameters in a large graph schema. This is a combination of tedium and domain discovery (it's not my graph and I'm not a domain expert). In the first day, Claude uncovered attributes and relations no other LLM or Google search uncovered. And it worked!! The next day, I allowed it to continue. After a bit, results didn't pass the sniff test.
I checked into details of Claude's thinking: it decided to start making up schema attributes and inventing fallback queries on error with more made up attributes. It was "conscious" of its decision to do so. By the time I caught this, Claude had polluted quite a bit of code. Sure, plenty of well placed git commits helped in rolling back code...but it's not quite that simple..over the many git commits were sprinkled plenty of learnings I don't want to toss. It took another two days of carefully going through the code to pull out the good stuff and then roll things back. So now I'm at day five of this adventure with cleaned up code and notes on what we learned.
I suspect continual improvements on tooling will help. Until then, it's a short leash.
If LLMs couldn't do anything else then that alone would still warrant an invention of a century sticker.
With the help of the agent, I was able to iterate through several potential approaches and find the gaps and limitations within the space of an afternoon. By the time we got to the end of that process the LLM wrote up a nice doc of notes on the experiments, and *I* knew what I wanted to do next. Knowing that, I was able to give a more detailed and specific prompt to Claude which then scaffolded out a solution. I spent probably another day tweaking, testing, and cleaning up.
Overall I think it's completely fair to say that Claude saved me a week of dev time on this particular task. The amount of reading and learning and iterating I'd have had to do to get the same result would have just taken 3-4 days of work. (not to mention the number of hours I might have wasted when I got stuck and scrolled HN for an hour or whatever).
So it still needed my discernment and guidance - but there's no question that I moved through the process much quicker than I would have unassisted.
That's worth the $8 in API credit ten times over and no amount of parroting the "stochastic parrot" phrase (see what I did there?) would change my mind.
I think pro-AI people sometimes forget/ignore the second order effects on society. I worry about that.
(Asking online was a possibility, but instead of helpful answers, insults for being newb was the standard response)
With a LLM I would have had a likely correct answer immediately.
And yes, yes what if it is wrong?
Well, I was also taught plenty of wrong stuff from human teachers as well. I learned to think for myself. I doubt anyone decently smart who now grews up with those tools, think they are flawless.
In the end, you are responsible for the product. If it works, if it passes the tests, you succeeded. That did not change.
Hypothetically, a solution to a problem that preoccupied you for days would translate into a more stable and long-lasting neuron configration in your brain (i.e. be remembered) than a solution to a problem that preoccupied you only for the time taken to type the prompt in.
But I don't have the time and energy to figure everything out on my own and I stopped learning many things, where some useful hints in time likely would have kept the joy for me to master that topic.
So it is probably about the right balance.
Assuming you're literate, there's no age or skill level at which it's necessary to get stuck churning on beginner-level questions. The option to RTFM is always available, right from the start.
To this day, readiness to RTFM (along with RTDS: read the damn source) is the biggest factor I can identify in the technical competency of my peers.
Well, I guess I am, too, but I still see great value in asking specific questions to competent persons.
Or don't you think asking teachers/instructors questions is helpful?
I feel weird when I read about people needing support. Maybe there is something wrong with me.
I know I had mostly bad teachers and am largely a autodidact myself. But the few good teachers/instructors I had, were really helpful for my learning progress.
A teacher can be a unique resource, but asking the teacher is often more of a reflexive shortcut than the thoughtful use of a unique resource.
I think use of LLMs (like StackOverflow before them) are more likely to discourage people from seriously or patiently reading documentation than they are to act as a stepping stone to a habit of more serious inquiry for most people.
I do think it's entirely plausible that a lot of people who otherwise would have wanted to learn more will grow up getting used to instant results and will simply not do anything the LLM can't do or tell them. Kind of similar to how my social media addicted brain gets antsy if it goes more than an hour without a fast dopamine hit (hence me being on HN right now...).
I agree that this is a concern, and I even worry about it for myself. Did I miss the opportunity to add another brick to the foundation of my expertise because Claude helped me out? Would I be marginally better at solving the next problem if I'd worked through the week I saved?
Even if the concern isn't some specific knowledge I'd have gained - did I lose out on a few "reps" to build grit, determination? Am I training myself to only like easy solutions that come out of Claude? Are there problems I won't want to solve because they're too difficult for this new "augmented" workflow?
I don't know the answers - I can only say that I do care, and at the very least I'm aware that there are new dynamics affecting my work and expertise that are worthy of consideration.
And, unfortunately, a lot of friction from not having access to information in the first place. I've read a bunch of docs from people talking to Glean in order to explore a new topic; if it's a topic I'm actually very familiar with then four out of five times it is somewhere from misleading to catastrophically wrong. Any internal terminology that doesn't match the common usage outside of our organization poisons the whole session and it will make up things to join the meanings together, and the prompter is none the wiser.
I trust AI only as a gap filler in domains that I'm already an expert in or where there's little internal context, anything else is intellectual suicide.
Backpedling is a massive inefficiency.
A better way is the single clean step approach.
Use the largest LLM you can. Have it generate a single output for one update.
If that update has logical errors or dropped anything you asked for restart, refine, narrow until it does.
It's quite hard to plan each step right but the level and conplexity you can get to is far higher than an agent.
Agents are much better at the shallow/broad problems.
Large LLMs are exponentially better deep/narrow problems.
If you really believe in the power of LLMs then it’s time to wake up. The writing is on the wall. Automation the workforce further into precarious jobs.
The idea that tech workers can’t stop so-called “progress” is at best a dumb self-fulfilling prophecy. Our workplaces depend on us. We have the power to put the brakes on whatever our employers are up to by organizing and striking.
Tech workers should be organizing to prepare for the profit-taking moves management has in store for us as the tech gets better and better. If LLMs really live up to their potential, It’s just going to get worse from here.
They keep bragging about how many people are going to lose their jobs and they mean us. They’re bragging about firing us! It’s foolish for us to sit idly by while we are the ones who make them their profits.
You have the power to do that in your own isolated environment, but it's a delaying tactic at best.
This whole thing is as inevitable as TV, smartphones, airplanes, tanks or guns. Everyone will use them, because they're so much better than anything that came before that there's no competition for the same use cases.
I think you think this is going to help tech workers, but raising the cost of employing humans is only going to incentivize companies to adopt AI faster.
You should do the opposite, and tax AI use. Though that's probably rife with potential issues too. I think as umemployment increases we're going to have to make a bad decision somewhere... and I'm guessing taxing AI is the lesser evil.
The time for tough decisions is today. There is no interest from the companies that control our politicians in supporting taxes, basic income, or whatever other policy proposal people think will occur after AI. Worker leverage will only be diminished after mass layoffs. We will end up in a situation of social unrest. To pull out of that workers will be forced to organize to increase negotiating power. Sooner is better than later, if we can realize it’s in our interests to do so.
The problem is capitalism not LLMs. Fighting against this progress just leaves us stagnant as a species.
Perhaps AI will lead to the revolution where automation frees the average person from having to toil in a factory or a cubical farm instead of trying to achieve infinite growth in a closed system.
> Are you a vibe coding Youtuber? Can you not read code? If so: astute point. Otherwise: what the fuck is wrong with you?
> You’ve always been responsible for what you merge to main. You were five years go. And you are tomorrow, whether or not you use an LLM.
Loved this bit. In a professional setting, any code — AI-written or not — should be read by at least two people, the submitter and the reviewer.
This really doesn't accord with my own experience. Using claude-code (esp. with opus 4) and codex (with o3) I've written lots of good Rust code. I've actually found Rust helps the AI-pair-programming experience because the agent gets such good, detailed feedback from the compiler that it can iterate very quickly and effectively.
Can it set up great architecture for a large, complex project from scratch? No, not yet. It can't do that in Ruby or Typescript either (though it might trick you by quickly getting something that kinda works in those languages). It think that will be a higher bar because of how Rust front-loads a lot of hard work, but I expect continuing improvement.
There are a lot of v0.x.x crates in Rust with frequent updates and breaking changes. As LLMs "learn" from code in the wild, they are exposed to different way to use the same crate and thus fail to produce working code "from memory".
But as soon as you explain how to use it with the latest syntax, they are able to fix the code.
I expected better from Thomas.
However I've tried coding agents, and well they suck... for me. Nearly all of the code I write these days is performance sensitive, as well as design sensitive (making abstractions for others to use), and not close to what could be considered boilerplate. The boilerplate I do write usually takes a trivial amount of time to write by hand. I'll keep trying whatever is new in AI every few months, I certainly don't want to be left behind, but so far that strategy hasn't left me optimistic.
> but the code is shitty, like that of a junior developer
It's worse than a junior. Juniors learn and grow, if they're a good junior it's often fairly rapid and self directed. Juniors can also ask for guidance or help, this is something maybe agents will be capable of in the future, but I could see it going pretty poorly for reasons tied to the first point about not learning. While LLMs contain a ton of information, more than a junior could, the junior is a better reasoning machine; I don't consider LLM "reasoning" to be anything close to the meatbag version.
> but the craft
Craft matters, it's important to enjoy and take pride in your work. If you enjoy managing AI agents that's fine, lots of programmers end up on the manager career path, but it's not for everyone.
For me, a highly productive but boring use of LLMs for code is that they excel at providing midwit “best practice” solutions to common problems. They are better documentation than the documentation and can do a lot of leg work e.g. Linux syscall implementation details. My application domains tend to require more sophisticated solutions than an LLM can provide but they still save a lot of rote effort. A lot of software development exists almost entirely in the midwit zone.
Much more interesting, they are decent at reducing concepts in literature to code practice for which there are no code examples. Google and StackOverflow turn up nothing. For example, I’ve found them useful for generating specialized implementations of non-Euclidean computational geometry algorithms that don’t really exist in the wild that I’ve ever seen. This is a big win, it literally turns months of effort into hours of effort.
On the other hand, I do a lot of work with algorithms that don’t exist in literature, never mind public code, with extremely performance-engineered implementations. There is an important take away from this too: LLMs are hilariously bad at helping with this but so are human software developers if required to do the same thing with no context.
Knowledge for which there is little or no training data is currently a formidable moat, both for LLMs and humans.
Getting good code is still a pain with chat, but you can go a fair way by reducing the context as much as possible and only asking for core functions (no point wasting hundreds of tokens on an arg parser). I'd like to play with Cursor or Zed a bit more.
Debugging sometimes works. 4.5 often figures out errors in one, 4o often struggles and spits out suggestions like it's a customer support agent (it also falls into repetition fairly often if you tell it "no, so what else could it be?"). You really have to work to convince it up front that you're not an idiot.
Neither helps with the hardest and most time consuming problems faced by software engineers: solving problems that have never been (publicly) solved before, making hard decisions based on intuition about what will be necessary in the future, understanding the interactions between broad webs of dependency in large codebases, and working with proprietary domain knowledge.
The cost is a pretty obvious indication that it's nowhere near the capability of an engineer. A robotic worker that can produce code 24/7 without tiring would be worth even more than a person of equivalent skill. I don't know what kinds of goofy hiring ya'll are doing, but all of the juniors I've been involved in hiring actually worked out great.
> The median dev thinks Star Wars and Daft Punk are a public commons. The great cultural project of developers has been opposing any protection that might inconvenience a monetizable media-sharing site. When they fail at policy, they route around it with coercion. They stand up global-scale piracy networks and sneer at anybody who so much as tries to preserve a new-release window for a TV show.
This is pretty disingenuous and mean-spirited. He's using SV startup types to smear all developers, including the typical, humble, un- or undercompensated FOSS dev who absolutely respects intellectual property, and quite reasonably, expects their own IP to be respected as well. But because Sam Altman types follow the "It's Better to Beg for Forgiveness than to Ask for Permission" rule, altruistic, non-sociopathic developers have no right to complain?
The very same developers that advocate pirating from Netflix or Disney?
He's pointing out the hypocrisy in a class of people (generally, tech-savvy programmers) that have no problem advocating for piracy and gleefully thwarting the concrete IP rights of businesses on one hand, while defending and bemoaning the abstract IP rights of other content creators on the other hand.
My take away is that he's saying IP rights is a weak argument to use to defend a personal position for not adopting LLM's in your workflow, especially given the context of modern piracy.
Even the ones advocating or engaging in personal piracy aren't creating unauthorized derivative works and monetizing them, which is arguably what these AI systems and their customers are doing. And Thomas wasn't talking about personal piracy, but instead trying to use the IP-hostile conduct of Google and certain startups (like Scribd or GrooveShark) to smear all developers and de-legitimize their IP-based objections, even of those releasing code under permissive licenses (like the GPL) in exchange for expected adherence to certain terms and conditions that these systems and their users ignore.
I challenge you to point out the language in his post making this claim. Otherwise you're just making stuff up.
He's specifically addressing high-level arguments used against LLM's, and this case he's calling out the fact that one group of people are railing against LLM providers for "IP theft" while simultaneously encouraging and using pirated content. It's perfectly reasonable to call out the hypocrisy in these contradictory positions, to demonstrate that the "IP theft" argument is mostly virtue signaling.
Here:
> The great cultural project of developers has been opposing any protection that might inconvenience a monetizable media-sharing site. When they fail at policy, they route around it with coercion. They stand up global-scale piracy networks and sneer at anybody who so much as tries to preserve a new-release window for a TV show.
He doesn't explicitly name corporate entities, but it's clearly the ones founded and run by "move fast, break things" types (and I suspect he's one of them) that he's using to smear all developers, even the humble ones who aren't nearly as rich as him and whose open source software he used to build his business.
The LLM has one job, to make code that looks plausible. That's it. There's no logic gone into writing that bit of code. So the bugs often won't be like those a programmer makes. Instead, they can introduce a whole new class of bug that's way harder to debug.
LLMs are way faster than me at writing tests. Just prompt for the kind of test you want.
I can and do use AI to help with test coverage but coverage is pointless if you don’t catch the interesting edge cases.
Quality increases if I double check code with a second LLM (especially o4 mini is great for that)
Or double check tests the same way.
Maybe even write tests and code with different LLMs if that is your worry.
Maybe use one LLMs to write the code and a wildly different one to write the tests and yet another wildly different one to generate an English description of each test while doing critical review.
https://news.ycombinator.com/item?id=44163194
https://news.ycombinator.com/item?id=44068943
It doesn't optimize "good programs". It interprets "humans interpretation of good programs." More accurately, "it optimizes what low paid over worked humans believe are good programs." Are you hiring your best and brightest to code review the LLMs?Even if you do, it still optimizes tricking them. It will also optimize writing good programs, but you act like that's a well defined and measurable thing.
Correctness.
> and meets my requirements
It can't do that. "My requirements" wasn't part of the training set.
> It can't do that. "My requirements" wasn't part of the training set.
Neither are mine, the art of building these models is that they are generalisable enough that they can tackle tasks that aren't in their dataset. They have proven, at least for some classes of tasks, they can do exactly that.
> to an abstract academic definition here
Besides the fact that your statement is self contradicting, there is actually a solid definition [0]. You should click the link on specification too. Or better yet, go talk to one of those guys that did their PhD in programming languages. > They have proven
Have they?Or did you just assume?
Yeah, I know they got good scores on those benchmarks but did you look at the benchmarks? Look at the question and look what is required to pass it. Then take a moment and think. For the love of God, take a moment and think about how you can pass those tests. Don't just take a pass at face value and move on. If you do, well I got a bridge to sell you.
[0] https://en.wikipedia.org/wiki/Correctness_(computer_science)
> In theoretical computer science, an algorithm is correct with respect to a specification if it behaves as specified.
"As specified" here being the key phrase. This is defined however you want, and ranges from a person saying "yep, behaves as specified", to a formal proof. Modern language language models are trained under RL for both sides of this spectrum, from "Hey man looks good", to formal theorem proving. See https://arxiv.org/html/2502.08908v1.
So I'll return to my original point: LLMs are not just generating outputs that look plausible, they are generating outputs that satisfy (or at least attempt to satisfy) lots of different objectives across a wide range of requirements. They are explicitly trained to do this.
So while you argue over the semantics of "correctness", the rest of us will be building stuff with LLMs that is actually useful and fun.
No they aren't. You were lied to by the hype machine industry. Sorry.
The good news is that there's a lot of formerly intractable problems that can now be solved by generating plausible output. Programming is just not one of them.
Ok. My own empirical evidence is in favour of these things being useful, and useful enough to sell their output (partly), but I'll keep in mind that I'm being lied to.
(And yes, this leap is the lie you're being sold. "LLMs are kinda useful" is not what led to the LLM trillion dollar hype bubble.)
> formal theorem proving
You're using Coq and Lean?I'm actually not convinced you read the paper. It doesn't have anything to do with your argument. Someone using LLMs with formal verification systems is wildly different than LLMs being formal verification systems.
This really can't work if you don't read your own sources
> If the model can write code that passes tests
You think tests make code good? Oh my sweet summer child. TDD has been tried many times and each time it failed worse than the last.I'm not saying don't make tests. But I am saying you're not omniscient. Until you are, your tests are going to be incomplete. They are helpful guides, but they should not drive development. If you really think you can test for every bug then I suggest you apply to be Secretary for health.
https://hackernoon.com/test-driven-development-is-fundamenta...
https://geometrian.com/projects/blog/test_driven_development...
* End up missing tests for edge cases they built and forgot about. Those edge cases often have bugs.
* They forget and cover the same edge cases twice if theyre being thorough with test-after. This is a waste.
* They usually end up spending almost as much time manually testing in the end to verify the code change they just made worked whereas I would typically just deploy straight to prod.
It doesnt prevent all bugs it just prevents enough to make the teams around us who dont do it look bad by comparison even though they do manual checks too.
Ive heard loads of good reasons to not write tests at all, Ive yet to hear a good reason to not write one before if you are going to write one.
Both of your articles raise pretty typical straw men. One is "what if im not sure what the customer wants?" (thats fine but i hope you arent writing production code at this point) and the other is the peculiar but common notion that TDD can only be done with a low level unit test which is dangerous bullshit.
The average driver thinks they're above average. The same is true about programmers.
I do disagree a bit with the post and think you should write tests while developing. Honestly, I don't think they'll disagree. I believe they're talking about a task rather than the whole program. Frankly, no program is ever finished so in that case you'd never write tests lol.
I believe this because they start off saying it wasn't much code.
But you are missing the point. From the first link
> | when the tests all pass, you’re done
> Every TDD advocate I have ever met has repeated this verbatim, with the same hollow-eyed conviction.
These aren't strawmen. These are questions you need to constantly be asking yourself. The only way to write good code is to doubt yourself. To second guess. Because that's what drives writing better tests.I actually don't think you disagree. You seem to perfectly understand that tests (just like any other measure) are guides, not answers. That there's much more to this than passing tests.
But the second D in TDD is what's the problem. Tests shouldn't drive development, they are just part of development. The engineer writing tests at the end is inefficient, but the engineer that writes tests at the beginning is arrogant. To think you can figure it out before writing the code is laughable. Maybe some high level broad tests are feasible but that's only going to be a very small portion.
You can do hypothesis driven development, but people will call you a perfectionist and say you're going to slow. By HDD I mean you ask "what needs to happen, how would I know that is happening?" Which very well might involve creating tests. Any scientist is familiar with this but also familiar with its limits
Im not sure what you're saying, really but I dont think it disagrees with this central point in any specific way.
Are you saying you're better than that? If you think you're next to perfect then I understand why you're so against the idea that an imperfect LLM could still generate pretty good code. But also you're wrong if you think you're next to perfect.
If you're not being super haughty, then I don't understand your complaints against LLMs. You seem to be arguing they're not useful because they make mistakes. But humans make mistakes while being useful. If the rate is below some line, isn't the output still good?
> I don't know if any of this applies to the arguments
> with access to ground truth
There's the connection. You think you have ground truth. No such thing existsIn medical AI, where I'm currently working, "ground truth" is usually whatever human experts say about a medical image, and is rarely perfect. The goal is always to do better than whatever the current ground truth is.
But even when taking state of the art knowledge as a ground truth aligning to that is incredibly hard. Medicine is a great example. You're trying to create a causal graph in a highly noisy environment. You ask 10 doctors and you'll get 12 diagnoses. The problem is subtle things become incredibly important. Which is exactly what makes measurements so fucking hard. There is no state of the art in a well defined sense.
The point is that in most domains this is how things are. Even in programming.
Getting the right answer isn't enough
You can talk about how meaningful those exit codes and error messages are or aren't, but the point is that they are profoundly different than the information an LLM natively operates with, which are atomized weights predicting next tokens based on what an abstract notion of a correct line of code or an error message might look like. An LLM can (and will) lie to itself about what it is perceiving. An agent cannot; it's just 200 lines of Python, it literally can't.
> You're here using "ground truth" in some kind of grand epistemic sense
I used the word "ground truth" because you did! >> in agent loops with access to ground truth about whether things compile and pass automatic acceptance.
Your critique about "my usage of ground truth" is the same critique I'm giving you about it! You really are doing a good job at making me feel like I'm going nuts... > the information an LLM natively operates with,
And do you actually know what this is?I am a ML researcher you know. And one of those ones that keeps saying "you should learn the math." There's a reason for this, because it is really connected to what you're talking about here. They are opaque, but they sure aren't black boxes.
And it really sounds like you're thinking the "thinking" tokens are remotely representative of the internal processing. You're a daily HN user, I'm pretty sure you saw this one[0].
I'm not saying anything OpenAI hasn't[1]. I just recognize that this applies to more than a very specific narrow case...
[0] https://news.ycombinator.com/item?id=44074111
[1] https://cdn.openai.com/pdf/34f2ada6-870f-4c26-9790-fd8def563...
I very strongly disagree with this and think this reflects a misunderstanding of model capabilities. This sort of agentic loop with access to ground truth model has been tried in one form or another ever since GPT-3 came out. For four years they didn't work. Models would very quickly veer into incoherence no matter what tooling you gave them.
Only in the last year or so have models gotten capable enough to maintain coherence over long enough time scales that these loops work. And future model releases will tighten up these loops even more and scale them out to longer time horizons.
This is all to say that progress in code production has been essentially driven by progress in model capabilities, and agent loops are a side effect of that rather than the main driving force.
You can definitely still run into some of the problems eluded to in the first link. Think hacking unit tests, deception, etc -- but the bar is less "create a perfect RL environment" than "create an RL environment where solving the problem is easier than reward hacking." It might be possible to exploit a bug in the Lean 4 proof assistant to prove a mathematical statement, but I suspect it will usually be easier for an LLM to just write a correct proof. Current RL environments aren't as watertight as Lean 4, but there's certainly work to make them more watertight.
This is in no way a "solved" problem, but I do see it as a counter to your assertion that "This isn't a thing RL can fix." RL is powerful.
> Current paradigms are shifting towards RLVR, which absolutely can optimize good programs
I think you've misunderstood. RL is great. Hell, RLHF has done a lot of good. I'm not saying LLM are useless.But no, it's much more complex than you claim. RLVM can optimize for correct answers in the narrow domains where there are correct answers but it can't optimize good programs. There's a big difference.
You're right that Lean, Coq, and other ATPs can prove mathematical statements, but they also don't ensure that a program is good. There's frequently an infinite number of proofs that are correct, but most of those are terrible proofs.
This is the same problem all the coding benchmarks face. Even if the LLM isn't cheating, testing isn't enough. If it was we'd never do code review lol. I can pass a test with an algorithm that's O(n^3) despite there being an O(1) solution.
You're right that it makes it better, but it doesn't fix the underlying problem I'm discussing.
Not everything is verifiable.
Verifiability isn't enough.
If you'd like to prove me wrong in the former you're going to need to demonstrate that there are provably true statements to lots of things. I'm not expecting you to defy my namesake, nor will I ask you prove correctness and solve the related halting problem.
You can't prove an image is high fidelity. You can't prove a song sounds good. You can't prove a poem is a poem. You can't prove this sentence is English. The world is messy as fuck and most things are highly subjective.
But the problem isn't binary, it is continuous. I said we're using Justice Potter optimization, you can't even define what porn is. These definitions change over time, often rapidly!
You're forgetting about the tyrannical of metrics. Metrics are great, powerful tools that are incredibly useful. But if you think they're perfectly aligned with what you intend to measure then they become tools that work against you. Goodhart's Law. Metrics only work as guides. They're no different than any other powerful tool, if you use it wrong you get hurt.
If you really want to understand this I really encourage you to deep dive into this stuff. You need to get into the math. Into the weeds. You'll find a lot of help with metamathematics (i.e. my namesake), metaphysics (Ian Hacking is a good start), and such. It isn't enough to know the math, you need to know what the math means.
If the former, I still think that the vast majority of production software has metrics/unit tests that could be attached and subsequently hillclimbed via RL. Whether the resulting optimized programs would be considered "good" depends on your definition of "good." I suspect mine is more utilitarian than yours (as even after some thought I can't conceive of what a "terrible" proof might look like), but I am skeptical that your code review will prove to be a better measure of goodness than a broad suite of unit tests/verifiers/metrics -- which, to my original last point, are only getting more robust! And if these aren't enough, I suspect the addition of LLM-as-a-judge (potentially ensembles) checking for readability/maintainability/security vulnerabilities will eventually put code quality above that of what currently qualifies as "good" code.
Your examples of tasks that can't easily be optimized (image fidelity, song quality, etc.) seem out of scope to me -- can you point to categories of extant software that could not be hillclimbed via RL? Or is this just a fundamental disagreement about what it means for software to be "good"? At any rate, I think we can agree that the original claim that "The LLM has one job, to make code that looks plausible. That's it. There's no logic gone into writing that bit of code" is wrong in the context of RL.
> I took this to mean "good code within the domain of software engineering," not "good code within the universe of possible programs.
We both mean the same thing. The reasonable one. The only one that even kinda makes sense: good enough code > vast majority of production software has metrics/unit tests that could be attached and subsequently hillclimbed via RL
Yes, hill climbed. But that's different than "towards good"Here's the difference[0]. You'll find another name for Goodhart's Law in any intro ML course. Which is why it is so baffling that 1) this is contentious 2) it is the status quo in research now
Your metrics are only useful if you understand them
Your measures are only as good as your attention
And it is important to distinguish metrics from measures. They are different things. Both are proxies
> Your examples of tasks that can't easily be optimized (image fidelity, song quality, etc.) seem out of scope to me
Maybe you're unfamiliar with diffusion models?[1]They are examples where it is hopefully clearer that these things are hard to define. If you have good programming skills you should be able to make the connection back to what this has to do with my point. If not, I'm actually fairly confident GPT will be able to do so. There's more than enough in its training data to do that.
And again, diffusion models aren't relevant here. The original comment was about LLMs producing buggy code -- not RL's general limitations in other domains. Diffusion models' tensors aren't written by hand.
If you can't code then the distinction is lost on you, but in fact the "correct" part is why programmers get paid. If "plausible" were good enough then the profession of programmer wouldn't exist.
If we did, we could autogenerate code even without an LLM.
Code that doesn't do what you want isn't "working", bro.
Working exactly to spec is the code's only job.
That sounds like a new opportunity for a startup that will collect hundreds of millions a of dollars, brag about how their new AI prototype is so smart that it scares them, and devliver nothing
Anyway, this is where AI's have been really bad for us. As well as sometimes "overengineering" their bug prevention in extremely inefficient ways. The flip-side of this is of course that a lot of human programmers would make the same mistakes.
Funny story: when I first posted that and had a couple of thousand readers, I had many comments of the type "you should just read the code carefully on review", but _nobody_ pointed out the fact that the opening example (the so called "right code") had the exact same problem as described in the article, proving exactly what you just said: it's hard to spot problems that are caused by plausibility machines.
AI generated code will fuck up so many lives. The post office software in the UK did it without AI. I cannot imagine the way and the number of lives will be ruined since some consultancy vibe coded some government system. I might come to appreciate the German bureaucracy and backwardness.
What makes you say that? If LLMs didn't reason about things, they wouldn't be able to do one hundredth of what they do.
Why? Because if I’m not right then I am convinced that AI is going to be a force for evil. It will power scams on an unimaginable scale. It will destabilize labor at a speed that will make the Industrial Revolution seem like a gentle breeze. It will concentrate immense power and wealth in the hands of people who I don’t trust. And it will do all of this while consuming truly shocking amounts of energy.
Not only do I think these things will happen, I think the Altmans of the world would eagerly agree that they will happen. They just think it will be interesting / profitable for them. It won’t be for us.
And we, the engineers, are in a unique position. Unlike people in any other industry, we can affect the trajectory of AI. My skepticism (and unwillingness to aid in the advancement of AI) might slow things down a billionth of a percent. Maybe if there are more of me, things will slow down enough that we can find some sort of effective safeguards on this stuff before it’s out of hand.
So I’ll keep being skeptical, until it’s over.
I firmly believe that too. That's why I've been investing a great deal of effort in helping people understand what this stuff can and can't do and how best to make use of it.
I don't think we can stop it, but I do think (hope) we can show people how to use it in a way where the good applications outweigh the bad.
I stand by what I wrote about it though: https://simonwillison.net/2025/Mar/19/vibe-coding/
I think it's a net positive for regular humans to be able to build tools for their own personal use, and I think my section on "when is it OK to vibe code?" (only for low stakes projects, treat with extreme caution if private data or security is involved) is something I wish people had paid more attention to! https://simonwillison.net/2025/Mar/19/vibe-coding/#when-is-i...
That feels idealistic. About as realistic as telling people how to use semiconductors or petrochemicals for good instead of bad.
No-one knows where AI is going but one thing you can be sure of - the bad actors don't give two hoots what we think, and they will act in their own interests as always. And as we see from historical events, there are still many, many bad actors around. And when the bad actors do bad things with the technology, the good actors have no choice but to react.
Sounds better than nothing.
You can write walls of texts about ethics and social failure. Bad actors won't care.
You can tell everyone that some technology is bad and everyone should stop using it. Some good people will listen to you and stop. Bad actors won't stop, and they will have technological edge.
You can ask politicians for regulation. However, your government might be a bad actor just as well (and recently we had a fine demonstration). They will not regulate in the interests of good people. They will regulate for what stakeholders want. Common people are never stakeholders.
If you want to stop bad actors doing bad things with AI: learn AI faster and figure out how to use AI to stop AI. This is the only way to fly.
It already is. https://futurism.com/slop-farmer-ai-social-media
And all the other things you predicted. They're underway _now_ .
> Maybe if there are more of me, things will slow down enough
Nope. That's not how it's gonna work. If you want to prevent things, it will take legislation. But sitting it out doesn't send any message at all. No amount of butterflies farting against the wind is going to stop this tornado.
AI is just the latest in a long list of disruptive technologies. We can only guess about the long term ramifications. But if history is any indicator, people in a few decades will probably see AI as totally normal and will be discussing the existential threat of something new.
Claiming AI isn't unique in being a tool for evil isn't interesting, the point is that it's a force multiplier as such.
AI is a large leap forward in capability and will likely have significant impacts on society. But it’s far from clear that it will have disproportionate negative impacts like a nuke. More likely it will have benefits and downsides similar to numerous other modern technologies.
We had friends over for dinner a couple days back; between us we had two computer scientists, one psychologist, one radiologist, one doctor. Each of us were in turn astonished and somewhat afraid of the rapid pace of change. In a university setting, students are routinely using Claude and ChatGPT for everything from informal counseling to doing homework to generating presentations to doing 'creative' work (smh).
At the end of they day, we all agreed that we were grateful that we are at the tail end of our working life, and that we didn't have to deal with this level of uncertainty
But people surely felt the same way about gunpowder, the steam engine, electricity, cars, phones, planes, nukes, etc.
Or look at specific professions that software has negatively affected in recent decades. Not a lot of people use travel agents anymore, for example.
I’m not saying that the negative effects are good. But that’s just the nature of technological advancement. It’s up to society to adapt and help out those who have been most negatively affected.
AI is not only a general-purpose technology (such as, e.g., electricity or computers), but also the only one that can self-improve. Also, its potential for diffusion is much bigger, because unlike for electricity and computers, barriers are much more easily overcome [1].
But I'd suggest looking at this not as a separate "technology" like airplanes or smartphones, but rather as a software breakthrough. Everything that worried people about software [2] in the 20th and the early 21st century - or anything that came to pass on a smaller scale - is now much more worrisome.
[1] Pretty much any person or machine today that has access to electricity and computers can use AI, thanks to subscription models, cloud computing, VC subsidies, web access, etc.
[2] For example, large-scale automation and job loss, mass surveillance, robot swarms, etc.
I feel you've misunderstood the moment. There is no "over". This is it.
The ends do not justify the means. It is a similar judgement as when refusing to buy products of forced labour or disproportionate environmental impact, or to invest in war and bloodshed. Everyone makes one for themselves.
Coincidentally (or not), if said suspect ethics were properly addressed, it would ameliorate some of the reservations even the actual skeptics have. Licensing training data would make everyone involved aware of what is happening, give them an ability to vote and freedom to choose, soften the transition as opposed to pulling ground from under people’s feet.
[0] Control over intellectual property has given us fantastic things (cf. Linux, Blender, etc.; you can’t have copyleft without an ability to defend it, and IP laws provide that ability). If yesterday we were sued for singing the happy birthday song in public, and today we see corporations with market caps the size of countries pretending that IP ownership is not much of a thing, the alarm bells should be deafening.
The stuff that Disney does to extend copyright is not the same as assuming daft punk is public domain.
And there’s a difference between what is human scale infringement and what’s going on now.
Nor does it mean that people don’t have the right to point out that it’s piracy.
If being more in line with the espoused values is the issue, then it’s to make an effort to ensure that we stop consuming pirated content. Or building tools to encourage piracy - this turns out to be a relatively small group of people, compared to everyone in tech.
And people have already stopped piracy - once alternatives showed up. There is the issue that you don’t own the stuff you stream, but that’s a separate topic.
The moral arguments presented persuasive.
I disagree with this being unique to AI because every improved technology since the automated loom has concentrated wealth and power. AI is an improved technology so it'll do so also.
I disagree that engineers are uniquely equipped to do anything about this fact because the solution to wealth concentration due to improved technology has basically nothing to do with technology and everything to do with sociology and politics.
Our technology keeps improving and I keep being surprised to hear people say "ah, with our improved efficiency, we can finally work ten hours a week and kick our feet up." The first people to say that were the luddites and when they found out that wasn't to be the case, they burned down factories about it. Why do we think it will suddenly be different for this specific technology?
I agree we should do something about it but I don't think the solution involves code.
Here's my historical take: in the 1960s and 1970s, computation in general was viewed as a sinister, authoritarian thing. Many people assumed it was going to be that way, and a small minority recognised that it also had the potential to empower and grant autonomy to a wider class of people. These were the advocates of the personal computer revolution -- the idea of "computer lib", whereby the tools of control would be inverted and provided to the people at large.
You can argue about whether that strategy was a success or not, but the group tht was largely irrelevant to that fight were the people who decided not to get involved, or to try (although not very hard) to impede the development of computation in general.
To bend the trajectory of AI in general involves understanding and redeploying it, rather than rejecting it. It also involves engaging. If it's anything like the last few times, the group that is simultaneously exploring and attempting to provide agency and autonomy for the maximum number of people will be smaller than both those using new tech to exploit people or maintain an unequal status quo, and the people who have good intentions, but throw their hands up at the possibility of using their skills to seize control of the means that provide for a better future.
But there is very little you or I can do about it except choosing not to partake.
You say "there is very little you and I can do about it". Even if you don't listen to me, perhaps you might listen to the coiner of the term "enshittification"? https://archive.is/CqA8w
We may be talking past each other, but my experience of computing in the 70's and 80's was definitely not academic.
Did it exist a little? Of course. But it was dwarfed by the other stuff going on. I suspect your (and a lot of other HN) experience is going to bias on the hobbiest side though, as does mine. I only found out about the much larger stuff going on at the same time much later.
Almost all the early networking stuff (UUCP, pre-Internet internet like Arpanet, early Usenet, Gopher, even HTML and the WWW, etc) was academic institutions or related.
Often with military grants/contracts. Sometimes with purely commercial contracts, but even those were almost always for some Gov’t project. The amount of work on basics like sorting algorithms that grew out gov’t research is mind boggling, for instance.
There is a lot of well documented history on this.
Then PCs and halfway decent modems became available (2400 baud+), and things changed very rapidly.
Mid 80’s, BBS’s started sprouting like weeds. There were a few before then, but the truly hobbiest ones were very niche.
Then even more so with commercial services like Prodigy, then AOL, then actual ISPs, etc.
And it was correct. We now live in surveillance states much worse than Stalin's or east germany.
You need to lookup how much an "average" human consumes. When I replace 2 humans with a ChatGPT subscription, I can guarantee you that OpenAI is generating less co2 than what these two interns were creating with their transport to the office (and back). That's before we consider things like the 25 years it took to raise and train them or the very expensive tastes (eg. Travelling around the world) they get after they earn a large salary.
I'm tired. I'm tired of developers/techies not realizing their active role in creating a net negative in the world. And acting like they are powerless and blameless for it. My past self is not innocent in this; but I'm actively trying to make progress as I make a concerted effort to challenge people to think about it whenever I can.
After countless of times that the tech industry (and developers specifically) have gone from taking an interesting technical challenge that quickly require some sort of ethical or moral tradeoff which ends up absolutely shaping the fabric of society for the worse.
Creating powerful search engines to feed information to all who want it; but we'll need to violate your privacy in an irreversible way to feed the engine. Connecting the world with social media; while stealing your information and mass exposing you to malicious manipulation. Hard problems to solve without the ethical tradeoff? Sure. But every other technical challenge was also hard and solved, why can't we also focus on the social problems?
I'm tired of the word "progress" being used without a qualifier of what kind of progress and at the cost of what. Technical progress at the cost of societal regression is still seen as progress. And I'm just tired of it.
Every time that "AI skeptics" are brought up as a topic; the focus is entirely on the technical challenges. They never mention the "skeptics" that are considered that because they aren't skeptical of what AI is and could be capable. I'm skeptical if the tradeoffs being made will benefit society overall; or just a few. Because at literally every previous turn for as long as I've been alive; the impact is a net negative to the total population, without developer questioning their role in it.
I don't have an answer for how to solve this. I don't have an answer on how to stop the incoming shift in destroying countless lives. But I'd like developers to start being honest in their active role in not just accepting this new status quo; but proactively pushing us us in a regressive manner. And our power to push back on this coming wave.
Political-Economic analysis of technology is not super popular thing in a mainstream media, but disabling, sabotaging or vandalising anti-human tech might be.
Like this?
Most of them got into tech because it's fun and because it pays royaly. Morals have little to do with that for lots of folks.
But, tech was not always a net negative.
As far as I can tell, the sharpest negative inflection came around the launch of the iPhone. Facebook was kind of fine when it was limited to universities and they weren't yet doing mobile apps, algorithmic feeds or extensive A:B testing..
It seems "optimizing engagement," was a grave initial sin...
Maybe some engineers should to go back to their childhoods and watch some Outer Limits and pay attention to the missed lessons..
Our lives are not our own. From womb to tomb, we are bound to others. Past and present. And by each crime and every kindness, we birth our future.
Tech has always been a tool for control, power and accumulation of capital.
You counterbalance it with social and civic laws (ie. Counter power)
Some would say "The Industrial Revolution and its consequences have been a disaster for the human race."
No "we" don't want it. And those who do want it, let them go live in the early industrial England whete the lack of regulation degenerated masses.
Also, for some reason people still portray capitalism as being something completelky different with or without regulation, it's like saying a man is completelly different in a swimming swit and a costume.
> We, in the western world, were in the privileged position of having a choice, and we chose individual profit over the communal good
Again, "we" did not have a gathering a choose anything. Unless you have records of that zoom session.
> given the fact we're essentially animals.
This is a reductionist statement that doesn't get anywhere. Yes we are animals but we are more than that, similar to being quarks but also more than quarks.
And I am not sure but like, I have got this one life. Why can't I just be a good guy who wants to help others while still being in the system.
Why do I have to suffer for other peoples decision and have to bear the mental responsibility to.
Nobody's perfect. Neither do I intend to be. We are all gonna die. I just want to leave my community a little bit more charming place. Not bring revolution.
I can't escape the system because this thought terrifies me. It terrifies me because you have to pick your battles wisely. I won't leave my coding job because of llms.
Instead if I am really feeling like trying do good. I can donate extensively and live a frugal life for some time and donate to people who are dying due to hunger and such
And I would still have the freedom to go back at any I stant and stop donating.
The same can't be said about leaving a job. Its hard to reenter
I am not sure lol. But I would much rather build stuff that I like with llms and then donate instead of the project not existing or taking way longer time imo
https://fred.stlouisfed.org/series/RSAHORUSQ156S
(This is somewhat but not entirely tautological.)
When in actuality, I personally believe that it doesn't
I am tired of people blaming bottom developers, while CEOs get millions for "the burden of responsibility".
We developers are not blameless. If we accept that we are playing a role; then we can be proactive in preventing this and influencing the direction things go. CEOs need developers to achieve what they want.
I'm not saying it's easy. I won't even hold it against folks that decide to go in a separate direction than mine. But I at least hope we can be open about the impact we each have; and that we are not powerless here.
This is because most people on HN who say they are skeptical about AI mean skeptical of AI capabilities. This is usually paired with statements that AI is "hitting a wall." See e.g.
> I'm very skeptical. I see all the hype, listen to people say it's 2 more years until coding is fully automated but it's hard for me to believe seeing how the current models get stuck and have severe limitations despite a lot of impressive things it can do. [https://news.ycombinator.com/item?id=44015865]
> As someone who is mildly skeptical of the current wave of LLM hype and thinks it's hitting a wall... [https://news.ycombinator.com/item?id=43634169]
(that was what I found with about 30 seconds of searching. I could probably find dozens of examples of this with more time)
I think software developers need to urgently think about the consequences of what you're saying, namely what happens if the capabilities that AI companies are saying are coming actually do materialize soon? What would that mean for society? Would that be good, would that be bad? Would that be catastrophic? How crazy do things get?
Or put it more bluntly, "if AI really goes crazy, what kind of future do you want to fight for?"
Pushing back on the wave because you take AI capabilities seriously is exactly what more developers should be doing. But dismissing AI as an AI skeptic who's skeptical of capabilities is a great way to cede the ground on actually shaping where things go for the better.
I’m definitely not skeptical of its abilities, I’m concerned by them.
I’m also skeptical that the AI hype is going to pan out in the manner people say it is. If most engineers make average or crappy code, then how are they going to know if the code they are using is a disaster waiting to happen?
Verifying an output to be safe depends on expertise. That expertise is gained through the creation of average or bad code.
This is a conflict in process needs that will have to be resolved.
These LLMs may not be inherently evil, but their impact on society could be potentially destabilising.
I'm not saying there is no evil, but that argument at least holds little ground.
These systems (LLMs, diffusion) yield imitative results just powerful enough to eventually threaten the jobs of most non-manual laborers, while simultaneously being not powerful enough (in terms of capability to reason, to predict, to simulate) to solve the hard problems AI was promised to solve, like accelerating cancer research.
To put it another way, in their present form, even with significant improvement, how many years of life expectancy can we expect these systems to add? My guess is zero. But I can already see a huge chunk of the graphic designers, the artists, the actors, and the programmers or other office workers being made redundant.
Everything else around it is a glamorous party cause everyones money is riding on it and one needs to appreciate it or risk being deserted by the crowd.
The basics of science is around questioning things until you get convinced. People depending on models too much may end up in a situation where they would loose the ability to triangulate information from multiple sources before being convinced about it.
Programming can be more complicated above a certain threshold even for humans so it would be interesting how the models perform with the complexity. I am skeptic but again I dont know the future either.
For ai companies, its to get a model which can be better on benchmarks and vibes so that it can be sota and get higher valuation for stakeholders.
For coders, they just want the shit done. Everyone wants the easy way if his objective is to complete a project but for some it is learning and they may not choose the easy way.
Why they want to do it the easy way, mostly as someone whose cousin's and brother's are in this cs field(i am still in high school), they say that if they get x money then the company at least takes a 10x value of work from them. (Of course, it may be figuratively). One must imagine why they should be the one morally bound in case ai goes bonkers.
Also, the best not using ai would probably stop it a little but the ai world moves so fast, its unpredictable, deepseek was unpredicted. I might argue that now its a matter of us vs China in this new arms race of ai. Would that stop if you stop using it? Many people are already hating ai but has that done much to stop it? If that is, you call ai stopping at the moment.
Its paradoxical. But to be Frank, LLM was created for the reason Its excelling at. Its a technological advancement and a moral degradation.
Its already affecting supply chain tbh. And to be frank, I am still using ai to build projects which I just want to experiment with and see if it can really work without getting the domain specific knowledge. Though I also want to learn more and am curious but just don't have much time in high school.
I don't think people cared about privacy and I don't think people would care about it now. And its the same as not using some big social media giant, you can't escape it. The tech giants also made it easier but less private. People chose the easier part and they would still choose the easy part ie llm. So I guess the future is bleak eh? Well the present isn't that great either. Time to just enjoy life while the world burns by the regret of its past actions for 1% shareholder profit. (For shareholders, it was all worth it though, am I right?)
My 0.02$
> Because if I’m not right then I am convinced that AI is going to be a force for evil. It will power scams on an unimaginable scale.
It doesn't need to be a good coder to do that.Look at common scams. You get those texts from "Jane" who sent you an iMessage from an email address offering you a part time job and asks you to contact them on WhatsApp, right? (Well... Android does a better job at filtering spam) Or calls from "the IRS". Or anything else that's bullshit. This even includes legal scams like charging you for canceling your service or making it hard to cancel your subscription.
There's no skill needed for this. You don't need a brilliant coder. You need the equivalent of a call center in India. You need the equivalent of a poorly worded Nigerian scam email.
Shitty coding LLMs make this shit easier to mass produce. High quality LLMs only make it worse.
Personally, I'm just tired of all the shitty lemons[0] everywhere. I wanna buy a peach, but everything being sold is a lemon. All the lemons have done is make me annoyed and frustrated at all the extra work I have to do.
I now have 4 entries for my GF's birthday because when I merged a duplicated contact it just created more instances. I can't even delete them! Shit like this sounds petty and minor but when you're hitting 20-100 new issues like this daily, it isn't so minor anymore. I can't make any more lemonaid. There are just too many lemons...
I keep hearing this but have yet to find a good resource to study the issues. Most of what I've read so far falls into two buckets:
"It'll hijack our minds via Social Media" - in which case Social Media is the original sin and the problem we should be dealing with, not AI.
or
"It'll make us obsolete" - I use the cutting edge AI, and it will not, not anytime soon. Even if it does, I don't want to be a lamplighter rioting, I want to have long moved on.
So what other good theories of safety can I read? Genuine question.
Bruce Scheneir, May 2024
https://www.schneier.com/academic/archives/2024/06/ai-will-i...
I am seeing a stream of comments on Reddit that are entirely ai driven, and even bots which are engaging in conversations. Worst case scenarios I’m looking at will mean it’s better to assume everyone online is a bot.
I know of cases where people have been duped into buying stocks because of an AI generated version of a publicly known VP of a financial firm.
Then there’s the case where someone didn’t follow email hygiene and got into a zoom call with what appeared to be their CFO and team members, and transferred several million dollars out of the firm.
And it’s only 2-3 years into this lovely process. The future is so bleak that just talking about this with people not involved with looking at these things call it nihilism.
It’s so bad that talking about it is like punching hope.
For that matter, My email has been /dev/null for a while now, and unless I have spoken to a person over phone and expect their email, I don't even check my inbox. Facebook/Instagram account is largely used as a photo back up service, plus online directory. And Twitter is for news.
I mostly don't trust anything that comes online, unless I already have verified the other party is somebody Im familiar with and even then only through the established means of communication we both have agreed to.
I do believe reddit, quora, leet code et al, will largely be reduced /dev/null spaces very soon.
There was a direct benefit from digitization and being able to trust digital video and information that allowed nations to deliver services.
Trust was a public good. Factual information cheaply produced and disseminated was a public good.
Those are now more expensive because the genAI content easily surpasses any cheap bullshit filter.
It also ends up undermining faith in true content, which may be outlandish.
I saw an image of a penny hitch on Reddit and I have no idea if it’s real or not without having to check anymore.
In all honesty, art in some form or the other has always been simulated to some extent. Heck, the whole idea of a story, even if in a book is something you know hasn't happened in real life, but you are willing to suspend belief for a while to be entertained. This is the essence of all entertainment. It is not real, but it makes you feel good.
Like action movies have had cgi, cartoon shows, magic shows and even actors putting on make up itself can be considered deviation from truth.
I guess your idea is that news can be manufactured and one could rig public opinion to all sorts of bad things. But again, once you are here, a good amount of public already knows this to be false enough to be wary of it. Come to think of it, a lot of news is already heavily edited to a point it doesn't represent the original story. This is just a continuation of the same.
The second issue at play here is the level of effort required to spoof content and its flip side - the level of effort required to verify accuracy of content.
I am talking about the second issue: Effectively our ability to suss out what is real is now insufficient. Is the person you are talking to in the comments a bot? Is that short message from a human? interesting historic fact true? Did people really do that? That can’t be real, can it?
I am concerned that it used to take X amount of time and effort to check if something was valid, or Y amount of time to create a convincing facsimile.
The issue is that since Y is much lower, when something outlandish shows up, it takes more time and effort to check if it’s true. For example, I could look at an image and at a glance tell if it was fake. Now I can’t. This means that theres a whole swathe of content that I cannot trust anymore, unless I am willing to make a decent chunk of effort to verify it.
This means I am also less invested in public groups or communities, because they are likely to be filled with bots. My posture is to be more guarded and suspicious.
Extended to the entire ecosystem, and this becomes the dystopian worst case scenario, and that voice asking for help in some corner of the net, is likely a mimic and not an adventurer who needs help.
I am not too concerned about rigging popular opinion, because that process has already been discovered (I’ll plug Network Propaganda again).
That's why there's no single source that's useful to study issues related to AI. Until we see an incident, we will never know for sure what is just a possibility and what is (not) an urgent or important issue [1].
So, the best we can do is analogize based on analogical things. For example: the centuries of Industrial Revolution and the many disruptive events that followed; history of wars and upheavals, many of which were at least partially caused by labor-related problems [2]; labor disruptions in the 20th century, including proliferation of unions, offshoring, immigration, anticolonialism, etc.
> "Social Media is the original sin"
In the same way that radio, television and the Internet are the "original sin" in large-scale propaganda-induced violence.
> "I want to have long moved on."
Only if you have where to go. Others may not be that mobile or lucky.
[1] For example, remote systems existed for quite some time, yet we've only seen a few assassination attempts. Does that mean that slaughterbots are not a real issue? It's unclear and too early to say.
[2] For example, high unemployment and low economic mobility in post-WW1 Germany; serfdom in Imperial Russia.
That's why there's no single source that's useful to study issues related to AI. Until we see an incident, we will never know for sure what is just a possibility and what is (not) an urgent or important issue [1].
So, the best we can do is analogize based on analogical things. For example: the centuries of Industrial Revolution and the many disruptive events that followed; history of wars and upheavals, many of which were at least partially caused by labor-related problems [2]; labor disruptions in the 20th century, including proliferation of unions, offshoring, immigration, anticolonialism, etc.
> "Social Media is the original sin"
In the same way that radio, television and the Internet are the "original sin" in large-scale propaganda-induced violence.
> "I want to have long moved on."
Only if you have where to go. Others may not be that mobile or lucky. If autonomous trucks can make the trucking profession obsolete, it's questionable how quickly can truckers "move on".
[1] For example, remote systems existed for quite some time, yet we've only seen a few assassination attempts. Does that mean that slaughterbots are not a real issue? It's unclear and too early to say.
[2] For example, high unemployment and low economic mobility in post-WW1 Germany; serfdom in Imperial Russia.
But you’ll be armed with AI also, if you choose to pick it up. The choice is yours.
I do think that this wave of AI should show we(society, the world, etc...) are not actually prepared for real significant AI break through. Kind of like covid19 in hindsight wasn't as bad as it could of be and we all got really lucky because of that, we really weren't prepared to handle that well either.
Oh boy it's over.
In so many ways too. I cannot fathom the scale of mass data collection and surveillance.
Multiple people I’ve recently spoken with (very smart and educated people) consistently are using it to discuss some of the most intimate things about their life.
Things that no existing social media platforms or any other tools are simply capable of that.
They are going to have in depth summaries on everyone. Our whole security and privacy model up until now has relied on "Technically someone at Google or the government could see my data but realistically they don't have the resources to look at non targets" Now they really will have an agent looking at everything you do and say.
Authoritarian governments are going to have a 1 click CSV export for all the individuals problematic to them based on private conversations.
Build weapons and empower yourself.
To be blunt, this describes sticking one's head in the sand to a tee.
If you're convinced that AI is going to be a force for evil, then fight to make sure that it doesn't happen. If that means you want to slow down AI, then fine, fight to slow it down.
If by skepticism, you mean "desperately" wanting reality to be one way rather than the other, that is not going to significantly affect the trajectory of AI.
Being clear-eyed about where AI is going, and then fighting nonetheless for what you want is the way to go.
Be intellectually honest with yourself. Don't hold on to ideas that you yourself acknowledge are probably wrong simply for their comfort. Don't stick your head in the sand. Assess what you want, and fight for that.
Safeguards and slowing things down will not happen via wishful thinking.
The solution is to put an AI intermediary into interactions. We already should have AI that rewrite the web pages we view into an ad-free format but I guess my ideas on this topic is ahead of the inevitable curve.
>It will destabilize labor at a speed that will make the Industrial Revolution seem like a gentle breeze.
Most of our work and employment lines are a variation of drugery a d slave labor so that's a good thing way overdue.
>It will concentrate immense power and wealth in the hands of people who I don’t trust.
It have democratized the access to consultation expertise and an increasingly widening pool of digital skills/employees for everyone to use and access. A huge amount of things previously locked or restricted by capital access are now freely accessible to literally anyone (with some skill and accuracy issues still to be ironed out).
And this last point is particularly important because we're only going to have more and better AI crop up, and unlike a humans their time isn't priced according to living expenses and hourly wage locked behind formalized business structures with additional layers of human employees that all need to pay rent and eat that drives the cost skywards.
It also matches my own prediction of a mundane non-singularity. Long before we get anything properly superhuman we'll have a proliferation of innumerable sub- or parahuman AI that proliferates and become ambiguous in society and the world.
If AI is truly as revolutionary as it could be... well, who is to say it isn't the pandoras box that destabilizes the tech giants today and gets us back to a place where a team of 10 can genuinely compete against 1000. And not in the "raise cash, build fast, and get out while things are good" trend... but actually in building, small, more principled, companies that aren't pushed to do the unsustainable things that current market pushes them to do.
Once again... it is more likely than not to be a pipe-dream... but I am starting to think it may well be better to be realistic about the momentum this freight train is building and see if it can be repurposed for my world-view rather than to cede the space to the worst of the grifters and profit-seeking-at-all-cost types.
My suspicion is that current sophistication of tech and AI is already enough to fulfill gp's predictions, and it's already doing that.
It may get worse. Imagine the police using AI to interpret evidence against you, get judged by a court that uses AI to write your sentence, based on laws that were also written by AI. Nobody understands this, just listen to the AI.
The other aspect of this is the flood of inane and untrue content. It may go to such an extent that the outlook of the typical person may become incredibly local again, limited to their immediate surroundings and personal experiences, not by choice, but because there won't be any way to obtain any reliable information about the outside world, with no way to sift the real from the unreal.
If powering scams and “destabilizing labor” makes something evil then we should delete the internet. Seriously.
Like, not at all and ignoring it
The Netherlands is filled with AI skeptics. It’s a very human centered country, so perhaps it shouldn’t be a surprise. But when so many top technologists express skepticism, people don’t prepare. They don’t even consider the possibilities. And they don’t learn.
My fear is that many professorial-types express skepticism because it sells well—and it elevates their own standing. They know better—“it’s only predicting the next token”—and people listen to them because of their authority. And then a whole society fails to prepare, to adapt or to learn.
More meaningful, we can influence the context the intelligence explosion will play out in.
So how about we use the occasion to switch our global economic operating system from competition to cooperation in time for the singularity?
> Maybe if there are more of me, things will slow down
Precious little hope in slowing this rocket down when the boosters are just getting fired up..
1. Scams are going to be a massive massive problem. They already are and that is without AI. I think we are going to see communication devices that are default deny and that require significant amounts of vetting before a contact is added.
2. Energy usage is bad but likely a short term problem not a long term one.
It will also make proving your identity harder and more time-consuming.
https://chatgpt.com/share/683f3932-fce0-8012-a108-4b70c3e5fd...
Things change and it's scary, but it usually works out. Or at least we just get used to it.
I am a believer that these tools will be (if they aren’t already) extremely useful. Game changing in many ways, and I worry about the future of knowledge fields. Definitely not a skeptic in the trajectory. I do wonder if that makes those of us who are on the cutting edge more valuable as AI can quickly reproduce things that are well understood.
As a software engineer, I need to call out this rhetoric. We are not all morally bankrupt. Many of the people with the most ethical backbone I know are also software engineers.
I don't use "AI" for the same reason I don't use amphetamine. I'm sure I could get stuff more stuff done faster if I used it, but I don't, because the manufacturing and the distribution of it is completely unethical.
The blatant stealing of intellectual property is only the tip of the shitberg. As bad are the giant ecological footprint[1], the colonialism of the grossly underpaid RLHF, and the conceding of even more of our digital lives to trillion dollar corporations.
We can and must ethically evaluate our work, and that includes our tools. Nobody's perfect, but doing one immoral thing does not mean you get a free pass to skip morality altogether.
1. https://www.greenpeace.de/publikationen/environmental-impact...
Yes, the tech isn't perfect yet. Yes, it means existing industries will be displaced. Yes, it threatens to undermine the identities we have built as useful artists, programmers, data analysts, whatever. It is a scary, uncertain future, and on some days I am overcome with dread about what it means for me.
But ffs, we will figure it out. This technology, if it continues to improve at the pace it has, will open the door to us solving previously intractable problems - problems like curing cancer, endless energy and nuclear fusion, making us 100x more productive at building great housing and infrastructure. Let's get on board and make this thing a success, not fight it every step of the way because we want to preserve some basically arbitrary state of society.
Otherwise, well written and persuasive.
I’ve yet to witness this myself but that’s with my interactions. Which are limited to free offerings.
I think we've passed the threshold beyond which not using AI is just doing things the hard way for no appreciable gain.
I can't prove it but I *think* the answer there is yes.
As someone who was born ambitious I find this technology tepid at best.
Is it really? I scraped all HN comments with the words "Rust" and "LLM" from the last 6 months, put it into Gemini and asked it to surface quotes of people saying they had a good time, versus quotes of the opposite. It was about 60% positive and 40% negative.
Does anyone have recent personal experience with this they can talk about?
When that doesn't work, they start inventing hyper-specific scenarios and graphs with grounding that exists only in their imagination to push the topic further. If the AI can't solve a problem in a domain for you, well, it's that domain that's the problem, not AI. You simply need to re-invent the paradigm around your entire problem - easy stuff.
Finally, when they've failed to convince anyone with a modicum of rationality, they resort to some version of "well, fuck you, it's the future and I'll just be leaving you behind!"
Interestingly, this article has all three.
> Meanwhile, software developers spot code fragments seemingly lifted from public repositories on Github and lose their shit. What about the licensing? If you’re a lawyer, I defer. But if you’re a software developer playing this card? Cut me a little slack as I ask you to shove this concern up your ass. No profession has demonstrated more contempt for intellectual property.
> The median dev thinks Star Wars and Daft Punk are a public commons. The great cultural project of developers has been opposing any protection that might inconvenience a monetizable media-sharing site. When they fail at policy, they route around it with coercion. They stand up global-scale piracy networks and sneer at anybody who so much as tries to preserve a new-release window for a TV show.
So it starts with a humble strawman, and then the author illustrates how dumb they think their fellow developers really are if they have the make different decisions based on different values.
Most developers I interact with who are what you might call 'AI skeptic', are most concerned with how this tech will be used, especially since its creation is rooted in mass scale disregard for ownership of anything.
> "The great cultural project of developers has been opposing any protection that might inconvenience a monetizable media-sharing site.... They stand up global-scale piracy networks",
IMO the author is here projecting or something, cause I literally never met someone with this opinion, and I've also been in tech/developer circles for over 20 years. Personal use of pirated content is very different from commercial distribution or making money of share sites. Not everyone's values are so rooted in making money with complete disregard to the impact of their actions in doing so.
I get it, the author wants to keep using LLMs are for people to stop trying to make them feel bad but trying to make a case for how their arguments are just dumb. But the author completely missed the 'why' behind the arguments. In the end, LLMs are a tool. Use them or not is up to the individual. But that doesn't give a blanket social license to use them in any way people, or more importantly, companies want.
The LLM coding agents might help you fire some swe to save a penny or two, but soon enough, it will also help your customers or suppliers to replace your business model.
I suspect C# Asp.Net Core not so much. Any thoughts?
I haven't used coding agents. Can someone who has (or knows about these things) elaborate, or point me to a good resource? Thanks!
That's really it. Stuff the context, loop over the LLM until work is done.
Also the emphasis on greenfield projects? Starting is by FAR the easiest part. That's not impressive to me. When do we get to code greenfield for important systems? Reminds me of the equally absurd example of language choice. You think you get to choose? What?
Imagine all the code these agents are going to pump out that can never be reviewed in a reasonable time frame. The noise generated at the whim of bike-shedding vibe coders is going to drown all the senior reviewers soon enough. I'll call that Cowboy Coders on Steroids. Anyone with skills will be buried in reviews, won't have time for anything else, and I predict stricter code gen policies to compensate.
But this won't stop it from happening.
What you mean to claim here is that verification is 10x harder than authorship. That's true, but unhelpful to skeptics, because LLMs are extremely useful for verification.
Some answers were trivial to grade—either obviously correct or clearly wrong. The rest were painful and exhausting to evaluate.
Checking whether the code was correct and tracing it step by step in my head was so draining that I swore never to grade programming again.
Sometimes, code is hard to review. It's not very helpful if the reviewer just kills it because it's hard.
I am absolutely still an AI skeptic, but like: we do this at work. If a dev has produced some absolutely nonsense overcomplicated impossible to understand PR, it gets rejected and sent back to the drawing board (and then I work with them to find out what happened, because thats a leadership failure more than a developer one IMO)
A lot of code can not and will not be straightforwardly reviewable because it all depends on context. Using an LLM adds an additional layer of abstraction between you and the context, because now you have to untangle whether or not it accomplished the context you gave it.
Also, and know this doesn't matter, but it's so weird to see this downvoted. That's not an "I disagree" button...
I can review 100x more Go code in a set amount of time than I can, say React.
With Go there are repetitive structures (if err == nil) and not that many keywords, it's easy to sniff out the suspicious bits and focus on them.
With Javascript and all of the useWhatevers and cyclic dependencies and functions returning functions that call functions, it's a lot harder to figure out what the code does just by reading it.
I can think of edge cases where a certain section of code is easier and faster to write than to read, but in general - in our practical day-to-day experiences - reading a lot of code is faster than writing a lot of code. Not just faster but less mentally and physically taxing. Still mentally taxing, yes, but less.
The idea is that people will spend 10x more time reading your code in all future time stacked together. Not that reading and understanding your code once takes 10x the effort of writing it, which is obviously untrue.
Here is the quote from Clean Code, where this idea seems to originate from:
> Indeed, the ratio of time spent reading versus writing code is well over 10 to 1.
It's nice to build throwaway things _so fast_, especially in the sort of fuzzy stuff like frontend where it's fine for it to be completely wrong. And then I can just use my own sense of how computers work to fix up what I care about, delete a bunch of what I don't care about... It's pretty amazing.
For existing projects I have only witnessed garbage output. I know people have success. I haven't seen it.
I have witnessed PMs taking a bullet pointed list of requirements and then use ChatGPT to generate paragraphs of text for some reason. You had the list!
Everyone is still responsible for the code they produce. I review my own PRs before I expect others to, and I don't even use AI. I think what the article describes seems interesting though.
VCs are already doubting if the billions invested into data centers are going to generate a profit [1 and 2].
AI companies will need to generate profits at some point. Would people still be optimistic about Claude etc if they had to pay say $500 per month to use it given its current capabilities? Probably not.
So far the only company generating real profits out of AI is Nvidia.
[1] https://www.goldmansachs.com/insights/articles/will-the-1-tr...
[2] https://www.nytimes.com/2025/06/02/business/ai-data-centers-...
And this is quickly spreading beyond software engineering. Software engineers are just being guinea pigs for agentic AIs eventually popping up in all sectors. Basically, while security and quality issues are being sorted out, it helps having users that are a bit more clued in about what they are doing.
That's why AI investments are so hot right now. Of course there are a lot of AI companies that will fall short. There always are. And companies like Nvidia that will make a lot of money selling GPUs.
But there is some non trivial amount of revenue potential there. Anybody still in denial about that is probably penny pinching.
My guess would also be that at the 100$ price point only one company can be profitable but that is just a very wild guess.
For what I've seen SE's are some of the last to adopt it. My marketing colleague has been overflowing in generic AI crap produced by external writers for over a year now.
Are you "the product" if you're paying $50 but the company also gets $0.35 of value from your data?
If yes I think you're overreacting, if no then I don't think your worries apply to AI subscriptions. (Other worries do, but not that one.)
Anyway, your employer won't ask you for permission, they'll just expect you to use the tools they provide you with. And the responsible IT manager, VP of engineering, or whomever is just going to look at cost/benefit here and come to some conclusion based on that. My guess is most companies will pay up for some decent subscriptions and will probably get some value out of those.
Likewise freelancers might be asked to not use their personal accounts for security reasons, or they might be expected to get a proper paid subscription that provides the right guarantees in the same way that they might be expected to have a laptop to work on. I've had customers providing me with laptops and I've also done projects with my own laptop. It seems to vary.
Even if AI companies can recoup the billions or trillions of dollars invested, how long will that take and what will be their margin after that?
Sure, they are perhaps 6 months behind the closed-source models, and the hardware to run the biggest and best models isn't really consumer-grade yet (How many years could it be before regular people have GPUs with 200+ gigabytes vram? That's merely one order of magnitude away).
But they're already out there. They will only ever get better. And they will never disappear due to the company going out of business or investors raising prices.
I personally only care about the closed sourced proprietary models in so far as they let me get a glimpse of what I'll soon have access to freely and privately on my own machine. Even if all of them went out of business today, LLMs would still have a permanent effect on our future and how I'd be working.
I can guarantee +99.999% of users won't be doing that.
What a strange disclaimer. Image generation was a (career)killer feature long before vibe coding was halfway useful.
That's what a lot of the highest paid programmers work on. I work in such a firm and the agent they have for us is incapable. And it's the latest and greatest from whoever coders love to rave about online.
I love that AI written tests and other generally greenfield things work well and can be vetted quicker than writing and getting the code oneself. But for most more complex tasks it's not there. I'm ready when it is though
… most of the code you write is tedious boilerplate? What do you do all day? I can honestly say the most tedious code I write is migrations, because I have to edit three files. I do this like once a week. It would take way longer to explain to an ai what to do than just do it myself.
… you run the ai on your procrastination and as a result do heads down work? Seriously? 1, programming with AI is like playing a slot machine. It keeps you hooked just on the verge of winning the jackpot for far longer than the tedious task would have distracted you. And let’s say a good agentic case - it runs for 30 minutes before it’s done. That’s not even enough time to “boot up” a serious problem, and your already having to put it aside to go check the work you yourself are saying is mediocre? And you’re advocating to run multiple of these? Even the most ardent vibe covers I know talk about babysitting the agents. How is this unlocking deep work time in any way shape or form?
… craft is relagated to hobbies? I guarantee you, this person loves Linear. It was probably written on a MacBook. This whole “we solve real problems” BS is just a cop out because being great is hard.
… mediocre code is good code? All code is a liability. If you generate 10 times the code, and are admitting it’s the lowest bar of quality possible, it is at a minimum 10x the liability on your shoulders.
… letting LLMs refactor your tests? Serious question, who actually does this? Because I would LOVE to have LLMs write tests for me. I have tried every. single. time. I need to write something that needs a lot of tests to ask LLMs for tests for TWO YEARS, and not once have I ever used a single test it generates. It generates worse tests than it does code, and the last thing I want is 42 asserts in 8 tests across 600 LoC that nobody understands that poorly tests what should have been 2 asserts in 2 tests in 12 lines of code.
Q: but the code that AI generates is incoherent gibberish
Guess what, life’s tough. Fixing incoherent gibberish is part of the job you wuss. Honestly, I bet you write incoherent gibberish. Would you rather be fixing incoherent gibberish that an AI wrote or incoherent gibberish that you wrote?
Q: but I have to re-do everything it touches. Everything takes me ten times as long!
What are you, some kind of vibe-coding YouTuber? Everything taking ten times as long is part of the job, cupcakes. What’s important is that now it takes ten times as long automatically. I like the way Zed does it, where it tells you to leave while it’s changing stuff.
But I’m at least grateful for LLMs making me more aware of how important it is for me to focus on my own skills building. I will never use it because I value having those skills sharp
I think it could do with a little bit more friendly, but… it’s got its own charm.
It also convinced me I should be using agents rather than raw dogging Gemini 2.5 pro
I use Claude for specific questions, it works great (but apparently I’m doing it wrong according to the author), code generation doesn’t, I don’t want my boilerplate code to have subtle differences in every file, that’s the worst kind of boilerplate and that’s what I get if I automate it.
There are real world implications of all the infrastructure being built for this too, not to mention the extremely short e-waste cycles during the boom we’re in right now. To me it also seems like LLMs are the wrong thing to solve for, it’s like a gasoline car, make the engine larger and more fuel consuming to go faster and no matter how much you perfect it, it will still be fossil fuel being burnt at a high rate. What’s the alternative, I don’t think we know yet, but if we were a smarter species we would see that this is not it.
While I don’t believe the “AI will take all jobs” VC tales, I also think not using LLMs in software development workflows is similar to not wanting to use Excel as an accountant in the 1990s.
AI won’t replace all jobs, but companies that cleverly integrate it into their processes will be faster and more efficient compared to those that don’t.
If it becomes so easy to write apps, then everyone can build their own. They won't need your app.
The entire point of most SaaS business revolves around the simple concept/question: build vs. buy.
So if there's no point in buying since building is so easy, then AI isn't helping you like your software product/SaaS.
I think programmers may just become the "IT person" or "webmaster" or "AI guy" at a company that needs software. Not at a separate company building it for others.
This may greatly reduce the need for programmers. This will in turn reduce the need for fancy computers like that expensive MacBook Pro. It will reduce the need for web hosting. Don't get me wrong, these things will still exist...but the demand will decrease.
Essentially it will become the biggest bubble since the dot com bust. It'll be much larger.
The dust will eventually settle and we'll all adapt to a new world with AI...but I think the years and years of over specialization could come to a screeching halt.
The industry will likely implode and suck in all of the orbiting industries along with it. I hope I'm wrong, but I don't think people quite understand here. I greatly suspect there will be far fewer sales of high end computers. Compute power will continue to move to data centers.
Think about it this way. Code aside, because it's not just a coding thing. If everyone can create amazing videos and full on films with AI, then they don't need all those high end machines. They don't need tons of hard drive space that takes all the raw footage and different cuts to make a video. You prompt and the get the exact result you need. Assuming AI gets that good.
So the more we push into AI the greater than "gravity" here so to speak and the stronger the implosion. I just see an entire industry collapsing in on itself personally. If I'm believing all the FOMO logic out there.
If "we'll see our first one person billion dollar company." Whoever said that is a moron. You understand what that means, right? If anyone can do all that, then how could that even exist? Are we all going to become billionaires now? No, to say something as stupid as that is to say software is trivial and worthless. So how could anyone start a software business worth so much?
The issue is maintenance and focus. Sure, smash out an AI accounting app, payroll app, CRM, etc but now you're not in the business of selling hamburgers or whatever, you've become a conglomerate.
But how do you quality check this part? I am proofreading all these transformations and I am not sure if this is really faster than actually recording a VIM macro, which will do the steps I need.
I also dread what's going to happen to the younger generations which have this to replace their real education, but that's a bit offtopic here.
Seriously, this is the most based defense of LLM that I’ve read over the years. All the yapping in the thread only proves your points.
On 'just read the code' - all well and good. Of course this implies insisting on team members who can read the code, will read the code and are empowered to read the code.
Otherwise orgs will try to hire pure vibe coders who aren't interested in that and only allow time for literally just instructing agents to generate code because it sounds cheaper and execs don't understand the nuance so long as it looks like product is shipping - until it all blows up and the one standing senior developer on hand is supposed to fix a prod issue buried in millions of lines of vibe reviewed code ASAP.
On 'but it's cheaper than a junior': cloud hosted LLM systems are currently massively subsidised to an absurd degree. The cost side of things is all smoke and mirrors geared towards accelerated market adoption at all costs. It's not a profitable enterprise at the model development level.
At some point that AI economy is going to expect to make that money back, and future (especially near-future) hardware advancements don't explain where all of that is going to come from.
So genuine question for AI Boosters, what should I be using?
I would like the 10x productivity gains we were promised :)
Tells you what you need to know about the AI culture.
If you don’t consider the human aspect, a LLM coding agent has a similar use than a junior developer.
Of course, humans have advantages and those advantages have value. But the coding agent costs $20 dollars a month.
Replacing humans my machines that are much cheaper isn’t really something new.
I really want to like AI assisted coding, but I guess i am just stuck as to what to do.
I have tried co-pilot, both 2 years ago, and recently. I have tried some other AI plugin on VScode (i thought that was cursor, but I don't see it in the extensions when i now search, so it was probably something else). If anything, I have found with these solutions that they are great for the first few tasks - generate these models, scaffold this, add oauth, etc - but after that they lack all consistency with the code they themselves generated (like if i add a new model, the output is wildly different to what it already created) and it takes more time to rewrite that output than to copy-paste and change whats already there.
@op - could you explain exactly what tooling you are using, maybe give an example of these async agents, etc? Would be most appreciated for us luddites who would love to be more productive.
Side note of interest, from Wikipedia: "Flying Machines Which Do Not Fly" is an editorial published in the New York Times on October 9, 1903. The article incorrectly predicted it would take one to ten million years for humanity to develop an operating flying machine.
It was only six years to go from the first multi-person spacecraft and first spacewalk to the first space station.
Yeah that's my entire point, technological process doesn't have a constant rate of acceleration. Some advances are quickly made one after another and others lag and take a very long time.
Even if technological progress stopped we could have launched enough parts to assemble a colony structure.
In your reply it's equivalent of someone expecting AGI in next decade. The same is when people talk about if AI will take software dev jobs. Some just see all the flows in AI and they know they job is secure. Some other see that they are 2x productive and potentially your team mate not needed anymore. If AI can eliminate 50% of IT jobs in ~10-20 years then thats still job replacement. When we replaced horses with cars that doesn't mean we have no horses today or that nobody ride horses.
This seems an extremely widespread belief among AI boosters.
What technologies existing today might be used for this purpose, assuming no financial or political limitations?
> But it is an inherently plagiarist technology
Was
> Developers frequently engage in copyright infringement, and so will I, so unless if you're a lawyer, shove it up your ass
"I am a bad person so I get to continue being bad" is not the gotcha you think it is, Patrick.
I started out very sceptical. When Claude Code landed, I got completely seduced — borderline addicted, slot machine-style — by what initially felt like a superpower. Then I actually read the code. It was shockingly bad. I swung back hard to my earlier scepticism, probably even more entrenched than before.
Then something shifted. I started experimenting. I stopped giving it orders and began using it more like a virtual rubber duck. That made a huge difference.
It’s still absolute rubbish if you just let it run wild, which is why I think “vibe coding” is basically just “vibe debt” — because it just doesn’t do what most (possibly uninformed) people think it does.
But if you treat it as a collaborator — more like an idiot savant with a massive brain but no instinct or nous — or better yet, as a mech suit [0] that needs firm control — then something interesting happens.
I’m now at a point where working with Claude Code is not just productive, it actually produces pretty good code, with the right guidance. I’ve got tests, lots of them. I’ve also developed a way of getting Claude to document intent as we go, which helps me, any future human reader, and, crucially, the model itself when revisiting old code.
What fascinates me is how negative these comments are — how many people seem closed off to the possibility that this could be a net positive for software engineers rather than some kind of doomsday.
Did Photoshop kill graphic artists? Did film kill theatre? Not really. Things changed, sure. Was it “better”? There’s no counterfactual, so who knows? But change was inevitable.
What’s clear is this tech is here now, and complaining about it feels a bit like mourning the loss of punch cards when terminals showed up.
[0]: https://matthewsinclair.com/blog/0178-why-llm-powered-progra...
This is my experience in general. People seem to be impressed by the LLM output until they actually comprehend it.
The fastest way to have someone break out of this illusion is tell them to chat with the LLM about their own expertise. They will quickly start to notice errors in the output.
If I can reduce this even by 10% for 20 dollars it’s a bargain.
You mean these?
I use AI everyday but you’ve got hundreds of billions of dollars and Scam Altman (known for having no morals and playing dirty) et al on “your” side. The only thing AI skeptics have is anecdotes and time. Having a principled argument isn’t really possible.
The AI skeptics are mostly correctly reacting to the AI hypists, who are usually shitty linkedin influencer type dudes crowing about how they never have to pay anyone again. its very natural, even intelligent to not trust this now that its filling the same bubble as NFTs a few years ago. I think its okay to stay skeptical and see where the chips fall in a few years at this point.
When it works it’s brilliant.
There is a threshold point as part of the learning curve where you realize you are in a pile of spaghetti code and think it actually saves no time to use LLM assistant.
But then you learn to avoid the bad parts - thus they don’t take your time anymore - and the good parts start paying back in heaps of the time spent learning.
They are not zero effort tools.
There is a non-trivial learning cost involved.
We need to understand what kind of guard rails to put these models on for optimal results.
We don’t even have a solid education program for software engineering - possibly for the same reason.
The industry loves to run on the bleeding edge, rather than just think for a minute :)
it might be ok since what you were thinking about is probably not a good idea in the first place for various reasons, but once in a while stars align to produce the unicorn, which you want to be if you're thinking about building something.
caveat: maybe you just want to build in a niche, it's fine to think hard in such places. usually.
Institution scale lack of deep thinking is the main issue.
There's an entire field called computer science. ACM provides curricular recommendations that it updates every few years. People spend years learning it. The same can't be said about the field of, prompting.
How do we know a software engineer is competent? We can’t tell, and damned if we trust that msc he holds.
Computer science, while fundamental, is very little of help in the emergent large scale problems which ”software engineering” tries to tackle.
The key problem is converting capital investment to a working software with given requirements and this is quite unpredictable.
We don’t know how to effectively train software engineers so that software projects would be predictable.
We don’t know how to train software engineers so that employers would trust their degrees as a strong signal of competence.
If there is a university program that, for example FAANGM (or what ever letters forms the pinnacle of markets) companies respect as a clear signal of obvious competence as a software engineer I would like to know what that is.
No, but AI did.
"Artificial intelligence is profoundly — and probably unfairly — threatening to visual artists"
This feels asserted without any real evidence
Far more importantly, though, artists haven't spent the last quarter century working to eliminate protections for IPR. Software developers have.
Finally, though I'm not stuck on this: I simply don't agree with the case being made for LLMs violating IPR.
I have had the pleasure, many times over the last 16 years, of expressing my discomfort with nerd piracy culture and the coercive might-makes-right arguments underpinning it. I know how the argument goes over here (like a lead balloon). You can agree with me or disagree. But I've earned my bona fides here. The search bar will avail.
And so what? Tell it to the Graphviz diagram creators, entry level Javascript programmers, horse carriage drivers, etc. What's special?
> .. and does so by effectively counterfeiting creative expression
What does this actually mean, though? ChatGPT isn't claiming to have "creative expression" in this sense. Everybody knows that it's generating an image using mathematics executed on a GPU. It's creating images. Like an LLM creates text. It creates artwork in the same sense that it creates novels.
> Far more importantly, though, artists haven't spent the last quarter century working to eliminate protections for IPR. Software developers have.
Programmers are very particular about licenses in opposition to your theory. Copyleft licensing leans heavily on enforcing copyright. Besides, I hear artists complain about the duration of copyright frequently. Pointing to some subset of programmers that are against IPR is just nutpicking in any case.
That said, this particular argument you are advancing isn't getting so much heat here because of an unfriendly audience that just doesn't want to hear what you have to say. Or that is defensive because of hypocrisy and past copyright transgressions. It is being torn apart because this argument that artists deserve protection, but software engineers don't is unsound special pleading of the kind you criticize in your post.
Firstly, the idea that programmers are uniquely hypocritical about IPR is hyperbole unsupported by any evidence you've offered. It is little more than a vibe. As I recall, when Photoshop was sold with a perpetual license, it was widely pirated. By artists.
Secondly, the idea -- that you dance around but don't state outright -- that programmers should be singled out for punishment since "we" put others out of work is absurd and naive. "We" didn't do that. It isn't the capital owners over at Travelocity that are going to pay the price for LLM displacement of software engineers, it is the junior engineer making $140k/year with a mortgage.
Thirdly, if you don't buy into LLM usage as violating IPR, then what exactly is your argument against LLM use for the arts? Just a policy edict that thou shalt not use LLMs to create images because it puts some working artists out of business? Is there a threshold of job destruction that has to occur for you to think we should ban LLMs use case by use case? Are there any other outlaws/scarlet-letter-bearers in addition to programmers that will never receive any policy protection in this area because of real or perceived past transgressions?
Again, the argument I'm making regarding artists is that LLMs are counterfeiting human art. I don't accept the premise that structurally identical solutions in software counterfeit their originals.
I think the case we are making is there is no such thing as intellectual property to begin with and the whole thing is a scam created by duck taping a bunch of different concepts together when they should not be grouped together at all.
You probably don't have those views. But I think Thomas' point is that the profession as a whole has been crying "information wants to be free" for so many years, when what they meant was "information I don't want to pay for wants to be free" - and the hostile response to AI training on private data underlines that.
This is the only piece of human work left in the long run, and that’s providing training data on taste. Once we hook up a/b testing on ai creative outputs, the LLM will know how to be creative and not just duplicative. The ai will never have innate taste, but we can feed it taste.
We can also starve it of taste, but that’s impossible because humans can’t stop providing data. In other words, never tell the LLM what looks good and it will never know. A human in the most isolated part of the world can discern what creation is beautiful and what is not.
Is the matrix a ripoff of the Truman show? Is Oldboy derivative of Oedipus?
Saying everything is derivative is reductive.
How is creative expression required for such things?
Also, I believe that we're just monkey meat bags and not magical beings and so the whole human creativity thing can easily be reproduced with enough data + a sprinkle of randomness. This is why you see trends in supposedly thought provoking art across many artists.
Artists draw from imagination which is drawn from lived experience and most humans have roughly the same lives on average, cultural/country barriers probably produce more of a difference.
Many of the flourishes any artist may use in their work is also likely used by many other artists.
If I commission "draw a mad scientist, use creative license" from several human artists I'm telling you now that they'll all mostly look the same.
In what way are these two not the same? It isn't like icons or ui panels are more original than the code that runs the app.
Or are you saying only artists are creating things of value and it is fine to steal all the work of programmers?
Things like this are expressions of preference. The discussion will typically devolve into restatements of the original preference and appeals to special circumstances.
I think graphic designers would be a lot less angry if AIs were trained on licensed work… thats how the system worked up until now after all.
I have a lot of artist friends but I still appreciate that diffusion models are (and will be with further refinement) incredibly useful tools.
What we're seeing is just the commoditisation of an industry in the same way that we have many, many times before through the industrial era, etc.
It’s very unlikely simply training an LLM on “unlicensed” work constitutes infringement. It could possibly be that the model itself, when published, would represent a derivative work, but it’s unlikely that most output would be unless specifically prompted to be.
"Create a video of a girl running through a field in the style of Studio Ghibli."
There, someone has specifically prompted the AI to create something visually similar to X.
But would you still consider it a derivative work if you replaced the words "Studio Ghibli" with a few sentences describing their style that ultimately produces the same output?
This is why all the lobby now pushes the govs to not allow any regulation of AI even if courts disagree.
IMHO what will happen anyway is that at some point the companies will "solve" the licensing by training models purely on older synthetic LLM output that will be "public research" (which of course will have the "human" weights but they will claim it doesnt matter).
It’s important that copyright applies to copying/publishing/distributing - you can do whatever you to copyrighted works by yourself.
Of course, that still won’t make artists happy, because they think things like styles can be copyrighted, which isn’t true.
If we believe that authors should be able decide how their work is used then they can for sure say no machine learning. If we dont believe in intelectual property then anything is for grabs. I am ok with it but the corps are not.
But there is a reason why nobody cares about Adobe AI and everybody uses midjourney…
For some of the free-er licenses this might mostly be just a lack-of-attribution issue, but in the case of some stronger licenses like GPL/AGPL, I'd argue that training a commercial AI codegen tool (which is then used to generate commercial closed-source code) on licensed code is against the spirit of the license, even if it's not against the letter of the license (probably mostly because the license authors didn't predict this future we live in).
That's a bit beside the point, which is that AI will not be just another tool, it will take ALL the jobs, one after another.
I do agree it's absolutely great though, and being against it is dumb, unless you want to actually ban it- which is impossible.
Quite the opposite, I'd say that it's what it has most. What are "hallucinations" if not just a display of immense creativity and intuition? "Here, I'll make up this API call that's I haven't read about anywhere but sounds right".
As to creativity, that's something I know too little about to define it, but it seems reasonable that it's even more "fuzzy" than intuition. On the opposite, causal relationships are closer to hard logic, which is what LLMs struggle with- as humans do, too.
AI currently can’t reliably make 3d objects so AI can’t make you a sculptor.
3D models can be generated quite well already. Good enough for a sculpture.
I'm an engineer through and through. I can ask an LLM to generate images just fine, but for a given target audience for a certain purpose? I would have no clue. None what so ever. Ask me to generate an image to use in advertisement for Nuka Cola, targeting tired parents? I genuinely have no idea of where to even start. I have absolutely no understanding of the advertisement domain, and I don't know what tired parents find visually pleasing, or what they would "vibe" with.
My feeble attempts would be absolute trash compared to a professional artist who uses AI to express their vision. The artist would be able to prompt so much more effectively and correct the things that they know from experience will not work.
It's the exact same as with coding with an AI - it will be trash unless you understand the hows and the whys.
Although I've seen a little American TV ads before, that shit's basically radioactively coloured, same as your fizzy drinks.
I believe you, did you try asking ChatGPT or Claude though?
You can ask them a list of highest-level themes and requirements and further refine from there.
Whatever can be replaced by AI will, cause it is easier for business people to deal with than real people.
> vector art pigeonholes art into something that can be used for machine learning
Look around, AI companies are doing just fine with raster art.
The only thing we agree on is that this will hurt workers
It changed the skill set but it didn’t “kill the graphic arts”
Rotoscoping in photoshop is rotoscoping. Superimposing an image on another in photoshop is the same as with film, it’s just faster and cheaper to try again. Digital painting is painting.
AI doesn’t require an artist to make “art”. It doesn’t require skill. It’s different than other tools
I just don't think the interest of the profession control. The travel agents had interests too!
Technical? Yes. Hardcore expert premium technical, no. The people who want the service can pay someone with basic to moderate skills a few hundred bucks to spend a day working on it, and that's all good.
Could I get an LLM to do much of the work? Yes, but I could also do much of the work without an LLM. Someone who doesn't understand the first principles of domains, Wordpress, hosting and so on, not so much.
They were not rotting platforms when they evaporated jobs at that particular moment, about 10-15 years ago. There's no universe where people are making money making websites. One could easily collect multi thousand dollars per month just making websites awhile ago before twitter/fb pages just on the side. There is a long history to web development.
Also, the day of the website has been over for quite awhile so I don't even buy the claim that social media is a rotting platform.
Or maybe shouldn't enthusiastically repeat the destruction of the open web in favor of billionaire-controlled platforms for surveillance and manipulation.
Getting AI to hallucinate its way into secure and better quality code seems like the antithesis of this. Why don't we have AI and robots working for humanity with the boring menial tasks - mowing laws, filing taxes, washing dishes, driving cars - instead of attempting to take on our more critical and creative outputs - image generation, movie generation, book writing and even website building.
But the analysis doesn't stop there, because after the raw quality wash, we have to consider things LLMs can do profoundly better than human coders can. Codebase instrumentation, static analysis, type system tuning, formal analysis: all things humans can do, spottily, on a good day but that empirically across most codebases they do not do. An LLM can just be told to spend an afternoon doing them.
I'm a security professional before I am anything else (vulnerability research, software security consulting) and my take on LLM codegen is that they're likely to be a profound win for security.
I mean, we do have automation for literally all of those things, to varying degrees of effectiveness.
There's an increasing number of little "roomba" style mowers around my neighborhood. I file taxes every year with FreeTaxUSA and while it's still annoying, a lot of menial "form-filling" labor has been taken away from me there. My dishwasher does a better job cleaning my dishes than I would by hand. And though there's been a huge amount of hype-driven BS around 'self-driving', we've undeniably made advances in that direction over the last decade.
> Employment of travel agents is projected to grow 3 percent from 2023 to 2033, about as fast as the average for all occupations.
The last year there is data for claims 68,800 people employed as travel agents in the US. It's not a boom industry by any means, but it doesn't appear they experienced the apocalypse that Hacker News believes they did, either.
I don't know how to easily find historical data, unfortunately. BLS publishes the excel sheets, but pulling out the specific category would have to be done manually as far as I can tell. There's this, I guess: https://www.travelagewest.com/Industry-Insight/Business-Feat...
It appears at least that what happened is, though it may be easier than ever to plan your own travel, there are so many more people traveling these days than in the past that the demand for travel agents hasn't crashed.
Has some stats. It seems pretty clear the interests of travel agents did not count for much in the face of technological change.
40% of all travel agent jobs lost between 2001 and 2025. Glad I'm not a travel agent.
It was just 2 weeks ago when the utter incompetence of these robots were in full public display [1]. But none of that will matter to greedy corporate executives, who will prioritize short-term cost savings. They will hop from company to company, personally reaping the benefits while undermining essential systems that users and society rely on with robot slop. That's part of the reason why the C-suites are overhyping the technology. After all, no rich executive has faced consequences for behaving this way.
It's not just software engineering jobs that will take a hit. Society as a whole will suffer from the greedy recklessness.
Having plenty of initial discussion and distilling that into requirements documents aimed for modularized components which can all be easily tackled separately is key.
I tried the latest Claude for a very complex wrapper around the AWS Price APIs who are not easy to work with. Down a 2,000 line of code file, I found Claude faking some API returns by creating hard coded values. A pattern I have seen professional developers being caught on while under pressure to deliver.
This will be a boon to the human skilled developers, that will be hired at $900 dollars an hour to fix bugs of a subtlety never seen before.
"Claude gives up and hardcodes the answer as a solution" - https://www.reddit.com/r/ClaudeAI/comments/1j7tiw1/claude_gi...
C++, Linux: write an audio processing loop for ALSA
reading audio input, processing it, and then outputting
audio on ALSA devices. Include code to open and close
the ALSA devices. Wrap the code up in a class. Use
Camelcase naming for C++ methods.
Skip the explanations.
```
Run it through grok: https://grok.com/
When I ACTUALLY wrote that code the first time, it took me about two weeks to get it right. (horrifying documentation set, with inadequate sample code).Typically, I'll edit code like this from top to bottom in order to get it to conform to my preferred coding idioms. And I will, of course, submit the code to the same sort of review that I would give my own first-cut code. And the way initialization parameters are passed in needs work. (A follow-on prompt would probably fix that). This is not a fire and forget sort of activity. Hard to say whether that code is right or not; but even if it's not, it would have saved me at least 12 days of effort.
Why did I choose that prompt? Because I have learned through use that AIs do will well with these sorts of coding tasks. I'm still learning, and making new discoveries every day. Today's discovery: it is SO easy to implement SQLLite database in C++ using an AI when you go at it the right way!
That is why some people don't find AI that essential, if you have the knowledge, you already know how to find a specific part in the documentation to refresh your semantics and the time saved is minuscule.
Write an audio processing loop for pipewire. Wrap the code up in a
C++ class. Read audio data, process it and output through an output
port. Skip the explanations. Use CamelCase names for methods.
Bundle all the configuration options up into a single
structure.
Run it through grok. I'd actually use VSCode Copilot Claude Sonnet 4. Grok is being used so that people who do not have access to a coding AI can see what they would get if they did.I'd use that code as a starting point despite having zero knowledge of pipewire. And probably fill in other bits using AI as the need arises. "Read the audio data, process it, output it" is hardly deep domain knowledge.
A 5 second search on DDG ("easyeffects") and a 10 second navigation on github.
https://github.com/wwmm/easyeffects/blob/master/src/plugin_b...
But that is GPL 3.0 and a lot of people want to use the license laundering LLM machine.
N.B. I already know about easyeffects from when I was seeking for a software equalizer
EDIT
Another 30 seconds exploration ("pipewire" on DDG, finding the main site, then goes on the documentation page, and the tutorial section).
https://docs.pipewire.org/audio-dsp-filter_8c-example.html
There's a lot of way to find truthful information without playing Russian roulette with an LLM.
This might be how one looks at it in the beginning, when having no experience or no idea about coding. With time one will realize it's more about creating the correct mental model of the problem at hand, rather than the activity of coding itself.
Once this realized, AI can't "save" you days of work, as coding is the least time consuming part of creating software.
e.g: MUI, typescript:
// make the checkbox label appear before the checkbox.
Tab. Done. Delete the comment.vs. about 2 minutes wading through the perfectly excellent but very verbose online documentation to find that I need to set the "labelPlacement" attribute to "start".
Or the tedious minutia that I am perfectly capable of doing, but it's time consuming and error-prone:
// execute a SQL update
Tab tab tab tab .... Done, with all bindings and fields done, based on the structure that's passed as a parameter to the method, and the tables and fieldnames that were created in source code above the current line. (love that one).Copilot was what i was looking for, thank you. I have it installed in Webstorm already but I haven't messed with this side of it.
I have an older Mediawiki install that's been overrun by spam. It's on a server I have root access on. With Claude, I was able to rapidly get some Python scripts that work against the wiki database directly and can clean spam in various ways, by article ID, title regex, certain other patterns. Then I wanted to delete all spam users - defined here as users registered after a certain date whose only edit is to their own user page - and Claude made a script for that very quickly. It even deployed with scp when I told it where to.
Looking at the SQL that ended up in the code, there's non-obvious things such as user pages being pages where page_namespace = 2. The query involves the user, page, actor and revision tables. I checked afterwards, MediaWiki has good documentation for its database tables. Sure, I could have written the SQL myself based on that documentation, but certainly not have the query wrapped in Python and ready to run in under a minute.
I think these days coding is 20% of my job, maybe less. But HN is a diverse audience. You have the full range of web programmers and data scientists all the way to systems engineers and people writing for bare metal. Someone cranking out one-off Python and Javascript is going to have a different opinion on AI coding vs a C/C++ systems engineer and they're going to yell at each other in comments until they realize they don't have the same job, the same goals or the same experiences.
Busy code I need to generate is difficult to do with AI too. Because then you need to formalize the necessary context for an AI assistant, which is exhausting with an unsure result. So perhaps it is just simpler to write it yourself quickly.
I understand comments being negative, because there is so much AI hype without having to many practical applications yet. Or at least good practical applications. Some of that hype is justified, some of it is not. I enjoyed the image/video/audio synthesis hype more tbh.
Test cases are quite helpful and comments are decent too. But often prompting is more complex than programming something. And you can never be sure if any answer is usable.
Sure it was easier to do it myself. But putting in the time to train, give context, develop guardrails, learn how to monitor etc ultimately taught me the skills needed to delegate effectively and multiply the teams output massively as we added people.
It's early days but I'm getting the same feeling with LLMs. It's as exhausting as training an overconfident but talented intern, but if you can work through it and somehow get it to produce something as good as you would do yourself, it's a massive multiplier.
But you're not training LLMs as you use them really - do you mean that it's best to develop your own skill using LLMs in an area you already understand well?
I'm finding it a bit hard to square your comment about it being exhausting to catherd the LLM with it being a force multiplier.
Humans really like to anthropomorphize things. Loud rumbles in the clouds? There must be a dude on top of a mountain somewhere who's in charge of it. Impressed by that tree? It must have a spirit that's like our spirits.
I think a lot of the reason LLMs are enjoying such a huge hype wave is that they invite that sort of anthropomorphization. It can be really hard to think about them in terms of what they actually are, because both our head-meat and our culture has so much support for casting things as other people.
With LLMs the better I get at the scaffolding and prompting, the less it feels like catherding (so far at least). Hence the comparison.
You just explained how your work was affected by a big multiplier. At the end of training an intern you get a trained intern -- potentially a huge multiplier. ChatGPT is like an intern you can never train and will never get much better.
These are the same people who would no longer create or participate deeply in OSS (+100x multipler) bragging about the +2x multiplier they got in exchange.
If you know what you're doing you can still "teach" them though, but it's on you to do that - you need to keep on iterating on things like the system prompt you are using and the context you feed in to the model.
If an LLM learned something when you gave it commands, it would probably be reflected in some adjusted weights in some of its operational matrix. This is true of human learning, we strengthen some neural connection, and when we receive a similar stimuli in a similar situation sometime in the future, the new stimuli will follow a slightly different path along its neural pathway and result in a altered behavior (or at least have a greater probability of an altered behavior). For an LLM to “learn” I would like to see something similar.
Admittedly, you have to wrap LLMs to with stuff to get them to do that. If you want to rewrite the rules to excluded that then I will have to revise my statement that it is "mostly, but not completely true".
:-P
I think SSR schedulers are a good example of a Machine Learning algorithms that learns from it’s previous interactions. If you run the optimizer you will end up with a different weight matrix, and flashcards will be schedule differently. It has learned how well you retain these cards. But an LLM that is simply following orders has not learned anything, unless you feed the previous interaction back into the system to alter future outcomes, regardless of whether it “remembers” the original interactions. With the SSR, your review history is completely forgotten about. You could delete it, but the weight matrix keeps the optimized weights. If you delete your chat history with ChatGPT, it will not behave any differently based on the previous interaction.
I wrote a bit about that here - I've turned it off: https://simonwillison.net/2025/May/21/chatgpt-new-memory/
Makes me wonder if there had been equal investment into specialized tools which used more fine-tuned statistical methods (like supervised learning), that we would have something much better then LLMs.
I keep thinking about spell checkers and auto-translators, which have been using machine learning for a while, with pretty impressive results (unless I’m mistaken I think most of those use supervised learning models). I have no doubt we will start seeing companies replacing these proven models with an LLM and a noticeable reduction in quality.
Edit: grammar
I'd challenge this one; is it more complex, or is all the thinking and decision making concentrated into a single sentence or paragraph? For me, programming something is taking a big high over problem and breaking it down into smaller and smaller sections until it's a line of code; the lines of code are relatively low effort / cost little brain power. But in my experience, the problem itself and its nuances are only defined once all code is written. If you have to prompt an AI to write it, you need to define the problem beforehand.
It's more design and more thinking upfront, which is something the development community has moved away from in the past ~20 years with the rise of agile development and open source. Techniques like TDD have shifted more of the problem definition forwards as you have to think about your desired outcomes before writing code, but I'm pretty sure (I have no figures) it's only a minority of developers that have the self-discipline to practice test-driven development consistently.
(disclaimer: I don't use AI much, and my employer isn't yet looking into or paying for agentic coding, so it's chat style or inline code suggestions)
Combine that with when you’re reading the code it’s often much easier to develop a prototype solution as you go and you end up with prompting feeling like using 4 men to carry a wheelbarrow instead of having 1 push it.
Feed it to an llm and it implements it. Ideally it can also verify it's solution with your specification code. If LLMs don't gain significantly more general capabilities I could see this happening in the longer term. But it's too early to say.
In a sense the llm turns into a compiler.
The main lesson has been that it's actually not much of an enabler and the people doing it end up being specialised and rather expensive consultants.
Things are different now.
Disillusionment seems to spring from expecting the model to be a god or a genie instead of a code generator. Some people are always going to be better at using tools than other people are. I don't see that changing, even though the tools themselves are changing radically.
Trust me on this, at least: I don't need the typing practice.
I tried running the idea on a programming task I did yesterday. "Create a dialog to edit the contents of THIS data structure." It did actually produce a dialog that worked the first time. Admitedly a very ugly dialog. But all the fields and labels and controls were there in the right order with the right labels, and were all properly bound to props of a react control, that was grudgingly fit for purpose. I suspect I could have corrected some of the layout issues with supplementary prompts. But it worked. I will do it again, with supplementary prompts next time.
Anyway. I next thought about how I would specify the behavior I wanted. The informal specification would be "Open the Looping dialog. Set Start to 1:00, then open the Timebase dialog. Select "Beats", set the tempo to 120, and press the back button. Verify that the Start text edit now contains "30:1" (the same time expressed in bars and beats). Set it to 10:1,press the back button, and verify that the corresponding "Loop" <description of storage for that data omited for clarity> for the currently selected plugin contains 20.0. I can actually see that working (and I plan to see if I can convince an AI to turn that into test code for me).
Any imaginable formal specification for that would be just grim. In fact, I can't imagine a "formal" specification for that. But a natural language specification seems eminently doable. And even if there were such a formal specification, I am 100% positive that I would be using natural language AI prompts to generate the specifications. Which makes me wonder why anyone needs a formal language for that.
And I can't help thinking that "Write test code for the specifications given in the previous prompt" is something I need to try. How to give my AI tooling to get access to UI controls though....
I agree, but even smaller than thinking in agile is just a tight iteration loop when i'm exploring a design. My ADHD makes upfront design a challenge for me and I am personally much more effective starting with a sketch of what needs to be done and then iterating on it until I get a good result.
The loop of prompt->study->prompt->study... is disruptive to my inner loop for several reasons, but a big one is that the machine doesn't "think" like i do. So the solutions it scaffolds commonly make me say "huh?" and i have to change my thought process to interpet them and then study them for mistakes. My intution and iteration is, for the time being, more effective than this machine assited loop for the really "interesting" code i have to write.
But i will say that AI has been a big time saver for more mundane tasks, especially when I can say "use this example and apply it to the rest of this code/abstraction".
I think we'll find that over the next few years the first really big win will be AI tearing down the mountain of tech & documentation debt. Bringing efficiency to corporate knowledge is likely a key element to AI working within them.
I notice, because the amount of text has been increased tenfold while the amount of information has stayed exactly the same.
This is a torrent of shit coming down on us, that we are all going to have to deal with it. The vibe coders will be gleefully putting up PRs with 12 paragraphs of "descriptive" text. Thanks no thanks!
The fastest way I can transcribe a design is with code or pseudocode. Converting it into English can be hard.
It reminds me a bit of the discussion of if you have an inner monologue. I don't and turning thoughts into English takes work, especially if you need to be specific with what you want.
There's also an intangible benefit of having someone to "bounce off". If I'm using an LLM, I am tweaking the system prompt to slow it down, make it ask questions and bug me before making changes. Even without that, writing out the idea displays quickly potential logic or approach flaws - much fast than writing pseudo in my experience.
Yes, but you're expensive.
And these models are getting better at solving a lot of business-relevant problems.
Soon all business-relevant problems will be bent to the shape of the LLM because it's cost-effective.
Which brings me to your comment. The comparison to Uber drivers is apt, and to use a fashionable word these days, the threat to people and startups alike is "enshittification." These tools are not sold, they are rented. Should a few behemoths gain effective control of the market, we know from history that we won't see these tools become commodities and nearly free, we'll see the users of these tools (again, both people and businesses) squeezed until their margins are paper-thin.
Back when articles by Joel Spolsky regularly hit the top page of Hacker News, he wrote "Strategy Letter V:" https://www.joelonsoftware.com/2002/06/12/strategy-letter-v/
The relevant takeaway was that companies try to commoditize their complements, and for LLM vendors, every startup is a complement. A brick-and-mortar metaphor is that of a retailer in a mall. If you as a retailer are paying more in rent than you're making, you are "working for the landlord," just as if you are making less than 30% of profit on everything you sell or rent through Apple's App Store, you're working for Apple.
I once described that as "Sharecropping in Apple's Orchard," and if I'm hesitant about the direction we're going, it's not anything about clinging to punch cards and ferromagnetic RAM, it's more the worry that it's not just a question of programmers becoming enshittified by their tools, it's also the entire notion of a software business "Sharecropping the LLM vendor's fields."
We spend way too much time talking about programming itself and not enough about whither the software business if its leverage is bound to tools that can only be rented on terms set by vendors.
--------
I don't know for certain where things will go or how we'll get there. I actually like the idea that a solo founder could create a billion-dollar company with no employees in my lifetime. And I have always liked the idea of software being "Wheels for the Mind," and we could be on a path to that, rather than turning humans into "reverse centaurs" that labour for the software rather than the other way around.
Once upon a time, VCs would always ask a startup, "What is your Plan B should you start getting traction and then Microsoft decides to compete with you/commoditize you by giving the same thing away?" That era passed, and Paul Graham celebrated it: https://paulgraham.com/microsoft.html
Then when startups became cheap to launch—thank you increased tech leverage and cheap money and YCombinator industrializing early-stage venture capital—the question became, "What is your moat against three smart kids launching a competitor?"
Now I wonder if the key question will bifurcate:
1. What is your moat against somebody launching competition even more cheaply than smart kids with YCombinator's backing, and;
2. How are you insulated against the cost of load-bearing tooling for everything in your business becoming arbitrarily more expensive?
They're still much cheaper where I am. But regardless, why not take the Uber while it's cheaper?
There's the argument of the taxi industry collapsing (it hasn't yet). Is your concern some sort of long term knowledge loss from programmers and a rug pull? There are many good LLM options out there, they're getting cheaper and the knowledge loss wouldn't be impactful (and rug pull-able) for at least a decade or so.
Saas just seems very much like a terminator seed situation in the end.
I haven't tried Claud code yet however. Maybe that approach is more on point.
Core to Ptacek's point is that everything has changed in the last 6 months. As you and I presume he agree, the use of off-the-shelf LLMs in code was kinda garbage. And I expect the skepticism he's knocking here ("stochastic parrots") was in fact accurate then.
But it did get a lot of people (and money) to rush in and start trying to make something useful. Like the stone soup story, a lot of other technology has been added to the pot, and now we're moving in the direction of something solid, a proper meal. But given the excitement and investment, it'll be at least a few years before things stabilize. Only at that point can we be sure about how much the stone really added to the soup.
Another counterfactual that we'll never know is what kinds of tooling we would have gotten if people had dumped a few billion dollars into code tool improvement without LLMs, but with, say, a lot of more conventional ML tooling. Would the tools we get be much better? Much worse? About the same but different in strengths and weaknesses? Impossible to say.
So I'm still skeptical of the hype. After all, the hype is basically the same as 6 months ago, even though now the boosters can admit the products of 6 months ago sucked. But I can believe we're in the middle of a revolution of developer tooling. Even so, I'm content to wait. We don't know the long term effects on a code base. We don't know what these tools will look like in 6 months. I'm happy to check in again then, where I fully expect to be again told: "If you were trying and failing to use an LLM for code 6 months ago †, you’re not doing what most serious LLM-assisted coders are doing." At least until then, I'm renewing my membership in the Boring Technology Club: https://boringtechnology.club/
Different people have different weird tendencies in different directions. Some people irrationally assume that things aren’t going to change much. Others see a trend and irrationally assume that it will continue on a trend line.
Synthesis is hard.
Understanding causality is even harder.
Savvy people know that we’re just operating with a bag of models and trying to choose the right combination for the right situation.
This misunderstanding is one reason why doomers, accelerations, and “normies” talk past each other or (worse) look down on each other. (I’m not trying to claim epistemic equivalence here; some perspectives are based on better information, some are better calibrated than others! I’m just not laying out my personal claims at this point. Instead, I’m focusing on how we talk to each other.)
Another big source of misunderstanding is about differing loci of control. People in positions of influence are naturally inclined to think about what they can do, who they know, and where they want to be. People farther removed feel relatively powerless and tend to hold onto their notions of stability, such as the status quo or their deepest values.
Historically, programmers have been quite willing to learn new technologies, but now we’re seeing widespread examples where people’s plasticity has limits. Many developers cannot (or are unwilling to) wrap their minds around the changing world. So instead of confronting the reality they find ways to deny it, consciously or subconsciously. Our perception itself is shaped by our beliefs, and some people won’t even perceive the threat because it is too strange or disconcerting. Such is human nature: we all do it. Sometimes we’re lucky enough to admit it.
I do of course agree that some people are just refusing to "wrap their minds around the changing world". But anybody with enough experience in tech can count a lot more instances of "the world is about to change" than "the world really changed". The most recent obvious example being cryptocurrencies, but there are plenty of others. [1] So I think there's plenty of room here for legitimate skepticism. And for just waiting until things settle down to see where we ended up.
Generally speaking, I find it suspect when someone points to failed predictions of disruptive changes without acknowledging successful predictions. That is selection bias. Many predicted disruptive changes do occur.
Most importantly, if one wants to be intellectually honest, one has to engage against a set of plausible arguments and scenarios. Debunking one particular company’s hyperbolic vision for the future might be easy, but it probably doesn’t generalize.
It is telling to see how many predictions can seem obvious in retrospect from the right frame of reference. In a sense (or more than that under certain views of physics), the future already exists, the patterns already exist. We just have to find the patterns — find the lens or model that will help the messy world make sense to us.
I do my best to put the hype to the side. I try to pay attention to the fundamentals such as scaling laws, performance over time, etc while noting how people keep moving the goalposts.
Also wrt the cognitive bias aspect: Cryptocurrencies didn’t threaten to apply significant (if any) downward pressure on the software development labor market.
Also, even cryptocurrency proponents knew deep down that it was a chicken and the egg problem: boosters might have said adoption was happening and maybe even inevitable, but the assumption was right out there in the open. It also had the warning signs of obvious financial fraud, money laundering, currency speculation, and ponzi scheming.
Adoption of artificial intelligence is different in many notable ways. Most saliently, it is not a chicken and egg problem: it does not require collective action. Anyone who does it well has a competitive advantage. It is a race.
(Like Max Tegmark and others, I view racing towards superintelligence as a suicide race, not an arms race. This is a predictive claim that can be debated by assessing scenarios, understanding human nature, and assigning probabilities.)
This was actually the only point in the essay with which I disagree, and it weakens the overall argument. Even 2 years ago, before agents or reasoning models, these LLMs were extremely powerful. The catch was, you needed to figure out what worked for you.
I wrote this comment elsewhere: https://news.ycombinator.com/item?id=44164846 -- Upshot: It took me months to figure out what worked for me, but AI enabled me to produce innovative (probably cutting edge) work in domains I had little prior background in. Yes, the hype should trigger your suspicions, but if respectable people with no stake in selling AI like @tptacek or @kentonv in the other AI thread are saying similar things, you should probably take a closer look.
Sure, but I would argue that the UX is the product, and that has radically improved in the past 6-12 months.
Yes, you could have produced similar results before, manually prompting the model each time, copy and pasting code, re-prompting the model as needed. I would strenuously argue that the structuring and automation of these tasks is what has made these models broadly usable and powerful.
In the same way that Apple didn't event mobile phones nor touchscreens nor OSes, but the specific combination of these things resulted in a product that was different in kind than what came before, and took over the world.
Likewise, the "putting the LLM into a structured box of validation and automated re-prompting" is huge! It changed the product radically, even if its constituent pieces existed already.
[edit] More generally I would argue that 95% of the useful applications of LLMs aren't about advancing the SOTA model capabilities and more about what kind of structured interaction environment we shove them into.
But I think my other point still stands: people will need to figure out for themselves how to fully exploit this technology. What worked for me, for instance, was structuring my code to be essentially functional in nature. This allows for tightly focused contexts which drastically reduces error rates. This is probably orthogonal to the better UX of current AI tooling. Unfortunately, the vast majority of existing code is not functional, and people will have to figure out how to make AI work with that.
A lot of that likely plays into your point about the work required to make useful LLM-based applications. To expand a bit more:
* AI is technology that behaves like people. This makes it confusing to reason about and work with. Products will need to solve for this cognitive dissonance to be successful, which will entail a combination of UX and guardrails.
* Context still seems to be king. My (possibly outdated) experience has been the "right" context trumps larger context windows. With code, for instance, this probably entails standard techniques like static analysis to find relevant bits of code, which some tools have been attempting. For data, this might require eliminating overfetching.
* Data engineering will be critical. Not only does it need to be very clean for good results, giving models unfettered access to the data needs the right access controls which, despite regulations like GDPR, are largely non-existent.
* Security in general will need to be upleveled everywhere. Not only can models be tricked, they can trick you into getting compromised, and so there need to even more guardrails.
A lot of these are regular engineering work that is being done even today. Only it often isn't prioritized because there are always higher priorities... like increasing shareholder value ;-) But if folks want to leverage the capabilities of AI in their businesses, they'll have to solve all these problems for themselves. This is a ton of work. Good thing we have AI to help out!
Building a mental model of a new domain by creating a logical model that interfaces with a domain I'm familiar with lets me test my assumptions and understanding in real time. I can apply previous experience by analogy and verify usefulness/accuracy instantly.
> Upshot: It took me months to figure out what worked for me, but AI enabled me to produce innovative (probably cutting edge) work in domains I had little prior background in. Yes, the hype should trigger your suspicions[...]
Part of the hype problem is that describing my experience sounds like bullshit to anyone who hasn't gone through the same process. The rate that I pick up concepts well enough to do verifiable work with them is literally unbelievable.
Maybe? Social proof doesn't mean much to me during a hype cycle. You could say the same thing about tulip bulbs or any other famous bubble. Lots of smart people with no stake get sucked in. People are extremely good at fooling themselves. There are a lot of extremely smart people following all of the world's major religions, for example, and they can't all be right. And whatever else is going on here, there are a lot of very talented people whose fortunes and futures depend on convincing everybody that something extraordinary is happening here.
I'm glad you have found something that works for you. But I talk with a lot of people who are totally convinced they've found something that makes a huge difference, from essential oils to functional programming. Maybe it does for them. But personally, what works for me is waiting out the hype cycle until we get to the plateau of productivity. Those months that you spent figuring out what worked are months I'd rather spend on using what I've already found to work.
Learning how to use a tool once is easy, relearning how to use a tool every six months because of the rapid pace of change is a pain.
A thing being great doesn’t mean it’s going to generate outsized levels of hype forever. Nobody gets hyped about “The Internet” anymore, because novel use cases aren’t being discovered at a rapid clip, and it has well and throughly integrated into the general milieu of society. Same with GPS, vaccines, docker containers, Rust, etc., but I mentioned the Internet first since it’s probably on a similar level of societal shift as is AI in the maximalist version of AI hype.
Once a thing becomes widespread and standardized, it becomes just another part of the world we live in, regardless of how incredible it is. It’s only exciting to be a hype man when you’ve got the weight of broad non-adoption to rail against.
Which brings me to the point I was originally trying to make, with a more well-defined set of terms: who cares if someone waits until the tooling is more widely adopted, easy to use, and somewhat standardized prior to jumping on the bandwagon? Not everyone needs to undergo the pain of being an early adopter, and if the tools become as good as everyone says they will, they will succeed on their merits, and not due to strident hype pieces.
I think some of the frustration the AI camp is dealing with right now is because y’all are the new Rust Evangelism Strike Force, just instead of “you’re a bad software engineer if you use a memory unsafe languages,” it’s “you’re a bad software engineer if you don’t use AI.”
People have all these feelings about AI hype, and they just have nothing at all to do with what I'm saying. How well the tools work have not much at all to do with the hype level. Usually when someone says that, they mean "the tools don't really work". Not this time.
One big problem with Claude Code vs Cursor is that you have to pay for the cost of getting over the learning curve. With Cursor I could eat the subscription fee and then goof off for a long time trying to figure out how to prompt it well. With Claude Code a bad prompt can easily cost me $5 a pop, which (irrationally, but measurably) hurts more than the one-time monthly fee for Cursor.
Anyway, if you've tried it and it doesn't work for you, fair enough. I'm not going to tell you you're wrong. I'm just bothered by all the people who are out here posting about AI being bad while refusing to actually try it. (To be fair, I was one of them, six months ago...)
While I agree with the skepticism, what specifically is the stake here? Most code assists have usable plans in the $10-$20 range. The investors are apparently taking a much bigger risk than the consumer would be in a case like this.
Aside from the horror stories about people spending $100 in one day of API tokens for at best meh results, of course.
I picked coding again a couple of days back and I’m blown away by how much things have changed
It was all manual work until a few months back. Suddenly, its all agents
I think it's very useful if one wants to properly weigh the value of LLMs in a way that gets beyond the hype. Which I do.
Now he can't - it's too closed and complicated
Yet, modern cars are way better and almost never breakdown
Don't see how LLMs are any different than any other tech advancement that obfuscates and abstracts the "fundamentals".
You'll not only never know this, it's IMHO not very useful to think about at all, except as an intellectual exercise.
I wish i could impress this upon more people.
A friend similarly used to lament/complain that Kotlin sucked in part because we could have probably accomplished it's major features in Java, and maybe without tons of work, or migration cost.
This is maybe even true!
as an intellectual exercise, both are interesting to think about. But outside of that, people get caught up in this as if it matters, but it doesn't.
Basically nothing is driven by pure technical merit alone, not just in CS, but in any field. So my point to him was the lesson to take away from this is not "we could have been more effective or done it cheaper or whatever" but "my definition of effectiveness doesn't match how reality decides effectiveness, so i should adjust my definition".
As much as people want the definition to be a meritocracy, it just isn't and honestly, seems unlikely to ever be.
So while it's 100% true that billions of dollars dumped into other tools or approaches or whatever may have have generated good, better, maybe even amazing results, they weren't, and more importantly, never would have been. Unknown but maybe infinite ROI is often much more likely to see investment than more known but maybe only 2x ROI.
and like i said, this is not just true in CS, but in lots of fields.
That is arguably quite bad, but also seems unlikely to change.
Of course, in aggregate AI makes me capable in a far broader set of problem domains. It would be tough to live without it at this stage, but needs to be used for what it is actually good at, not what we hope it will be good at.
Ultimately the thing that impresses me is that LLMs have replaced google search. The thing that disappoints me is that their code is often convincing but wrong.
Coming from a hard-engineering background, anything that is unreliable is categorized as bad. If you come from the move-fast-break-things world of tech, then your tolerance for mistakes is probably a lot higher.
- Split things into small files, today’s model harnesses struggle with massive files
- Write lots of tests. When the language model messes up the code (it will), it can use the tests to climb out. Tests are the best way to communicate behavior.
- Write guides and documentation for complex tasks in complex codebases. Use a language model for the first pass if you’re too lazy. Useful for both humans and LLMs
It’s really: make your codebase welcoming for junior engineers
Or not. I watched Copilot's agent mode get stuck in a loop for most of an hour (to be fair, I was letting it continue to see how it handles this failure case) trying to make a test pass.
I’m impressed with this latest generation of models: they reward hack a lot less. Previously they’d change a failing unit test, but now they just look for reasonable but easy ways out in the code.
I call it reward hacking, and laziness is not the right word, but “knowing what needs to be done and not doing it” is the general issue here. I see it in junior engineers occasionally, too.
This is how I use it mostly. I also use it for boilerplate, like "What would a database model look like that handles the following" you never want it to do everything, though there are tools that can and will and they're impressive, but then when you have a true production issue, your inability to quickly respond will be a barrier.
To a first approximation, the answer to both of these is "yes".
There is still a lot of graphic design work out there (though generative AI will be sucking the marrow out of it soon), but far less than there used to be before the desktop publishing revolution. And the kind of work changed. If "graphic design" to you meant sitting at a drafting table with pencil and paper, those jobs largely evaporated. If that was a kind of work that was rewarding and meaningful to you, that option was removed for you.
Theatre even more so. Yes, there are still some theatres. But the number of people who get to work in theatrical acting, set design, costuming, etc. is a tiny tiny fraction of what it used to be. And those people are barely scraping together a living, and usually working side jobs just to pay their bills.
> it feels a bit like mourning the loss of punch cards when terminals showed up.
I think people deserve the right to mourn the loss of experiences that are meaningful and enjoyable to them, even if those experiences turn out to no longer be maximally economically efficient according to the Great Capitalistic Moral Code.
Does it mean that we should preserve antiquated jobs and suffer the societal effects of inefficiency without bound? Probably not.
But we should remember that the ultimate goal of the economic system is to enable people to live with meaning and dignity. Efficiency is a means to that end.
I think this ends up being recency bias and terminology hairsplitting, in the end. The number of people working in theatre mask design went to nearly zero quite a while back but we still call the stuff in the centuries after that 'theatre' and 'acting'.
I think "theatre" is a fairly well-defined term to refer to live performances of works that are not strictly musical. Gather up all of the professions necessary to put those productions on together.
The number of opportunities for those professions today is much smaller than it was a hundred years ago before film ate the world.
There are only so many audience members and a night they spend watching a film or watching TV or playing videogames is a night they don't spend going to a play. The result is much smaller audiences. And with fewer audiences, there are fewer plays.
Maybe I should have been clearer that I'm not including film and video production here. Yes, there are definitely opportunities there, though acting for a camera is not at all the same experience as acting for a live audience.
I'm saying an artform that is meaningful to its participants and allows them to make a living wage while enriching the lives' of others should not be thoughtlessly discarded in slave to the almighty god of economic efficiency. It's not special pleading because I'd apply this to all artforms and all sorts of work that bring people dignity and joy.
I'm not a reactionary luddite saying that we should still be using oil streetlamps so we don't put the lamplighters out of work. But at the same time I don't think we should automatically and carelessly accept the decimation of human meaning and dignity at the altar of shareholder value.
No doubt. A few years ago there was some HN post with a video of the completely preposterous process of making diagrams for Crafting Interpreters. I didn't particularly need the book nor do I have room for it but I bought it there and then to support the spirit of all-consuming wankery. So I'm not here from Mitch & Murray & Dark Satanic Mills, Inc either. At the same time, I'm not sold on the idea niche art is the source of human dignity that needs societal protection, not because I'm some ogre but because I'm not convinced that's how actual art actually arts or provides meaning or evolves.
Like another Thomas put it
Not for the proud man apart
From the raging moon I write
On these spindrift pages
Nor for the towering dead
With their nightingales and psalms
But for the lovers, their arms
Round the griefs of the ages,
Who pay no praise or wages
Nor heed my craft or art.
Haha, a good way to describe it. :)
> the idea niche art is the source of human dignity that needs societal protection
I mean... have you looked around at the world today? We've got pick at least some sources of human dignity to protect because there seem to be fewer and fewer left.
Doesn't it mean cinema too? edit: Even though it was clear from context you meant live theatre.
I can only imagine what this technology will be like in 10 years. But I do know that it's not going anywhere and it's best to get familiar with it now.
I’d add that Excel didn’t kill the engineering field. It made them more effective and maybe companies will need less of them. But it also means more startups and smaller shops can make use of an engineer. The change is hard and an equilibrium will be reached.
One thing I wish he would have talked about though is maintenance. My only real qualm with my LLM agent buddy is the tendency to just keep adding code if the first pass didn't work. Eventually, it works, sometimes with my manual help. But the resulting code is harder to read and reason about, which makes maintenance and adding features or behavior changes harder. Until you're ready to just hand off the code to the LLM and not do your own changes to it, it's definitely something to keep in mind at minimum.
Relatively speaking, I would say that film and TV did kill theater
"Garbage in, garbage out", is still the rule for LLM's. If you don't spend billions training them or if you let them feed on their own tail too much they produce nonsense. e.g. Some LLM's currently produce better general search results than google. This is mainly a product of many billions being spent on expert trainers for those LLM's, while google neglects (or actively enshitifies) their search algorithms shamefully. It's humans, not LLM's, producing these results. How good will LLM's be at search once the money has moved somewhere else and neglect sets in?
LLM's aren't going to take everyone's jobs and trigger a singularity precisely because they fall apart if they try to feed on their own output. They need human input at every stage. They are going to take some people's jobs and create new ones for others, although it will probably be more of the former than the latter, or billionaires wouldn't be betting on them.
My obligatory comment how analogies are not good for arguments: there is already discussion here that film (etc.) may have killed theatre.
The models are trained primarily on copyrighted material and code written by the very professionals who now must "upskill" to remain relevant. This raises complex questions about compensation and ownership that didn't exist with traditional tools. Even if current laws permit it, the ethical implications are different from Photoshop-like tools.
Previous innovations created new mediums and opportunities. Photoshop didn't replace artists, because it enabled new art forms. Film reduced theater jobs but created an entirely new industry where skills could mostly transfer. Manufacturing automation made products like cars accessible to everyone.
AI is fundamentally different. It's designed to produce identical output to human workers, just more cheaply and/or faster. Instead of creating new possibilities, it's primarily focused on substitution. Say AI could eliminate 20% of coding jobs and reduce wages by 30%:
* Unlike previous innovations, this won't make software more accessible
* Software already scales essentially for free (build once, used by many)
* Most consumer software is already free (ad-supported)
The primary outcome appears to be increased profit margins rather than societal advancement. While previous technological revolutions created new industries and democratized access, AI seems focused on optimizing existing processes without providing comparable societal benefits.This isn't an argument against progress, but we should be clear-eyed about how this transition differs from historical parallels, and why it might not repeat the same historical outcomes. I'm not claiming this will be the case, but that you can see some pretty significant differences for why you might be skeptical that the same creation of new jobs, or improvement to human lifestyle/capabilities will emerge as with say Film or Photoshop.
AI can also be used to achieve things we could not do without, that's the good use of AI, things like Cancer detection, self-driving cars, and so on. I'm speaking specifically of the use of AI to automate and reduce the cost/speed of white collar work like software development.
Its why it is impacting so many people, but also having very small changes to everyday "quality of life" kind of metrics (e.g. ability to eat, communicate, live somewhere, etc). It arguably is more about enabling greater inequality and gatekeeping of wealth to capital - where intelligence and merit matters less in the future world. For most people its hard to see where the positives are for them long term in this story; most everyday folks don't believe the utopia story is in anyway probable.
Most importantly, I'll embrace the change and hope for the possible abundance.
AI has helped me pick up my pencil and paper again and realize my flawed knowledge, skills, and even flawed approach to AI.
Now i instructed it to never give me code :). not because the code is bad, but my attempts to extract code from it are more based in laziness than efficiency. they are easy to confuse afterall ;(....
I have tons of fun learning with AI, exploring. going on adventures into new topics. Then when i want to really do something, i try to use it for the things i know i am bad at due to laziness, not lack of knowledge. the thing i fell for first...
it helps me explore a space, then i think or am inspired for some creation, and it helps me structure and plan. when i ask it from laziness to give me the code, it helps me overcome my laziness by explaining what i need to do to be able to see why asking for the code was the wrong approach in the first place.
now, that might be different for you. but i have learned i am not some god tier hacker from the spawl, so i realized i need to learn and get better. perhaps you are at the level you can ask it for code and it just works. hats off in that case ;k (i do hope you tested well!)
I’ve seen lots of people:
* think that conflating LLMs and “AI” produces a lot of poorly reasoned arguments
* doubt the economic narratives being built around LLM technology
* think the current rate of progress in the technology is basically flat
* think most “AI companies” resemble most crypto companies
An addendum to the last point: very few crypto skeptics deny that BitCoin is a thing or think it’s going away, either. It’s just strawmanning.
* I don't care what "AI" is. There's a section in the post about this. Nobody needs to update their religious beliefs because of LLMs.
* I don't care about the economic futures of Nvidia or the frontier model companies. I'm not recommending you invest in their stock.
* The article opens with "all progress in LLMs could halt and this would still apply".
* The article doesn't care about "AI companies".
I'm a crypto skeptic and I do deny Bitcoin is a long-term thing, for what it's worth. I'm a hard skeptic of cryptofinance.
The only real dispute about the usefulness of the current technology I hear is the concern that this will leave us without devs who understand the code at a deep level.
Case in point: WWW and the “dot com” bubble. Dot com was a bubble, but the WWW (and Internet) was a big thing that transformed all major societies. It survived the bust that followed after the boom.
Are LLMs here to stay? We can guess, but as usual, time will tell.
And yet I'm overwhelmingly demoralized and depressed about AI's encroachment on software engineering, so much so that I am finding it difficult to marshal cognitive resources. I'm finding it harder and harder to engage with coding problems, even though I love programming.
The last thing I expected was that AI, even kept at a distance, would rob me of the joy I find in one of the things I love most.
It so devalues what I do and my basic capabilities as a human being that I kind of want to die. I'll stick around for my daughter, my partner, and any future children, but I'm not optimistic at all about the future.
...Right?
This is in fact what I am doing (though I am not trying to convince anyone not to use LLMs). How can I get into this other mode? Any pointers? Do I need to pay up for a service? I do not want to carry out vibe coding, I want to use LLMs to solve physics, math and logic problems/puzzles. I'd appreciate any recommendations.
It's faster than opening chatgpt in browser, writing your promt, waiting for the reply, scanning and scrolling through the reply, copying it, inserting it manually, going back to say something doesn't work.
During my time as a programmer (gaming + astrophysics), the following things have happened:
1. The rise of GPGPU programming, which has enabled 1000x increases in performance
2. A revolution in CPU architecture, from single core, to massively multicore
3. C++98 - C++26
4. Transformational increases in CPU memory
5. (In the graphics space): dedicated raytracing hardware, the fully general purpose GPU + OpenCL/CUDA, deferred rendering, FSR/DLSS
6. Mobile phones were invented
7. 64-bit
8. The internet went from being for MSN and facebook to what it is today
9. The rise of Valve/Steam and the incredible transformational existence of the indie games market, which enables individuals to self publish their code and products for virtually nothing
10. Clang was born, and GCC came back to life. MSVC became standards compliant, finally
11. Rust was invented, and is taking off. People actually take security seriously now (!)
12. Unified CPU/GPU memory spaces, GPU architectures becoming scalar etc
All of these have had vastly more impact on my programming than LLM's ever had. I've always a bit wondered who people are who find it transformational, because I can vaguely gesture at any part of my programming tooling and find that its vastly different to 10 years ago
>interact with Git, run existing tooling, like linters, formatters, and model checkers, and make essentially arbitrary tool calls (that you set up) through MCP.
I mean.. you can just do all these things anyway. Its literally more work for me to use an LLM to run a linter than to run a linter (its built into my IDE). So's git. And formatters as well. You can also make arbitrary tool calls.. through your ide/scripts/precommit/postcommit/etc. I have no idea why you'd even want an LLM to do this!
>People complain about LLM-generated code being “probabilistic”. No it isn’t
>The LLM might be stochastic
????????? That's literally the entire point. I want deterministic answers, like clang based autocomplete instead of the nightmare that is prob- i mean stochastic autocomplete
>If hallucination matters to you, your programming language has let you down.
They matter beacuse they waste your time with bullshit that you then have to fix. No programming language can truly enforce correct logic constraints which are one of the primary difficulties with writing code. I literally have no idea what the OP is even doing, actually writing code has always been a vastly minimal amount of time - the main bottleneck is the train of thought to make sure that everything's correct. The literal typing is an afterthought. No programming language can bypass that step, they can (at best) handhold you through certain kinds of problems
>Does an intern cost $20/month? Because that’s what Cursor.ai costs.
For the moment. They'll jack up the prices while enshittifying, and then good luck. I do not want to subscribe to a service to be able to code. I use free IDEs and free tools for exactly this reason. If you rely on a paid-for 3rd party you are doing it wrong and will regret it. This is one of the silliest things in the whole article
>But people select languages in part based on how well LLMs work with them, so Rust people should get on that
They really don't. People select languages based on:
1. What they're taught at school/university
2. What language meets their domain requirements
3. What language their job requires
>Meanwhile, software developers spot code fragments seemingly lifted from public repositories on Github and lose their shit. What about the licensing? If you’re a lawyer, I defer. But if you’re a software developer playing this card? Cut me a little slack as I ask you to shove this concern up your ass. No profession has demonstrated more contempt for intellectual property.
Are you joking. Is this article a troll? Programmers give a lot of a crap about copyright law. GPL/MIT/etc are hugely important, and people respect the ever loving crap about it in general - even most major companies won't just blatantly pirate GPL works
There's a tonne more in this article, but it smells like someone who has literally no idea what they're talking about talking out of their arse, and it shows profoundly
I do use the AI tools to some extent, if for no other reason than that they are currently the path of least resistance, and google+friends have lately played themselves out of the game.
He is probably right we should get acquainted with using agents before dismissing it :-).
imo that tone is exactly what's needed to set right the entitlement that we SWEs have developed over the last decade. The key line in the article gets to the heart of the actual concern developers have imo.
> LLMs really might displace many software developers. That’s not a high horse we get to ride. Our jobs are just as much in tech’s line of fire as everybody else’s have been for the last 3 decades.
I would like to point out: LLM's look really intelligent until you converse with them in a less popular language, it will make weird mistakes and interpret your questions all wrong. I am Dutch and they really suck at that.
You still need to read this tedious code to verify that it actually does what you want it to do. Given this, I'd much rather prefer to write the tedious code myself than having to make sense of someone else's tedious code.
Today these models fall apart very quickly once you step out of js/python/c land to rust/zig/ocaml.
With today LLM tech, feeding it language documentation does not work great.
Giving it only 1 or 2 complex requirements at a time, and then having it iterate, I've found to be more effective.
Most LLMs don't "think", so when asking an LLM something, I generally try to think "would I be able to do this without thinking, if I had all the knowledge, but just had to start typing and go?".
You could break down your prompt into separate prompts like this maybe: https://chatgpt.com/share/683eb7d7-e7ec-8012-8b3b-e34d523dc9...
I think it broke things down in a weird way, but I definitely can't analyse the correctness of anything it outputs in this domain :P
Coding specific agents like Copilot might be better able to handle a complex initial prompt, since they take the initial prompt, and use LLMs to break it down into smaller steps, which ChatGPT doesn't do. They can sort of "think". Deep research AIs also have a sort of thinking too, so might do better.
One could also conclude that a large portion of software engineering is rather is mostly implementation of things that have been implemented many times before and that only a small portion consist of real software engineering where you have to develop code for a problem that nobody ever wrote before or that require a deep understanding of the problem domain.
Strong +1. That's one of the reasons I enjoy using LLMs, cause I'm often using it for the pieces of code which I don't even find rewarding to write; it's just glue code which I know has been written 1000 times over, which _I've_ already written 1000 times over across different projects.
When I added the prompt: 'Could you use the specification given in Section 5.18 of https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-...' it produced almost the same code preceded with some babbling from the document, but not using anything from the specification, not even the code fragments mentioned in the section.
Alas, good luck!
I gave chatgpt (the free desktop web version) my Ruby solution to one of my favorite adventofcode challenges and asked it to write in in Rust. I've no experience in Rust except reading the few first chapters of the book but with the help of the compiler I managed to fix it and it gave the right responses 100x faster than Ruby did. So chatgpt did a great job keeping the logic intact.
I would have done the same thing if LLMs didn't exist, write it in Ruby first and then in Rust. The flexibility and expressiveness of Ruby allows to actually find a working solution. Experienced Rustaceans would do it directly in Rust, but writing in Ruby should be easier and an LLM can transform it.
One point the author might be missing is:
> If you were trying and failing to use an LLM for code 6 months ago †, you’re not doing what most serious LLM-assisted coders are doing
The tech giants and AI bros alike have been bombarding us with "but look what it can do NOW", "the new shiny model is revolutionary, now buy tokens", "look what's around the corner" and whatever else for over 2 years now. Can you really blame a typical "skeptic" for not being interested in putting their time to veryfing these claims themselves after seeing mediocre results so many times? You start to develop a filter for these things.
For most software projections, they cannot even remotely do that - at least not today and as far as I can tell.
And that is to say, it's not that they can write a not-so-large fraction. They cannot write any significant fraction of it. They can often write you a draft that helps you get there faster, though. They can also facilitate web search result consolidation etc. which can also help. But - they can also lead you down wrong paths and dead-ends, on occasion.
> they’re smarter than me. And when they get over this affectation, they’re going to make coding agents profoundly more effective than they are today.
Then I really hope he's in for a surprise.
I'm a bit on the fence myself, as I think it's very harmful, but I can also see ways it can be useful. But it's absolutely mindblowing how this is nearly always completely out of the discussion even though our current way of living and powering things is on a timer and we still haven't addressed it as a whole.
Maybe LLMs will help lead to a breakthrough in nuclear fusion or battery research.
Degrowth is the philosophy of poverty, and a poverty of philosophy.
As for the breakthroughs, maybe they will, maybe they won't; it's not much of an argument.
Bullshit.
Even if this was true (and so far that doesn't seem to be the case), that's not how commodities work.
You can't just measure how much your thing uses, because even if it was running purely on green energy you also have to keep in mind other consumers that end up displaced onto worse sources. The only fair way to measure this is by the worst generator on your grid, because that's who would be shut down w/o your demand.
And even if we assume that the entire grid was green, building out that capacity also has (environmental) costs! As does producing those GPUs, for that matter.
I’m only half joking.
I for one am glad not everybody writes their op-eds while cosplaying as an HR-drone writing press releases.
If you'd please review https://news.ycombinator.com/newsguidelines.html and stick to the rules when posting here, we'd appreciate it. Note these, for example:
"Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."
"When disagreeing, please reply to the argument instead of calling names. 'That is idiotic; 1 + 1 is 2, not 3' can be shortened to '1 + 1 is 2, not 3."
"Please don't fulminate."
> Professional software developers are in the business of solving practical problems for people with code. We are not, in our day jobs, artisans.
Well that's just wrong - we have failed to capture tremendous value through knowledge of our craft and to reduce it down to min-maxing effort/profit is not a pursuit that I can get behind. Nobody is talking about building Ivory towers, but we can do a lot better than most of the common denominator technologies in use today. We're failing our users most of the time (and of course people like the author say it doesn't matter because it doesn't show up in the quarterly results).
> Nobody cares if the logic board traces are pleasingly routed.
Speak for yourself.
If someone doesn't want to learn, there have always been ways to avoid it. Now it's easier with LLMs, that's true. But for people who do want to learn, I think it's also easier now with LLMs.
I'm not sure where the equilibrium will land, in terms of whether we'll end up with more people with a deep understanding, or more people with a shallow understanding. Or maybe the ratio will end up being the same, but there will just be more people in both groups! Ie the ease of learning will pull some of the previously shallow learners into developing a deep understanding, and the ease of coasting will pull up people who were never into development before into being shallow learners.
What I tried is indeed "set up Cursor" or "copy-paste ChatGPT", and it was all bad, I have no idea what is this "agent" and "MCP" stupp.
> I can feel my blood pressure rising thinking of all the bookkeeping and Googling and dependency drama > of a new project. An LLM can be instructed to just figure all that shit out. > Often, it will drop you precisely at that golden moment where shit almost works, and development means > tweaking code and immediately seeing things work better. That dopamine hit is why I code.
This is me . Just yesterday I finished something I have been putting off for 2 years. I maintain 2 mobile phones one because of work and i needed a way to see sms messages from the other phone. I wanted to write a simple andoird app that reads incoming messages and saves them somewhere for me to use for OTP's and other stuff. For the longest time I wanted to build something like this for myself but I have been putting it off just because of all the weird issues i had been getting with mobile development. A couple of days ago since I had some time and I fired up gemini and asked it to write me a simple flutter app for and to save the messages to supabase and voila it was amazing . With some help and some tweaking I have a fully functioning mobile app on andoird , a small tauri application on desktop to read the messages nicely formatted in a data table without much coding on my part at all. Truly remarkable.
I want to see the hype but without paying up front. My focus area is c#, .net, js.
But every time I try this,this happens. I've tried the same thing on multiple agents and they all make up some magic api that doesn't exist, even when I've used the right thing elsewhere in the code to do the same job with that function commented as such. Asking them to use that always fails.
This seems a massive flaw to me that is just hand waved away here.
What about that "superpower" we had to make something out of _nothing_ but our own brains? Sure, you may have looked up some code (do some research and learn), but that mental exercise.. just knowing your code inside-out, can never be compared to the "oh look at that, it actually works" with LLM stuff.
Well guess what, now, you traded that superpower for a subscription! (yes, you are now a middle man of software peddling - paying some company that stole other peoples code and ideas and did not pay _them anything_ in return), and eventually you will get to a point when the internet is down, you won't write any software that day cause you cant get the N model to work.
Did we go to the next step in software evolution? If it has a cost per month (and you pay thieves) (yes that what they are), its going to be a hard "no" for me, you just became a cow. Sure, now this is whats expected of you? OK. This has some strong neoliberalism drift, I assume I can be wrong so please comments welcome I am quite interested in this part.
yes, augmentations seem like amputations, is it McLuhan? I'm not sure, Will we retain our power to produce software? No! Humans dumb down very quickly, a generation is all it took to reduce reading to kids below expected educational levels [2]. There will be less professionals and lots of peddlers that have no skills, and software will be so junk it will be seriously scary to catch a plane in 10 years time.
Interesting times.
[1] https://news.ycombinator.com/item?id=44116724 [2] https://www.theguardian.com/lifeandstyle/2025/jun/02/gen-z-p...
People like to think that intelligence is some intrinsic force but it's far better thought of as situational. We are contextually smarter or dumber based on the emotions we're trying to process from a situation and how our motivation reasoning allows us to avoid certain negative emotions.
The subset of people the OP is trying to argue against are people who are otherwise normally intelligent but have an emotional need for AI to be unimportant that they're unable to process so they build a social community of terrible arguments.
The nature of the internet is that visibility is terribly correlated (and often anti-correlated) with popularity so OP thinks this is some large contingent of people when it's instead algorithmically served conflict generators.
Writing against that is also an algorithmic conflict generator, as evidenced by the number of upvotes here. The correct response is to realize that people putting up terrible arguments about AI don't actually care about AI, they're using this issue to process unresolved trauma from other, unrelated areas of their life and taking their ideas literally instead of seriously is a waste and diversion. There are plenty of smart people with far more nuanced views that still disagree with you where they're interested in the content of the argument.
Maybe you are, but I didn't get a Master's degree to produce quick shit for shareholders
> Does an intern cost $20/month? Because that’s what Cursor.ai costs.
> Part of being a senior developer is making less-able coders productive, be they fleshly or algebraic.
But do you know what another part of being a senior developer is? Not just making them more productive, but also guiding the junior developers into becoming better, independent, self-tasking, senior coders. And that feedback loop doesn't exist here.
We're robbing ourselves of good future developers, because we aren't even thinking about the fact that the junior devs are actively learning from the small tasks we give them.
Will AI completely replace devs before we all retire? Maybe. Maybe not.
But long before that, the future coders who aren't being hired and trained because a senior dev doesn't understand that the junior devs become senior devs (and that's an important pipeline) and would rather pay $20/month for an LLM, are going to become a major loss/ brain drain domestically.
Cursor is a heck of a lot more than $20/month if you actually want it working for a full work day, every day.
This issue manifests a bit differently in people, but I've definitely worked with people (not only juniors) who only have a few productive hours a month in them. And for what it's worth, some of those people were sufficiently productive in those few hours that it was rational for the company to keep them.
He didn't last long.
You could probably hammer the most expensive cursor API all-day every-day and it would still be a fraction of the cost of a junior dev.
Eventually you will get a memory leak even in a GCd language. Eventually there will be some incredibly obscure, unreported bug in a library. Eventually you will find an issue in unmaintained code you depend on. Eventually there will be performance problems caused by too many layers of abstraction.
You either need to know, roughly, how your dependencies work by occasionally digging into their code/reading the documentation. Or you need intuition to know how it probably works, but you usually build that intuition by actually writing/reading code.
Also, I spend a lot of time mentoring, and I'd like to think A will grow to be more like B over time. But now it feels like it's wasted effort to try mentoring those skills if it won't be valued.
I had a rather depressing experience this semester in my office hours with two students who had painted themselves in a corner with code that was clearly generated. They came to me for help, but were incapable of explaining why they had written what was on their screens. I decided to find where they had lost the thread of the class and discovered that they were essentially unable to write a helloworld program. In other words, they lost the thread on day one. Up until this point, both students had nearly perfect homework grades while failing every in-class quiz.
From one perspective I understand the business case for pushing these technologies. But from another perspective, the long term health of the profession, it’s pretty shortsighted. Who knows, in the end maybe this will kill off the group of students who enroll in CS courses “because mom and dad think it’s a good job,” and maybe that will leave me with the group that really wants to be there. In the meantime, I will remind students that there is a difference between programming and computer science and that you really need a strong grasp of the latter to be an effective coder. Especially if you use AI tools.
I see this so much. “Data science major” became the 2020s version of law school. It’s such a double edged sword. It’s led to a huge increase in enrollment and the creation of multiple professional masters programs, so the college loves us. We hire every year and there’s always money for just about anything. On the other hand, class sizes are huge, which is not fun, and worse a large fraction of the students appear to have minimal intrinsic interest in coding or analyzing data. They’re there because it’s where the jobs are. I totally get that, in some sense college has always been that way, but it does make me look back fondly on the days when classes were 1/4 as big and filled with people who were genuinely interested in the subject.
Unfortunately I think I may get my wish. AI is going to eliminate a lot of those jobs and so the future of our field looks a bit bleak. Worse, it’s the very students who are going to become redundant the quickest that are the least willing to learn. I’d be happy to teach them basic analysis and coding skills, but they are dead set on punching everything into ChatGPT.
Is there any interpretation that makes sense _other_ than this?
From a student's perspective: I think it was the same with SO. While LLMs make c&p even easier, they also have the upside of lowering the bar on more complex topics/projects. Nowadays, the average person doesn't touch assembly, but we still had a course where we used it and learned its principles. Software engineering courses will follow suit.
The problem is the lack of analysis that goes into producing a useful question for others that fits in with the rest of the site.
True, proper analysis of the homework rarely yields such questions, and even less so in 2025. But the point was always to have a question that's about a clear, specific problem, not about a task that was assigned. Because the latter can only possibly help people who were assigned the same task.
So while I fully agree with you, this is not a concern for a single decision maker in private company world. And state such as US doesn't pick up this work instead, quietly agreeing with this situation.
Well, think for a second who makes similar budget and long term spending focus. Rich lawyers who chose to become much more rich politicians, rarely somebody else and almost never any more moral profession.
This is nothing new. In a computer graphics class I took over 20 years ago, the median score on the assignments before the midterm was >100% (thanks to bonus questions), yet in midterm prep other students in the class were demonstrating that they didn't even have a firm grasp on the basic concept of a matrix.
That is: they were in a 4th year undergrad course, while doubting material from a senior year high school course where they had to have gotten high marks in order to get into the program.
And the midterm grading was heavily curved as a result (though not as much as in some other courses I took).
Students will do what they need to do for the grade. It seems a great many of them have internalized that none of this is about actually learning anything, even if they would never say so aloud. (I learned things - where I didn't already know them - because it was actually interesting. My resulting grades were pretty good overall, but certainly not top of class.)
> Who knows, in the end maybe this will kill off the group of students who enroll in CS courses “because mom and dad think it’s a good job,”
Why would it? It's becoming easier than ever to fake understanding, and to choose anything else they would need both the opportunity and social permission. I only see the problem getting worse.
My view is that right now, because of the willingness of corporations and other investors to swallow short term (but massive) losses on this, we're basically in AI fiscal fantasy land.
The question we should be asking is how do we get access to these local models in the first place? It's all based on the work of these hyper expensive base models as the best small models are quantisations and distills. None of this works as soon as the profit motive comes into play and companies start gatekeeping effectively, which they will.
Almost every senior developer I know is spending that time making LLM's more productive and useful instead.
Whatever you think the job is of the senior developer, it will not be "coding".
I think people need to stop thinking of themselves as computer programmers and start thinking of themselves as _engineers_. Your job isn't writing programs, your job is _using the technology you have available to solve problems_. Maybe that is through writing code, but maybe it's orchestrating LLM's to write code for you. The important part is solving the problem.
You call it robbing ourselves of good future developers, I call it hourly consultancy rate increase.
Talking to colleagues at work is a chore, and huge risk! Not opportunity! At least AI respects my privacy, and will not get my fired!
In no way can AI be used for anything larger than generating singular functions or anything that would require writing to or modifying multiple files.
Technically you might be able to pull off having AI change multiple files for you in one go, but you'll quickly run into sort of "Adobe Dreamviewer" type of issue where your codebase is dominated by generated code which only the AI that generated it is able to properly extend and modify.
I remember when Dreamviewer was a thing, but you essentialyl had to make a choice between sticking with it forever for the project or not using it at all, because it would basically convert your source code into it's own proprietary format due to it becoming so horribly messy and unreadable.
Regardless, AI is absolutely incredible and speeds up development by a great deal, (even) if you only use it to generate small snippets at the time.
AI is also an absolute godsend for formatting and converting stuff from anything and to anything - you could e.g. dump your whole database structure to Gemini and ask it to generate an API against it; big task, but since it is basically just a conversion task, it will work very well.
Using AI for over a year now on a daily basis, most AI tools do not struggle with Rust, they struggle to provide value. If I do not instruct them to keep the output to the bare minimum, I usually end up with 10x the output what would be a simple solution to my problem.
But I don't want to make the claim lightly, so I did an experiment.
I signed up for copilot pro, and have been using their 'edit' feature. This is more than just their auto complete.
I set myself a goal to create a playable web game of classic frogger.
It took 4 hours with copilot "edit" and my full attention.
I didn't write a single line of code, but I did ask it to refactor and gave it a project description.
I suspect this would have taken me 4 days full time to get to this level.
Try it out:
You could create a Tetris in even less time. Try to create something more novel and less hot on the heatmap of paths walked in gamedev.
Question, how familiar are you with the used technologies? My experience on where Ai have been useful so far is things I don't have a good understanding on but, when I do, its a different ball game, mostly because coding it directly seems faster since I know exactly the behaviour I am looking for and I am not having to deal with unintended consequences.
I see it as the Alice cat thing, when you don't know where you going, any road will take you there. So its been great for exploratory work and prototyping.
I tried doing a warcraft 1 clone, but that felt too complex for the model being used (openai 4.1). That model was just the default setting in copilot.
I dug a little deeper this morning, and it turns out I hadn't actually enabled my copilot 'pro' mode, which has granted access to some more current or dev focused models. So I'll take them for a spin to see what they're capable of.
My goal here is to roughly get a sense for when a task is too complex for an "agent" to handle.
I also want to try adding custom tools to suit certain project needs. For example, Unreal Engine has a python editor interface, so I'd like to have an agent drive the editor to build something. I have my doubts.
Once I have a feeling for what level of complexity can be handled, I'll see if I can manage the tools better using this understanding, by breaking large and complex projects into appropriate chunks of work / complexity.
/endBrainDump
It's not just Rust. It's everything that has been created post the last vacuum. You need a large corpus for LLM to get decent-ish.
Making fun of people who care about copyright (implying that fly abuses GPL for instance) is just icing on the cake.
It's the complete opposite of OP's main language GO, as he says:
> Go has just enough type safety, an extensive standard library, and a culture that prizes (often repetitive) idiom. LLMs kick ass generating it.
Python is an interpreted dynamically typed language and the static type checkers are not there yet (most popular 3rd parties libraries have no type hints for example). Also it allows for many different programming styles that the LLMs struggle to choose from.
1. Every extra line of code is much more risky. 2. It's much harder to verify the LLM's code.
On the other hand I think rust will be in a good place in regards to LLMs in the next few years thanks to the robustness of the language and the quality of its diagnostic messages. Those 2 attributes should compound very well.
* use type hints everywhere.
* use dataclasses everywhere.
* add the dataclass files to your prompt.
* if the produced code is too complex, ask the LLM "Is this the simplest and cleanest way to solve this?"
I've seen Antirez use it for hard to find bugs in the Redis codebase (videos are on YouTube, albeit Italian), I've seen Cursor refactor in a matter of minutes code that would've taken hours, correctly, I've seen security researchers leverage it to find complex edge cases, etc
My 2 cents is that developers that won't learn to use the tools will be left behind (albeit not for long imho), but also, that sales people promoting the idea that devs can be replaced equally dangerous.
Anyway in a couple of months I'll try again. Until that time, I guess the search result summaries are nice?
I've been using Zed and Claude Sonnet 4 (and sometimes trying Opus) heavily over the past weeks. For small edits where I have lots of unit tests, the results were great. So great that they worry me with regards to job security. For exploring a new programming domain it was also somewhat useful. I work a lot with the Typescript compiler API right now, and it has almost no documentation. Since the AI can see into every GitHub repository out there, it's much better, and more efficient, at learning APIs based on code from other folks. On the other hand it means I don't do that job, and I am forced to rely 100% on how the AI presents the Typescript compiler API to me. Are there better methods I could use? Who knows.
Where it's abysmal is code architecture. Sometimes it's almost comical: it adds an if statement to handle one highly specific edge case in a program that only makes sense if it solves the general case. This didn't happen often thought.
The hardest part was to force it to reuse existing code from the same file. My use case is transforming a Typescript AST into a GraphQL AST. The code is one big switch statement with lots of recursive calls. The AI would often add 300 lines of code that duplicate some logic which already exists somewhere else.
In the end I rewrote the whole thing from scratch. At around 900 lines of code the AI was starting to really struggle. When I wanted to take over, I realized that I didn't have the in-depth knowledge to do so. And trying to understand the code the AI had written proved futile.
Ultimately that's on me, I should have been more diligent reviewing the dozens of 300 line of code changes the AI throws at me over the course of a day. But I wasn't, because reviewing is really, really hard. For many reasons. And AI makes it even harder.
Am I therefore nuts? I find this whole article extremely one sided. Surely, based on the sheer amount of both positive and negative press, the answer is somewhere in the middle.
Very true and humbling.
And who the fuck will train those interns, now they've been replaced by machines?
https://fly.io/blog/vscode-ssh-wtf/
Where he dunks on how SSH access works in VSCode. I don't know. The code and architecture behind this feature may well be bananas, but gets the work done. Sounds like a clear case of mediocority. I wonder how does he reconcile those two articles together.
For me this is more of a clickbait. Both of the articles. With that in mind, if I am nuts for being sceptical of LLMs, I think it is fair to call the author a clickbaiter.
(I suspect I know the answer, but I am not certain of it.)
He's dunking on the design of it.
I've used LLMs to crank out code for tedious things (like generating C-APIs and calling into poorly documented libraries) but I'm not letting them touch my code until I can run it 100% locally offline. Would love to use the agentic stuff but from what I've heard it's still too slow to run on a high end workstation with a single 4080.
Or have things got better lately, and crucially is there good VisualStudio integration for running local agents / LLMs?
But even if I did, there's a much more solid foundation of trust there, whereas these AI companies have been very shady with their 'better to ask for forgiveness, than permission' attitudes of late.
You can do this already with Ollama, RooCode, and a Docker compatible container engine.
Why haven't we seen an explosion of new start-ups, products or features? Why do we still see hundreds of bug tickets on every issue tracking page? Have you noticed anything different on any changelog?
I invite tptacek, or any other chatbot enthusiast around, to publish project metrics and show some actual numbers.
The enthusiasts have a cognitive dissonance because they are pretty sure this is huge and we’re living in the future, so they go through various denial strategies when the execs ask them where the money is.
In this case it’s blame. These darned skeptics are ruining it for everyone.
You're posting this question on a forum hosted by YC. Here's a story from March 2024: "YC’s latest W24 batch includes 240 companies. A significant portion of the companies have some AI component, with 63% tagged as “Artificial Intelligence” — a notable increase from 51% in the preceding S23 batch and 29% before that.". https://jamesin.substack.com/p/analysis-of-ycs-latest-w24-ba...
I've not seen the same analysis for more recent batches.
1. A huge part of the demographic group visiting HN is biased in favor of AI given the sort of startups YN decides to fund.
2. The large amount of start-ups funded by HN that are related to AI should answer your question.
I am slightly leaning towards the first one combined with a little bit of the latter one. A lot of people working in startups will be used to building up a structure from scratch where incorporating the latest "thing" is not that big of a deal. It also means they rarely see the long term impact of the code they write.
They have a huge blind spot for the reality of existing code bases and company structures where introducing these tools isn't as easy and code needs to be maintained for much longer.
I categorize that as "an explosion", personally. Do you disagree?
The amount of start-ups getting into YC hasn't really changed YoY: The W23 batch had 282 companies and W24 260.
Agreed.
> I think percentage of accepted startups doing X carries more information than total number of startups accepted as a whole.
Disagreed, I think it carries almost no information at all. The only thing your analysis signifies is that companies believe that using AI will make YCombinator more likely to fund them. They are competing for a limited number of slots and looking for ways to appeal to YCombinator. And the only thing that signifies is that YCombinator, specifically, likes funding AI startups right now.
This is not surprising. If you're an investment firm, funding AI companies is a good bet, in the same way that funding Web3 firms used to be a genuinely good bet. Investors ride market trends that are likely to make a company balloon in value or get bought out by a larger company. Investors are not optimizing for "what redefines the industry"; investors optimize to make a return on investment. And those are two entirely different things.
But it's also not surprising given YCombinator's past - the firm has always kind of gravitated towards hype cycles. It would be surprising if YCombinator wasn't following a major tech trend.
If you want evidence that we're seeing an explosion of companies, you need to look at something much more substantial than "YCombinator likes them".
And that's especially given the case that OP wasn't asking "is the tech industry gravitating towards AI?" They were asking, "are we seeing an explosion of new economic activity?"
And frankly, we're not. There are a lot of reasons for that which could have nothing to do with AI (tariffs and general market trends are probably a bigger issue). But we really aren't seeing the kind of transformation that is being talked about.
His point is that if AI were so great, loads of NON AI startups would be appearing because the cost to make a company should have dramatically dropped and new opportunities for disruption in existing businesses should be available.
His point is that they aren't.
You pointing at AI startups in YC actually highlights the opposite of what you think it does. People are still looking for the problems for AI to solve, not solving problems with AI.
Your example is actually a bell-weather that there is no great leap forward yet, otherwise the companies delivering real world value would be taking spots in YC from the AI tooling companies. Because they'd be disrupting existing businesses and making lots of money, instead of trying to sell AI tools.
It's like you pointing at the large batches of YC companies doing crypto 5/10 years ago and saying that proves crypto is a game changer and everyone would soon be using crypto in their development.
The YC companies are focused on the AI tool hype, not making money by solving real world problems.
I see new AI assisted products everyday, and a lot of them have real usage. Beyond the code-assistants/gen companies which are very real examples, here's an anecdote.
I was thinking of writing a new story, and found http://sudowrite.com/ via an ad, an ai assistant for helping you write, its already used by a ton of journalists and serious writers, and am trying it out.
Then i wanted to plan a trip - tried google but saw nothing useful, and then asked chatgpt and now have a clear plan
I am not seeing anything indicating it is actually used by a ton of journalists and serious writers. I highly doubt it is, the FAQ is also paper thin in as far as substance goes. I highly doubt they are training/hosting their own models yet I see only vague third party references in their privacy policy. Their pricing is less than transparent given that they don't really explain how their "credits" translate to actual usage. They blatantly advertise this to be for students, which is problematic in itself.
This ignores all the other issues around so heavily depending on LLMs for your writing. This is an interesting quirk for starters: https://www.theguardian.com/technology/2024/apr/16/techscape... . But there are many more issues about relying so heavily on LLM tools for writing.
So this example, to me, is actually exemplifying the issue of overselling capabilities while handwaving away any potential issues that is so prevalent in the AI space.
We released our own fiction-specific model earlier this year - you can read more it at https://www.sudowrite.com/muse
A much-improved version 1.5 came out today -- it's preferred 2-to-1 vs Claude in blind tests with our users.
You're right on the faq -- alas, we've been very product-focused and haven't done the best job keeping the marketing site up to date. What questions do you wish we'd answer there?
Your marketing material quotes a lot of journalists, giving the impression they too use it a lot. I have my reservations about LLMs being used for professional writing, but for the moment I'll assume that Muse handles a lot of those concerns perfectly. I'll try to focus on the more immediate and actual concerns.
Your pricing specifically has a "Hobby & Student" section which mentions "Perfect for people who write for fun or for school". This is problematic to me, I'll get to why later when I answer you question about things missing from the FAQ.
> What questions do you wish we'd answer there?
Well it would be nice if you didn't hand wave away some actual potential issues. The FAQ also reads more like loose marketing copy than a policy document.
- What languages does Sudowrite work in?
Very vague answer here. Just be honest and say it highly depends on the amount of source material and that for many languages the result likely will be not that good.
- Is this magic?
Cute, but doesn't belong in a FAQ
- Can Sudowrite plagiarize?
You are doing various things here that are disingenuous.
You basically talk around the issue by saying "well, next word prediction isn't exactly plagiarism". To me it strongly suggests the models used have been trained on material that you can plagiarize. Which in itself is already an issue.
Then there is the blame shifting to the user saying that it is up to the user to plagiarize or not. Which is not honest, the user has no insights in the training material.
"As long as your own writing is original, you'll get more original writing out of Sudowrite." This is a probabilistic statement, not a guarantee. It also, again is blame shifting.
- Is this cheating?
Way too generic. Which also brings me to you guys actively marketing to students. Which I feel is close to moral bankruptcy. Again, you sort of talk around the issue are basically saying "it isn't cheating as long as you don't use it to cheat". Which is technically true, but come on guys...
In many contexts (academia, specific writing competitions, journalism), using something like sudowrite to generate or significantly augment text would be considered cheating or against guidelines, regardless of intent. In fact, in many school and academic settings using tools like these is detrimental to what they are trying to achieve by having the students write their own text from scratch without aid.
- What public language models does Sudowrite use and how were they trained?
Very vague, it also made me go back to the privacy policy you guys have in place. Given you clearly use multiple providers. I then noticed it was last updated in 2020? I highly doubt you guys have been around for that long, making me think it was copy pasted from elsewhere. This shows as it says "Policy was created with WebsitePolicies." which just makes this generic boilerplate. This honestly makes me wonder how much of it is abided by.
It being so generic also means the privacy policy does not clearly mention these providers while effectively all data from users likely goes to them.
*This is just about current questions in the FAQ*. The FAQ is oddly lacking in regards to muse, some of it is on the muse page itself. But that there I am running in similar issues
- Ethically trained on fiction Muse is exclusively trained on a curated dataset with 100% informed consent from the authors.
Bold and big claim. I applaud it if true, but there is no way to verify other than trusting your word.
There is a lot more I could expand on. But to be frank, that is not my job. You are far from the only AI related service operating in this problematic way. It might even run deeper in general startup culture. But honestly, even if your service is awesome and ethically entirely sound I don't feel taken seriously by the publicly information you provide. It is almost if you are afraid to be real with customers, to me you are overselling and overhyping. Again, you are far from the only company doing so, you just happened to be brought up by the other user.
I won't go line-by-line here defending the cutesy copy and all that since it's not my job to argue with people on the internet either… but on a few key points that interested me:
- language support: I don't believe we're being disingenuous. Sudowrite works well in many languages. We have authors teaching classes on using Sudowrite in multiple languages. In fact, there's one on German tomorrow and one on French next week: https://lu.ma/sudowrite Our community runs classes nearly every day.
- student usage - We do sometimes offer a student discount when people write in to ask for it, and we've had multiple collage and high school classes use sudowrite in writing classes. We'll often give free accounts to the class when professors reach out. I don't believe AI use in education is unethical. I think AI as copilot is the future of most creative work, and it will seem silly for teachers not to incorporate these tools in the future. Many already are! All that said, we do not market to students as you claim. Not because we think it's immoral -- we do not -- but because we think they have better options. ChatGPT is free, students are cheap. We make a professional tool for professional authors and it is not free nor cheap. It would not make sense for our business to market to students.
- press quotes -- Yes, we quote journalists because they're the ones who've written articles about us. You can google "New Yorker sudowrite" etc and see the articles. Some of those journalists also write fiction -- that one who wrote the New Yorker feature had a book he co-wrote with AI reviewed in The New York Times.
> I then noticed it was last updated in 2020? I highly doubt you guys have been around for that long
So many of these objections feel bizarre to me because they're trivial to fact-check. Here's a New York Times article that mentions us, written in 2020. We were one of the first companies to use LLMs in this wave and sought and gained access to GPT-3 prior to public API availability. https://www.nytimes.com/2020/11/24/science/artificial-intell...
I realize they don't make sense to you, otherwise the website would contain different information. If I had to try to frame it more clearly I'd say that for a company whose core product revolves around clear writing, your website's information is surprisingly vague and evasive in some areas. I simply think it would make for a stronger and confident message if that information was just there. Which, might I remind you, I have said is true for many companies selling LLM based services and products.
> language support: I don't believe we're being disingenuous. Sudowrite works well in many languages.
I am sure it does, those languages with the highest presence in the training data. French and German doing well doesn't surprise me given the numbers I have seen there. I think this FAQ section could be much clearer here.
> we do not market to students as you claim.
I guess that your pricing page specifically has a "Hobby & Student" tier which mentions "Perfect for people who write for fun or for school" doesn't count as marketing to students?
> I don't believe AI use in education is unethical.
Neither do I, if it is part of the curriculum and the goal. For many language related course including writing using assistive tooling, certainly tooling that highly impacts style defeats the didactic purpose.
> So many of these objections feel bizarre to me because they're trivial to fact-check. Here's a New York Times article that mentions us, written in 2020. We were one of the first companies to use LLMs in this wave and sought and gained access to GPT-3 prior to public API availability.
Okay, I already went out of my way to go over the entire website because you asked. I am not doing a hit piece on you guys specifically as I specifically said you just happened to be linked by the other person. It was an assumption on my side, but reasonable given the age of most LLM companies. More importantly, that is not the main point I am making there anyway.
Since 2020 the landscape around LLMs changed drastically. Including the way privacy is being handled and looked at. You would think that this would result in changes to the policy in that period. In fact, I would think that the introduction of your own model would at the very least warrant some changes there. Not to mention that using copy pasted boilerplate for 5 years to me does not give a confident signal about how seriously you are taking privacy.
While you are not obligated to respond to me as I am just one random stranger on the internet. I would be remiss if I didn't make it clear that it is the overall tone and combined points that make me critical. Not just the ones that piqued your interest.
At least that's what I do and what I see among friends.
If you end up finishing it in 6 months, are you going to revise that estimate, or celebrate the fact that you don't need to wait until 2092 to use the project?
You did a years worth of work in 3 days? That is what 100x means.
At that stage, the real value will lie in the remaining 10%—the part that requires human judgment, creativity, or architectural thinking. The rest will be seen as routine: simple instructions, redundant CRUD operations, boilerplate, and glue code.
If we focus only on the end result, human will inevitably write less code overall. And writing less code means fewer programming jobs.
Call me naive, but you'd think that these specifically want to demonstrate how well their product works. Making an effort to distinguish PRs that are largely the work of their own agents. Yet, I am not seeing that.
I have no doubt that people find use in some aspects of these tools. Though I personally more subscribe to the interactive rubber ducky usage of them. But 90% from where I am standing seems like a very, very far way off.
People don't like working for free, either by themselves or with an AI agent.
2) Did you stop reading after that sentence? Because there is a whole lot more that follows, specifically:
> If I need to target it even more directly, why am I a not seeing hints of this being applied on code agent repositories? Call me naive, but you'd think that these specifically want to demonstrate how well their product works. Making an effort to distinguish PRs that are largely the work of their own agents. Yet, I am not seeing that.
I’m nonetheless willing to be patient and see how it plays out. If I’m skeptical about some grandiose claims I must also be equally skeptical and accepting about the possibility of large scale effects happening but not being apparent to me yet.
In my personal experience (LLM and code suggestion only) it's because I use LLMs to code unimportant stuff. Actually thinking what I want to do with the business code is exhausting and I'd rather play a little with a fun project. Also, the unit tests that LLMs can now write (and which were too expensive to write myself) were never important to begin with.
Then Mozilla and Google did things with it that I did not think were possible for them to do. Not "they wrote a bunch of code with it", stuff like "they eliminated an entire class of bugs from a section of their codebase."
Then I watched a bunch of essentially hobby developers write kernel drivers for brand new architecture, and watched them turn brand new Macbooks into one of the best-in-class ways to run Linux. I do not believe they could have done that with their resources at that speed, using C or C++.
And at that point, you kind of begrudgingly say, "okay, I don't know if I like this, but fine, heck you, whatever. I guess it might genuinely redefine some parts of software development, you win."
So this is not impossible. You can convince devs like me that your tools are real and they work.
And frankly, there are a billion problems in modern computing that are high impact - stuff like Gnome accessibility, competitive browser engines, FOSS UX, collaboration tools. Entire demographics who have serious problems that could be solved by software if there was enough expertise and time and there were resources to solve them. Often, the issue at play is that there is no intersection between people who are very well acquainted with those communities and understand their needs, and people who have experience writing software.
In theory, LLMs help solve this. In theory. If you're a good programmer, and suddenly you have a tool that makes you 4x as productive as a developer: you could have a very serious impact on a lot of communities right now. I have not seen it happen. Not in the enterprise world, but also not in the FOSS world, not in communities with lower technical resources, not in the public sector. And again, I can be convinced by this, I have dismissed tools that I later switched opinions on because I saw the impact and I couldn't ignore the impact: Rust, NodeJS, Flatpak, etc, etc.
The problem is people have been telling me that Coding Assistants (and now Coding Agents) are one of those tools for multiple years now, and I'm still waiting to see the impact. I'm not waiting to see how many companies pick them up, I'm not waiting to see the job market. I'm waiting to see if this means that real stuff starts getting written at a higher quality significantly faster, and I don't see it.
I see a lot of individual devs showing me hobby projects, and a lot of AI startups, and... frankly, not much else.
> Professional software developers are in the business of solving practical problems for people with code.
Are we? Is that our job? You might think it is, and it might even feel like it. But it is not. Our job, in 99% of cases, is to make stock price go up. That means making software that people will buy, making people want to keep buying it, and doing whatever makes your investors happy.
So what happens as you use these tools? Your job changes from coder to "LLM manager", but let's assume you're happy with that. You're more "productive", which is an artificial metric and nothing to do with your real job as described above. That's fine for now, but what happens down the road? Now your company is paying more and more money to these LLM services, and paying you less and less.
> "but they take-rr jerbs" So does open source.
This is a gross misunderstanding of open source. Most open source developers are paid to be, full time, by companies. Why? Warm fuzzy feeling? Goodness of their heart? No. It's because it's good for business. These open source techs are a baseline, there' not worth being competitive on. Build the open base in the direction you want, and build the "business value" ontop for profit.
> We used to pay good money for databases.
We still do. Have you looked at managed PostgreSQL prices lately? Also if you haven't noticed, Oracle hasn't gone anywhere.
> LLMs really might displace many software developers. That’s not a high horse we get to ride. Our jobs are just as much in tech’s line of fire as everybody else’s have been for the last 3 decades.
This isn't the same thing as replacing punch cards, or moving away from Java 6, or replacing artisans with factory machines. This is centralisation and rent-seeking. As companies become more and more dependant on a handful of companies with the upfront capital to build massive server farms... what do you think is going to happen? Once they've convinced enough people that LLMs are essential to their work it's going to cost a lot more than $20/month. Once these companies are entrenched they will wring every last dollar out of every company that's grown to depend on them, many will simply fold.
Maybe we are in a seismic shift in our industry. But the beneficiaries are not going to be whole-hearted LLM adopters, no amount of saying "you need to change how you work" is going to save any of us. The only people that will benefit are going to be the people that own the data centres.
Best use of evidence is deductive: Lots of code is tedious and uninteresting -> LLMs are fast at generating lots of tedious code -> LLMs help productivity.
Weakest part of the argument: The list of rebuttals doesn't have an obvious organization to it. What exactly is the main argument they're arguing against?
It's not stated outright but because the post is bookended by references to 'those smarter than me', I think this is an argument against the shaming of developers using (and loving) LLM tools.
Which I think is fair.
Overall, the post did not add anything to the general discussion. But the popularity of the author (and fly.io posts) may make it a beacon for some.
But agents like Claude Code and Cursor run on a user’s machine right? So how can they work in the background like that? are there agents that can work async?
That is not my objection, though some seem to think it is.
I got into coding because I found it interesting and fun. I've avoided managing others because I hate that sort of thing. Working with an LLM, prompting it to do work then assessing that work and adjusting the prompts or cleaning/completing the output myself, is far too much like outsourcing work to a 3rd party junior, and I don't want to do that whether said junior is a human in a far off land (or just down the street, for all I know!) or some LLM run by MegaCorp-ᴙ-Us.
I was already seriously disliking much of the modern programming industry (remote teams, the shit-show that client-side web work continues to devolve into, and much else) anyway. If it gets to the point where I can't do my job (or can't do it competitively enough) without using LLMs on top of dealing with the other things I don't care for, I'll be retraining for something else entirely. Heck, I've got the mortgage paid, I don't have or plan to have a family beyond the cats, I could just about survive on shelf-stacking…
I know it isn't a passing fad, though I'm convinced the current and near-future iterations aren't going to be quite as significant in the longer term as many believe. I don't object to others using it (aside from the matter of “MegaCorp-ᴙ-Us”s of the world suddenly deciding that piracy is just fine because they stand to benefit from it, in fact are already benefitting from it, in this case). But I don't plan to be nagged into using it myself.
> Do it on your own time.
If/when that career shift happens, or I survive as I am long enough to retire, I will probably do exactly that. As it stands I spend more than enough of my life on tech in DayJob, my own time is increasingly filled with unrelated hobbies and people.
With the amount of supervision you need to give this stuff, I can't see it being useful for much more than auto-completing the next few lines of code as you type. You've got to read and go over the whole output carefully anyway, so you might as well have finer grained control over it.
RIP Junior devs
It's improved to the point where the summary that I would email a junior developer is a pretty good prompt that produces code pretty close to want I want.
I had a weird project recently where I needed a stand alone event handler that would send an email receipt to the customer, another email to customer service, log the event in an accounting system via an API. I wrote up a proposal for the rest of the management team and, once accepted, pasted the proposal into ChatGPT (and a few others, but ChatGPT gave the best results).
I basically got a full recipe for setting up AWS API Gateway, Lambda, and SES, the Python Lambda function with api calls to SES, suggestions on using AWS SAM, etc. etc. Tweaked the prompt a couple of times and then took over from there. I had this into the hands of the testing team in less than a day.
I'm pretty sure this would reduce my team size in ANY role I've had in my career. Maybe not halve the teams, but it would eliminate a lot of the junior or journeymen developers.
Fast forward 10 years, the skeptics will be right on a few things, and the fanboys will be right on other things.
I for one know this isn’t just a “fad” like NFTs were, but I cannot help but feel skeptical about the sweeping claims of software engineering’s doomsday.
Translation: Drop your work and current tools every few months to pick up the shiny new thing or you aren't cool or even sane.
Reality: Your favorite cool new tool will be integrated into development work as it is proven, documented, and time allows. This idea that development must leap onto the latest shiny new thing every few months is extremely corrosive to reasonable business practice. The needs of the many outweigh the preferences of the lead developer.
The fact that LLMs can generate this kind of output is not incompatible with the OP's argument about how useful they can be when combined with agents and code review. But search is the lowest-friction way for anyone to interact with LLMs currently, and right now it's a pretty poor advertisement.
Thus sucking up all the time you "saved" by having the LLM write the code you saved. The machine made the easy parts of coding easier while making the hard parts harder.
"You’ll spend 5-10 minutes knocking it back into your own style."
As if style is the only thing that matters?
"LLMs are showing signs of adapting to local idiom, but we’re not there yet."
First step fallacy. Ladders are showing signs of being tall enough to reach the moon, but we're not there yet.
You lost me here. I have often found it to be far more than a 10-minute style issue, but fundamental misunderstanding of the code purposes that I need to fix.
I get it. You're being so productive. (Or so you claim. I don't believe you.)
But if I don't affirm your work choices, or embrace them myself, you lose your mind like a little caffeinated squirrel.
> If you can’t metabolize the boring, repetitive code an LLM generates: skills issue!
Case in point. Touch grass.
Sounds cool, but not useful... lol
Reminds one of a classic fictional conversation... =3
I've found LLMs can't really write a whole program there isn't already a template for. They're good enough at writing individual functions, and as the author mentioned, their breadth of knowledge is better than any individual.
Honestly, I see them as a supercharged version of googling, without all the out of context answers you get from Google since they also place ads and whatnot. I find an LLM can get me an answer far quicker than a regular internet search, even if it's imperfect and I'm probably not going to put it straight into my code.
But code is something I like to write myself, it's not about speed of producing code. It's keeping it such you still fully understand it.
Something fascinating about this HN crowd in particular is its lack of foresight. Companies producing medicine have to take into account longterm effects, yet people working in these “tech” companies push out AI after AI with not a single care in how it’s going to impact people.
Not because they aren't effective - they are - but because using them helps me appreciate how important my own skills and experience are for using them to effectively solve problems.
My CEO friends that overstate the efficacy of LLMs are all nuts.
(I say this as someone who has been working in "AI" as part of Digital transformation roadmaps for SEVEN years and as someone who has used LLM's daily in my workflow for 3 years).
One does not need to be a skeptic about machine learning and its potential as technology to refuse to engage with its practical applications when they are clearly ethically corrupt. The ends do not justify the means. Just like refusing to buy products of forced labour or disproportionate environmental impact, or to invest in war and bloodshed, everyone makes this decision for themselves.
Yesterday I “wrote” 18k lines of code with Claude & Cursor. Mostly UI, some backend. This would have taken me a week without LLMs.
I’m sold.
I haven't found a way to use it that makes me develop faster.
The articles talks about "tedious code." If you need to generate a large static value table or something, then OK an LLM might give you a really fast result and cut through the tedium. Most of us were already writing short scripts to do that. I'm open to the possibility that an LLM can do it faster. But it's such a rare requirement that the productivity gains are truly negligible here even if they can. And in those cases, it's obvious what the repetitive task needs to be. I often find myself writing the code by hand to be quicker than coming up with a prompt to get it to write the code that I then need to review for correctness.
The article then mentions scaffolding. Things like "bookkeeping" when it comes to creating and setting up a new repo (whatever he means by that). This is why I have, historically, been a big fan of frameworks and generators. Point being, this is already a solved problem and I haven't found a way to further improve the state of this world with LLMs. LLMs might be an alternate tool that work just as well. But they haven't made my existing daily workflow any faster. Setting up new repos is also something that is done so rarely that even if an LLM netted a 100% increase in efficiency, it wouldn't really impact much.
I am an AI "skeptic" but I'm not a naysayer. I do use LLMs regularly. I just don't use them for developing code because I have yet to find a problem that they solve for me. Don't get me wrong, there are problems that they can solve... I just haven't come across any solutions to previously-unsolved problems. Meaning I can swap an existing solution for an LLM-based one... and it is a valid solution... but I don't observe any increase in productivity from doing so. The existing solution was already working fine.
I am genuinely looking forward to the day when this changes. When I identify a single existing problem without an existing solution that LLMs solve for me when developing software. I just have yet to come across one.
I have pycharm pro with the ai. I can right click a function and there's 12 things AI can just do for you.
"find problems" where the linter has no concerns and it finds problems that dont show up from linter are shockingly good. This is like "Hey, you need to make sure this function is only running once at a time or it can easily overwhelm the cpu."
"write documentation" and it just fills in the documentation. I havent a clue what it writes and I wont be doing it.
"generate unit tests" is interesting? I have literally never written a unit test before and the code it produces is lengthy. Unit tests are a waste of time; but others disagree and so AI can produce it.
"explain code" look I wrote the code years ago half drunk, half high. No comments. It explains it all in depth and perfectly. Comments sometimes go stale after bug fixes and then make things confusing. Comments arent important when you have this beautifulness.
Lets go back to that cpu DOS attack example. I then can prompt what is claude 3.5 sonnet to give me to solution to fix it. It gives me this beautiful cache option. I could just hit apply and it'd be done. But then i said it's too complicated give me something simpler. It then gives me a solution that is simpler.
>nd in those cases, it's obvious what the repetitive task needs to be. I often find myself writing the code by hand to be quicker than coming up with a prompt to get it to write the code that I then need to review for correctness.
Its all about the boring busy work crap that doesnt need to be done that now gets done.
This statement worries me for a number of reasons.
First, I work on a very massive codebase, with a large engineering organization. And I have seen a number of AI generated unit tests. I have not seen a single example of an LLM-generated unit test that didn't contain a number of test anti-patterns. To the extent where I would argue that they are creating a self-fulfilling prophecy. You said you think that unit tests are a waste of time. I would argue that they CAN be even worse than that.
The benefit of unit tests is that, at their best, they give you a safety net for refactoring existing code. If you change the implementation of a system under test, and the tests pass, you know you didn't introduce a breaking change.
But a lot of tests couple themselves tightly to the implementation details. Every single LLM-generated unit test I have observed in the wild introduces this anti-pattern. If you have a system under test, and changing the implementation of that system without breaking its behaviour causes a test to fail... that's called the "Fragile Test" problem. Now your unit test is not only failing to encourage you to refactor code... it's actively DISCOURAGING you from refactoring code. In this case, the unit test is providing DISVALUE rather than value.
So the fact that a) you think unit tests are a waste of time and b) you look at AI as a way to save you from a chore ... tells me that you have no business ever writing uint tests with or without AI. Please stop. You are making the world worse by leveraging an LLM to do these things for you.
I have NEVER looked at writing a unit test by hand as a "chore" or as a "waste of time." I often write my tests before even writing my implementation code, because doing so helps me think through both the design and requirements of my code... and gives me a little mini-sandbox context where I can make sure that the brand new code that I am writing is doing what I want it to. It's a problem solving tool. Not something to be done after the fact as a chore.
"Write Documentation" is not writing code. And if you don't read the documentation you're generating, no one else will. So what's the point of having it in the first place if no one reads it? Food for thought.
"Find problems" I see as being akin to a spell check, or the red squigglies when you have a syntax error. I do see the value of having ML tech within the internal IDE implementation. We've been using ML for email SPAM filters for decades. There are interesting problems that can be solved there. But this is an area where I want my IDE to flag something for me. It's not me actively prompting an LLM to generate code for me.
I sure dont. I'm a cyber security 0day writer or linux bash scripter or yikes coder, who has made some bad turns and ended up with some open source projects in python that absolutely nobody would call large or massive.
>The benefit of unit tests is that, at their best, they give you a safety net for refactoring existing code. If you change the implementation of a system under test, and the tests pass, you know you didn't introduce a breaking change.
Never refactor. This is literally why everyone loves objected orientated and paint themselves in a corner that blows up in their face.
>Now your unit test is not only failing to encourage you to refactor code... it's actively DISCOURAGING you from refactoring code. In this case, the unit test is providing DISVALUE rather than value.
That sounds perfect to me. Never refactor, you end of life or version that old code out, and replace it with the new perfect code. We have WPA #3, TLS 1.3, SSH 2, IKE v2, HTTP 3.
It's fine to replace until you get it right.
>So the fact that a) you think unit tests are a waste of time and b) you look at AI as a way to save you from a chore ... tells me that you have no business ever writing uint tests with or without AI. Please stop. You are making the world worse by leveraging an LLM to do these things for you.
We live in very different worlds. I do like the idea that you feel like im making the world worse by having AI do the chores.
>I have NEVER looked at writing a unit test by hand as a "chore" or as a "waste of time."
Me neither. I just dont do them. with how awesome AI has been... im probably going to have the time to read for the first time: https://docs.djangoproject.com/en/5.2/topics/testing/
>I often write my tests before even writing my implementation code, because doing so helps me think through both the design and requirements of my code... and gives me a little mini-sandbox context where I can make sure that the brand new code that I am writing is doing what I want it to. It's a problem solving tool. Not something to be done after the fact as a chore.
I dont code anything so complicated. I come from linux, but they call it the 'unix philosophy'. Do one thing and do it well. Kind of akin to KISS principle.
Write clean simple code that self-documents itself that legit cant really go wrong. Especially if some jerk comes along like me who throws a fuzzer at it or worse user input.
>"Write Documentation" is not writing code. And if you don't read the documentation you're generating, no one else will. So what's the point of having it in the first place if no one reads it? Food for thought.
Good point.
>"Find problems" I see as being akin to a spell check, or the red squigglies when you have a syntax error.
Community pycharm has all that for free. The "find problems" is entirely next level. I'm so very sold.
I just wrote a stupid def test(): which returns a string. The linter has not a single thing to say.
1. *Try Block Contains No Risk Operations*: - The try-except block is wrapping a simple string operation that won't raise exceptions - Makes the try-except block unnecessary
>I do see the value of having ML tech within the internal IDE implementation. We've been using ML for email SPAM filters for decades. There are interesting problems that can be solved there. But this is an area where I want my IDE to flag something for me. It's not me actively prompting an LLM to generate code for me.
Since im public open source project on github I got copilot for free like 3 years ago and my biggest complaint was that it was generating garbage code for me and forcing it. It got so annoying i disabled it and stopped using it. Im sure it has improved since then.
Pycharm's AI on occasion has a tab completion thats really good, but isnt forcing.
I try to get people to think about "write once" code as much as possible. Keep things small, single-purpose and composable. You maximize reuse potential, make testing a lot easier and greatly ease debugging.
But you're right about one thing: we do live in very different worlds. In the product world, the business never knows what they are making until something is in front of them and they decide they want something else. Therefore the majority of developer time is spent "maintaining" existing code (read: making code changes) vs building new code.
In that world, the "never refactor" mentality is suicide. Refactoring is improving the design of existing code in order to make it simpler and easier to change as requirements change.
And that is the one universal truth of software: it changes.
If you don't need the code to change, ever, then you might as well stick with a fixed circuit.
> I dont code anything so complicated. I come from linux, but they call it the 'unix philosophy'. Do one thing and do it well. Kind of akin to KISS principle.
So do I. I use Linux on all of my devices and came up as a *nix developer. Unit tests favour the unix philosophy. I often think of a "unit" of code as a single logical operation. But logic is a function of the human mind and humans make errors. Even in simple, one-liner, single-responsibility functions logic errors creep up. Then consider how your programming language helps you or aids you. I see type casting induced errors all the time that a simple unit test would have caught and avoided.
> Write clean simple code that self-documents itself that legit cant really go wrong.
Then why did you even bring up writing documentation lol. We're aligned here. I never document code because I consider that to be a failure of the code. Documentation and code need to be kept in sync... which never happens, not to mention people don't read it. So favour code that is self-explanatory.
> I do like the idea that you feel like im making the world worse by having AI do the chores.
You're deliberately straw-manning me there. If writing a unit test is a chore, your approach to testing is flawed. That was my point and you know it. My comment was also specific to using AI to write unit tests. Not to do anything else.
The basic structure is this: six months ago, I tried llms and they were trash, but holy cow they have improved so much, now I can use them to avoid tedious that I don't like! Don't be an idiot like my skeptical past self.
Then they accuse everyone who now disagrees with their take on the tech as being an incurious luddite who is blinded by their anti-progress bias.
Personally, as a non-user but close observer of the tech, I never doubted that the tech would improve but there are obvious problems with the tech beyond hallucinations that have to do with human meaning, understanding and power relations that cannot be solved by making the tech better.
My challenge to all of the booster is this: try to articulate your own personal vision of both ai utopia and ai dystopia. I personally find it borderline impossible to even imagine a utopia emerging from genai, but it is extremely easy for me to imagine dystopia, especially given the entities that are controlling the tech and competing to "win" the ai arms race.
For me, the representation of the Chinese state as filtered through western media is already a dystopia. Of course, having not been to China myself and being unable to speak any of their languages, I cannot personally verify the representation. But by competing with the Chinese on ai (and I mean Chinese in the way we define them in the west, which I recognize may be very different from both the actual lived experience in China and their self-conception), we become more like our own negative stereotypes of them. It is essentially a race to disempower ordinary people, remove agency from their lives, hand it over to agents who most certainly do not have the best interest of living humans in mind and call this victory. To "win" the ai war as presently defined would likely be disaster for us all.
There are these ridiculous handwaving things about solving climate change or even human mortality with this tech even though there is no evidence whatsoever that it will do this. Just because the people building it say it will do these things doesn't mean we should trust them.
Imagine if a primatologist tried to tell us that because they have trained chimpanzees to recognize colors and some words better than a three year old, we should now stop investing in education and direct all of out resources into training chimps to do all of our repetitive tasks for us to liberate us from the drudgery of doing anything for ourselves. With enough resources, you would see an explosion in chimp capabilities and this would come directly at the expense of humans, who now have no work to do and just sit on top of a pyramid built by chimp labor. Not only would the things we made be worse than what we could have if we focused on developing our own human capacities instead of chimp capacities, but we would live in fear that the chimps (who also are much stronger than us) will wake up to their plight and rise up against their rulers. Humans would also rapidly lose our own capabilities and become much more like chimps than the humans of today. Sound familiar?
I tend to believe that as it is currently being developed, this tech is far more likely to lead us in a direction like the chimp apocalypse than some post labor utopia.
That doesn't mean that the tech isn't getting better or can't do impressive things. I can hold both things in my head at once. But I am much more concerned with human flourishing and well being than some bored engineers who don't actually like programming (or at least don't like it under the current industrial conditions) feeling like they are being liberated from the tedium of their work. And instead of solving the real the underlying problems that make the work so tedious, we instead compound the problem by having ai generate even more of the exact kind of code that caused the problem i the first place.
> First, we need to get on the same page. If you're [6 months out of date with the technology you think of as overhyped after having observed it for the last 4 years or so], you're [not fairly evaluating it].
Perhaps, but this is an extremely unconvincing approach to the argument right out of the gate - it's basically on the same level as "communism has never been tried".
Especially if you're going to start out by describing your interlocutors as "crazy", rather than positing that an effectively completely new thing now exists that ought to be evaluated differently from Copilot et. al.
so we can't look at the code and see the demonstrable engineering excellence
trust me bro, you fool, you poltroon,
They have a few other notable open source components, I think these two are particularly interesting:
- https://github.com/superfly/corrosion
- https://github.com/superfly/litefs
(Disclosure: they sponsor some of my work.)
Point being: there's some real power in these tools, but if the ground is not already well-trodden, they risk creating a huge time sink that could be better served just learning to do the thing yourself.
When people are sick of AI, this is what they are talking about. AI hype people ignore this perspective each and every time.
It doesn't matter how great your paper Mill's paper is, if you're dumping PCBs in the river, people are going to quite rightly get pissed off.
Many people are very eager to see how AI can improve their lives.
I’m building a product for accountants that uses AI, initially I avoided using AI as a part of marketing copy. Turns out, people are much more interested when I tell them it uses AI behind the scenes.
I was willing to read until I got to this part. He can shove this drivel up his own ass for all I care if that's how he thinks. What a waste of time.
> Kids today don’t just use agents; they use asynchronous agents. They wake up, free-associate 13 different things for their LLMs to work on, make coffee, fill out a TPS report, drive to the Mars Cheese Castle, and then check their notifications. They’ve got 13 PRs to review. Three get tossed and re-prompted. Five of them get the same feedback a junior dev gets. And five get merged.
> An agent can occupy itself for hours putzing with your tests in a VM and come back later with a PR.
> Kids today don’t just use agents; they use asynchronous agents. They wake up, free-associate 13 different things for their LLMs to work on, make coffee, fill out a TPS report, drive to the Mars Cheese Castle, and then check their notifications. They’ve got 13 PRs to review. Three get tossed and re-prompted. Five of them get the same feedback a junior dev gets. And five get merged.
I'm working better and harder with Roocode in Code mode (and somtimes Architect). I've only been doing it for a few weeks, and it reads to me like I'm already using an 'old approach'.
Tell me you won't use it and explain properly, without appeal to "legacy" or "human values" or other abstracts which don't self validate.
Now note that this is POV for every child on earth.
The new discipline is provider of LLM/AI guard rails, and absolutely requires knowing what is/not correct.
The arguments against genAI tend to point out things like: 1. Its output is unreliable at best 2. That output often looks correct to an untrained eye and requires expert intervention to catch serious mistakes 3. The process automates away a task that many people rely on for income
And the response from genAI advocates tends to be dismissive...and I suspect it is, in part, because that last point is a positive for many advocates of genAI. Nobody wants to say it out loud, but when someone on Reddit or similar claims that even a 10% success rate outweighs the 90% failure rate, what they mean is most likely "A machine that works 10% of the time is better than a programmer who works 60-80% of the time because the machine is more than 6-to-8-times cheaper than the programmer".
There's also the classic line about how automation tends to create more jobs in the future than it destroys now, which itself is a source of big disconnects between pro-genAI and anti-genAI crowds--because it ignores a glaring issue: Just because there's gonna be more jobs in the future, doesn't mean I can pay rent with no job tomorrow!
"You can write an effective coding agent in a week" doesn't reassure people because it doesn't address their concerns. You can't persuade someone that genAI isn't a problem by arguing that you can easily deploy it, because part of the concern is that you can easily deploy it. Also, "you’re not doing what the AI boosters are doing" is flat-out incorrect, at least if you're looking at the same AI boosters I am--most of the people I've seen who claim to be using generated code say they're doing it with Claude, which--to my knowledge--is just an LLM, albeit a particularly advanced one. I won't pretend this is anything but anecdata, but I do engage with people who aren't in the "genAI is evil" camp, and...they use Claude for their programming assistance.
"LLMs can write a large fraction of all the tedious code you’ll ever need to write" further reinforces this disconnect. This is exactly why people think this tech is a problem.
The entire section on "But you have no idea what the code is!" falls apart the moment you consider real-world cases, such as [CVE-2025-4143](https://nvd.nist.gov/vuln/detail/cve-2025-4143), where a programmer who is a self-described expert working with Claude--who emphasizes that he checked over the results with a fine-toothed comb, and that he did this to validate his own skepticism about genAI!--missed a fundamental mistake in implementing OAuth that has been common knowledge for a long while. The author is correct in that reading other people's code is part of the job...but this is difficult enough when the thing that wrote the code can be asked about its methods, and despite advances in giving LLMs a sort of train of thought, the fact remains that LLMs are designed to output things that "look truth-y", not things that are logically consistent. (Ah, but we're not talking about LLMs, even though kentonv tells us that he just used an LLM. We're talking about agentic systems. No true AI booster would "just" use an LLM...)
I actually agree with the point about how the language can catch and point out some of the errors caused by hallucination, but...I can generate bad function signatures just fine on my own, thank you! :P In all seriousness, this addresses basically nothing about the actual point. The problem with hallucination in a setting like this isn't "the AI comes up with a function that doesn't exist", that's what I'm doing when I write code. The problem with hallucination is that sometimes that function which doesn't exist is my RSA implementation, and the AI 'helpfully' writes an RSA implementation for me, a thing that you should never fucking do because cryptography is an incredibly complex thing that's easy to fuck up and hard to audit, and you really ought to just use a library...a thing you [also shouldn't leave up to your AI.](https://www.theregister.com/2025/04/12/ai_code_suggestions_s...) You can't fix that with a language feature, aside from having a really good cryptography library built into the language itself, and as much as I'd love to have a library for literally everything I might want to do in a language...that's not really feasible.
"Does an intern cost $20/month? Because that’s what Cursor.ai costs," says the blog author, as if that's supposed to reassure me. I'm an intern. My primary job responsibility is getting better at programming so I can help with the more advanced things my employer is working on (for the record, these thoughts are my own and not those of my employer). It does not make me happy to know that Cursor.ai can replace me. This also doesn't address the problem that, frankly, large corporations aren't going to replace junior developers with these tools; they're going to replace senior developers, because senior developers cost more. Does a senior engineer cost 20 dollars a month? Because that's what Cursor.ai costs!
...and the claim that open source is just as responsible for taking jobs is baffling. "We used to pay good money for databases" is not an epic own, it is a whole other fucking problem. The people working on FOSS software are in fact very frustrated with the way large corporations use their tools without donating so much as a single red cent! This is a serious problem! You know that XKCD about the whole internet being held up by a project maintained by a single person in his free time? That's what you're complaining about! And that guy would love to be paid to write code that someone can actually fucking audit, but nobody will pay him for it, and instead of recognizing that the guy ought to be supported, you argue that this is proof that nobody else deserves to be supported. I'm trying to steelman this blogpost, I really am, but dude, you fundamentally have this point backwards.
I hope this helps others understand why this blogpost doesn't actually address any of my concerns, or the concerns of other people I know. That's kind of the best I can hope for here.
> 2. That output often looks correct to an untrained eye and requires expert intervention to catch serious mistakes
The thing is this is true of humans too.
I review a lot of human code. I could easily imagine a junior engineer creating CVE-2025-4143. I've seen worse.
Would that bug have happened if I had written the code myself? Not sure, I'd like to think "no", but the point is moot anyway: I would not have personally been the one to write that code by hand. It likely would have gone to someone more junior on the team, and I would have reviewed their code, and I might have forgotten to check for this all the same.
In short, whether it's humans or AI writing the code, it was my job to have reviewed the code carefully, and unfortunately I missed here. That's really entirely on me. (It's particularly frustrating for me as this particular bug was on my list of things to check for and somehow I didn't.)
> 3. The process automates away a task that many people rely on for income
At Cloudflare, at least, we always have 10x more stuff we want to work on then we have engineers to work on it. The number of engineers we can hire is basically dictated by revenue. If each engineer is more productive, though, then we can ship features faster, which hopefully leads to revenue growing faster. Which means we hire more engineers.
I realize this is not going to be true everywhere, but in my particular case, I'm confident saying that my use of AI did not cause any loss of income for human engineers, and likely actually increased it.
sneak•1d ago
I was a 3-4x programmer before. Now I’m a 9-15x programmer when wrangling LLMs.
This is a sea change and it’s already into “incredible” territory and shows no signs of slowing down.
> Think of anything you wanted to build but didn’t. You tried to home in on some first steps. If you’d been in the limerent phase of a new programming language, you’d have started writing. But you weren’t, so you put it off, for a day, a year, or your whole career.
I have been banging out little projects that I have wanted to exist for years but always had on the back burner. Write a detailed readme and ask the agent to interrogate you about the missing parts of the spec then update the README. Then have it make a TODO and start implementing. Give it another code base for style guide.
I’ve made more good and useful and working code in the last month than I have in the last two years.
FridgeSeal•1d ago
I don’t just run one agent, I run all of them!
My time to close tickets is measured in minutes!
I don’t even review code, I have a different agent review it for me!
ofjcihen•1d ago
tptacek•1d ago
yifanl•1d ago
grey-area•1d ago
Just get another agent to review it and merge it, job done.
tsimionescu•1d ago
happytoexplain•1d ago
hooverd•1d ago
sneak•1d ago
I think a lot of people are unfamiliar with the (expensive) SOTA.
indigodaddy•1d ago
MegaButts•1d ago
What the fuck does this mean?
mouse_•1d ago
throwawayqqq11•1d ago
kaydub•1d ago
wvenable•1d ago
sneak•1d ago
I’m nowhere near that, but even unaided I’m quite a bit faster than most people I’ve hired or worked with. With LLMs my high quality output has easily tripled.
Writing code may be easier than reading it - but reading it is FASTER than writing it. And that’s what matters.
hansvm•1d ago
surgical_fire•1d ago
nico•1d ago
It definitely feels different to develop using LLMs, especially things from scratch. At this point, you can't just have the LLM do everything. Sooner or later you need to start intervening more often, and as the complexity of the project grows, so does the attention you need to give to guiding the LLM. At that point the main gains are mostly in typing and quickly looking some things up, which are still really nice gains
ryandrake•1d ago
I tried out Copilot a few months back to see what all the fuss was about and so that I could credibly engage with discussions having actually used the technology. I'd rate it as "kind of neat-o" but not earth shattering. It was like the first time I used an IDE with auto-complete. Oh, cool, nice feature. Would I pay monthly for it? No way. Would I integrate it into my development workflow if it were free? Maybe, I guess? Probably wouldn't bother unless it came literally set up for me out of the box like autocomplete does nowadays.
Don't get me wrong--it's cool technology. Well done, AI people. Is it "the 2nd most important thing to happen over the course of my career" as OP wrote? Come on, let's come down to earth a little.
1: https://www.cnbc.com/2018/02/01/google-ceo-sundar-pichai-ai-...
sneak•1d ago
I spent $600 on claude via cursor last month and it was easily worth 2-3x that.
ryandrake•1d ago
EDIT: Looks like the "Cursor" thing has a free trial. Might start there.
sneak•19h ago
You can start off for much less. I recommend trying claude-4-opus max/thinking. There might be cheaper options but that’s the one that has given me the best results so far.
kaydub•1d ago
It's easy to come up with some good ideas for new project, but then not want to do a lot of the garbage work related to the project. I offload all that shit to the LLM now.
Seriously, the LLMs have increased my productivity 2-4x.