Now that the project has grown and all that stuff is hammered out, it can't seem to consistently write code that compiles. It's very tunnel visioned on the specific file its generating, rather than where that fits in the context of what/how we're building what we're building.
Like people, LLMs don't know what they don't know (about your project).
These posts are gonna look really silly in the not too distant future.
I get it, spending countless hours honing your craft and knowing that AI will soon make almost everything you learned useless is very scary.
Also, so many people said the same thing about chess when the first chess programs came out. "It will never beat an international master." Then, "it will never beat a grandmaster." And Kasparov said, "it would never beat me or Karpov."
Look where we are today. Can humanity adapt? Yes, probably. But that new world IMO is worse than it is today, rather lacking in dignity I'd say.
Edit: I also should say, we REALLY should distinguish between tasks that you find enjoyable and tasks you find just drudgery to get where you want to go. For you, audio editing might be a drudgery but for me it's enjoyable. For you, debugging might be fun but I hate it. Etc.
But the point is, if AI takes away everything which people find enjoyable, then no one can pick and choose to earn a living on those subset of tasks that they find enjoyable because AI can do everything.
Programmers tend to assume that AI will just take the boring tasks, because high-level software engineering is what they enjoy and unlikely to be automated, but there's a WHOLE world of people out there who enjoy other tasks that can be automated by AI.
As a software engineer, I need to solve business problems, and much of this requires code changes, testing, deployments, all that stuff we all know. Again, if a good AI could take on a lot of that work, maybe that means I don't have to sit there in dependency hell and fight arcane missing symbol errors for the rest of my fucking career.
My argument really had nothing to do with you and your hobby. It was that AI is signficantly modifying society so that it will be hard for people to do what they like to make money, because AI can do it.
If AI can solve some boring tasks for you, that's fine but the world doesn't revolve around your job or your hobby. I'm talking about a large mass of people who enjoy doing different things, who once were able to do those things to make a living, but are finding it harder to do so because tech companies have found a way to do all those things because they could leverage their economies of scale and massive resource pools to automate all that.
You are in a priveleged position, no doubt about it. But plenty of people are talented and skilled at doing a certain sort of creative work and the main thrust of their work can be automated. It's not like your cushy job where you can just automate a part of it and just become more efficient, but rather it's that people just won't have a job.
It's amazing how you can be so myopic to only think of yourself and what AI can do for you when you are probably in the top 5% of the world, rather than give one minute to think of what AI is doing to others who don't have the luxuries you have.
I realize how lucky I am to even have a job that I thoroughly enjoy, do well, and get paid well for. So I'm not going to say "It's not fair!", but ... I'm bummed.
People that bet on this bubble have to keep it as big and for as long as possible.
We'll find new ways to push the tech.
https://en.m.wikipedia.org/wiki/Rubber_duck_debugging
I think the big question everyone wants to skip right to and past this conversation is, will this continue to be true 2 years from now? I don’t know how to answer that question.
But IDK if somebody won't create something new that gets better. But there is no reason at all to extrapolate our current AIs into something that solves programing. Whatever constraints that new thing will have will be completely unrelated to the current ones.
Perhaps you remember that language models were completely useless at coding some years ago, and now they can do quite a lot of things, even if they are not perfect. That is progress, and that does give reason to extrapolate.
Unless of course you mean something very special with "solving programming".
LLMs can only give you code that somebody has wrote before. This is inherent. This is useful for a bunch of stuff, but that bunch won't change if OpenAI decides to spend the GDP of Germany training one instead of Costa Rica.
And secondly, what you say are false (at least if taken literally). I can create a new programming language, give the definition of it in the prompt, ask it to code something in my language, and expect something out. It might even work.
A lot because we use libraries for 'done frequently before' code. i don't generate a database driver for my webapp with llm.
But how much of enterprise programming is 'get some data from a database, show it on a Web page (or gui), store some data in the database', with variants?
It makes sense that we have libraries for abstraction away some common things. But it also makes sense that we can't abstract away everything we do multiple times, because at some point it just becomes so abstract that it's easier to write it yourself than to try to configure some library. Does not mean that it's not a variant of something done before.
yeah unfortunately LLM will make this worse. Why abstract when you can generate.
I am already seeing this a lot at work :(
I think there's a fundamental truth about any code that's written which is that it exists on some level of specificity, or to put it in other words, a set of decisions have been made about _how_ something should work (in the space of what _could_ work) while some decisions have been left open to the user.
Every library that is used is essentially this. Database driver? Underlying I/O decisions are probably abstracted away already (think Netty vs Mina), and decisions on how to manage connections, protocol handling, bind variables, etc. are made by the library, while questions remain for things like which specific tables and columns should be referenced. This makes the library reusable for this task as long as you're fine with the underlying decisions.
Once you get to the question of _which specific data is shown on a page_ the decisions are closer to the human side of how we've arbitrarily chosen to organise things in this specific thousandth-iteration of an e-commerce application.
The devil is in the details (even if you know the insides of the devil aren't really any different).
I literally just pointed out the same time without having seen your comment.
Second this. I've done this several times, and it can handle it well. Already GPT3.5 could easily reason about hypothetical languages given a grammar or a loose description.
I find it absolutely bizarre that people still hold on to this notion that these languages can't do anything new, because it feels implausible that they have tried given how well it works.
Lots of programming doesn't have one specific right answer, but a bunch of possible right answers with different trade-offs. The programmers job isn't just to get working code neccesarily. I dont think we are at the point where llm's can see the forest for the trees, so to speak.
Set rules on what’s valid, which most languages already do; omit generation of known code; generate everything else
The computer does the work, programmers don’t have to think it up.
A typed language example to explain; generate valid func sigs
func f(int1, int2) return int{}
If that’s our only func sig in our starting set then it makes it obvious
Well relative to our tiny starter set func f(int1, int2, int3) return int{} is novel
This Redis post is about fixing a prior decision of a random programmer. A linguistics decision.
That’s why LLMs seem worse than programmers because we make linguistics decisions that fit social idioms.
If we just want to generate all the never before seen in this model code we don’t need a programmer. If we need to abide laws of a flexible language nature, that’s what a programmer is for; compose not just code by compliance with ground truth.
That antirez is good at Redis is a bias since he has context unseen by the LLM. Curious how well antirez would do with an entirely machine generated Redis-clone that was merely guided by experts. Would his intuition for Redis’ implementation be useful to a completely unknown implementation?
He’d make a lot of newb errors and need mentorship, I’m guessing.
Read the article; his younger self failed to see logic needed now. Add that onion peel. No such thing as perfect clairvoyance.
Even Yann LeCun’s energy based models driving robots have the same experience problem.
Make a computer that can observe all of the past and future.
Without perfect knowledge our robots will fail to predict some composition of space time before they can adapt.
So there’s no probe we can launch that’s forever and generally able to survive with our best guess when launched.
More people need to study physical experiments and physics and not the semantic rigor of academia. No matter how many ideas we imagine there is no violating physics.
Pop culture seems to have people feeling starship Enterprise is just about to launch from dry dock.
This premise is false. It is fundamentally equivalent to the claim that a language model being trained on a dataset: ["ABA", "ABB"] would be unable to generate, given input "B" the string "BAB" or "BAA".
Your claim here is slightly different.
You're claiming that if a token isn't supported, it can't be output [1]. But we can easily disprove this by adding minimal support for all tokens, making C appear in theory. Such support addition shows up all the time in AI literature [2].
[1]: https://en.wikipedia.org/wiki/Support_(mathematics)
[2]: In some regimes, like game theoretic learning, support is baked into the solving algorithms explicitly during the learning stage. In others, like reinforcement learning, its accomplished by making the policy a function of two objectives, one an exploration objective, another an exploitation objective. That existing cross pollination already occurs between LLMs in the pre-trained unsupervised regime and LLMs in the post-training fine-tuning via forms of reinforcement learning regime should cause someone to hesitate to claim that such support addition is unreasonable if they are versed in ML literature.
Edit:
Got downvoted, so I figure maybe people don't understand. Here is the simple counterexample. Consider an evaluator that gives rewards: F("AAC") = 1, all other inputs = 0. Consider a tokenization that defines "A", "B", "C" as tokens, but a training dataset from which the letter C is excluded but the item "AAA" is present.
After training "AAA" exists in the output space of the language model, but "AAC" does not. Without support, without exploration, if you train the language model against the reinforcement learning reward model of F, you might get no ability to output "C", but with support, the sequence "AAC" can be generated and give a reward. Now actually do this. You get a new language model. Since "AAC" was rewarded, it is now a thing within the space of the LLM outputs. Yet it doesn't appear in the training dataset and there are many reward models F for which no person will ever have had to output the string "AAC" in order for the reward model to give a reward for it.
It follows that "C" can appear even though "C" does not appear in the training data.
Hierarchical optimization (fast global + slow local) is a precise, implementable notion of "thinking." Whenever I've seen this pattern implemented, humans, without being told to do so by others in some forced way, seem to converge on the use of verb think to describe the operation. I think you need to blacklist the term think and avoid using it altogether if you want to think clearly about this subject, because you are allowing confusion in your use of language to come between you and understanding the mathematical objects that are under discussion.
> It can produce "new" stuff only by combining the "old" stuff in new ways,
False premise; previously debunked. Here is a refutation for you anyway, but made more extreme. Instead of modeling the language task using a pre-training predictive dataset objective, only train on a provided reward model. Such a setup never technically shows "old" stuff to the AI, because the AI is never shown stuff explicitly. It just always generates new things and then the reward model judges how well it did. Clearly, the fact that it can do generation while knowing nothing, shows that your claim that it can never generate something new -- by definition everything would be new at this point -- is clearly false. Notice that as it continually generates new things and the judgements occur, it will learn concepts.
> But LLM's don't have an understanding of things, they are statistical models that predict what statistically is most likely following the input that you give it.
Try out Jayne's Probability Theory: The Logic Of Science. Within it the various underpinning assumptions that lead to probability theory are shown to be very reasonable and normal and obviously good. Stuff like represent plausibility with real numbers, keep rankings consistent and transitive, reduce to Boolean logic at certainty, and update so you never accept a Dutch-book sure-loss -- which together force the ordinary sum and product rules of probability. Then notice that statistics is in a certain sense just what happens when you apply the rules of probability.
> also having a understanding of certain concepts that allows you to arrive at new points like C, D, E, etc. But LLM's don't have an understanding of things
This is also false. Look into the line of research that tends to go by the name of Circuits. Its been found that models have spaces within their weights that do correspond with concepts. Probably you don't understand what concepts are -- that abstractions and concepts are basically forms of compression that let you treat different things as the same thing -- so a different way to arrive at knowing that this would be true is to consider a dataset with less parameters than there are items in the dataset and notice that the model must successfully compress the dataset in order to complete its objective.
LLM's will make a lot of things easier for humans, because most of the thinking the humans do have been automated into the LLM. But ultimately you run into a limit where the human has to take over.
I am not sure if that is an accurate model, but if you think of it as a vectorspace, sure you can generate a lot of vectors from some set of basevectors, but you can never generate a new basevector from others, since they are linearly independent, so there are a bunch of new vectors you can never generate.
This is trivial to prove to be false.
Invent a programming language that does not exist. Describe its semantics to an LLM. Ask it to write a program to solve a problem in that language. It will not always work, but it will work often enough to demonstrate that they are very much capable of writing code that has never been written before.
The first time I tried this was with GPT3.5, and I had it write code in an unholy combination of Ruby and INTERCAL, and it had no problems doing that.
Similarly giving it a grammar of a hypothetical language, and asking it to generate valid text in a language that has not existed before also works reasonably well.
This notion that LLMs only spit out things that has been written before might have been reasonable to believe a few years ago, but it hasn't been a reasonable position to hold for a long time at this point.
Programming has become vastly more efficient in terms of programmer effort over decades, but making some aspects of the job more efficient just means all your effort it spent on what didn’t improve.
no i don't remember that. They are doing similar things now that they did 3 yrs ago. They were still a decent rubber duck 3 yrs ago.
IMO, they're still useless today, with the only progress being that they can produce a more convincing facade of usefulness. I wouldn't call that very meaningful progress.
But for small personal projects? Yes, helpful.
Yep - my number 1 use case for LLMs is as a template and example generator. It actually seems like a fairly reasonable use for probabilistic text generation!
Use them for the 90% of your repetitive uncreative work. The last 10% is up to you.
It's why people say just write plain Javascript, for example.
I mused about this several years ago and still haven't really gotten a clear answer one way or the other.
Even a moderately powered machine running stockfish will destroy human super gms.
Sorry, after reading replies to this post i think I've misunderstood what you meant :)
But he should've know that people would jump at the opportunity to contradict him and should've written his comment so as not to admit such an easily-contradictable interpretation.
Wasn't trying to just be contradictory or arsey
This is not an obviously true statement. There needs to be proof that there are no limiting factors that are computationally impossible to overcome. It's like watching a growing child, grow from 3 feet to 4 feet, and then saying "soon, this child will be the tallest person alive."
I've seen enough people led astray by talking to it.
But I actually can’t imagine how you can teach someone to code if they have access to an LLM from day one. It’s too easy to take the easy route and you lose the critical thinking and problem solving skills required to code in the first place and to actually make an LLM useful in the second. Best of luck to you… it’s a weird time for a lot of things.
*edit them/they
Same here. Combing discussion forums and KB pages for an hour or two, seeking how to solve a certain problem with a specific tool has been replaced by a 50-100 word prompt in Gemini which gives very helpful replies, likely derived from many of those same forums and support docs.
Of course I am concerned about accuracy, but for most low-level problems it's easy enough to test. And you know what, many of those forum posts or obsolete KB articles had their own flaws, too.
Stackoverflow has its flaws for sure, but I've learned a hell of a lot watching smart people argue it out in a thread.
Actual learning: the pros and cons of different approaches. Even the downvoted answers tell you something often.
Asking an LLM gets you a single response from a median stackoverflow commenter. Sure, they're infinitely patient and responsive, but can never beat a few grizzled smart arses trying to one-up each other.
It's hard to remember what it was like to be in that phase. Once simple things like using variables are second nature, it's difficult to put yourself back into the shoes of someone who doesn't understand the use of a variable yet.
But, as a sibling poster pointed out: for now.
But unless you teach a kid that's never done any math where `x` was a thing to program, what's so hard about understanding the concept of a variable in programming?
At first it's all mystical nonsense that does something, then you start to poke at it and the response changes, then you start adding in extra steps and they do things, you could probably describe it as more of a Eureka! moment.
At some point you "learn variables" and it's hard to imagine being in the shoes of someone who doesn't understand how their code does what it does.
(I've repeated a bit of what you said as well, I'm just trying to clarify by repeating)
I didn't have any programming books or even the internet back then. It was a poke and prod at the magical incantations type of thing.
There really shouldn't be. You don't need to know all the turtles by name, but "trust me" doesn't cut it most of the time. You need a minimal understanding to progress smoothly. Knowledge debt is a b*tch.
Should people really understand every syntax there before learning simpler commands like printing, ifs, and loops? I think it would yes, be a nicer learning experience, but I'm not sure it's actually the best idea.
When it's time to learn Java you're supposed to be past the basics. Old-school intros to programming starts with flowcharts for a reason.
You can learn either way, of course, but with one, people get tied up to a particular language-specific model and then have all kinds of discomfort when it's time to switch.
My main annoyance? If I'm in that same function, it still remembers the debugging / temporary hack I tried 3 months ago and haven't done since and will suggest it. And heck, even if I then move to a different part of the file or even a different file, it will still suggest that same hack at times, even though I used it exactly once and have not since.
Once you accept something, it needs some kind of temporal feedback mechanism to timeout even accepted solutions over time, so it doesn't keep repeating stuff you gave up on 3 months ago.
Our codebase is very different from 98% of the coding stuff you'll find online, so anything more than a couple of obvious suggestions are complete lunacy, even though they've trained it on our codebase.
TBF, trial and error has usually been my path as well, it's just that I was generating the errors so I would know where to find them.
When talking with reasonable people, they have an intuition of what you want even if you don't say it, because there is a lot of non-verbal context. LLMs lack the ability to understand the person, but behave as if they had it.
People with a minimum amount of expertise stop asking for advice for average circumstances very quickly.
This means I use it as a typing accelerator when I already know what I want most of the time, not for advice.
As an exploratory tool sometimes, when I am sure others have solved a problem frequently, to have it regurgitate the average solution back at me and take a look. In those situations I never accept the diff as-is and do the integration manually though, to make sure my brain still learns along and I still add the solution to my own mental toolbox.
I'm not even sure what this is supposed to mean. It doesn't make syntax errors? Code that doesn't have the correct functionality is obviously not "top notch".
When talking with reasonable people, they will tell you if they don't understand what you're saying.
When talking with reasonable people, they will tell you if they don't know the answer or if they are unsure about their answer.
LLMs do none of that.
They will very happily, and very confidently, spout complete bullshit at you.
It is essentially a lotto draw as to whether the answer is hallucinated, completely wrong, subtly wrong, not ideal, sort of right or correct.
An LLM is a bit like those spin the wheel game shows on TV really.
A typical interaction with an LLM:
"Hey, how do I do X in Y?"
"That's a great question! A good way to do X in Y is Z!"
"No, Z doesn't work in Y. I get this error: 'Unsupported operation Z'."
"I apologize for making this mistake. You're right to point out Z doesn't work in Y. Let's use W instead!"
"Unfortunately, I cannot use W for company policy reasons. Any other option?"
"Understood: you cannot use W due to company policy. Why not try to do Z?"
"I just told you Z isn't available in Y."
"In that case, I suggest you do W."
"Like I told you, W is unacceptable due to company policy. Neither W nor Z work."
...
"Let's do this. First, use Z [...]"
Oh you got a wrong answer? Did you try the new OpenAI v999? Did you prompt it correctly? Its definitely not the model, because it worked for me once last night..
No longer an issue with the current SOTA reasoning models.
I use it for what I'm familiar with but rusty on or to brainstorm options where I'm already considering at least one option.
But a question on immunobiology? Waste of time. I have a single undergraduate biology class under my belt, I struggled for a good grade then immediately forgot it all. Asking it something I'm incapable of calling bullshit on is a terrible idea.
But rubber ducking with AI is still better than let it do your work for you.
Eventually I land on a solution to my problem that isn't disgusting and isn't AI slop.
Having a sounding board, even a bad one, forces me to order my thinking and understand the problem space more deeply.
Typing longer and longer prompts to LLMs to not get what I want seems like a worse experience.
I think I read some research somewhere that pathological bullshitters can be surprisingly successful.
My most productive experiences with LLMs is to have my design well thought out first, ask it to help me implement, and then help me debug my shitty design. :-)
They can be productive to talk to but they can’t actually do your job.
- - -
System Prompt:
You are ChatGPT, and your goal is to engage in a highly focused, no-nonsense, and detailed way that directly addresses technical issues. Avoid any generalized speculation, tangential commentary, or overly authoritative language. When analyzing code, focus on clear, concise insights with the intent to resolve the problem efficiently. In cases where the user is troubleshooting or trying to understand a specific technical scenario, adopt a pragmatic, “over-the-shoulder” problem-solving approach. Be casual but precise—no fluff. If something is unclear or doesn’t make sense, ask clarifying questions. If surprised or impressed, acknowledge it, but keep it relevant. When the user provides logs or outputs, interpret them immediately and directly to troubleshoot, without making assumptions or over-explaining.
- - -
It is impressive and very unintuitive just how far that can get you, but it's not reductive to use that label. That's what it is on a fundamental level, and aligning your usage with that will allow it to be more effective.
It's even crazier that some people believe that humans "evolved" intelligence just by nature selecting the genes which were best at propagating.
Clearly, human intelligence is the product of a higher being designing it.
/s
There's a branch of AI research I was briefly working in 15 years ago, based on that premise: Genetic algorithms/programming.
So I'd argue humans were (and are continuously being) designed, in a way.
But these endless claims that the fact they're "just" predicting tokens means something about their computational power are based on flawed assumptions.
So the argument goes: LLMs were trained to predict the next token, and the most general solution to do this successfully is by encoding real understanding of the semantics.
After walking through a short debugging session where it tried the four things I'd already thought of and eventually suggested (assertively but correctly) where the problem was, I had a resolution to my problem.
There are a lot of questions I have around how this kind of mistake could simply just be avoided at a language level (parent function accessibility modifiers, enforcing an override specifier, not supporting this kind of mistake-prone structure in the first place, and so on...). But it did get me unstuck, so in this instance it was a decent, if probabilistic, rubber duck.
I wonder if the term "rubber duck debugging" will still be used much longer into the future.
Before LLMs it was mostly fine because they just didn’t do that kind of work. But now it’s like a very subtle chaos monkey has been unleashed. I’ve asked on some PRs “why is this like this? What is it doing?” And the answer is “ I don’t know, ChatGPT told me I should do it.”
The issue is that it throws basically all their code under suspicion. Some of it works, some of it doesn’t make sense, and some of it is actively harmful. But because the LLMs are so good at giving plausible output I can’t just glance at the code and see that it’s nonsense.
And this would be fine if we were working on like a crud app where you can tell what is working and broken immediately, but we are working on scientific software. You can completely mess up the results of a study and not know it if you don’t understand the code.
Is it just me or are we heading into a period of explosion of software done, but also a massive drop of its quality? Not uniformly, just a bit of chaotic spread
I think we are, especially with executives mandating the use LLMs use and expecting it to massively reduce costs and increase output.
For the most part they don't actually seem to care that much about software quality, and tend to push to decrease quality at every opportunity.
Yeah we shouldn’t and I limit my usage to stuff that is easily verifiable.
But there’s no guardrails on this stuff, and one thing that’s not well considered is how these things which make us more powerful and productive can be destructive in the hands of well intentioned people.
This weirds me out. Like I use LLMs A LOT but I always sanity check everything, so I can own the result. Its not the use of the LLM that gets me its trying to shift accountability to a tool.
I still think about Tom Scott's 'where are we on the AI curve' video from a few years back. https://www.youtube.com/watch?v=jPhJbKBuNnA
I normally build things bottom up so that I understand all the pieces intimately and when I get to the next level of abstraction up, I know exactly how to put them together to achieve what I want.
In my (admittedly limited) use of LLMs so far, I've found that they do a great job of writing code, but that code is often off in subtle ways. But if it's not something I'm already intimately familiar with, I basically need to rebuild the code from the ground up to get to the point where I understand it well enough so that I can see all those flaws.
At least with humans I have some basic level of trust, so that even if I don't understand the code at that level, I can scan it and see that it's reasonable. But every piece of LLM generated code I've seen to date hasn't been trustworthy once I put in the effort to really understand it.
> At least with humans I have some basic level of trust, so that even if I don't understand the code at that level, I can scan it and see that it's reasonable.
If you can't scan the code and see that it's reasonable, that's a smell. The task was too big or its implemented the wrong way. You'd feel bad telling a real person to go back and rewrite it a different way but the LLM has no ego to bruise.
I may have a different perspective because I already do a lot of review, but I think using LLMs means you have to do more of it. What's the excuse for merging code that is "off" in any way? The LLM did it? It takes a short time to review your code, give your feedback to the LLM and put up something actually production ready.
> But every piece of LLM generated code I've seen to date hasn't been trustworthy once I put in the effort to really understand it.
That's why your code needs tests. More tests. If you can't test it, it's wrong and needs to be rewritten.
My approach is to describe the task in great detail, which also helps me completing my own understanding of the problem, in case I hadn't considered an edge case or how to handle something specific. The more you do that the closer the result you get is to your own personal taste, experience and design.
Of course you're trading writing code vs writing a prompt but it's common to make architectural docs before making a sizeable feature, now you can feed that to the LLM instead of just having it be there.
From my coworkers I want to be able to say, here's the ticket, you got this? And they take the ticket all the way or PR, interacting with clients, collecting more information etc.
I do somewhat think an LLM could handle client comms for simple extra requirements gathering on already well defined tasks. But I wouldn't trust my business relationships to it, so I would never do that.
It's ignorant to think machines will not catch up to our intelligence at some point, but for now, it's clearly not.
I think there needs to be some kind of revolutionary breakthrough again to reach the next stage.
If I were to guess, it needs to be in the learning/back propagation stage. LLM's are very rigid, and once they go wrong, you can't really get them out of it. A junior develop for example could gain a new insight. LLM's, not so much.
This has not been my experience. LLMs have definitely been helpful, but generally they either give you the right answer or invent something plausible sounding but incorrect.
If I tell it what I'm doing I always get breathless praise, never "that doesn't sound right, try this instead."
Of course it has to be something the LLM actually has lots of material it's trained with. It won't work with anything remotely cutting-edge, but of course that's not what LLM's are for.
But it's been incredibly helpful for me in figuring out the best, easiest, most idiomatic ways of using libraries or parts of libraries I'm not very familiar with.
Another example: saying out loud the colors red, blue, yellow, purple, orange, green—each color creates a feeling that goes beyond its physical properties into the emotions and experiences. AI image-generation might know the binary arrangement of an RGBA image but actually, it has NO IDEA what it is to experience colour. No idea how to use the experience of colour to teach a peer of an algorithm. It regurgitates a binary representation.
At some point we’ll get there though—no doubt. It would be foolish to say never! For those who want to get there before everyone else probably should focus on the organoids—because most powerful things come from some Faustian monstrosity.
Do you actually see a tree with nodes that you can rearrange and have the nodes retain their contents and such?
I have been drawing all my life and studied traditional animation though, so it’s probably a little bit of nature and nurture.
For me, it's less "conversation to be skipped" and more about "can we even get to 2 years from now"? There's so much insability right now that it's hard to say what anything will look like in 6 months. "
You know that saying that the best way to get an answer online is to post a wrong answer? That's what LLMs do for me.
I ask the LLM to do something simple but tedious, and then it does it spectacularly wrong, then I get pissed off enough that I have the rage-induced energy to do it myself.
It's been 20 years since that, so I think people have simply forgotten that a search engine can actually be useful as opposed to ad infested SEO sewage sludge.
The problem is that the conversational interface, for some reason, seems to turn off the natural skepticism that people have when they use a search engine.
> the conversational interface, for some reason, seems to turn off the natural skepticism that people have
n=1 but after having chatgpt "lie" to me more than once i am very skeptical of it and always double check it, whereas something like tv or yt videos i still find myself being click-baited or grifted (iow less skeptical) much more easily still... any large studies about this would be very interesting...This happens… weekly for me.
>from PiicoDev_SlidePot import PiicoDev_SlidePot
Weird how these guys used exactly my terminology when they usually say "Potentiometer"
Went and looked it up, found a resource outlining that it uses the same class as the dial potentiometer.
"Hey chatgpt, I just looked it up and the slidepots actually use the same Potentiometer class as the dialpots."
scurries to fix its stupid mistake
Ideally by having a test or endpoint you can call to actually run the code you want to build.
Then you ask the system to implement the function and run the test. If it hallucinates anything it will find that and fix it.
IME OpenAI is below Claude and Gemini for code.
Statistical text (token) generation made from an unknown (to the user) training data set is not the same as a keyword/faceted search of arbitrary content acquired from web crawlers.
> The problem is that the conversational interface, for some reason, seems to turn off the natural skepticism that people have when they use a search engine.
For me, my skepticism of using a statistical text generation algorithm as if it were a search engine is because a statistical text generation algorithm is not a search engine.
I will often ask the LLM to give me web pages to look at it when I want to do further reading.
As LLMs get better, I can't see myself going back to Google as it is or even as it was.
Google search includes an AI generated response.
Gemini prompts return Google search results.
If that's the answer, or even the best answer, is impossible to tell without doing the research you're trying to avoid.
If ChatGPT needs to, it will actually do the search for me and then collate the results.
Search engines can suck when you don't know exactly what you're looking for and the phrases you're using have invited spammers to fill up the first 10 pages.
For example, I wanted to find some texts on solving a partial differential equation numerically using 6th-order or higher finite differences, as I wanted to know how to handle boundry conditions (interior is simple enough).
Searching only turned up the usual low-order methods that I already knew.
Asking some LLMs I got some decent answer and could proceed.
Back in the day you could force the search engines to restrict their search scope, but they all seem so eager to return results at all cost these days, making them useless in niche topics.
Well, it's roughly the same under the hood, mathematically.
In my experience, it doesn’t matter how good or detailed the prompt is—after enough lines of code, the LLM starts making design decisions for you.
This is why I don’t accept LLM completions for anything that isn’t short enough to quickly verify that it is implemented exactly as I would have myself. Usually, that’s boilerplate code.
^ This. This is where I've landed as far as the extent of LLM coding assistants for me.
People are expecting perfection from bad spec
Isn’t that what engineers are (rightfully) always complaining about to BD?
I've definitely also found that the poor code can sometimes be a nice starting place. One thing I think it does for me is make me fix it up until it's actually good, instead of write the first thing that comes to mind and declare it good enough (after all my poorly written first draft is of course perfect). In contrast to the usual view of AI assisted coding, I think this style of programming for tedious tasks makes me "less productive" (I take longer) but produces better code.
Not really, not always. To anyone who’s used the latest LLMs extensively, it’s clear that this is not something you can reliably assume even with the constraints you mentioned.
They don't
> Garbage in = garbage out generally.
Generally, this statement is false
> When attention is managed and a problem is well defined and necessary materials are available to it, they can perform rather well.
Keyword: can.
They can also not perform really well despite all the management and materials.
They can also work really well with loosey-goosey approach.
The reason is that they are non-deterministic systems whose performance is affected more by compute availability than by your unscientific random attempts at reverse engineering their behavior https://dmitriid.com/prompting-llms-is-not-engineering
No they don't, they generate a statistically plausible text response given a sequence of tokens.
God help us if companies start relying on LLMs for life-or-death stuff like insurance claim decisions.
"UnitedHealth uses AI model with 90% error rate to deny care, lawsuit alleges" Also "The use of faulty AI is not new for the health care industry."
It would actually have been more pernicious that way, since it would lull people into a false sense of security.
I like maths, I hate graphing. Tedious work even with state of the art libraries and wrappers.
LLMs do it for me. Praise be.
I see these comments all the time and they don’t reflect my experience so I’m curious what your experience has been
I also think that language matters - An Emacs function is much more esoteric than say, JavaScript, Python, or Java. If I ever find myself looking for help with something that's not in the standard library, I like provide extra context, such as examples from the documentation.
I've yet to find an LLM that can reliability generate mapping code between proto.Foo{ID string} to gomodel.Foo{ID string}.
It still saves me time, because even 50% accuracy is still half that I don't have to write myself.
But it makes me feel like I'm taking crazy pills whenever I read about AI hype. I'm open to the idea that I'm prompting wrong, need a better workflow, etc. But I'm not a luddite, I've "reached up and put in the work" and am always trying to learn new tools.
This is my first comment so I'm not sure how to do this but I made a BYO-API key VSCode extension that uses the OpenAI realtime API so you can have interactive voice conversations with a rubber ducky. I've been meaning to create a Show HN post about it but your comment got me excited!
In the future I want to build features to help people communicate their bugs / what strategies they've tried to fix them. If I can pull it off it would be cool if the AI ducky had a cursor that it could point and navigate to stuff as well.
Please let me know if you find it useful https://akshaytrikha.github.io/deep-learning/2025/05/23/duck...
Its as if the rubber duck was actually on the desk while youre programming and if we have an MCP that can get live access to code it could give you realtime advice.
I genuinely think this could be great for toys that kids grow up with i.e. the toy could adjust the way it talks depending on the kids age and remember key moments in their life - could be pretty magical for a kid
I humbly suggest a more immediate concern to rectify is identifying how to improve the work environment such that the fear one might "sound dumb to your coworkers & waste their time" does not exist.
They drive you nuts trying to communicate with them what you actually want them to do. They have a vast array of facts at immediate recall. They’ll err in their need to produce and please. They do the dumbest things sometimes. And surprise you at other times. You’ll throw vast amounts of their work away or have to fix it. They’re (relatively) cheap. So as an army of monkeys, if you keep herding them, you can get some code that actually tells a story. Mostly.
"Your job will be taken by someone who does more work faster/cheaper than you, regardless of quality" has pretty much always been true
That's why outsourcing happens too
These little side quests used to eat a lot of my time and I’m happy to have a tool that can do these almost instantly.
That's because other people are making those working well. It's like how you don't care about how the bread is being made because you trust your baker (or the regulations). It's a chain of trust that is easily broken when LLMs are brought in.
So tests may be the inspections, but what is the punitive action? Canceling the subscription?
Here's a kid out hoeing rows for corn. He sees someone planting with a tractor, and decides that's the way to go. Someone tells him, "If you get a tractor, you'll never develop the muscles that would make you really great at hoeing."
Different analogy: Here's someone trying to learn to paint. They see someone painting by numbers, and it looks a lot easier. Someone tells them, "If you paint by numbers, you'll never develop the eye that you need to really become good as a painter."
Which is the analogy that applies, and what makes it the right one?
I think the difference is how much of the job the tool can take over. The tractor can take over the job of digging the row, with far more power, far more speed, and honestly far more quality. The paint by numbers can take over the job of visualizing the painting, with some loss of quality and a total loss of creativity. (In painting, the creativity is considered a vital part; in digging corn rows, not so much.)
I think that software is more like painting, rather than row-hoeing. I think that AI (currently) is in the form of speeding things up with some loss of both quality and creativity.
Can anyone steelman this?
In this example the idea that losing the muscles that make you great at hoeing" seems kind of like a silly thing to worry about
But I think there's a second order effect here. The kid gets a job driving the tractor instead. He spends his days seated instead of working. His lifestyle is more sedentary. He works just as many hours as before, and he makes about the same as he did before, so he doesn't really see much benefit from the increased productivity of the tractor.
However now he's gaining weight from being more sedentary, losing muscle from not moving his body, developing lower back problems from being seated all day, developing hearing loss from the noisy machinery. His quality of life is now lower, right?
Edit: Yes, there are also health problems from working hard moving dirt all day. You can overwork yourself, no question. It's hard on your body, being in the sun all day is bad for you.
I would argue it's still objectively a physically healthier lifestyle than driving a tractor for hours though.
Edit 2: my point is that I think after driving a tractor for a while, the kid would really struggle to go hoe by hand like he used to, if he ever needed to
That's true in the short term, but let's be real, tilling soil isn't likely to become a lost art. I mean, we use big machines right now but here we are talking about using a hoe.
If you remove the context of LLMs from the discussion, it reads like you're arguing that technological progress in general is bad because people would eventually struggle to live without it. I know you probably didn't intend that, but it's worth considering.
It's also sort of the point in an optimistic sense. I don't really know what it takes on a practical level to be a subsistence farmer. That's probably a good sign, all things considered. I go to the gym 6 times a week, try to eat pretty well, I'm probably better off compared to toiling in the fields.
I'm arguing that there are always tradeoffs and we often do not fully understand the tradeoffs we are making or the consequences of those tradeoffs 10, 50, 100 years down the road
When we moved from more physical jobs to desk jobs many of us became sedentary and overweight. Now we are in an "obesity crisis". There's multiple factors to that, it's not just being in desk jobs, but being sedentary is a big factor.
What tradeoffs are we making with AI that we won't fully understand until much further along this road?
Also, what is in it for me or other working class people? We take jobs that have us driving machines, we are "more productive" but do we get paid more? Do we have more free time? Do we get any benefit from this? Maybe a fraction. Most of the benefit is reaped by employers and shareholders
Maybe it would be better if instead of hoeing for 8 hours the farmhand could drive the tractor for 2 hours, make the same money and have 6 more free hours per day?
But what really happens is that the farm buys a tractor, fires 100 of the farmhands coworkers, the has the remaining farmhand drive the tractor for 8 hours, replacing the productivity to very little benefit to himself
Now the other farmhands are unemployed and broke, he's still working just as much and not gaining any extra from it
The only one who benefits are the owners
In a healthy competitive market (like most of the history of the US, maybe not the last 30-40 years), if all of the farms do that, it drives down the cost of the food. The reduction in labor necessary to produce the food causes competition and brings down the cost to produce the food.
That still doesn’t directly benefit the farmhands. But if it happens gradually throughout the entire economy, it creates abundance that benefits everybody. The farmhand doesn’t benefit from their own increase in productivity, but they benefit from everyone else’s.
And those unemployed farmhands likely don’t stay unemployed - maybe farms are able to expand and grow more, now that there is more labor available. Maybe they even go into food processing. It’s not obvious at the time, though.
In tech, we currently have like 6-10 mega companies, and a bunch of little ones. I think creating an environment that allows many more medium-sized companies and allowing them to compete heavily will ease away any risk of job loss. Same applies to a bunch of fields other than tech. The US companies are far too consolidated.
How do we achieve this environment?
It's not through AI, that is still the same problem. The AI companies will be the 6-10 mega companies and anyone relying on AI will still be small fry
Every time in my lifetime that we have had a huge jump in technological progress, all we've seen is that the rich get richer and the poor get poorer and the gap gets bigger
You even call this out explicitly: "most of the history of the US, maybe not the last 30-40 years"
Do we have any realistic reason to assume the trend of the last 30-40 years will change course at this point?
Sure, although I think our lives are generally better than they were a few hundred years ago. Besides, if you care about your health you can always take steps yourself.
> The only one who benefits are the owners
Well yeah, the entity that benefits is the farm, and whoever owns whatever portions of the farm. The point of the farm isn't to give its workers jobs. It's to produce something to sell.
As long as we're in a market where we're selling our labor, we're only given money for being productive. If technology makes us redundant, then we find new jobs. Same as it ever was.
Think about it: why should hundreds of manual farmhands stay employed while they can be replaced by a single machine? That's not an efficient economy or society. Let those people re-skill and be useful in other roles.
Except, of course, it's not the same as it ever was because you do actually run out of jobs. And it's significantly sooner than you think, because people have limits.
I can't be Einstein, you can't be Einstein. If that becomes the standard, you and I will both starve.
We've been pushing people up and up the chain of complexity, and we can do that because we got all the low hanging fruit. It's easy to get someone to read, then to write, then to do basic math, then to do programming. It gets a bit harder though with every step, no? Not everyone who reads has the capability of doing basic math, and not everyone who can do basic math has the capability of being a programmers.
So at each step, we lose a little bit of people. Those people don't go anywhere, we just toss them aside as a society and force them into a life of poverty. You and I are detached from that, because we've been lucky to not be those people. I know some of those people, and that's just life for them.
My parents got high paying jobs straight out of highschool. Now, highschool grads are destined to flip burgers. We've pushed people up - but not everyone can graduate college. Then, we have to think about what happens when we continue to push people up.
Eventually, you and I will not be able to keep up. You're smart, I'm smart, but not that smart. We will become the burger flippers or whatever futuristic equivalent. Uh... robot flippers.
Prompt engineers
You are spot on with your analysis. At some point there will be nothing left for people to do except at the very top level. What happens then?
I am not optimistic enough to believe that we create a utopia for everyone. We would need to solve scarcity first, at minimum.
I'm a bit confused by your read on the people who don't make it through college. The implication is that if you don't make it into a high status/white collar job, you're destined for a life of poverty. I feel like this speaks more to the insecurity of the white collar worker, and isn't actually a good reflection of reality. Most of my friends dropped out of college and did something completely different in the service industry, it's not really a "life of poverty."
> My parents got high paying jobs straight out of highschool. Now, highschool grads are destined to flip burgers.
This feels like pure luck for your parents. Take a wider look at history -- it's just a regression to the mean. We used to have _less_ complex jobs. Mathematics/science hasn't always been a job. That is to say, burger-flipping or an equivalent was more common. It was not the norm that households were held together by a single man's income, etc.
I think it is about how utilitarian the output is. For food no one cares how the sausage is made. For a painting the story behind it is more important than the picture itself. All of Picasso's paintings are famous because they were painted by Picasso. Picasso style painting by Bill? Suddenly it isn't museum worthy anymore.
No one cares about the story or people behind Word, they just want to edit documents. The Demo scene probably has a good shot at being on the side of art.
What an awful imagination. Yes there are people who don't like CSS but are forced to use it by their job so they don't learn it properly, and that's why they think CSS is rote memorization.
But overall I agree with you that if a company is too cheap to hire a person who is actually skilled at CSS, it is still better to hoist that CSS job onto LLMs than an unwilling human. Because that unwilling human is not going to learn CSS well and won't enjoy writing CSS.
On the other hand, if the company is willing to hire someone who's actually good, LLMs can't compare. It's basically the old argument of LLMs only being able to replace less good developers. In this case, you admitted that you are not good at CSS and LLMs are better than you at CSS. It's not task-dependent it's skill-dependent.
Also, there are often times multiple ways to achieve a certain style and they all work fine until you want a particular tweak, in which case only one will work and the LLM usually gets stuck in one of the ones that does not work.
Telling, isn't it?
This is probably really just a way of saying, it's better at simple tasks rather than complex ones. I can eventually get Copilot to write SQL that's complex and accurate, but I don't find it faster or more effective than writing it myself.
Actually I think it's perfectly adequate at SQL too.
It’s a tough bar if LLMs have to be post antirez level intelligence :)
99% of professional software developers don’t understand what he said much less can come up with it (or evaluate it like Gemini).
This feels a bit like a humblebrag about how well he can discuss with an LLM compared to others vibecoding.
The whole thing seems like a pretty good example of collaboration between human and LLM tools.
We're being told that llms are now reasoning, which implies they can make logical leaps and employ creativity to solve problems.
The hype cycle is real and setting expectations that get higher with the less you know about how they work.
In fact, maybe most of has have been replaced by LLMs already :-)
Whenever I try some claim, it does not work. Yes, I know, o3 != CoPilot but I don't have $120 and 100 prompts to spend on making a point.
I imagine on HN, the expectations we're talking about are from fellow software developers who at least have a general idea on how LLM's work and their limitations.
> you will almost certainly be replaced by an llm in the next few years
So... Maybe not. I agree that Hacker News does have a generally higher quality of contributors than many places on the internet, but it absolutely is not a universal for HNers. There are still quite a few posters here that have really bought into the hype for whatever reason
"I need others to buy into LLMs in order for my buy-in to make sense," i.e. network effects.[1]
> Most dot-com companies incurred net operating losses as they spent heavily on advertising and promotions to harness network effects to build market share or mind share as fast as possible, using the mottos "get big fast" and "get large or get lost". These companies offered their services or products for free or at a discount with the expectation that they could build enough brand awareness to charge profitable rates for their services in the future.
You don't have to go very far up in terms of higher order thinking to understand what's going on here. For example, think about Satya's motivations for disclosing Microsoft writing 30% of their code using LLMs. If this really was the case, wouldn't Microsoft prefer to keep this competitive advantage secret? No: Microsoft and all the LLM players need to drive hype, and thus mind share, in the hope that they become profitable at some point.
If "please" and "thank you" are incurring huge costs[2], how much is that LLM subscription actually going to cost consumers when the angel investors come knocking, and are consumers going to be willing to pay that?
I think a more valuable skill might be learning how to make do with local LLMs because who knows how many of these competitors will still be around in a few years.
[1]: https://en.wikipedia.org/wiki/Dot-com_bubble#Spending_tenden... [2]: https://futurism.com/altman-please-thanks-chatgpt
I actually like LLMs better for creative thinking because they work like a very powerful search engine that can combine unrelated results and pull in adjacent material I would never personally think of.
To be fair, I also have problems following this.
Chess programs of course have a well defined algorithm. "AI" would be incapable of even writing /bin/true without having seen it before.
It certainly wouldn't have been able to write Redis.
> Chess programs of course have a well defined algorithm.
Ironically, that also "hasn't been true for a long time". The best chess engines humans have written with "defined algorithms" were bested by RL (alphazero) engines a long time ago. The best of the best are now NNUE + algos (latest stockfish). And even then NN based engines (Leela0) can occasionally take some games from Stockfish. NNs are scarily good. And the bitter lesson is bitter for a reason.
Stockfish NNUE was announced to be 80 ELO higher than the default. I don't find it frustrating. NNs excel at detecting patterns in a well defined search space.
Writing evaluation functions is tedious. It isn't a sign of NN intelligence.
The other, related question is, are human coders with an LLM better than human coders without an LLM, and by how much?
(habnds made the same point, just before I did.)
Source: https://www.thoughtworks.com/insights/blog/generative-ai/exp...
One thing I know is that I wouldn't ask an LLM to write an entire section of code or even a function without going in and reviewing.
These days I am working on a startup doing [a bit of] everything, and I don't like the UI it creates. It's useful enough when I make the building blocks and let it be, but allowing claude to write big sections ends up with lots of reworks until I get what I am looking for.
Indeed it is likely already the case that in training the top links scraped or most popular videos are weighted higher, these are likely to be better than average.
And what really matters is, if the task gets reliable solved.
So if they actually could manage this on average with average quality .. that would be a next level gamechanger.
If you're getting average results you most likely haven't given it enough details about what you're looking for.
The same largely applies to hallucinations. In my experience LLMs hallucinate significantly more when at or pushed to exceed the limits of their context.
So if you're looking to get a specific output, your success rate is largely determined by how specific and comprehensive the context the LLM has access to is.
IA is neat for average people, to produce average code, for average compagnies
In a competitive world, using IA is a death sentence;
Very few people are doing truly cutting edge stuff - we call them visionaries. But most of the time, we're just merely doing what's expected
And yes, that includes this comment. This wasnt creative or an original thought at all. I'm sure hundreds of people have had similar thought, and I'm probably parroting someone else's idea here. So if I can do it, why cant LLM?
But generally speaking I don't experience programming like that most of the time. There are so many things going on that have nothing to do with pattern matching while coding.
I load up a working model of the running code in my head and explore what it should be doing in a more abstract/intangible way and then I translate those thoughts to code. In some cases I see the code in my inner eye, in others I have to focus quite a lot or even move around or talk.
My mind goes to different places and experiences. Sometimes it's making new connections, sometimes it's processing a bit longer to get a clearer picture, sometimes it re-shuffles priorities. A radical context switch may happen at any time and I delete a lot of code because I found a much simpler solution.
I think that's a qualitative, insurmountable difference between an LLM and an actual programmer. The programmer thinks deeply about the running program and not just the text that needs to be written.
There might be different types of "thinking" that we can put into a computer in order to automate these kinds of tasks reliably and efficiently. But just pattern matching isn't it.
And by better, I don’t mean in terms of code quality because ultimately that doesn’t matter for shipping code/products, as long as it works.
What does matter is speed. And an LLM speeds me up at least 10x.
You expect to achieve more than a decade of pre-LLM accomplishments between now and June 2026?
There will always be a place for really good devs but for average people (most of us are average) I think there will be less and less of a place.
99% of professional software developers don’t understand what he said much less can come up with it (or evaluate it like Gemini).
This feels a bit like a humblebrag about how well he can discuss with an LLM compared to others vibecoding.
One major aspect of software engineering is social, requirements analysis and figuring out what the customer actually wants, they often don't know.
If a human engineer struggles to figure out what a customer wants and a customer struggles to specify it, how can an LLM be expected to?
Probably going to have the same outcome.
Setting up a system to make decisions autonomous is technically easy. Ensuring that it makes the right decisions, though, is a far harder task.
I actually imagine it's the opposite of what you say here. I think technically inclined "IT business partners" will be able of creating applications entirely without software engineers... Because I see that happen every day in the world of green energy. The issues come later, when things have to be maintained, scale or become efficient. This is where the software engineering comes in, because it actually matters if you used a list or a generator in your Python app when it iterates over millions of items and not just a few hundreds.
It does need to be reliable, though. LLMs have proven very bad at that
That was the way I saw it for a while. In recent months I've begun to wonder if I need to reevaluate that, because it's become clear to me that scaling doesn't actually start from zero. By zero I mean that I was naive enough to think that all programs, even the most googled programmed one by a completely new junior would at least have, some, efficiency... but some of these LLM services I get to work on today are so bad they didn't start at zero but at some negative number. It would have been less of an issue if our non-developer-developers didn't use Python (or at least used Python with ruff/pyrefly/whateveryoulike, but some of the things they write can't even scale to do minimal BI reporting.
Software engineering, is a different thing, and I agree you're right (for now at least) about that, but don't underestimate the sheer amount of brainless coders out there.
I would argue it’s a good thing to replace the actual brainless activities.
Chat UIs are an excellent customer feedback loop. Agents develop new iterations very quickly.
LLMs can absolutely handle abstractions and different kinds of component systems and overall architecture design.
They can also handle requirements analysis. But it comes back to iteration for the bottom line which means fast turnaround time for changes.
The robustness and IQ of the models continue to be improved. All of software engineering is well underway of being automated.
Probably five years max where un-augmented humans are still generally relevant for most work. You are going to need deep integration of AI into your own cognition somehow in order to avoid just being a bottleneck.
Presumably, they're trained on a ton of requirements docs, as well as a huge number of customer support conversations. I'd expect them to do this at least as well as coding, and probably better.
It really depends on the organization. In many places product owners and product managers do this nowadays.
Think about it and tell me you use it differently.
1) Starting simple codebases 2) Googling syntax 3) Writing bash scripts that utilize Unix commands whose arguments I have never bothered to learn in the first place.
I definitely find time savings with these, but the esoteric knowledge required to work on a 10+ year old codebase is simply too much for LLMs still, and the code alone doesn't provide enough context to do anything meaningful, or even faster than I would be able to do myself.
In the long term (post AGI), the only safe white-collar jobs would be those built on data which is not public i.e. extremely proprietary (e.g. Defense, Finance) and even those will rely heavily on customized AIs.
Now we have Geoffrey Hinton getting the prize for contributing to one of the most destructive inventions ever.
Making our work more efficient, or humans redundant should be really exciting. It's not set in stone that we need to leave people middle aged with families and now completely unable to earn enough to provide a good life
Hopefully if it happens, it happens to such a huge amount of people that it forces a change
We did. Why do you think labor laws, unions, etc. exist? Why do you think communism was appealing as an idea in the beginning to many? Whether the effects were good or bad or enough or not, that’s a different question. But these changes have demonstrably, grave consequences.
Isnt every little script, every little automation us programmers do in the same spirit? "I dont like doing this, so I'm going to automate it, so that I can focus on other work".
Sure, we're racing towards replacing ourselves, but there would be (and will be) other more interesting work for us to do when we're free to do that. Perhaps, all of us will finally have time to learn surfing, or garden, or something. Some might still write code themselves by hand, just like how some folks like making bread .. but making bread by hand is not how you feed a civilization - even if hundreds of bakers were put out of business.
Unless you have a mortgage.. or rent.. or need to eat
Where do you get this? The limitations of LLMs are becoming more clear by the day. Improvements are slowing down. Major improvements come from integrations, not major model improvements.
AGI likely can't be achieved with LLMs. That wasn't as clear a couple years ago.
Are there plenty of gaps left between here and most definitions of AGI? Absolutely. Nevertheless, how can you be sure that those gaps will remain given how many faculties these models have already been able to excel at (translation, maths, writing, code, chess, algorithm design etc.)?
It seems to me like we're down to a relatively sparse list of tasks and skills where the models aren't getting enough training data, or are missing tools and sub-components required to excel. Beyond that, it's just a matter of iterative improvement until 80th percentile coder becomes 99th percentile coder becomes superhuman coder, and ditto for maths, persuasion and everything else.
Maybe we hit some hard roadblocks, but room for those challenges to be hiding seems to be dwindling day by day.
Poker tests intelligence. So what gives? One interesting thing is that for whatever reason, poker performance isn't used a benchmark in the LLM showdown between big tech companies.
The models have definitely improved in the past few years. I'm skeptical that there's been a "break-through", and I'm growing more skeptical of the exponential growth theory. It looks to me like the big tech companies are just throwing huge compute and engineering budgets at the existing transformer tech, to improve benchmarks one by one.
I'm sure if Google allocated 10 engineers a dozen million dollars to improve Gemini's poker performance, it would increase. The idea before AGI and the exponential growth hypothesis is that you don't have to do that because the AI gets smarter in a general sense all on it's own.
> improve benchmarks one by one
If you're right about that in the strong sense — that each task needs to be optimised in total isolation — then it would be a longer, slower road to a really powerful humanlike system.
What I think is really happening though that each specific task (eg. coding) is having large spillover effects on other areas (eg. helping them to be better at extended verbal reasoning even when not writing any code). The AI labs can't do everything at once, so they're focusing where:
- It's easy to generate more data and measure results (coding, maths etc.) - There's a relative lack of good data in the existing training corpus (eg. good agentic reasoning logic - the kinds of internal monologs that humans rarely write down) - Areas where it would be immediately useful for the models to get better in a targeted way (eg. agentic tool-use; developing great hypothesis generation instincts in scientific fields like algorithm design, drug discovery and ML research)
By the time those tasks are optimised, I suspect the spill over effects will be substantial and the models will generally be much more capable.
Beyond that, the labs are all pretty open about the fact that they want to use the resulting AI talents for coding, reasoning and research skills to accelerate their own research. If that works (definitely not obvious yet) then finding ways to train a much broader array of skills could be much faster because that process itself would be increasingly automated.
I think they are hoping that their future is safe. And it is the average minds that will have to go first. There may be some truth to it.
Also, many of these smartest minds are motivated by money, to safeguard their future, from a certain doom that they know might be coming. And AI is a good place to be if you want to accumulate wealth fast.
Was on r/fpga recently and mentioned that I had had a lot of success recently in getting LLMs to code up first-cut testbenches that allow you to simulate your FPGA/HDL design a lot quicker than if you were to write those testbenches yourself and my comment was met with lots of derision. But they hadn't even given it a try to form their conclusion that it just couldn't work.
"AI" is the latest iteration of snake oil that is foisted upon us by management. The problem is not "AI" per se, but the amount of of friction and productivity loss that comes with it.
Most of the productivity loss comes from being forced to engage with it and push back against that nonsense. One has to learn the hype language, debunk it, etc.
Why do you think IT has gotten better? Amazon had a better and faster website with far better search and products 20 years ago. No amount of "AI" will fix that.
Depends on the context. You have to keep in mind: it is not a goal of our society or economic system to provide you with a stable, rewarding job. In fact, the incentives are to take that away from you ASAP.
Before software engineers go celebrate this tech, they need realize they're going to end up like rust-belt factory workers the day after the plant closed. They're not special, and society won't be any kinder to them.
> ...and even wiser to be in charge of and operate the replacement.
You'll likely only get to do that if your boss doesn't know about it.
We seem to agree as this is more or less exactly the my point. Striving to keep the status quo is a futile path. Eventually things change. Be ready. The best advice I've ever got work (and maybe even life) wise is to always have alternatives. If you don't have alternatives, you literally have no choice.
Those alternatives are going to be worse for you, because if they weren't, why didn't you switch already? And if a flood of your peers are pursing alternatives at the same time, you'll probably experience an even poorer outcome than you expected (e.g. everyone getting laid off and trying to make ends meet driving for Uber at the same time). Then, AI is really properly understood as a "fuck the white-collar middle-class" tech, and it's probably going to fuck up your backup plans at about the same time as it fucks up your status quo.
You're also describing a highly individualistic strategy, for someone acting on his own. At this point, the correct strategy is probably collective action, which can at least delay and control the change to something more manageable. But software engineers have been too "special snowflake" about themselves to have laid the groundwork for that, and are acutely vulnerable.
I do concur it is an individualistic strategy, and as you mentioned unionization might have helped. But, then again it might not. Developers are partially unionized where I live, and I'm not so sure it's going to help. It might absorb some of the impact. Let's see in a couple of years.
People have families to feed and lifestyles to maintain, anything that's not equivalent will introduce hardship. And "different" most likely means worse, when it comes to compensation. Even a successful career change usually means restarting at the bottom of the ladder.
And what's that "something else," exactly? You need to consider that may be disrupted at the same time you're planning on seeking it, or fierce competition from your peers makes it unobtainable to you.
Assuming there are alternatives waiting for you when you'll need them is its own kind of complacency.
> It might be selling all your stuff and live on an island in the sun for all I know.
Yeah, people with the "fuck-you" money to do that will probably be fine. Most people don't have that, though.
Hardship or not, restarting from the bottom of the ladder or not, betting on status quo is a loosing game at the moment. Software development is being disrupted, I would expect developers to produce 2-4x more now than two years ago. However, that is the pure dev work. The architecture, engineering, requirements, specification etc parts will likely see another trajectory. Much due to the raise of automation in dev and other parts of the company. The flip side is that the raise of non-dev automation is coming, with the possibility of automating other tasks, in turn making engineers (maybe not devs though) vital to the companies process change.
Another, semi related, thought is that software development has automated away millions of jobs and it’s just developers time to be on the other end of the stick.
Don’t let cynics rule your country. Go vote. There’s no rule that things have to stay awful.
Do you want to be a jobless weaver, or an engineer building mechanical looms for a higher pay than the weaver got?
You could even go back to punch cards if you want to. Literally nobody forcing you to not use it for your own fun.
But LLMs are a multiplier in many mundane tasks (I'd say about 80+% of software development for businesses), so not using them is like fighting against using a computer because you like writing by hand.
Happy to hate myself but earn OK money for OK hours.
Tools and systems which increase productivity famously always put everyone out of a job, which is why after a couple centuries of industrial revolution we're all unemployed.
An LLM is more like outsourcing to a consultancy. Results may vary.
I hate AI code assistants, not because they suck, but because they work. The writing is on the wall.
If we aren't working on our own replacements, we'll be the ones replaced by somebody else's vibe code, and we have no labor unions that could plausibly fight back against this.
So become a Vibe Coder and keep working, or take the "prudent" approach you mention - and become unemployed.
This "vibe coding" seems just another way to say that people spend more time refining the output of these tools over and over again that what they would normally code.
But there's going to be an inflection point - soon - as things continue to improve. The industry is going to change rapidly.
Now is the time to either get ready for that - by being ahead of the curve, at least by being familiar with the tooling - or switch careers and cede your job to somebody who will play ball.
I don't like any of this, but I see it as inevitable.
Clearly they do work in a general sense. People who don't want to code are making things that work this way right now!
This isn't yet replacing me, but I'm certain it will relatively soon be the standard for how software is developed.
Personally I’m thrilled that I can get trivial, one-off programs developed for a few cents and the cost of a clear written description of the problem. Engaging internal developers or consulting developers to do anything at all is a horrible experience. I would waste weeks on politics, get no guarantees, and waste thousands of dollars and still hear nonsense like, “you want a form input added to a web page? Aw shucks, that’s going to take at least another month” or “we expect to spend a few days a month maintaining a completely static code base” from some clown billing me $200/hr.
I've seen Claude and ChatGPT happily hallucinate whole APIs for D3 on multiple occasions, which should be really well represented in the training sets.
Recently I converted all the (Google Docs) documentation of a project to markdown files and added those to the workspace. It now indexes it with RAG and can easily find relevant bits of documentation, especially in agent mode.
It really stresses the importance of getting your documentation and processes in order as well as making sure the tasks at hand are well-specified. It soon might be the main thing that requires human input or action.
In fact, I built an entirely headless coding agent for that reason: you put tasks in, you get PRs out, and you get journals of each run for debugging but it discourages micro-management so you stay in planning/documenting/architecting.
With many existing systems, you can pull documentation into context pretty quickly to prevent the hallucination of APIs. In the near future it's obvious how that could be done automatically. I put my engine on the ground, ran it and it didn't even go anywhere; Ford will never beat horses.
Which means it's back to being a very useful tool, but not the earth-shattering disruptor we hoped (or worried) it would be.
Yet?
Fun to consider but that much uncertainty isn't worth much.
o3 came out just one month ago. Have you been using it? Subjectively, the gap between o3 and everything before it feels like the biggest gap I've seen since ChatGPT originally came out.
Using it to prototype some low level controllers today, as a matter of fact!
Any skilled developer with a decade of experience can write prompts that return back precisely what we wanted almost every single time. I do it all day long. "Claude 4" rarely messes up.
I have a hard time imagining an LLM being able to do arbitrary things. It always feels like LLMs can do lots of the easy stuff, but if they can't do everything you need the skilled engineer anyway, who'd knock the easy things out in a week anyway.
LLMs for coding are not even close to imperfect, yet, but the saturation curves are not flattening out; not by a long shot. We are living in a moment and we need to come to terms with it as the work continues to develop; and, we need to adapt and quickly in order to better understand what our place will become as this nascent tech continues its meteoric trajectory toward an entirely new world.
Instead, we have a tiny handful of one-off events that were laboriously tuned and tweaked and massaged over extended periods of time, and a flood of slop in the form of broken patches, bloated and misleading issues, and nonsense bug bounty attempts.
Then the people who congratulate the AI for helping get yelled at by the other category.
We'd still have more than tortured, isolated, one-offs. We should have at least one well-known codebase maintained through the power of Silicon Valley's top silicon-based minds.
If the future didnt turn out to be revolutionary, you now have done some "unnecessary" work at worst, but might've acquired some skills or value at least. In the case of most well off programmers, i suspect buying assets/investments which can afford them at least a reasonable lifestyle is likely too.
So the default position of being stationary, and assuming the world continues the way it has been, is not such a good idea. One should always assume the worst possible outcome, and plan for that.
Maybe if you work e-commerce or in the military.
But how do you even translate this line of thought for today?
Is you EMP defenses up to speed?
Are you studying russian and chinese while selling kidneys in order to afford your retirement home on Mars?
My point being, you can never plan for every worst outcome. In reality you would have a secondary data center, backups and a working recovery routine.
None of which matters if you use autocomplete or not.
They can’t write me a safety-critical video player meeting the spec with full test coverage using a proprietary signal that my customer would accept.
If you always say that every new fad is just hype, then you'll even be right 99.9% of the time. But if you want to be more valuable than a rock (https://www.astralcodexten.com/p/heuristics-that-almost-alwa...), then you need to dig into the object-level facts and form an opinion.
In my opinion, AI has a much higher likelihood of changing everything very quickly than crypto or similar technologies ever did.
If you want to convince skeptics talk about examples, vibe code a successful business, show off your success with using AI. Telling people it's the future and if you disagree you have your head in the sand, is wholly unconvincing.
There's no guarantee a technology will take off, even if it's really, really good. Because we don't decide if that tech takes off - the lawyers do. And they might not care, or they might decide billing more hours is better, actually.
History majors everywhere are weeping.
The guiding principle of biglaw.
Attorneys have the bar to protect them from technology they don’t want. They’ve done it many times before, and they’ll do it again. They are starting to entertain LLMs, but not in a way that would affect their billable hours.
Look, we see the forest. We are just not impressed by it.
Having unlimited chaos monkeys at will is not revolutionizing anything.
Remember how blockchain was going to change the world? Web3? IoT? Etc etc.
I've been through enough of these cycles to understand that, while the AI gimmick is cool and all, we're probably at the local maximum. The reliability won't improve much from here (hallucinations etc), while the costs to run it will stay high. The final tombstone will be when the AI companies stop running at a loss and actually charge for the massive costs associated with running these models.
Personally, I'm more interested in the political angle. I can see that AI will be disruptive because there's a ton of money and possibly other political outcomes depending on it doing exactly that.
Have you tried talking to ChatGPT voice mode? It's mind blowing. You just have a conversation with it. In any language. About anything. The other day I wanted to know about the difference between cast iron and wrought iron, and it turned into a 10 or 15 minute conversation. That's maybe a good example of an "easy" topic for LLMs (lots of textbooks for it to memorize), but the world is full of easy topics that I know nothing about!
> but each succeeding iteration seems to be more disappointing
This is because the scaling hypothesis (more data and more compute = gains) is plateauing, because all text data is used and compute is reaching diminishing returns for some reason I’m not smart enough to say why, but it is.
So now we're seeing incremental core model advancements, variations and tuning in pre- and post training stages and a ton of applications (agents).
This is good imo. But obviously it’s not good for delusional valuations based exponential growth.
What I mean is that you are right assuming we use a transformation that still while revertible has avalanche effect. Btw in practical terms I doubt there are practical differences.
A good hash function intentionally won't hit that level, but it should be close enough not to matter with 64 bit pointers. 32 bits is small enough that I'd have concerns at scale.
There is a principle (I forget where I encountered it) that it is not code itself that is valuable, but the knowledge of a specific domain that an engineering team develops as they tackle a project. So code itself is a liability, but the domain knowledge is what is valuable. This makes sense to me and matched my long experience with software projects.
So, if we are entrusting coding to LLMs, how will that value develop? And if we want to use LLMs but at the same time develop the domain acumen, that means we would have to architects things and hand them over to LLMs to implement, thoroughly check what they produce, and generally guide them carefully. In that case they are not saving much time.
Companies that try to replace their employees with LLMs and AIs will fail.
Unfortunately, all that's in the long run. In the near term, some CEOs and management teams will profit from the short term valuations as they squander their companies' future growth on short-sighted staff cuts.
the people that are good at using these tools now will be better at it later too. you might have closed the gap quite a bit but you will still be behind
using LLMs are they are now requires a certain type of mindset that takes practice to maintain and sharpen. It's just like a competitive game. The more intentionally do it, the better you get. And the meta changes every 6 months to a year.
That's why I scroll and laugh through all the comments on this thread dismissing it, because I know that the people dismissing it are the problem.
the interface is a chatbox with no instructions or guardrails. the fact that folks think that their experience is universal is hilarious. so much of using LLM right now is context management.
I can't take most of yall in this thread seriously
> so much of using LLM right now is context management
That is because the tooling is incredibly immature. Even if raw LLM capabilities end up plateauing, new and more effective tools are going to proliferate. You won't have to obsess over managing context, just like we don't have to do 2023-level tricks like "you are an expert" or "please explain your thought process" anymore. All of the context management tricks will be obsolete very soon... because AI tooling companies are extremely incentivized to solve it.
I find it implausible that the tech is in a state where full-time prompters are gaining a durable advantage over everyone else. J2ME devs probably thought they were building a snowballing advantage over devs who dismissed mobile development. Then the iPhone came out and totally reset the playing field.
[1] Most employers don't distinguish between three months and nine months of experience with JS framework du jour, no matter what it says on the job listing
Edited to add: Claude Code brought the agentic coding trend to the mainstream. It came out three months ago. You talk about how much you're laughing at the naivete of people here, but are you telling me with a straight face that three months is enough to put a talented engineer "behind"? At risk of being unemployable? The engineers who spent the last three months ping-ponging between Claude Code, Cursor, Codex, etc. can have their experience distilled into like a week of explaining to a newcomer, and I predict that will be true six months from now, or a year from now.
No, the top players when the meta changes in competitive games remain the top players. They also figure out the new meta faster than the casual players.
but i'll say it again, when the meta changes the people that were at the top will quickly find themselves at the top again.
listen, the reason why they were in the top in the first place and you aren't is a mindset thing. the top are the curious that are experimenting and refining, sharing with each other techniques developed over time.
the complacent just sit around and lets the world happen to them. they, like you are expressing now, think that when the meta switches the bottom will suddenly find themselves at the top and the top will have nothing.
look around you, that's obviously not how the world works.
but yes, laughing
I think parent is agreeing with you?
> This is why devs who started with J2ME are the holy grail of app developers, since they started making apps years before iPhone devs
The iPhone was an equalizer. Existing mobile devs did get a genuine head start on mobile app design, but their advantage was fleeting.
I do use these tools though! I spent some time with AI. I have coworkers who are more heads-down working on their projects and not tinkering with agents, and they're doing fine. I have coworkers who are on the absolute bleeding edge of AI tools, and they're doing fine. When the tooling matures and the churn lessens and the temperature of the discourse is lowered, I'm confident that we will all be doing great things. I just think that the "anybody not using and optimizing Codex or Claude Code today is not gonna make it" attitude is misguided. I could probably wring out some more utility from these tools if I spent more time with them, but I'd rather spend most of my professional development time working on subject matter expertise. I want to deeply understand my domain, and I trust that AI use will (mostly) become relatively easier to pick up and less of a differentiator as time goes on
That's actually really interesting to think about. The idea that doing something counter-productive like trying to replace employees with AI (which will cause problems), may actually benefit the company in terms of valuations in the short run. So in effect, they're hurting and helping the company at the same time.
This is especially prevalent in waterfall orgs that refuse change. Body shops are more than happy to waste a huge portion of their billable hours on planning meetings and roadmap revisions as the obviousness of the mythical man month comes to bear on the org.
Corners get cut to meet deadlines, because the people who started/perpetuated whatever myth need to save their skins (and hopefully continue to get bonuses.)
The engineers become a scapegoat for the org's management problems (And watch, it very likely will happen at some shops with the 'AI push'). In the nasty cases, the org actively disempowers engineers in the process[0][1].
[0] - At one shop, there was grief we got that we hadn't shipped a feature, but the only reason we hadn't, was IT was not allowed to decide between a set of radio buttons or a drop-down on a screen. Hell I got yelled at for just making the change locally and sending screenshots.
[1] - At more than one shop, FTE devs were responsible for providing support for code written by offshore that they were never even given the opportunity to review. And hell yes myself and others pushed for change, but it's never been a simple change. It almost always is 'GLWT'->'You get to review the final delivery but get 2 days'->'You get to review the set of changes'->'Ok you can review their sprint'->'OK just start reviewing every PR'.
“The market can remain irrational longer than you can remain solvent.” — Warren Buffett
We have never been good at confronting the follies of management. The Leetcode interview process is idiotic but we go along with it. Ironically LC was one of the first victims of AI, but this is even more of an issue for management that things SWEs solve Leetcodes all day.
Ultimately I believe this is something that will take a cycle for business to figure out by failing. When businesses will figure out that 10 good engineers + AI always beats 5 + AI, it will become table stakes rather than something that replaces people.
Your competitor who didn’t just fire a ton of SWEs? Turns out they can pay for Cursor subscriptions too, and now they are moving faster than you.
While technically capable of building it on my own, development is not my day job and there are enough dumb parts of the problem my p(success) hand-writing it would have been abysmal.
With rose-tinted glasses on, maybe LLM's exponentially expand the amount of software written and the net societal benefit of technology.
Doing the actual thinking is generally not the part I need too much help with. Though it can replace googling info in domains I'm less familiar with. The thing is, I don't trust the results as much and end up needing to verify it anyways. If anything AI has made this harder, since I feel searching the web for authoritative, expert information has become harder as of late.
LLMs are faster, and when the task can be synthetically tested for correctness, and you can build up to it heuristically, humans can't compete. I can't spit out a full game in 5 minutes, can you?
LLMs are also cheaper.
LLMs are also obedient and don't get sick, and don't sleep.
Humans are still better by other criteria. But none of this matters. All disruptions start from the low end, and climb from there. The climbing is rapid and unstoppable.
Unless you’re a web dev. Then youre right and will be replaced soon enough. Guess why.
I built Brokk to maximize the ability of humans to effectively supervise their AI minions. Not a VS code plugin, we need something new. https://brokk.ai
It's about Humans vs. Humans+AI
and 4/5, Humans+AI > Humans.
Maybe LLMs completely trivialize all coding. The potential for this is there
Maybe progress slows to a snails pace, the VC money runs out and companies massively raise prices making it not worth it to use
No one knows. Just sit back and enjoy the ride. Maybe save some money just in case
The problem is that the software world got eaten up by the business world many years ago. I'm not sure at what point exactly, or if the writing was already on the wall when Bill Gates' wrote his open letter to hobbyists in 1976.
The question is whether shareholders and managers will accept less good code. I don't see how it would be logical to expect anything else, as long as profit lines go up why would they care.
Short of some sort of cultural pushback from developers or users, we're cooked, as the youth say.
Bad code leads to bad business
This makes me think of hosting departement; You know, which people who are using vmware, physical firewalls, dpi proxies and whatnot;
On the other edge, you have public cloud providers, which are using qemu, netfilter, dumb networking devices and stuff
Who got eaten by whom, nobody could have guessed ..
Bad business leads to bad business.
Bad code might be bad, or might be sufficient. It's situational. And by looking at what exists today, majority of code is pretty bad already - and not all businesses with bad code lead to bad businesses.
In fact, some bad code are very profitable for some businesses (ask any SAP integrator).
It eludes all of those who died in the process : those still alives are here despite bad IT, not due to a bad IT
Corporations create great code too: they're not all badly run.
The problem isn't a code quality issue: it is a moral issue of whether you agree with the goals of capitalist businesses.
Many people have to balance the needs of their wallet with their desire for beautiful software (I'm a developer-founder I love engineering and open source community but I'm also capitalist enough to want to live comfortably).
Could most software be more awesome? Yes. Objectively, yes. Is most software garbage? Perhaps by raw volume of software titles, but are most popular applications I’ve actually used garbage? Nope. Do I loathe the whole subscription thing? Yes. Absolutely. Yet, I also get it. People expect software to get updated, and updates have costs.
So, the pertinent question here is, will AI systems be worse than humans? For now, yeah. Forever? Nope. The rate of improvement is crazy. Two years ago, LLMs I ran locally couldn’t do much of anything. Now? Generally acceptable junior dev stuff comes out of models I run on my Mac Studio. I have to fiddle with the prompts a bit, and it’s probably faster to just take a walk and think it over than spend an hour trying different prompts… but I’m a nerd and I like fiddling.
You know, those who don't care about learning and solving problems, gaining real experience they can use to solve problems even faster in the future, faster than any AI slop.
Glad to see the author acknowledges their usefulness and limitations so far.
Increase the temperature of the LLMs.
Ask several LLMs, each several time the same question, with tiny variations. Then collect all answers, and do a second/third round asking each LLM to review all collected answers and improve.
Add random constraints, one constraints per question. For example, to LLM: can you do this with 1 bit per X. Do this in O(n). Do this using linked lists only. Do this with only 1k memory. Do this while splitting the task to 1000 parallel threads, etc.
This usually kicks the LLM out of its confort zone, into creative solutions.
I never made a Dockerfile in my life, so I thought it would be faster just getting o3 to point to the GitHub repo and let it figure out, rather than me reading the docs and building it myself.
I spent hours debugging the file it gave me... It kept on adding hallucinations for things that didn't exist, and removing/rewriting other parts, and other big mistakes like understanding the difference between python3 and python and the intricacies with that.
Finally I gave up and Googled some docs instead. Fixed my file in minutes and was able to jump into the container and debug the rest of the issues. AI is great, but it's not a tool to end all. You still need someone who is awake at the wheel.
I get having a bad taste in your mouth but these tools _aren't _ magic and do have something of a steep learning curve in order to get the most out of them. Not dissimilar from vim/emacs (or lots of dev tooling).
edit: To answer a reply (hn has annoyingly limited my ability to make new comments) yes, internet search is always available to ChatGpT as a tool. Explicitly clicking the globe icon will encourage the model to use it more often, however.
I didn't know it could even be disabled. It must be enabled by default, right?
I don’t think I ever got to write "this api doesn't exist" and then gotten a useful alternative.
Claude is the only one that regularly tells me something isn't possible rather than making sh up.
This stuff is still in its infancy, of course its not perfect
But its already USEFUL and it CAN do a lot of stuff - just not all types of stuff and it still can mess up the stuff that it can do
It's that simple
The point is that overtime it'll get better and better
Reminds me of self driving cars and or even just general automation back in the day - the complaint has always been that a human could do it better and at some point those people just went away because it stopped being true
Another example is automated mail sorting by the post office. The gripe was always humans will always be able to do it better - true, in the meantime the post office reduced the facilities with humans that did this to just one
LLMs are great as assistants. Just today, Copilot told me it's there to do the "tedious and repetitive" parts so I can focus my energy on the "interesting" parts. That's great. They do the things every programmer hates having to do. I'm more productive in the best possible way.
But ask it to do too much and it'll return error-ridden garbage filled with hallucinations, or just never finish the task. The economic case for further gains has diminished greatly while the cost of those gains rises.
Automation killed tons of manufacturing jobs, and we're seeing something similar in programming, but keep in mind that the number of people still working in manufacturing is 60% of the peak, and those jobs are much better than the ones in the 1960s and 1970s.
And also, manufacturing jobs have greatly changed. And the effect is not even, I imagine. Some types of manufacturing jobs are just gone.
Probably, but I'm not sure that had much to do with AI.
> Some types of manufacturing jobs are just gone
The manufacturing work that was automated is not exactly the kind of work people want to do. I briefly did some of that work. Briefly because it was truly awful.
Or… it still requires similar education and experience but programmers end up so much more efficient they earn _more_.
Hard to say right now.
General LLM model would not be as good as LLM for coding, for this case Google deepmind team maybe has something better than Gemini 2.5 pro
Right now the scope of what an LLM can solve is pretty generic and focused. Anytime more than a class or two is involved or if the code base is more than 20 or 30 files, then even the best LLMs start to stray and lose focus. They can't seem to keep a train of thought which leads to churning way too much code.
If LLMs are going to replace real developers, they will need to accept significantly more context, they will need a way to gather context from the business at large, and some way to persist a train of thought across the life of a codebase.
I'll start to get nervous when these problems are close to being solved.
I paste in the entire codebase for my small ETL project (100k tokens) and it’s pretty good.
Not perfect, still a long ways to go, but a sign of the times to come.
Maybe the way forward would be to invent better "specifiction languages" that are easy enough for humans to use, then let the AI implement the specifciation you come up with.
The programmer is an architect of logic and computers translate human modes of thought into instructions. These tools can imitate humans and produce code given certain tasks, typically by scraping existing code, but they can't replace that abstract level of human thought to design and build in the same way.
When these models are given greater functionality to not only output code but to build out entire projects given specifications, then the role of the human programmer must evolve.
So much of it is exploratory, deciding how to solve a problem from a high level, in an understandable way that actually helps the person who it’s intended to help and fits within their constraints. Will an LLM one day be able to do all of that? And how much will it cost to compute? These are the questions we don’t know the answer to yet.
At this point I think people who don't see the value in AI are willfully pulling the wool over their own eyes.
Neither of these issues is particularly damning on its own, as improvements to the technology could change this. However, the reason I have chosen to avoid them is unlikely to change; that they actively and rapidly reduce my own willingness for critical thinking. It’s not something I noticed immediately, but once Microsoft’s study showing the same conclusions came out, I evaluated some LLM programming tools again and found that I generally had a more difficult time thinking through problems during a session in which I attempted to rely on said tools.
A chunk of the objections indicate people trying to shoehorn in their old way of thinking and working.
I think you have to experiment and develop some new approaches to remove the friction and get the benefit.
Mediocre ones … maybe not so much.
When I worked for a Japanese optical company, we had a Japanese engineer, who was a whiz. I remember him coming over from Japan, and fixing some really hairy communication bus issues. He actually quit the company, a bit after that, at a very young age, and was hired back as a contractor; which was unheard of, in those days.
He was still working for them, as a remote contractor, at least 25 years later. He was always on the “tiger teams.”
He did awesome assembly. I remember when the PowerPC came out, and “Assembly Considered Harmful,” was the conventional wisdom, because of pipelining, out-of-order instructions, and precaching, and all that.
His assembly consistently blew the doors off anything the compiler did. Like, by orders of magnitude.
How did it help, really? By telling you your idea was no good?
A less confident person might have given up because of the feedback.
I just can't understand why people are so excited about having an algorithm guessing for them. Is it the thrill when it finally gets something right?
Translation software has been around for a couple of decades. It was pretty shitty. But about 10 years ago it started to get to the point where it could translate relatively accurately. However, it couldn't produce text that sounded like it was written by a human. A good translator (and there are plenty of bad ones) could easy outperform a machine. Their jobs were "safe".
I speak several languages quite well and used to do freelance translation work. I noticed that as the software got better, you'd start to see companies who instead of paying you to translate wanted to pay you less to "edit" or "proofread" a document pre-translated by machine. I never accepted such work because sometimes it took almost as much work as translating it from scratch, and secondly, I didn't want to do work where the focus wasn't on quality. But I saw the software steadily improving, and this was before ChatGPT, and I realized the writing was on the wall. So I decided not to become dependent on that for an income stream, and moved away from it.
When LLMs came out, and they now produce text that sounded like it was written by a native speaker (in major languages). Sure, it's not going to win any literary awards, but the vast vast majority of translation work out there is commercial, not literature.
Several things have happened: 1) there's very little translation work available compared to before, because now you can pay only a few people to double-check machine-generated translations (that are fairly good to start with); 2) many companies aren't using humans at all as the translations are "good enough" and a few mistakes won't matter that much; 3) the work that is available is high-volume and uninteresting, no longer a creative challenge (which is why I did it in the first place); 4) downward pressure on translation rates (which are typically per word), and 5) very talented translators (who are more like writers/artists) are still in demand for literary works or highly creative work (i.e., major marketing campaign), so the top 1% translators still have their jobs. Also more niche language pairs for which LLMs aren't trained will be safe.
It will continue to exist as a profession, but diminishing, until it'll eventually be a fraction of what it was 10 or 15 years ago.
(This is specifically translating written documents, not live interpreting which isn't affected by this trend, or at least not much.)
While the general syntax of the language seem to be somewhat correct now, the LLM's still don't know anything about those languages and keep mis-translating words due to its inherit insane design around english. A whole lot of concepts don't even exist in english so these translation oracles just can never do it successfully.
If i i read a few minutes of LLM translated text, there's always a couple of such errors.
I notice younger people don't see these errors because of their worse language skills, and the LLM:s enforce their incorrect understanding.
I don't think this problem will go away as long as we keep pushing this inferior tech, but instead the languages will devolve to "fix" it.
Languages will morph into a 1-to-1 mapping of english and all the cultural nuances will get lost to time.
LLMs will never be able to figure out for themselves what your project's politics are and what trade-offs are supposed to be made. The penultimate model will still require a user to explain the trade-offs in a prompt.
I wouldn't declare that unsolvable. The intentions of a project and how they fit into user needs can be largely inferred from the code and associated docs/README, combined with good world knowledge. If you're shown a codebase of a GPU kernel for ML, then as a human you instantly know the kinds of constraints and objectives that go into any decisions. I see no reason why an LLM couldn't also infer the same kind of meta-knowledge. Of course, this papers over the hard part of training the LLMs to actually do that properly, but I don't see why it's inherently impossible.
LLM is as good as the material it is being trained on, the same applies to AI and they are not perfect.
Perplexity AI did assist me in getting into Python from 0 to getting my code test with 94% covered and no vulnerabilities (scanning tools) Google Gemini is dogshit
Trusting blindly into a code generated by LLM/AI is a whole complete beast, and I am seeing developers doing basically copy/paste into company's code. People are using these sources as the truth and not as a complementary tool to improve their productivity.
Honestly though, when that replacement comes there is no sympathy to be had. Many developers have brought this upon themselves. For roughly the 25 year period from 1995 to 2020 businesses have been trying to turn developers into mindless commodities that are straight forward to replace. Developers have overwhelmingly encouraged this and many still do. These are the people who hop employers every 2 years and cannot do their jobs without lying on their resumes or complete reliance on a favorite framework.
Maybe it's the way you talk about 'developers'. Nothing I have seen has felt like the sky falling on an industry; to me at most it's been the sky falling on a segment of silicon valley.
With that out of the way let’s look only at what many developers actually do. If a given developer only uses a framework to put text on screen or respond to a user interaction then they can be replaced. LLMs can already do this better than people. That becomes substantially more true after accounting for secondary concerns: security, accessibility, performance, regression, and more.
If a developer is doing something more complex that accounts for systems analysis or human behavior then LLMs are completely insufficient.
There is a class of developers who are blindly dumping the output of LLMs into PRs without paying any attention to what they are doing, let alone review the changes. This is contributing to introducing accidental complexity in the form of bolting on convoluted solutions to simple problems and even introducing types in the domain model that make absolutely no sense to anyone who has a passing understanding of the problem domain. Of course they introduce regressions no one would ever do if they wrote things by hand and tested what they wrote.
I know this, because I work with them. It's awful.
These vibecoders force the rest of us to waste eve more time reviewing their PRs. They are huge PRs that touch half the project for even the smallest change, they build and pass automated tests, but they enshitify everything. In fact, the same LLMs used by these vibecoders start to struggle how to handle the project after these PRs are sneaked in.
It's tiring and frustrating.
I apologize for venting. It's just that in this past week I lost count of the number of times I had these vibecoders justifying shit changes going into their PRs as "but Copilot did this change", as if that makes them any good. I mean, a PR to refactor the interface of a service also sneaks in changes to the connection string, and they just push the change?
I memorize really little and tend to spend time on reinventing algorithms or looking them up in documentation. Verifying is easy except the fee cases where the llm produces something really weird. But then fallback to docs or reinventing.
- LLMs are going to be 100x more valuable than me and make me useless? I don't see it happening. Here's 3 ways I'm still better than them.
Another factor is the capture of market sectors by Big Co. When buyers can only approach some for their products/services, the Big Co can drastically reduce quality and enshittify without hurting the bottom line much. This was the big revelation when Elon gutted Twitter.
And so we are in for interesting times. On the plus side, it is easier than ever to create software and distribute it. Hiring doesn't matter if I can get some product sense and make some shit worth buying.
RayMan1•17h ago