No, seriously. If you have the skills to join, yo should be able to handle the proper way of programming with no AI at all.
Has this ever happened besides the Y2K fixes?
Wouldn't it be much more likely that the companies will simply go under? Or that they will make a team that writes a completely new version of the code, somewhat like Mac OS X was a replacement of MacOS 9 and not a cleanup.
Isn't that one way of the "cleanup and re-design work"?
Success has many fathers, failure is an orphan.
Publications like the Economist or the WSJ that drive a lot of this hype among the investor and executive classes are loath to point out that their readers and owners are the proverbial emperor not wearing any clothes.
One of the benefits of tech unions (if they existed in any meaningful way) would be to point out where the emperor is naked in a way that is a bit harder to ignore than the kerfuffle that occurs inside hacker news threads or subreddits dedicated to experienced devs.
Careful speaking such heresy around here, you might get burned at the stake.
All of us are already in tech debt. The post-AI mess won't look significantly different from the pre-AI mess.
If anything I would expect management to throw more money at AI in hopes of fixing the mess (if management even perceives such a mess).
And if it does hit a dead end, just regenerate a new version of the entire system in minutes.
I don’t know what’s going to happen with AI coding, but it often seems to me that people are making fundamental errors when framing the problem.
“But how will humans maintain AI code?” Is one such example. Why would we expect one part (code creation) to change dramatically without all other parts equally undergoing a revolution?
Because due to how complexity works, you reach a state where the expected number of breakages from modifying the code base exceeds the value from the change itself. Even if you assume the cost of labor is 0. It’s like the monkeys writing Shakespeare.
> And if it does hit a dead end, just regenerate a new version of the entire system in minutes.
If this worked it would have been done every few years at big companies. In reality, prototypes cannot take the place of a legacy system because it’s deeply integrated and depended upon. Most notably through the data it owns, but in many subtle ways as well. Simply verifying that a new system will not break is a massive undertaking. Writing meaningfully testable systems is an order of magnitude harder than implementing them.
When there’s a monetary risk of bugs (lose data, lose users, mess up core business logic etc) companies pay for the confidence to get things right. Doesn’t mean it always works, or that their priorities are right, but a payment vendor is not going to trust vibes to do a db migration.
There are still many experimental prototype domains out there, like indie games and static web sites, where throw away and start over is pretty much fine. But that’s not the entire field.
Ah, but you see, without any promise of pudding: why eat the meat? I do the 'human engineering' part only to do... the rest of the engineering.
edit: The [currently] Dead reply below is a touch ironic. I rest my case. Great sell, worth the squeeze.
What was a day of script writing becomes 15 minutes of prompt engineering to clean up the CSV and convert it into the proper statements. Massive time savings!
I believe one can vibe code a basic app or script. Like building one of those little trinkets people put on websites, like calculators or mini games.
But I agree, the LLM can’t build, say, Google Chrome, or AWS Cognito, or Netflix, or anything that actually has serious billion dollar value. It’s not a “when” question, because these are not “solvable” problems.
But that has a flip-side too: analysts are more likely to give you CSVs with more issues because they know you can clean it up with AI. And pretty soon people will put less care in general in what they do until something goes wrong with this translate-with-AI paradigm.
Perhaps my few decades in the industry have been in areas whree it is always the details, correctness, and fitness for purpose that tends to make those problems hard, not the work itself.
I do see a use case for throw away spikes, or part of a red-green-refactor, etc.. but if accuracy and correctness aren't critical, data cleanup is easy even without an LLM.
Sure for someone who does ETL type work all day, or often enough anyway, they'd scoff, and true LLM won't really save them time. But for me who does it once in a blue moon, LLMs are great. It's still on me to determine correctness, I am simply no longer contending with the bootstrap problem of learning new packages and their syntax and common usage.
The CSV to SQL for analysts problem is a data integrity problem that is domain specific and not tool specific.
Remember that a 'relation' in relational databases is just a table, specifically named columns and tuples (rows).
A CSV is also just tuples (lines), but obviously SQL also typically has multiple normalized tables etc...
Typically bad data is worse than missing data.
For analysts, missing data can lead to bias and reduced statistical challenges, but methods exist and it can often be handled.
Bad data, on the other hand, can be misleading, deceptive and/or harmful. An LLM will be its very nature, be likely to produce bad data when cleaning.
The risk of using an LLM here is that it doesn't have context or nuances to deal with that. Data cleaning via (sed,grep,tr,awk), language tools or even ETL can work....
I promise you that fixing that bad data will be far worse.
But using it in a red-green-refactor model may help with the above, but you will actively need to be engaged and dig through what it produces.
Personally I find it takes more time to do that than to just view it as tuple repacking...and use my favorite tools to do so.
Data cleaning is hard, but it is the context specific details that make it so.
A colleague did this recently and found it had removed crucial data.
Better to get it to help you author scripts or difficult parts of scripts, and then review the code carefully.
You sound like you've done that CSV to SQL a lot of times, are subconsciously aware of all the pitfalls and unwritten (and perhaps never specified) requirements and you're limited by just your typing speed when doing it again.
I can use LLMs for stuff I can do in my sleep as well.
I move that even for you or me, LLMs ain't worth much for stuff we can't do in our sleep. Stuff we don't know or have only introductory knowledge of. You can use them to generate tutorials instead of searching for them, but that's about it.
Generating tutorials is good, but is it good because a LLM did it or because you can't find a good tutorial by searching any more?
That's fine, although I think only native English speakers would be proud of that. I am sure he also didn't use a spell-checker.
I'm not sure that would matter with/to most people.
I can only speak for myself and I don't care unless I'm reading a text by someone who claims to be either an English teacher or Linguistic subject expert.
AI is closer to this sentiment than it is to the singularity.
FWIW I think OP came up with an excellent analogy.
Back to AI though.
I just checked the customer support page of a hyped AI app generator and its what you expect: "doesn't work on complex project" "wastes all my tokens" and "how to get a refund"
These things are over promising and a future miracle is required to justify valuations. Maybe the miracle will come maybe not.
I'm not sure why you continued using words when you summed up 3D printing with those four words. In the time it takes to print 1 object, you could have molded thousands of them. 3D printing has done a lot for manufacturing in terms of prototyping and making the first thing while improving flexibility for iterations. Using them for mass production is just not a sane concept.
But in the time it took me to convert a picture of my cat to a 3d model using AI and print it, I could have ... got on the phone to the injection molding lab, and asked about availability to produce the mold for that cat.
3d printing fits the niche where either you need to model or make something bespoke that it isn't worth setting up custom machinery.
The point is 3d printing is useful and the tech is improving and it will get more and more useful. It won't take over manufacturing of course (just like Rust won't take over all programming).
It's enabled some acceleration of product prototyping and it has democratized hardware design a little bit. Some little companies are building some buildings using 3D printing techniques.
Speaking as someone who owns and uses a 3D printer daily, I think the biggest impact it's had is that it's a fun hobby, which doesn't strike me as "world-changing."
Between that and the changed game for hobbyists, the world is meaningfully different.
Most world-changing inventions do so subtly. Atom bombs are the exception, not the rule.
But this seems an unfair comparison. For one, I think 3D printing made me better, not worse, at engineering (back when mechanical engineering was my jam), as it allowed me to prototype and make mistakes faster and cheaper. While it hasn’t replaced all manufacturing (or even come close), it plays an important role in design without atrophying the skills of the user.
You might argue that LLMs have simply exposed some systematic defects instead of improving anything, but the impact is there. Dozens of lecturing workflows that were pretty standard 2 years ago are no longer viable. This includes the entirety of online and remote education which ironically dozens of universities started investing in after Covid, right around when chatgpt launched. To put this impact in context, we are talking about the tertiary and secondary sector globally.
There will be the before and after AI eras in academia.
I don't get this. Either you do graded home assignments which the person takes without any examiner, which you could always cheat on, or you do live exams and then people can't rely on AI . LLMs make it easier to cheat, but it's not a categorical difference.
I feel like my experience of university (90% of the classes had in-person exams, some had home projects for a portion of the final marks) is fundamentally different from what other people experienced and this is very confusing for me.
But 3D printing and AI are on totally different trajectories.
I haven't heard of Mattel saying, "we're going to see in what places we can replace standard molding with 3d printing". It's never been considered a real replacement, but rather potentially a useful way to make PoCs, prototypes and mockups.
3D printing has definitely replaced some manufacturing, and it has had a huge effect on product design.
These anti-AI articles are getting more tedious than the vibe coding articles.
Sure, but I'd argue the AIs are the new injection molding (as mentioned downthread) with the current batch being the equivalent of Bakelite.
Plus, who seriously said 3d printers were going to churn out Barbies by the millions? What I remember is people claiming they would be a viable source of one-off home production for whatever.
I don't think LLMs were ever meant to completely replace human engineering, at least in the way we think of engineering as a craft. But the truth is the world is changing, and with LLMs/AI, the goalposts are changing, too. With massive economies of scale and huge output, the goal is less and less good engineering and more and more pure output.
Put it another way, the post-AI world considers a reduction of quality to be an acceptable tradeoff for the massive short-term gains. Moreover, although quantity over quality is not exactly a new concept, what AI does is magnify and distill that concept as a prime directive.
Code doesn't have to work well to be profitable given how cheaply in can be produced, and that's a general effect of technology that can reach global audiences, and AI is the apex of that technology.
I don't think there is a nuanced view: AI is an all-around bad thing and its positives in the short-term will be vastly dwarfed by its negatives except for those who have become specialized at concentrating wealth at the expense of the common good.
Tells me a lot about people like you who make these comments rather than llms.
I don't believe that that's something that we can stop. Somehow, I feel like the popularization of LLMs is a force of natural selection where those who are smart enough to keep training their minds will find themselves more financially secure than those who don't, and therefore more likely to survive.
Yes, that's exactly right, it's the prisoner's dilemma. You articulated it perfectly.
Code is a liability. Code is expensive to maintain, has bugs, security issues, performance issues. The short-term profitable solutions will have a very narrow window to succeed because they will quickly crumble under their own weight.
(Emphasis mine)
This has been the biggest pain point for me, and the frustrating part is that you might not even realize you're leading it a particular way at all. I mean it makes sense with how LLMs work, but a single word used in a vague enough way is enough to skew the results in a bad direction, sometimes contrary to what you actually wanted to do, which can lead you down rabbit holes of wrongness. By the time you realize, you're deep in the sludge of haphazardly thrown-together code that sorta kinda barely works. Almost like human language is very vague and non-specific, which is why we invented formal languages with rules that allow for preciseness in the first place...
Anecdotally, I've felt my skills quickly regressing because of AI tooling. I had a moment where I'd reach out to it for every small task from laziness, but when I took a real step back I started realizing I'm not really even saving myself all that much time, and even worse is that I'm tiring myself out way quicker because I was reading through dozens or hundreds of lines of code, thinking about how the AI got it wrong, correcting it etc. I haven't measured, but I feel like in grand totality, I've wasted much more time than I potentially saved with AI tooling.
I think the true problem is that AI is genuinely useful for many tasks, but there are 2 camps of people using it. There are the people using it for complex tasks where small mistakes quickly add up, and then the other camp (in my experience mostly the managerial types) see it shit out 200 lines of code they don't understand, and in their mind this translates to a finished product because the TODO app that barely works is good enough for an "MVP" that they can point to and say "See, it can generate this, that means it can also do your job just as easily!".
To intercept the usual comments that are no doubt going to come flooding in about me using it wrong or trying the wrong model or whatever, please read through my old comment [1] for more context on my experience with these tools.
In other words, AI is my assistant, but it is MY responsibility to turn up quality, maintainable work.
However, to put things in perspective for the masses: just consider the humble calculator. It has ruined people’s ability to do mental math. AI is going to do that for writing and communication skills, problem solving skills, etc.
I agree fully, I use it as a bouncing off point these days to verify ideas mostly.
The problem is, and I'm sure I'm not alone in this, management is breathing down my neck to use AI for fucking everything. Write the PR with AI, write the commit message with AI, write the code, the tests, use YOUR AI to parse MY AI's email that I didn't bother proofreading and has 4 logical inconsistencies in 1 sentence. Oh this simple feature that can easily be done for cheaper, quicker and easier without AI? Throw an AI at it! We need to sell AI! "You'll be left in the dust if you don't adopt AI now!"
It comes back to my point about there being 2 camps. The one camp actually uses AI and can see their strengths & weaknesses clear as day and realizes it's not a panacea to be used for literally everything, the other is jumping headfirst into every piece of marketing slop they come across and buying into the false realities the AI companies are selling them on.
A GREAT example is good old Coke vs Pepsi.
That said, AI does make some things easier today, like if you have an example to use for "make me a page like this but with data from x instead of y". Often it's faster than searching documentation, even with the caveat that it might hallucinate. And ofc it will probably improve over time.
The particular improvement I'd like to see is (along with in general doing things right) finding the simplest solution without constantly having to be told to do so. My experience is the biggest drawback to letting chatgpt/claude/etc loose is quickly churning out a bunch of garbage, never stopping to say this will be too complex to do anything with in the future. TFA claims only humans can resist entropy by understanding the overall design; again idk if that will never improve but it feels like the big problem right now.
I'm glad I'm not the only one who feels this way. It seems like these models latch on to a particular keyword somewhere in my prompt chain and throw traditional logic out the window as they try to push me down more niche paths that don't even really solve the original problem. Which just leads to higher levels of frustration and unhappiness for the human involved.
> Anecdotally, I've felt my skills quickly regressing because of AI tooling
To combat this, I've been trying to use AI to solve problems that I normally would with StackOverflow results: for small, bite-sized and clearly-defined tasks. Instead of searching "how to do X?", I now ask the model the same question and use its answer as a guide to solving the problem instead of a canonical answer.
I've read you comment about all the things you tried, and it seems you have much broader experience with LLMs than I do. But I didn't see this technique mentioned, so leaving this here in case it helps someone else :).
The real struggle will be, the people phoning it in are still going to be useless, but with AI. The rest will learn and grow with AI.
It's similar with full self drive. FSD is better than a bad, drunk, or texting human driver, and that's a lot of the drivers on the road.
There are real safety improvements from ADAS. For safety you only need crash avoidance, not a full-time chauffeur.
I would rather work in an office entirely staffed by well-meaning people struggling at their jobs than a single person like you.
False.
> An LLM is a token predictor. It works only at the level of text. It is not capable of working at a conceptual level: it doesn't reason about ideas, diagrams, or requirements specifications.
False.
Anyone who have spent time in machine learning or reinforcement learning understands that models are projections of higher dimension concepts on to lower dimensions as weights.
There is no such thing as a higher dimensional concept, nor can they be projected into a weight space, because they aren't quantities.
The concept, say, "Dog" composes with the concept, "Happy" to form "Happy Dog". The extension(Dog) is all possible dogs, the extension(Happy) is all happy objects, the extensions here compose. The intension of "Dog" depends on its context, eg., "get that damned dog!" has a different intension than, "I wish I looked less like a dog!". And the intensions here do not compose like the extensions.
Take the act of counterfactual reality-oriented simulation, called "imagination" narrowly, call that `I`. And "curry" this operator with a simulation premise, "(as if) on mars", so we have I' = I(Mars)(...).
Now, what is the content of I'(Dog), I(Happy Dog), I(Get that damned dog), I(get that damned happy dog) ? and so on
The contents of `I` is nowhere modelled by "projection" because this does not model composition, and is not relevantly discrete and bounded by logical connectives.
These are trivial issues which arise the moment you're aware that few writing computer science papers have ever studied the meaning of the words they use with abandon.
Concept is an abstraction layer above human languages.
Here's a good article that touched on this topic: https://www.neelnanda.io/mechanistic-interpretability/glossa...
"Concept" is not a term from computer science, its use here has not only been "narrowed" but flat-out redefined. "Concept" as used in XAI (a field in which i've done research) is an extremely limited attempt to capture the extension of concepts over a given training domain.
Concept, as used by the author of this article that you are replying to, and 99.9999999...% of all people familiar with the term, means "concept". It does not mean what it has been asserted to mean in XAI.
And one of the most basic features of concepts is their semantic content, that they compose, that they form parts of propositions, and so on.
In Chinese language, concept is 概念.
In Chinese language, happy dog is 快乐的狗.
Notice it has an extra "的" that is missing in English language. This tells you that you can't just treat English grammar and structure as the formal definition of "concept". Some languages do not have words for happiness, or dog. But that doesn't mean the concept of happiness or dog does not exist.
The reverse is also true, you can't claim a concept does not exist if it does not exist in English language. Concept is something beyond any particular language or logical construct or notations that you invent.
That would be a consequence of your position.
The person who wrote the article is english. The claim being evaluated here is from the article. The term "concept" is english. The *meaning* of that term isn't english, any more than the meaning of "two" is english.
My analysis of "concept" has nothing to do with the english language. "Happy" here stands in for any property-concept and 'dog' any term which can be a object-concept, or a property-concept, or others. If some other language has terms which are translated into terms that do not function in the same way, then that would be a bad translation for the purpose of discussing the structure of concepts.
It is you who are hijacking the meaning of "concept", ignoring the meaning the author intended, substituting one made up 5 minutes ago by self-aggrandising poorly read people in XAI -- and the going off about irrelevant translations into Chinese.
The claim the author made has nothing to do with XAI, nor chinese, nor english. It has to do with mental capacities to "bring objects under a concept", partition experience into its conceptual structure ("conceptualise"), simulate scenarios based on compositions of concepts ("the imagination") and so on. These are mental capabilities a wide class of animals possess, who know no language; that LLMs do not possess.
> It has to do with mental capacities to "bring objects under a concept", partition experience into its conceptual structure ("conceptualise"), simulate scenarios based on compositions of concepts ("the imagination") and so on. These are mental capabilities a wide class of animals possess, who know no language; that LLMs do not possess.
Assume it is true that humans are capable of these capabilities, why do you think LLMs are not capable of them? We don't know if they are capable of these capabilities, and that's what explainable ai is.
Take a step back and assume that LLMs are not capable of these capabilities, how do you prove that these are the fundamental concepts in the universe, instead of projections of higher level concepts from higher dimensional space that humans and animals are not aware of? What if there exist a more universal set of concepts that contains all the concepts we know and others, in a higher dimension, and both LLMs and humans are just using the lower dimension projections of such concepts?
Good job too, otherwise the brain would have to magically create some entirely new physics to represent concepts.
In practical terms, what do you think the LLM output cannot contain right now? Because the way I read it now is "LLM can't speculate". But that's trivial to disprove by asking for that happy dog on Mars speculation you have as an example - whether you want the scientific version, or child level fun, it's available and the model will give nontrivial idea connections that I could not find anywhere. (For example childlike speculation from Claude included that maybe dogs would be ok playing in spacesuits since some dogs like wearing little coats)
Similarly "And the intensions here do not compose like the extensions." is really high level. What's the actual claim with regards to LLMs?
The issue is why one (prompt, answer) pair is given. If the answer is given as a "reasoning process" over salient parts of the prompt, that, e.g., involves imagining/simulation as expected, then for {(prompt', answer')} of similar imaginings we will get reliable mappings. If its cheating, then we wont.
We can, I think, say for certain that the system is not engaged in counterfactual reasoning. Eg., we can give a series of prompts (p1, p2, p3...) which require increasing complexity of the imagined scenario, and we do not find O(answering) to follow O(p-complexity-increase). Rather the search strategy is always the same, and we can just get "mildly above linear" (pseudo-)reasoning complexity with chain-of-thought.
This applies the same to humans hearing a question and responding. Tokens in, tokens out (whether words or sound). It's not unique to LLMs, so not useful for explaining differences.
> then for {(prompt', answer')} of similar imaginings we will get reliable mappings. If its cheating, then we wont.
You're not really showing that this is/isn't the case already. Also this would put people with quirky ideas and wild imagination in the "cheating" group if I understand your claim correctly. There's even a whole game around a similar concept - Dixit - describe an image in a way that as few people as possible will get it.
> we can give a series of prompts (p1, p2, p3...) which require increasing complexity of the imagined scenario, and we do not find O(answering) to follow O(p-complexity-increase). Rather the search strategy is always the same
You're describing most current implementations, not a property of LLMs. Gemini scales the thinking phase for example. Future models are likely to do the same. Another recent post implemented this too https://news.ycombinator.com/item?id=44112326
Eg., coffee has some internal kinetic energy in the motion of its molecules, and it has the disposition to cause a thermometer to rise its mercury to a certain height.
There's always an open question in these cases: is the height of the mercury a "reliable stand-in" for the temperature of the system? In many case: NO. If you read-off the height too quickly, you'll report the wrong temperature.
No system's intrinsic properties is, litearlly, just its measure properties. We are not literally our behaviours. An LLM is not literally its input/output tokens.
The question arises: what is the actual intrinsic property which gives rise to the measured properties?
it's very easy to see why people believe that the LLM case is parallel to the human case, becuse in ordinary circumstances, our linguistic behvaviours are "reliable measures" of our mental states. So we apply the same perception to LLMs: so to must they generate outputs in the way we do, they must "Reason".
However, there are many much more plausible explanations of how LLMs work that do not resort to giving them mental capacities. And so what's left to those in possession of such better explanations is to try to explain to others why you cannot just put a thermometer into a xbox cd drive and think you're measuring how hot the player is.
LLMs are the idea you describe but made incarnate, in form of a computing artifact we can "hold in our hands", study and play with. IMHO people are still under-appreciating how big a thing this is fundamentally, beyond RAG and chatbots.
It is also the case that animals do not "reliably and univerally" implement all aspects of all meanings they are acquainted with, so we aren't looking for 100% of capacities, 100% of the time.
Nevertheless, LLMs are only implementing a limited aspect of meaning: mostly association and "some extension". And with this, plus everything ever written, they can narrowly appear to implement much more.
Let's be clear though, when we say "implement" we mean that an answer arises from a prompt for a very specific reason: because the answer is meant by the system in the relevant way. In this sense, LLMs can mean any association, perhaps they can mean a few extensions, but they cannot mean anything else.
Whenver an LLM appears to partake in more aspects of meaning it is only cheating: it is using familiarity with families of associations to overcome its disabilities.
Like the idiot savant who appears to know all hollywood starlets, but is discovered eventually, not to realise they are all film stars. We routinely discover these disabilities in LLMs, when they attempt to engage in reasoning beyond these (formally,) narrow contexts of use.
Agentic AI is a very good "on steroids" version of this. Just try to use windsurf, and the brittle edges of this trick appear quickly. It's "reasononing" whenver it seems to work, and "hallucination" when not -- but of course, it just never was reasoning.
> Whenver an LLM appears to partake in more aspects of meaning it is only cheating: it is using familiarity with families of associations to overcome its disabilities.
I'm not convinced there's anything more to "meaning" - we seem to be defining concepts through relationship to other concepts, and ground that directly or indirectly with experiences. The richer that structure is, the more nuanced it gets.
> Like the idiot savant who appears to know all hollywood starlets, but is discovered eventually, not to realise they are all film stars. We routinely discover these disabilities in LLMs, when they attempt to engage in reasoning beyond these (formally,) narrow contexts of use.
I see those as limitations of degree, not kind. Less idiot savant, more like someone being hurried to answer questions on the spot. Some associations are stronger and come to mind immediately, some are less "fresh in memory", and then associations can bring false positives and it takes extra time/effort to notice and correct those. It's a common human experience, too. "Yes, those reserved words are 'void', 'var', 'volatile',... wait, 'var' is JS stuff, it's not reserved in C..." etc.
Then, of course, humans are learning continuously, and - perhaps more importantly - even if they're not learning, they're reinforcing (or attenuating) existing associations through bringing them up and observing feedback. LLMs can't do that on-line, but that's an engineering limitation, not a theoretical one.
I'm not claiming that LLMs are equivalent to humans in general sense. Just that they seem to be implementing the fundamental machinery behind "meaning" and "understanding" in general sense, and the theoretical structure behind it is quite pretty, and looks to me like a solution to a host of philosophical problems around meaning and language.
One can always find a kind of confirmation bias analysis here, which "saves the appearances", ie., one can always say "take a measurement set of people's mental capacities, given in their linguistic behaviour" and find such behaviours apparent in LLMs. This will always be possible for the obvious reason that LLMs are trained on human linguistic practice.
This makes "linguistic measurement" of LLMs especially deceptive. Consider the analogous case of measuring a video games by it's pixels: does it really have a "3d space" ? No. It only appears to. We know that pixel-space measurements of video games are necessarily deceptive, because we constructed them that way, so it is obvious that you cannot "walk into a tv".
Yet we did not construct the mechanism of deception in LLMs, making seeing thru the failure of "linguistic measurement" apparently somewhat harder. But I imagine this is just a matter of time -- in particular, when LLM's mechanisms are fully traced, it will be more obvious that their outputs are not generated for the reasons we suppose. That the "reason to linguistic output" mapping we use on people is deceptive as applied to LLMs. Just as a screenshot of a video game is a deceptive measure, whereas a photograph isnt. For a photograph, the reason the mountain is small is because its far away; for a screenshot, it isnt: there is no mountain, it is not far away from the camera, there is no camera.
In the case of LLMs we know they cannot mean what they say. We know that if an LLM offers a report on new york it cannot mean what a person who has travelled to new york means. The LLM is drawing on an arrangment, in token space, of tokens placed there by people who have been to new york. This arrangement is like the "rasterization" of a video game: it places pixels as-if there were 3d. You could say, then, that an LLM's response is a kind of rasterization of meaning.
And just as with a video game, there are failures, eg., clipping through "solid" objects. LLMs do not genuinely compose concepts, because they have no concpets -- they can only act as if they are composing them, so long as a token-space measurement of composition is available in the weight-space of the model. (And so on...)
The failures of LLMs to have these capacities will be apparent after awhile, at the moment we're on the hype rollercoaster, and its not yeet peaked. At the moment, people are still using the "reason-lingusitic" mapping theyve learned from human communication on LLMs, to impart the relevant menetal states they would with people. The boundaries of the failure of this mapping isnt yet clear to everyone. Users don't yet avoid "clipping thru" objects, beause they can't understand what clipping is -- at the moment, many seem to be desperate to say that if a video game object is clipped thru, it must be designed to be hollow.
In any case, as i've said in many places in this thread (which you can see from my recent comment history) -- there are a large variety of mental capacities associated with apprehending meaning that LLMs lack. But the process is anti-inductive so it will take quite awhile: for all those who are finding the fragile boundaries ("clipping thru the terrian") new models come out with invisible walls.
- On how embeddings work;
- On the observation that in very high-dimensional space you can encode a lot of information in relative arrangement of things;
- On the observation that the end result (LLMs) are too good at talking and responding like people in nuanced way for this to be uncorrelated;
- On noticing similarities behind embeddings in high-dimensional spaces and what we arrive when we try to express what we mean by "concept", "understanding" and "meaning", or even how we learn languages and acquire knowledge - there's a strong undertone of defining things in terms of similarity to other things, which themselves are defined the same way (recursively). Naively, it sounds like infinite regress, but it's exactly what embeddings are about.
- On the observation that the goal function for language model training is, effectively, "produce output that makes sense to humans", in fully general meaning of that statement. Given constraints on size and compute, this is pressuring the model to develop structures that are at least functionally equivalent to our own thinking process; even if we're not there yet, we're definitely pushing the models in that direction.
- On the observation that most of the failure modes of LLMs also happen to humans, up to and including "hallucinations" - but they mostly happen at the "inner monologue" / "train of thought" level, and we do extra things (like explicit "system 2" reasoning, or tools) to fix them before we write, speak or act.
- And finally, on the fact that researchers have been dissecting and studying inner workings of LLMs, and managed to find direct evidence of them encoding concepts and using them in reasoning; see e.g. the couple major Anthropic studies, in which they demonstrated the ability to identify concrete concepts, follow their "activations" during inference process, and even control the inference outcome by actively suppressing or amplifying those activations; the results are basically what you'd expect if you believed the "concepts" inside LLMs were indeed concepts as we understand them.
- Plus a bunch of other related observations and introspections, including but not limited to paying close attention to how my own kids (currently 6yo, 4yo and 1.5yo) develop their cognitive skills, and what are their failure modes. I used to joke that GPT-4 is effectively a 4yo that memorized half the Internet, after I noticed that stories produced by LLMs of that time and those of my own kid follow eerily similar patterns, up to and including what happens when the beginning falls out of the context window. I estimated that at 4yo, my eldest daughter had a context window of about 30s long, and I could see it grow with each passing week :).
That's in a gist, what adds up to my current perspective on LLMs. Might not be hard science, but I find a lot of things pointing in the direction of us narrowing down on the core functionality that also exists in our brain (but not the whole thing, obviously) - and very little that would point otherwise.
(I actively worry that it might be my mental model is too "wishy washy" and lets me interpret anything in a way that fits it. So far, I haven't noticed any warning signs, but I did notice that none of the quirks or failure modes feel surprising.)
--
I'm not sure if I got your videogame analogy the way you intended, but FWIW, we also learn and experience lots of stuff indirectly; the whole point of language and communication is to transfer understanding this way - and a lot of information is embodied in the larger patterns and structures of what we say (or don't say) and how we say it. LLM training data is not random, it's highly correlated with human experience, so the information for general understanding of how we think and perceive the world is encoded there, implicitly, and at least in theory the training process will pick up on it.
--
I don't have a firm opinion on some of the specifics you mention, just couple general heuristic/insights that tell me it could be possible we narrowed down on the actual thing our own minds are doing:
1. We don't know what drives our own mental processes either. It might be we discover LLMs are "cheating", but we might also discover they're converging to the same mechanisms/structures our own minds use. I don't have any strong reason to assume the former over the latter, because we're not designing LLMs to cheat.
2. Human brains are evolved, not designed. They're also the dumbest possible design evolution could arrive at - we're the first to cross the threshold after which our knowledge-based technological evolution outpaced natural evolution by orders of magnitude. All we've achieved to date, we did with a brain that was the nature's first prototype that worked.
3. Given the way evolution works - small, random, greedy increments that have to be incrementally useful at every step - it stands to reason that whatever the fundamental workings of a mind are, they must not be that complicated, and they can be built up incrementally through greedy optimization. Humans are a living proof of that.
4. (most speculative) It's unlikely there are multiple alternative implementations of thinking minds that are very different from each other, yet all are equally easy to reach through random walk, and that evolution just picked one of those and run with it. It's more likely that, when we get to that point (we might already be there), we'll find the same computational design nature did. But even if not, diffing ours and nature's solution will tell us much about ourselves.
That's assuming that LLMs operate according to how we read their text. What you're doing is reading llm chain-of-thought as-if said by a human, and imparting the implied capacities that would be implied if a human said it. But this is almost certainly not how LLMs work.
LLMs are replaying "linguisitc behaivour" which we take, often accurately, to be dispositive of mental states in people. They are not evidence of mental capacities and states in LLMs, for seemingly obvious reasons. When a person says, "I am hungry" it is, in verdical cases, caused by their hunger. When an LLM says it the cause is something like, "responding appropriately, accoring to a history of appropriate use of such words, on the occasion of a prompt which would, in ordinary historical cases, give this response".
The reason an LLM generates a text prima fascie never involves any associated capacities which would have been required for that text to have been written in the first place. Overcoming this leap of logic requires vastly more than "it seems to me".
> On how embeddings work
The space of necessary capacities is no exhausted by "embedding", by which you mean a (weakly) continuous mapping of historical exemplars into a space. Eg., logical relationships, composition, recursion, etc. are not mental capacities which can be implemented this way.
> We don't know what drives our own mental processes either.
Sure we do. At the level of enumerating mental capacities, their operation and so on, we can give very exhaustive lists. We do not know how even the most basic of these is implemented biologically, save I believe, we can say quite a lot about how properties of complex biological systems generically enable this.
But we have a lot of extremely carefully designed experiments to show the existence of relevant capacities in other animals. None of these experiments can be used on an LLM, because by design, any experiment we would run would immediately reveal the facade: any measurement of the GPU running the LLM and its environmental behaviour shows a total empirical lack of anything which could be experimentally measured.
We are, by the charaltan's design, only supposed to use token-in/token-out as "measuremnt". But this isn't a valid measure, becuase LLMs are constructed on historical cases of linguistic behaviour in people. We know, prior to any experiment, that the one thing designed to be a false measure, is the lingustic behaviour of the LLM.
Its as if we have constructed a digital thermometer to always replay historical temperature readings -- we know, by design, that these "readings" are therefore never indicative of any actual capacity of the device to measure temperature.
I would add that AI often makes far too clever code as well, and would defer to Kernighan's law: “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”
The LLM may or may not be clever enough, but you aren't clever enough to evaluate its debugging.
[edit] Makeshift mailing list for those interested: https://github.com/sdegutis/90s.dev/issues/2 (subscribe or comment to "sign up for emails" on the topic)
But maybe first and foremost I need a mailing list so people can be notified of things like this when they're announced/released?
I also like /r/tinycode for its spirit. My #1 saying in coding is "simplicity is harder than complexity." It's gone downhill like the rest of Reddit though.
I'm also not totally anti-AI. I use it a little bit. I just think if you aren't a good developer you aren't competent to use it properly. It's like saying autocomplete will make a bad developer good. I think it's like super autocomplete. Also found it useful for "here's a big block of code, explain it" -- you have to check what it tells you of course, but it gives you a passable first pass analysis, like asking a junior dev to do that.
To clarify the AI stance, I meant it in the context of the article: it would encourage cultivating our skills so it both grows and doesn't atrophy.
That's a great way of putting what I've been thinking for a few years now. It's the same reason I designed https://immaculata.dev so differently than Vite. Sure I could throw a ton of code at the problem, but carefully solving the problem correctly is both simpler and harder.
> It's gone downhill like the rest of Reddit though.
Exactly, which is part of the charm of HN. I want to capture that for 90s.dev but focused solely on software skill cultivation (and sharing the wonder and beauty and joy of writing software in the 90s) rather than the topic-soup of HN.
But many people will use unrestrained tool-enabled agents to do all of their work. And so be it, there’s already bad software out there, always has been. It’s easier to make and distribute software. It’s high calorie and tastes great and is FAST.
But you have to hope it will be more of a tool than a full meal.
Yet nowhere he addresses the #1 flaws to his position: rate of improvement of the technology, and its promise to deliver on saved money and gained speed.
In all the companies I've seen engineering leadership hardly really gives a shit about things OP says are important. They just care that customers are happy, the system is stable, and its malleable enough to allow for pivots when need be.
Good discussions & documentation about architecture before starting the work that gets peer-reviewed + A non-stupid engineer putting on good guardrails around the LLM's output + the extensive unit test suites in CD/CI + peer reviews on the PRs = all downsides near eliminated while all upside gained.
This is how we work at my company today (health startup). Google and Meta also boast publicly +30% of new lines of code are AI-generated in their companies today. That the state of *today*; assume in 5 years these AIs are 10x better... I simply cannot foresee a world where LLM-assisted coding is not the de-facto way be a software engineer.
It's not at all clear that the upside gained outweighs the cost in your hypothetical scenario nor is should it be taken as a given that the trend line will continue as is for long enough to create enough upside to outweigh that cost.
Why has OpenAI been acquiring "application" layer companies for large financial sums, instead of improving their own tools to build application layer codebases?
> 30% of new lines of code is AI-generated
"Watching AI drive Microsoft employees insane", 500 comments, https://news.ycombinator.com/item?id=44050152
Your examples of people "not giving a shit" about things the OP says are important & Google/Meta boasting about AI use are all revenue driven; people in leadership roles commonly place company revenue over product quality, in which case they shouldn't give a shit about the OP's topic.
As an engineer / IC I care about product quality because that's what gives me personal fulfillment. As a founder I care about product quality because I entered into this enterprise to solve a problem, not to sell a solution. Many people do the latter (very successfully) & this article isn't for those people. But it's relevant to me.
Both are companies with heavy investment into their AI products, hence an extremely biased view. I’d take that with a huge grain of salt.
> assume in 5 years these AIs are 10x better... I simply cannot foresee a world
Over the last 3ish years the improvements to the performance were significant, but incremental. What makes you think that the models will continue to improve at a substantially faster rate than today? Especially considering past releases have already demonstrated the diminishing returns with larger models and more computational power. Then there is also the pressure on the training data: downwards quality as well as ongoing litigation. From my POV there is more reason to believe the future development of LLMs will slow down somewhat rather than accelerate significantly.
You realize that Google and Meta sell AI products, right? So what you cited is effectively an ad campaign. Also the 30% NEW code is likely whittled down to 5-10% added to production after heavy edits. The devil is in the omitted details. :)
> In all the companies I've seen engineering leadership hardly really gives a shit about things OP says are important.
People put too much stock in “engineers” and “managers”. What we’re really talking about here is in the realm of sociology and psychology.
I think there’s a lot of evidence (just ask anyone in academia) that AI is already diminishing people’s ability to think for themselves.
There’s a lot of power in AI it’s true—but let’s not get blinded by the gold and leave everyone bloodied in the muck.
I wonder where did author got that feeling. What recent LLMs proved time and time again is that they are definitely able to work at conceptual level (by correctly translating concepts from one language to another depending on the context for example). Saying it doesn't "understand" the concepts as humans do is a different thing. It wouldn't "understand" pain, because it has no experience of it. But humans constantly talk about thing they've never personally experienced (and indeed maybe they shouldn't, but that's another topic).
This is a weak model of some features of concepts, eg., association: "dog" is associated with "cat", etc. But it, e.g., does not model composition, nor intension, nor the role of the term in counterfactuals. (See my comment elsewhere in this comments section on this issue).
However you can always brute force your way to apparent performance in some apparently conceptual skill if the kinds of questions you ask are similar to the trainign data. So eg., if someone has asked, "if dogs played on mars, would they be happy?" etc. or similar-enough-families-of-questions... then that allows you to have a "dog" cluster around "literal facts" and a "dog" cluster around some subset of preknown counterfactuals.
If you want to see the difference between this and genuine mental capabilities, note that there are an infinite combination of concepts of abitary depth, which can be framed in an infinite number of counterfactauls, and so on. And a child armed with only those basic components, and the capacity for imagination, can evaluate this infinite variety.
This is why we see LLMs being used most by narrow fields (esp. software engineers) where the kinds of "conceptual work" that they need has been extremely well documented and is sufficiently stable to provide some utiltiy.
So far, the abilities of LLM to manipulate concepts, in practice, has been indistinguishable in practice from "true" human-level concept manipulation. And not just for scientific, "narrow" fields.
If I give a child a physics exam, and they score 100% it could either be because they're genuinely a genius (possessing all relevant capabilities and knowledge), or because they cheated. Suppose we dont know how they're cheating, but they are. Now, how would you find out? Certainly not by feeding more physics exams, at least, its easy enough to suppose they can cheat on those.
The issue here is that the LLM has compressed basically everything written in human history, and the question before us is "to what degree is a 'complex search' operation expressing a genuine capability, vs. cheating?"
And there is no general methodological answer to that question. I cannot give you a "test", not least because I'm required to give you it in token-in--token-out form (ie., written) and this dramatically narrows the scope of capability testing methods.
Eg., I could ask the cheating child to come to a physics lab and perform an experiment -- but I can ask no such thing from an LLM. One thing we could do with an LLM is have a physics-ignorant-person act as an intermediary with the LLM, and see if they, with the LLM, can find the charge on the electron in a physics lab. That's highly likely to fail with current LLMs, in my view -- because much of the illusion of their capability lies in the expertise of the prompter.
> has been indistinguishable in practice from "true" human-level concept manipulation
This claim indicates you're begging the question. We do not use the written output of animal's mental capabilities to establish their existence -- that would be a gross pseudoscience; so to say that LLMs are indistinguishable from anything relevant indicates you're not aware of what the claim of "human-level concept manipulation" even amounts to. It has nothing to do with emitting tokens.
When designing a test to see if an animal possesses a relevant concept, can apply it to a relevant situation, can compose it with other concepts, and so on -- we would never look to linguistic competence, which even in humans, is an unreliable proxy: hence the need for decades of education and the high fallibility of exams.
Rather if I were assessing "does this person understanding 'Dog'?" I would be looking for contextual competence in application of the concept in a very broad role in reasoning processes: identification in the environment, counterfactual reasoning, composition with other known concepts in complex reasoning processes, and the like.
All LLMs do is emit text as-if they have these capacities, which makes a general solution to exposing their lack of them, basically methodologically impossible. Training LLMs is an anti-inductive process: the more tests we provide, the more they are trained on them, so the tests become useless.
Consider the following challenge: there are two glass panels, one is a window; and the other is a very high def TV showing a video game simulation of the world outside the window. You are fixed at a distance of 20 meters from the TV, and can only test each glass pane by taking a photograph of it, and studying the photograph. Can you tell which window is the outside? In general, no.
This is the grossly pseudoscientific experimental restriction people who hype LLMs impose: the only tests are tokens-in, tokens-out -- "photographs taken at a distance". If you were about to be throw against one of these glass panels, which would you choose?
If an LLM was, based on token in/out analysis alone, put in charge of a power plant: would you live near by?
It matters if these capabilities exist, because if real, the system will behave as expected according to capabilities. If its cheating, when you're thrown against the wrong window, you fall out.
LLMs are in practice, incredibly fragile systems, whose apparent capabilities quickly disappear when the kinds of apparent reasoning they need to engage in are poorly represetned in their training data.
Consider one way of measuring the capability to imagine that isnt token/token: energy use and time-to-compute:
Here, we can say for certain that LLMs do not engaged in counterfactual reasoning. Eg., we can give a series of prompts (p1, p2, p3...) which require increasing complexity of the imagined scenario, eg., exponentially more diverse stipulations, and we do not find O(answering) to follow O(p-complexity-increase). Rather the search strategy is always the same for single-shot prompt: so no trace thru an LLM involves simulation. We can just get "mildly above linear" (apparent) reasoning complexity with chain-of-thought, but this likewise does not follow the target O().
The kinds of time-to-compute we observe from LLM systems are entirely consistent with a "search and synthesis" over token-space algorithm, that only appears to simulate if the search space contains prior exemplars of simulation. There is no genuine capability
On the contrary, i strongly believe that what LLM proved is the fact linguists have always told us about : that the language provides a structure on top of which we're building our experience of concepts (Sapir whorf hypothesis).
I don't think one can conceptualize much without the use of a language.
Well a great swath of the animal kingdom stands against you.
LLMs have invited yet more of this pseudoscience. It's a nonesense position in an empirical study of mental capabilites across the animal kingdom. Something previuosly only believed by idealist philosophers of the early 20th century and prior. Now brought back so people can maintain their image in the face of their apparent self-deception: better we opt for gross pseudoscience than admit we're fooled by a text generation machine.
On the extreme, we can talk about things like Aphantasia, Synesthesia and colour blindness and understand the concepts even if we never experienced them.
I'm still trying to figure out how to produce a codebase using LLMs and end up as a expert in the system at the end of it while still coming out ahead. My hope is I can be more of an expert a bit faster than before, not less of an expert a lot faster.
It feels within reach to me as long as there's frequent periods of coming to a complete understanding of the code that's been produced and reshaping it to reduce complexity. As well as strong technical guidance as input to the LLM to begin with.
I think there's still a lot of learning to do about how to use these things. For example, I've tried LLM-heavy things in lua, C, and C#. The LLM was great at producing stuff that works (at first) in lua, but lua was miserable and the code ended up a huge mess that I can't be bothered to become an expert in. The LLM was really tripped up on C and I didn't make it that far, I didn't want to watch it fight the compiler so hard. C# has been great, the LLM is reasonably effective and I have an easy time consuming and reshaping the LLM code.
I've always liked static type systems but I like them even more now, in part because they help the LLM produce better code, but mostly because they make it a lot easier to keep up to speed on code that was just produced, or to simplify it.
I also had a similar typed language experience, switching from untyped to type hinted python made the outputs much easier to understand and assess.
Both often work with unclear requirements, and sometimes may face floating bugs which are hard to fix, but in most cases, SWE create software that is expected to always behave in a certain way. It is reproducible, can pass tests, and the tooling is more established.
MLE work with models that are stochastic in nature. The usual tests aren't about models producing a certain output - they are about metrics, that, for example, the models produce the correct output in 90% cases (evaluation). The tooling isn't as developed as for SWE - it changes more often.
So, for MLE, working with AI that isn't always reliable, is a norm. They are accustomed to thinking in terms of probabilities, distributions, and acceptable levels of error. Applying this mindset to a coding assistant that might produce incorrect or unexpected code feels more natural. They might evaluate it like a model: "It gets the code right 80% of the time, saving me effort, and I can catch the 20%."
As a concrete example, when I worked at Amazon, there were several really good ML-based solutions for very real problems that didn't have classical approaches to lean on. Motion prediction from grid maps, for example, or classification from imagery or grid maps in general. Very useful and well integrated in a classical estimation and control pipeline to produce meaningful results.
OTOH, when I worked at a startup I won't name, I was berated over and over by a low-level manager for daring to question a learning-based approach for, of all things, estimating orientation of a stationary plane over time. The entire control pipeline for the vehicle was being fed flickering, jumping, adhoc rotating estimate for a stationary object because the entire team had never learned anything fundamental about mapping or filtering, and was just assuming more data would solve the problem.
This divide is very real, and I wish there was a way to tease it out better in interviewing.
I'm curious: do you think there's any amount of high-quality data that could make the learning-based approach viable for orientation estimation? Or would it always be solving the wrong problem, regardless of data volume and delivery speed?
My sense is that effective solutions need the right confluence of problem understanding, techniques, data, and infrastructure. Missing any one piece makes things suboptimal, though not necessarily unsolvable.
In my current field (predictive maintenance), there are (in)famous examples and papers using multi-layer deep networks for solving anomaly detection problems, where a "single" line of basic Matlab code (standard deviations, etc.) performs better than the proposed AI solution. Publish or perish, I guess...
I think that this is one reason Software has such a flavor of the month approach to development.
There are disclaimers everywhere.
Sure there are usecases AI can't handle, but doesn't mean it is not massively valuable. There is not single thing in the World that can handle all usecases.
And given the current climate, the MLE's feel empowered for force their mindset onto others groups where it doesn't fit. I once heard a senior architect at my company ranting about that after a meeting: my employer sells products where accuracy and correctness have always been a huge selling point, and the ML people (in a different office) didn't seem to get that and thought 80-90% correct should be good enough for customers.
I'm reminded of the arguments about whether a 1% fatality rate for a pandemic disease was small or large. 1 is the smallest integer, but 1% of 300 million is 3 million people.
Accuracy rates, F1, anything, they're all just rough guides. The company cares about making money and some errors are much bigger than others.
We'd manually review changes for updates to our algos and models. Even with a golden set, breaking one case to fix five could be awesome or terrible.
I've given talks about this, my classic example is this somewhat imagined scenario (because it's unfair of me to accuse people of not checking at all):
It's 2015. You get an update to your classification model. Accuracy rates go up on a classic dataset, hooray! Let's deploy.
Your boss's, boss's, boss gets a call at 2am because you're in the news.
https://www.bbc.co.uk/news/technology-33347866
Ah. Turns out improving classifications of types of dogs improved but... that wasn't as important as this.
Issues and errors must be understood in context of the business. If your ML team is chucking models over the fence you're going to at best move slowly. At worst you're leaving yourself open to this kind of problem.
Through a career SWEs start rigid and overly focused on the immediate problem and become flexible/error-tolerant[1] as they become system (mechanical or meat) managers. this maps to an observation that managers like AI solutions - because they compare favourably to the new hire - and because they have the context to make this observation.
[1] https://grugbrain.dev/#:~:text=grug%20note%20humourous%20gra...
I don't think it's the case with this article. It focuses on the meta-concerns of people doing software engineering and how AI fits into that. I think he hits it on the head when he talks about Program Entropy.
A huge part of building a software product is managing entropy. Specifically, how you can add more code and more people while maintaining a reasonable forward velocity. More specifically, you have to maintain a system so you make it so all of those people understand how all the pieces fit together and how to add more of those pieces. Yes, I can see AI one day making this easier but right now, it oftentimes makes entropy worse.
Sorry not sorry that the rest of the world has to look over their shoulders.
I love the gray areas and probabilities and creativity of software...but not everyone does.
So the real danger is in everyone assuming the AI model is, must be, and always will be correct. They misunderstand the tool they are using (or directing others to use).
Hmm. It's like autopilot on the Tesla. You aren't supposed to take your hands off the wheel. You're supposed to pay attention. But people use it incorrectly. If they get into an accident, then people want to blame the machine. It's not. It's the fault of person who didn't read the instructions.
And actually, that's not wrong. People really do often struggle to navigate these days if they don't have the crutch of something like Google Maps. It really has changed our relationship to the physical world in many ways.
But also, a lot of people weren't especially good at navigation before? The overall average ability of people being able to get from Point A to Point B safely and reliably, especially in areas they are unfamiliar with, has certainly increased dramatically. And a small subset of people who are naturally skilled at geography and navigation have seen their own abilities complemented, not replaces, by things like Google Maps.
I think AI will end up being similar, on a larger scale. Yes, there are definitely some trade offs, and some skills and abilities will decrease, but also many more people will be able to do work they previously couldn't, and a small number of people will get even better at what they do.
Entirely anecdotal but I have found the opposite. With this mapping software I can go walk in a random direction and confidently course correct as and when I need to, and once I’ve walked somewhere the path sticks in my memory very well.
Driving still requires careful attention to other drivers, the world goes by rapidly, and most roads look like other roads.
The best tools are transparent. They are efficient, fast and reliable, yes, but they’re also honest about what they do! You can do everything manually if you want, no magic, no hidden internal state, and with internal parts that can be broken up and tested in isolation.
With LLMs even the simple act of comparing them side by side (to decide which to use) is probabilistic and ultimately based partly on feelings. Perhaps it comes with the territory, but this makes me extremely reluctant to integrate it into engineering workflows. Even if they had amazing abilities, they lower the bar significantly from a process perspective.
LLMs are perfectly reproducible. Almost all public services providing them are not. The fact that changing the model changes the output doesn't make it not reproducible, in the same way reproducible software packages depend on a set version of the compiler. But you can run a local model with zero temperature, set starting conditions and you'll get the same response every time.
My point is that it’s not a tool, because good tools reliably work the same way. If, for instance, a gun clicks when it’s supposed to fire, we would say that it malfunctioned. Or it fires when the safety is on. We can define what should happen, and if something else happens, there is a fault.
Is there evidence for this?
Then there is this Google Maps accident:
https://www.independent.co.uk/tv/news/driver-bridge-google-m...
Which tells you that following directions of a computer makes people more stupid.
Simply because the media didn't report on it....
My wife was very uncomfortable going to a new location via paper maps and directions. She’s perfectly happy following “bitching Betty” from the phone.
The problem is that mapping software is reliable and doesn't spit out a result of what is essentially a random number generator. You can rely on its output, the same way you can rely on a calculator. Not always, mind you, because mapping the entire globe is a massively complex task with countless caveats and edge cases, but compared to LLM output? Even with a temperature setting of 0 with the same prompt regenerated multiple times, you'll be getting vastly different output.
Also, since LLMs cover a much more broad swathe of concepts, people are going to be using these instead of their brains in a lot of situations where they really shouldn't. Even with maps, there are people out there that will drive into a lake because Google Maps told them that's where the street was, I can't even fathom the type of shit that's going to happen from people blindly trusting LLM output and supplanting all their thinking with LLM usage.
Actually, TSP is NP-hard (ie, at best, you never know whether you've been given the optimal route) in the general case, and Google maps might even give suboptimal routes intentionally sometimes, we don't know.
The problems you're describing are problems with people and they apply to every technology ever. Eg, people crash cars, blow up their houses by leaving the stove on, etc.
Not really.
I am not good at navigation yet love to walk around, so I use a set of maps apps a lot.
Google Maps is not reliable if you expect optimal routes, and its accuracy sharply falls if you're not traveling by car. Even then, bus lanes, prioerty lanes, time limited areas etc. will be a bloodbath if you expect Maps to understand them.
Mapping itself will often be inacurate in any town that isn't frozen in time for decades, place names are often wrong, and it has no concept of verticality/3D space, short of switching to limited experimental views.
Paid dedicated map apps will in general work a lot better (I'm thinking hiking maps etc.)
All to say, I'd mostly agree with parent on how fuzzier Maps are.
Er, no?
Google maps is 90% of the time better than a taxi driver where I live.
AI isn’t better than some person that did the thing for a couple days
One thing working with AI-generated code forces you to do is to read code -- development becomes more a series of code reviews than a first-principles creative journey. I think this can be seen as beneficial for solo developers, as in a way, it mimics / helps learn responsibilities only present in teams.
Another: it quickly becomes clear that working with an LLM requires the dev to have a clearly defined and well structured hierarchical understanding of the problem. Trying to one-shot something substantial usually leads to that something being your foot. Approaching the problem from a design side, writing a detailed spec, then implementing sections of it -- this helps to define boundaries and interfaces for the conceptual building blocks.
I have more observations, but attention is scarce, so -- to conclude. We can look at LLMs as a powerful accelerant, helping junior devs grow into senior roles. With some guidance, these tools make apparent the progression of lessons the more experienced of us took time to learn. I don't think it's all doom and gloom. AI won't replace developers, and while it's incredibly disruptive at the moment, I think it will settle into a place among other tools (perhaps on a shelf all of its own).
I also think that LLMs are an even more powerful accelerant for senior developers. We can prompt better because we know what exists and what to not bother trying.
Just look at recent news, layoff after layoff from Big Tech, Middle tech and small tech.
That's what I thought. The AI girlfriend/boyfriend app things seem to suggest otherwise
I don't get it, but apparently others do
If it works, like to hit a nail, you end up smashing everything in sight. If it fails, like digging a garden, you end up thinking it is stupid.
But there is a third case.
You use it to do something that you did not know you could do before. Like to planish metal.
People are experiencing the first case and second.
Minor quibble: On the chart at top, "Inverse correlation" would show up as a hyperbola. The line is more of a negative correlation. Just sayin' :-)
They have a great faith in AI (which is understandable), but they're constantly realising that:
a) they don't understand any of the problems enough to even being prompting for a solution
b) the AI can explain our code but the manager still won't understand
c) the AI can rephrase our explanations and they still won't understand.
Traditionally middle-managers probably consoled themselves with the idea that the nerds can't communicate well and coding is a dumb arcane discipline anyway. But now that their machine god isn't doing a better job than we are of ELI5ing it, I think even they're starting to doubt themselves.
Stated with pride? Given the proofreading and critiquing abilities of AI this dubious boast is a useful signal for the head-in-sand arguments of the essay.
AI is a profound step change in our workflow abilities, a revolution already. Wise to be wary of it, but this reads as shouting at clouds.
The landscape has changed, we have to change with it.
> Only humans can decrease or resist complexity.
It's funny how often there's a genuine concept behind posts like these, but then lots of specific claims are plainly false. This is trivial to do: ask for simpler code. I'm using that quite often to get a second opinion and get great results. If you don't query the model, you don't get any answer - neither complex or simple. If you query with default options, it's still a choice, not something inherent to the idea of LLM.
I'm also having a great time converting code into ideas and diagrams and vice versa. Why make the strong claims that people contradict in practice every day now?
> If you’re doing a DIY project Let me know what you're trying to achieve
Which is basically the SO style question you mentioned.
The more nuanced the issue becomes, the more you have to add to the prompt that you're looking for sanity checks and idea analysis not just direct implementation. But it's always possible.
I frequently have LLM write proposal.MD first and then iterate on that, then have the full solution, iterate on that.
It will be interesting to see if it does the proposal like I had in mind and many times it uses tech or ideas that I didn't know about myself, so I am constantly learning too.
The reason LLM is such a big deal is that they are humanity's first tool that is general enough to support recursion (besides humans of course.) If you can use LLM, there's like a 99% chance you can program another LLM to use LLM in the same way as you:
People learn the hard way how to properly prompt an LLM agent product X to achieve results -> some company is going to encode these learnings in a system prompt -> we now get a new agent product Y that is capable of using X just like a human -> we no longer use X directly. Instead, we move up one level in the command chain, to use product Y instead. And this recursion goes on and on, until the world doesn't have any level left for us to go up to.
We are basically seeing this play out in realtime with coding agents in the past few months.
Well yes, LLMs are not teleological, nor inventive.
" Is there an “inventiveness test” that humans can pass but LLMs don’t?"
Of course, any topic where there is no training data available and that cannot be extrapolated by simply mixing the existing data. Of course that is harder to test on current unknowns and unknown unknowns.
But it is trivial to test on retrospective knowledge. Just train the AI with text say to the 1800 and see if it can come out with antibiotics and general relativity, or if it will simply repeat outdated notions of disease theory and newtonian gravity.
LLMs are blank slates (like an uncultured primitive human being - albeit LLM comes with knowledge built-in, but builtin knowledge is mostly irrelevant here). LLM output is purely a function of the input (context), so agentic systems' capabilities do not equal underlying LLM's capabilities.
If you ask such an LLM "overturn Newtonian physics, come up with a better theory", of course the LLM won't give you relativity just like that. The same way an uneducated human has no chance of coming up with relativity either.
However, ask it this:
``` You are Einstein ... <omitted: 10 million tokens establishing Einstein's early life and learnings> ... Recent experiments have put these ideas to doubt, ...<another bunch of tokens explaining the Michelson–Morley experiment>... Any idea why this occurs? ```
and provide it with tools to find books, speak with others, run experiments, etc. Conceivably, the result will be different.
Again, we pretty much see this play out in coding agents:
Claude the LLM has no prior knowledge of my codebase so of course it has zero chance of solving a bug in it. Claude 4 is a blank slate.
Claude Code the agentic system can:
- look at a screenshot.
- know what the overarching goal is from past interactions & various documentation it has generated about the codebase, as well as higher-level docs describing the company and products.
- realize the screenshot is showing a problem with the program.
- form hypothesis / ideate why the bug occurs.
- verify hypotheses by observing the world ("the world" to Claude Code is the codebase it lives in, so by "observing" I mean it reads the code).
- run experiments: modify code then run a type check or unit test (although usually the final observation is outsourced to me, so I am the AI's tool as much as the other way around.)
Article also repeats some weird arguments that are superficially true, but don't stand to scrutiny. That Naur thing, which is a meme at this point, is often repeated as somehow insightful in the real world - yet what's forgotten is another fundamental, practical rule of software engineering: any nontrivial program quickly exceeds any one's ability to hold a full theory of it in their head; we almost never work with proper program theory; programming languages, techniques, methodologies and tools all evolve towards enabling people to work better without understanding most of the code. We actually share the same limitations as LLMs here, we're just better at managing it because we don't have to wait for anyone to let us do another inference loop so we can take a different perspective.
Etc.
Do you do it manually or a have automated tool? (I am looking for the latter.)
LLMs can explain the code they generate if you just ask - they never run out of patience. You can ask how it could be made faster, then ask why it did those specific things.
AI lets those with initiative shine so bright they show everyone else how it’s done.
> Only humans can decrease or resist complexity.
For a simple program, maintenance is naturally entropy-increasing: you add an `if` statement for an edge case, and the total number of paths/states of your program increases, which increases entropy.
But in very large codebases, it's more fluid, and I think LLMs have the potentially to massively _reduce_ the complexity by recommending places where state or logic should be decoupled into a separate package (for example, calling a similar method in multiple places in the codebase). This is something that can be difficult to do "as a human" unless you happen to have worked in those packages recently and are cognizant of the pattern.
In gpt2 times I used to read gpt generated text a lot, I was working on a game to guess if the text is AI generated or not, and for weeks while I was working on it I had strange dreams. It went away when I stopped consuming tokens, in gpt4 age this does not happen as I am reading hundreds of times more tokens than back then, but I think it is just more subtle.
Now I use AI to generate thousands of lines of code per day, at minimum sometimes now I just blank out when the AI doesnt spit out the code fast enough, I dont know what am I supposed to write, which libraries it is using what is the goal of this whole function etc, as it is not my code, it is foreign and I honestly dont want to be reading it at all.
This week I took the whole week off work and am just coding without AI and in few days the "blank out" is gone. Well, I did use AI to read 300 page docs of st7796s and write barebones spi driver for example, but I treat it almost as an external library, I give it the docs and example driver and it just works, but it is somewhat external to my thought process.
People argue that all fields have evolved, e.g. there are no more blacksmiths, but I argue that the machinists now are much more sophisticated than the ones in the past, pneumatic hammers allow them to work better and faster, as they use the hammer they get better understanding the material they work with, as in the machine does not take away their experience and ability to learn. I always had 2 days per week where I code without any AI, but now I think I have to change the way I code with it.
AI is for sure making me worse, and lazy. And I am not talking about the "career" here, I am talking about my ability to think.
I wrote few days ago about it: https://punkx.org/jackdoe/misery.html
I’m not a LLM user myself, but I’m slowly incorporating (forcing myself, really) AI into my workflow. I can see how AI as a tool might add value; not very different from, say, learning to touch-type or becoming proficient in Vim.
What is clear to me is that powerful tools lower entry barriers. Think Python vs C++. How many more people can be productive in the former vs the latter? It is also true that powerful tools lend themselves to potentially shitty products. C++ that is really shitty tends to break early, if it compiles at all, whereas Python is very forgiving. Intellisense is another such technology that lowers barriers.
Python itself is a good example of what LLMs can become. Python went from a super powerful tool in a jack-of-trades-master-of-none sort of way, to a rich data DSL driven by Numpy, Scipy, Pandas, Scikit, Jupyter, Torch, Matplotlib and many others; then it experienced another growth spurt with the advent of Rust tooling, and it is still improving with type checkers, free threading and even more stuff written in Rust - but towards correctness, not more power.
I really do hope that we can move past the current fomo/marketing/bullshit stage at some point, and focus on real and reproducible productivity gains.
Quoth the makers of Claude:
> AI systems are no longer just specialized research tools: they’re everyday academic companions.
> https://www.anthropic.com/news/anthropic-education-report-ho...
To call Anthropic's opener brazen, obnoxious, or euphemistic would be an understatement. I hope it ages like milk, as it deserves, and so embarrasses the company that it eventually requires a corrective foreword or retraction.
I believe there’s a small minority of people that truly believes AI is a friend, but I would say it’s a psychological pathology.
I don’t bother trying to guess what large companies really think: a/ they’re made of so many different stakeholders I don’t think it’s possible. And b/ I know money is the most important thing if they are large enough and have lots of anonymous investors, I don’t need to know anything else.
replica ai
I often found myself adding "use built-in features if they exists", just because because of this type of scenario.
It unsettles me that some people feel okay always accepting AI code, even when it "works".
Mirroring the example provided in the article I once saw a 200 line class implementation for tridiagonal matrices in python - where a simple numpy command would suffice and perform an order of magnitude better.
In practice, I find this approach reduces productivity in favor of gaining a deeper understanding of how things work - instead of just naively using LAPACK/BLAS based libraries, one 'wastes time' diving into how they work internally, which previously would have been very opaque.
These are tools, it's up to you how you use them. Kind of like compilers, really.
If you are able to make such deductions then you should also be able to deduce that almost nobody employed in a "software engineering" role is doing any actual engineering.
The article assumes that software companies are hiring software engineers (where "engineer" actually means what it does everywhere else) when in reality most software companies are not hiring any kind of actual engineer.
And if engineers can't be replaced by AI, but you're not actually an engineer, can you be replaced by AI?
Now I don't know the answer for sure, but I'd say for most people in "software engineering" roles the answer is still no, at least for now. But the reasons definitely can't have anything to do with whether AI can do engineering.
As a final note: I'd recommend anyone in a software engineering role, who thinks they do "actual engineering", to actually study some actual engineering discipline. You most likely have the necessary level of intelligence to get started. But you will quickly find that electrical engineering, electronics engineering, mechanical engineering, structural engineering, drainage engineering, etc, are nothing like your actual day to day job in _very fundamental_ ways.
Once you get beyond the most simple code you are practicing tradeoff and balance. You can use a simple, memory intensive algorithm, but you need to understand if you have the space to use it. You might be able to develop software in 1/2 the time if you take a basic approach, but it won't scale.
I don't know if you develop software or not. Regardless think more deeply about what is involved in engineering.
I notice you said you hold degrees. But which of these two disciplines do you actually work in?
We all like to think the we are grand architects finely honing software for the ages, when in reality we are really just specifying that the right grade of gravel is being used in the roadbed so that you don't get potholes.
Software engineering is like deciding after you built the bridge that it needs to now be able to open. Oh and the bridge is now busy and used heavily so we can't allow for any downtime while you rebuild the bridge to open.
And as much as we hope for standards so everything cookie cutter, there are idiosyncracies all over the place so no project is ever really the same.
These days I think of software more like writing fiction than engineering
Also bridges are never done. They are continually inspected and refurbished. Every bridge you build has an on-going cost. Just like software.
Every line you accept without understanding is borrowed comprehension, which you’ll repay during maintenance with high interest. It feels like free velocity. But it's probably more like tech debt at ~40 % annual interest. As a tribe, we have to figure out how to use AI to automate typing and NOT thinking.
Or would be, if the LLM actually understood what it was writing, using the same definition of understanding that applies to human engineers.
Which it doesn't, and by its very MO, cannot.
So, every line from an LLM that is accepted without understanding, is really nonexistent comprehension. It's a line of code, spat out by a stochastic model, and until some entity that actually can comprehend a codebases context, systems and designs (and currently the only known entity that can do that is a human being), it is un-comprehended.
And the “rule of three” basically ceases to be applicable between components — either the code has localized impact, or is a part of rock-solid foundational library. Intermediate cases just explode the refactoring complexity.
With LLM assistance, it might become easier to maintain a Markdown file containing the “theory” of a program. But what should be in the file?
For me, writing code has never ever been the challenge. Deciding what to write has always been the challenge.
I have this folder of academic papers from when access was free during covid which is enough to keep me busy for quite a while. Usually I get caught up with the yak shaving and never really progress on whatever I was intending to work on but now I have this super efficient yak shaver so I can, umm, still get caught up with the yak shaving.
But, alas, shaving yaks and arguing with stupid robots makes me happy so...
(A movie studio executive who believes a screen writer has been harassing him with threatening postcards has just murdered the screenwriter the previous night. The executive arrives late at a mid-morning studio meeting as other executives argue about the lavishness of writers' fees)
Movie studio boss: Griffin, you're here! We were just discussing the significance of writing. Maybe you have something to add?
Executive: I was just thinking what an interesting concept it is to eliminate the writer from the artistic process. If we could just get rid of these actors and directors, maybe we've got something here.
(Assistant hands the executive another postcard)
—
Re AI chat:
High school students who are illiterate now use AI chat to orchestrate the concoction of course papers they can't understand, then instructors use AI chat to grade the work. The grads apply for jobs and get selected by AI chat, then interviewed by managers using AI chat. The lucky hires are woken up by their IOs and transported to an office at an unknown destination by autonomous cars. Their work is to follow scripts emitted by AI chat to apply AI chat to guide refinement of processes, further feeding LLMs. Once returned to the home cubicle after the shift, the specialist consumes the latest cartoons and sexts with an AI companion. The IO sounds the alarm and it's time to sleep.
If the MCP guys can just cut out the middle men, I think we've got something here!
—
The threat of the new machines is not AGI overlords who will exterminate an optional humanity. The threat is an old one that's been proven over millennia of history: the conversion of human beings into slaves.
Even if all the wonders were true that people love to believe in about LLMs, you cannot get around this argument.
> Input Risk. An LLM does not challenge a prompt which is leading or whose assumptions are flawed or context is incomplete. Example: An engineer prompts, "Provide a thread-safe list implementation in C#" and receives 200 lines of flawless, correct code. It's still the wrong answer, because the question should have been, "How can I make this code thread-safe?" and whose answer is "Use System.Collections.Concurrent" and 1 line of code. The LLM is not able to recognize an instance of the XY problem because it was not asked to.
When I prompt Gemini 2.5 Pro with "Provide a thread-safe list implementation in C#" it _does_ push back and suggest using the standard library instead (in addition to providing the code of course). First paragraph of the LLM response:
> You can achieve a thread-safe list in C# by using the lock statement to synchronize access to a standard List<T>. Alternatively, you can use concurrent collection classes provided by .NET, such as ConcurrentBag<T> or ConcurrentQueue<T>, depending on the specific access patterns you need.
That's not categorically true: if a theory/design fits in their context window it's possible that they _can_ master it.
LLMs shine for simple tasks with little context. There are plenty of tasks like that.
Proof needed
h1fra•1d ago
CuriouslyC•1d ago
vouaobrasil•1d ago
namaria•1d ago
conradfr•1d ago
Yet The Prodigy made good albums entirely with Reason.
namaria•1d ago
conradfr•1d ago
vouaobrasil•1d ago
johnecheck•1d ago
If their whole job is throwing together disposable demos and 1-use scripts, I could believe 4x. But in the normal case, where the senior engineer's time is mostly spent wrangling a legacy codebase into submission? I just don't see LLMs having that level of effect.
CuriouslyC•1d ago
rTX5CMRXIfFG•1d ago
CuriouslyC•1d ago
rTX5CMRXIfFG•1d ago
blibble•1d ago
this outcome is far from certain (unless you're a slopper, in which case it's obviously happening tomorrow)
jgalt212•1d ago