Sure some of this comes from a lack of education.
But similar to crypto these movements only have value if the value is widely perceived. We have to work to continue to educate, continue to question, continue to understand different perspectives. All in favor of advancing the movement and coming out with better tech.
I am a supporter of both but I agree with the reference in the article to both becoming echo chambers at times. This is a setback we need to avoid.
Even crypto people didn’t dogfood their crypto like that, on their own critical path.
Is that the official cutsie name people working there are called? Those feels so 2020 ...
The really difficult and valuable parts of the codebase are very very far beyond what the current LLMs are capable of, and believe me, I’ve tried!
Writing the majority of the code is very different from creating the majority of the value.
And I really use and value LLMs, but they are not replacing me at the moment.
Does it? Or does their marketing tell you that? Strange that "most code is written by Claude" and they still hire for actual humans for all the positions from backend to API to desktop to mobile clients.
> How much babysitting and reviewing is undetermined; but the Ants seem to tremendously prefer the workflow.
So. We know nothing about their codebase, actual flows, programming languages, depth and breadth of usage, how much babysitting is required...
Whether or not we can get to 100% using LLMs is an open research problem and far from guaranteed. If we can’t, it’s unclear if it will ever really proliferate the way things hope. That 5% makes a big difference in most non-niche use cases…
We don't know enough about how LLMs work or about how human reasoning works for this to be at all meaningful. These numbers quantify nothing but wishes and hype.
Considering LLMs have 0 level of reasoning, I can't decide if it's a bad take, or a stab at the average human's level of reasoning.
In all seriousness, the actual numbers vary from 13% to 26%: https://fortune.com/2025/02/12/openai-deepresearch-humanity-...
My take is that there are fundamental limitations to try to pigeon-hole reasoning to LLMs, which are essentially a very very advanced autocomplete, and that's why those % won't jump too much too soon.
This is very typical of naive automation, people assume that most of the work is X and by automating that we replace people, but the thing that's automated is almost never the real bottleneck. Pretty sure I saw an article here yesterday about how writing code is not the bottleneck in software development, and it holds everywhere.
Of course people will either love AI or hate AI - and some don’t care. I am cautious especially when people say ‘AI is here to stay’. It takes away agency.
includes the 3rd law, which reads, and seems on topic,
"Any sufficiently advanced technology is indistinguishable from magic."
The people I have talked to at length about using AI tools claim that it has been a boon for productivity: a nurse, a doctor, three (old) software developers, a product manager, and a graduate student in Control Systems.
It is entirely believable that it may not, on average, help the average developer.
I'm reminded of the old joke that ends with "who are you going to believe, me or you're lying eyes?"
But that sets expectation way too high. Partly it is due to Amdahl's law: I spend only a portion of my time coding, and far more time thinking and communicating with others that are customers of my code. Even if does make the coding 10x faster (and it doesn't most of the time) overall my productivity is 10-15% better. That is nothing to sneeze at, but it isn't 10x.
It is something to sneeze at if you are 10-15% more expensive to employ due to the cost of the LLM tools. The total cost of production should always be considered, not just throughput.
How is one spending anywhere close to 10% of total compensation on LLMs?
Claude Max is $200/month, or ~2% of the salary of an average software engineer.
Similar situation at my work, but all of the productivity claims from internal early adopters I've seen so far are based on very narrow ways of measuring productivity, and very sketchy math, to put it mildly.
The AI thing kind of reminds me of the big push to outsource software engineers in the early 2000's. There was a ton of hype among executives about it, and it all seemed plausible on paper. But most of those initiatives ended up being huge failures, and nearly all of those jobs came back to the US.
People tend to ignore a lot of the little things that glue it all together that software engineers do. AI lacks a lot of this. Foreigners don't necessarily lack it, but language barriers, time zone differences, cultural differences, and all sorts of other things led to similar issues. Code quality and maintainability took a nosedive and a lot of the stuff produced by those outsourced shops had to be thrown in the trash.
I can already see the AI slop accumulating in the codebases I work in. It's super hard to spot a lot of these things that manage to slip through code review, because they tend to look reasonable when you're looking at a diff. The problem is all the redundant code that you're not seeing, and the weird abstractions that make no sense at all when you look at it from a higher level.
Management thinks the LLM is doing most of the work. Work is off shored. Oh, the quality sucks when someone without a clue is driving. We need to hire again.
So? It sounds like you're prodding us to make an extrapolation fallacy (I don't even grant the "10x in 12 months" point, but let's just accept the premise for the sake of argument).
Honestly, 12 months ago the base models weren't substantially worse than they are right now. Some people will argue with me endlessly on this point, and maybe they're a bit better on the margin, but I think it's pretty much true. When I look at the improvements of the last year with a cold, rational eye, they've been in two major areas:
* cost & efficiency
* UI & integration
So how do we improve from here? Cost & efficiency are the obvious lever with historical precedent: GPUs kinda suck for inference, and costs are (currently) rapidly dropping. But, maybe this won't continue -- algorithmic complexity is what it is, and barring some revolutionary change in the architecture, LLMs are exponential algorithms.UI and integration is where most of the rest of the recent improvement has come from, and honestly, this is pretty close to saturation. All of the various AI products already look the same, and I'm certain that they'll continue to converge to a well-accepted local maxima. After that, huge gains in productivity from UX alone will not be possible. This will happen quickly -- probably in the next year or two.
Basically, unless we see a Moore's law of GPUs, I wouldn't bet on indefinite exponential improvement in AI. My bet is that, from here out, this looks like the adoption curve of any prior technology shift (e.g. mainframe -> PC, PC -> laptop, mobile, etc.) where there's a big boom, then a long, slow adoption for the masses.
But seriously: If you find yourself agreeing with one and not the other because of sourcing, check your biases.
If you're going to call all of that not substantial improvement, we'll have to agree to disagree. Certainly it's the most rapid rate of improvement of any tech I've personally seen since I started programming in the early '00s.
To be quite honest, I’ve found very little marginal value in using reasoning models for coding. Tool usage, sure, but I almost never use “reasoning” beyond that.
Also, LLMs still cannot do basic math. They can solve math exams, sure, but you can’t trust them to do a calculation in the middle of a task.
Then Gemini 2.5 pro (the first one) came along and suddenly this was no longer the case. Nothing hallucinated, incredible pattern finding within the poems, identification of different "poetic stages", and many other rather unbelievable things — at least to me.
After that, I realized I could start sending in more of those "hard to track down" bugs to Gemini 2.5 pro than other models. It was actually starting to solve them reliably, whereas before it was mostly me doing the solving and models mostly helped if the bug didn't occur as a consequence of very complex interactions spread over multiple methods. It's not like I say "this is broken, fix it" very often! Usually I include my ideas for where the problem might be. But Gemini 2.5 pro just knows how to use these ideas better.
I have also experimented with LLMs consuming conversations, screenshots, and all kinds of ad-hoc documentation (e-mails, summaries, chat logs, etc) to produce accurate PRDs and even full-on development estimates. The first one that actually started to give good results (as in: it is now a part of my process) was, you guessed it, Gemini 2.5 pro. I'll admit I haven't tried o3 or o4-mini-high too much on this, but that's because they're SLOOOOOOOOW. And, when I did try, o4-mini-high was inferior and o3 felt somewhat closer to 2.5 pro, though, like I said, much much slower and...how do I put this....rude ("colder")?
All this to say: while I agree that perhaps the models don't feel like they're particularly better at some tasks which involve coding, I think 2.5 pro has represented a monumental step forward, not just in coding, but definitely overall (the poetry example, to this day, still completely blows my mind. It is still so good it's unbelievable).
My weapon of choice these days is Claude 4 Opus but it's slow, expensive and still not massively better than good old 3.5 Sonnet
You had to paste more into your prompts back then to make the output work with the rest of your codebase, because there weren't good IDEs/"agents" for it, but you've been able to get really really good code for 90% of "most" day to day SWE since at least OpenAI releasing the ChatGPT-4 API, which was a couple years ago.
Today it's a lot easier to demo low-effort "make a whole new feature or prototype" things than doing the work to make the right API calls back then, but most day to day work isn't "one shot a new prototype web app" and probably won't ever be.
I'm personally more productive than 1 or 2 years ago now because the time required to build the prompts was slower than my personal rate of writing code for a lot of things in my domain, but hardly 10x. It usually one-shots stuff wrong, and then there's a good chance that it'll take longer to chase down the errors than it would've to just write the thing - or only use it as "better autocomplete" - in the first place.
Your developers still push a mouse around to get work done? Fire them.
AI is the new uplift. Embrace and adapt, as a rift is forming (see my talk at https://ghuntley.com/six-month-recap/), in what employers seek in terms of skills from employees.
I'm happy to answer any questions folks may have. Currently AFK [2] vibecoding a brand new programming language [1].
[1] https://x.com/GeoffreyHuntley/status/1940964118565212606 [2] https://youtu.be/e7i4JEi_8sk?t=29722
Frankly even just getting engineers to agree upon those super specificized standardized patterns is asking a ton, especially since lots of the things that help AI out are not what they are used to. As soon as you have stuff that starts deviating it can confuse the AI and makes that 10x no longer accessible. Also no one would want to review the PRs I'd make for the changes I do on my "10x" local project... Especially maintaining those standards is already hard enough on my side projects AI will naturally deviate and create noise and the challenge is constructing systems to guide that to make sure nothing deviates (since noise would lead to more noise).
I think it's mostly a rebalancing thing, if you have 1 or a couple like minded engineers who intend to do it they can get that 10x. I do not see that EVER existing in any actual corporate environment or even once you get more then like 4 people tbh.
Ai for middle management and project planning on the other hand...
It’s not toxic for me to expect someone to get their work done in a reasonable amount of time with the tools available to them. If you’re an accountant and you take 5X the time to do something because you have beef with excel you’re the problem. It’s not toxicity to tell you that you are a bad accountant
You don't sound like a great lead to me, but I suppose you could be working with absolutely incompetent individuals, or perhaps your soft skills need work.
My apologies but I see only two possibilities for others not to take the time to follow your example given such strong evidence. They either actively dislike you or are totally incompetent. I find the former more often true than the latter.
My apologies but that does not sound like good leadership to me. It actually sounds like you may have deficiencies in your skills as it relates to leadership. Perhaps in a few years we will have an LLM who can provide better leadership.
isn't this the entire LLM experience?
Everyone else who raises any doubts about LLMs is an idiot and you're 10,000x better than everyone else and all your co-workers should be fired.
But what's absent from all your comments is what you make. Can you tell us what you actually do in your >500k job?
Are you, by any chance, a front-end developer?
Also, a team-lead that can't fire their subordinates isn't a team-lead, they're a number two.
We should not be having to code special 'host is Ableton Live' cases in JUCE just to get your host to work like the others.
Can you please not fire any people who are still holding your operation together?
At this point I'd say about 1/3 of my web searches are done through ChatGPT o3, and I can't imagine giving it up now.
(There's also a whole psychological angle in how having LLM help sort and rubber-duck your half-baked thought makes many task seem much less daunting, and that alone makes a big difference.)
Once I decide I want to "think a problem through with an LLM", I often start with just the voice mode. This forces me to say things out loud — which is remarkably effective (hear hear rubber duck debugging) — and it also gives me a fundamentally different way of consuming the information the LLM provides me. Instead of being delivered a massive amount of text, where some information could be wrong, I instead get a sequential system where I can stop/pause the LLM/redirect it as soon as something gets me curious or as I find problems with it said.
You would think that having this way of interacting would be limiting, as having a fast LLM output large chunks of information would let you skim through it and commit it to memory faster. Yet, for me, the combination of hearing things and, most of all, not having to consume so much potentially wrong info (what good is it to skim pointless stuff), ensures that ChatGPT's Advanced Voice mode is a great way to initially approach a problem.
After the first round with the voice mode is done, I often move to written-form brainstorming.
That may also be in part because llms are not as big of an accelerant for junior devs as they are for seniors (juniors don't know what is good and bad as well).
So if you give 1 senior dev a souped up llm workflow I wouldn't be too surprised if they are as productive as 10 pre-llm juniors. Maybe even more, because a bad dev can actually produce negative productivity (stealing from the senior), in which case it's infinityx.
Even a decent junior is mostly limited to doing the low level grunt work, which llms can already do better.
Point is, I can see how jobs could be lost, legitimately.
Precision machining is going through an absolute nightmare where the journeymen or master machinists are aging out of the work force. These were people who originally learned on manual machines, and upgraded to CNC over the years. The pipeline collapsed about 1997.
Now there are no apprentice machinists to replace the skills of the retiring workforce.
This will happen to software developers. Probably faster because they tend to be financially independent WAY sooner than machinists.
My comment is mainly to say LLMs are amazing in areas that are not coding, like brainstorming, blue sky thinking, filling in research details, asking questions that make me reflect. I treat the LLM like a thinking partner. It does make mistakes, but those can be caught easily by checking other sources, or even having another LLM review the conclusions.
I built something in less than 24h that I'm sure would have taken us MONTHS to just get off the ground, let alone to the polished version it's at right now. The most impressive thing is that it can do all of the things that I absolutely can do, just faster. But the most impressive thing is that it can do all the things I cannot possibly do and would have had to hire up/contract out to accomplish--for far less money, time, and with faster iterations than if I had to communicate with another human being.
It's not perfect and it's incredibly frustrating at times (hardcoding values into the code when I have explicitly told it not to; outright lying that it made a particular fix, when it actually changed something else entirely unrelated), but it is a game changer IMO.
Would love to see it!
Of course, I was playing around with claude code, too, and I was fascinated how fun it can be and yes, you can get stuff done. But I have absolutely no clue what the code is doing and if there are some nasty mistakes. So it kinda worked, but I would not use that for anything "mission critical" (whatever this means).
It means projects like Cloudflare's new OAuth provider library. https://github.com/cloudflare/workers-oauth-provider
> "This library (including the schema documentation) was largely written with the help of Claude, the AI model by Anthropic. Claude's output was thoroughly reviewed by Cloudflare engineers with careful attention paid to security and compliance with standards. Many improvements were made on the initial output, mostly again by prompting Claude (and reviewing the results)."
the one that's a few weeks old and already has several CVEs due to the poor implementation?
The problem is that the LLM needs context of what you are doing, contexts that you won't (or too lazy) to give in a chat with it ala ChatGPT. This is where Claude Code changes the game.
For example, you have PCAP file where each UDP packet contain multiple messages.
How do you filter the IP/port/protocol/time? Use LLM, check the output
How do you find the number of packets that have patterns A, AB, AAB, ABB.... Use LLM, check the output
How to create PCAPs that only contain those packets for testing? Use LLM, check the output
Etc etc
Since it can read your code, it is able to infer (because lets be honest, you work aint special) what you are trying to do at a much better rate. In any case, the fact that you can simply ask "Please write a unit test for all of the above functions" means that you can help it verify itself.
I think it's dangerously easy to get misled when trying to prod LLMs for knowledge, especially if it's a field you're new to. If you were using a regular search engine, you could look at the source website to determine the trustworthiness of its contents, but LLMs don't have that. The output can really be whatever, and I don't agree it's necessarily that easy to catch the mistakes.
That said, don't use model output directly. Use it to extract "shibboleth" keywords and acronyms in that domain, then search those up yourself with a classical search engine (or in a follow-up LLM query). You'll access a lot of new information that way, simply because you know how to surface it now.
I started a job at a demanding startup and it’s been several months and I have still not written a single line of code by hand. I audit everything myself before making PRs and test rigorously, but Cursor + Sonnet is just insane with their codebase. I’m convinced I’m their most productive employee and that’s not by measuring lines of code, which don’t matter; people who are experts in the codebase ask me for help with niche bugs I can narrow in on in 5-30 minutes as someone whose fresh to their domain. I had to lay off taking work away from the front end dev (which I’ve avoided my whole career) because I was stepping on his toes, fixing little problems as I saw them thanks to Claude. It’s not vibe coding - there’s a process of research and planning and perusing in careful steps, and I set the agent up for success. Domain knowledge is necessary. But I’m just so floored how anyone could not be extracting the same utility from it. It feels like there’s two articles like this every week now.
Links please
This is _far_ from web crud.
Otherwise, 99% of my code these days is LLM generated, there's a fair amount of visible commits from my opensource on my profile https://github.com/wesen .
A lot of it is more on the system side of things, although there are a fair amount of one-off webapps, now that I can do frontends that don't suck.
That was my experience with Cursor, but Claude Code is a different world. What specific product/models brought you to this generalization?
Someone told me ‘ai makes all the little things trivial to do’ and i agree strongly with that. Those many little things are things that together make a strong statement about quality. Our codebase has gone up in quality significantly with ai whereas we’d let the little things slide due to understaffing before.
The point is writing that prompt takes longer than writing the code.
> Someone told me ‘ai makes all the little things trivial to do’ and i agree strongly with that
Yeah, it's great for doing all of those little things. It's bad at doing the big things.
I personally think you’re sugar coating the experience.
The person you're responding to literally said, "I audit everything myself before making PRs and test rigorously".
Your specific experience cannot be generalized. And speaking as the author, and who is (as written in the article) literally using these tools everyday.
> But I’m just so floored how anyone could not be extracting the same utility from it. It feels like there’s two articles like this every week now.
This is where we learn that you haven't actually read the article. Because it is very clearly stating, with links, that I am extracting value from these tools.
And the article is also very clearly not about extracting or not extracting value.
That's where the author lost me as well. I'd really be interested in a deep dive on their workflow/tools to understand how I've been so unbelievably lucky in comparison.
It's a play on the Anchorman joke that I slightly misremembered: "60% of the time it works 100% of the time"
> is where I lost faith in the claims you’re making.
Ah yes. You lost faith in mine, but I have to have 100% faith in your 100% unverified claim about "job at a demanding startup" where "you still haven't written a single line of code by hand"?
Why do you assume that your word and experience is more correct than mine? Or why should anyone?
> you did not outline your approaches and practices in how you use AI in your workflow
No one does. And if you actually read the article, you'd see that is literally the point.
I'll give some context, though.
- I use OCaml and Python/SQL, on two different projects.
- Both are single-person.
- The first project is a real-time messaging system, the second one is logging a bunch of events in an SQL database.
In the first project, Claude has been... underwhelming. It casually uses C idioms, overabuses records and procedural programming, ignores basic stuff about the OCaml standard library, and even gave me some data structures that slowed me down later down the line. It also casuallyies about what functions does.
A real example: `Buffer.add_utf_8_uchar` adds the ASCII representation of an utf8 char to a buffer, so it adds something that looks like `\123\456` for non-ascii.
I had to scold Claude for using this function to add an utf8 character to a Buffer so many times I've lost count.
In the second project, Claude really shined. Making most of the SQL database and moving most of the logic to the SQL engine, writing coherent and readable Python code, etc.
I think the main difference is that the first one is an arcane project in an underdog language. The second one is a special case of a common "shovel through lists of stuffs and stuff them in SQL" problem, in the most common language.
You basically get what you trained for.
It doesn't take away the requirements of _curation_ - that remains firmly in my camp (partially what a PhD is supposed to teach you! to be precise and reflective about why you are doing X, what do you hope to show with Y, etc -- breakdown every single step, explain those steps to someone else -- this is a tremendous soft skill, and it's even more important now because these agents do not have persistent world models / immediately forget the goal of a sequence of interactions, even with clever compaction).
If I'm on my game with precise communication, I can use CC to organize computation in a way which has never been possible before.
It's not easier than programming (if you care about quality!), but it is different, and it comes with different idioms.
How do you measure this?
How do you audit code from an untrusted source that quickly, LLMs do not have the whole project in their heads and are proned to hallucinate.
On average how long are your prompts and does the LLM also write the unit tests?
You didn't share any evidence with us even though you claim unbelievable things.
You even went as far as registering a throwavay account to hide your identity and to make verifying any of your claims impossible.
Your comment feels more like a joke to me
Damn, this sounds pretty boring.
_So much_ work in the 'services' industries globally comes down to really a human transposing data from one Excel sheet to another (or from a CRM/emails to Excel), manually. Every (or nearly every) enterprise scale company will have hundreds if not thousands of FTEs doing this kind of work day in day out - often with a lot of it outsourced. I would guess that for every 1 software engineer there are 100 people doing this kind of 'manual data pipelining'.
So really for giant value to be created out of LLMs you do not need them to be incredible at OCaml. They just need to ~outperform humans on Excel. Where I do think MCP really helps is that you can connect all these systems together easily, and a lot of the errors in this kind of work came from trying to pass the entire 'task' in context. If you can take an email via MCP, extract some data out and put it into a CRM (again via MCP) a row at a time the hallucination rate is very low IME. I would say at least a junior overworked human level.
Perhaps this was the point of the article, but non-determinism is not an issue for these kind of use cases, given all the humans involved are not deterministic either. We can build systems and processes to help enforce quality on non deterministic (eg: human) systems.
Finally, I've followed crypto closely and also LLMs closely. They do not seem to be similar in terms of utility and adoption. The closest thing I can recall is smartphone adoption. A lot of my non technical friends didn't think/want a smartphone when the iPhone first came out. Within a few years, all of them have them. Similar with LLMs. Virtually all of my non technical friends use it now for incredibly varied use cases.
https://www.technologyreview.com/2025/05/20/1116823/how-ai-i...
https://hai.stanford.edu/news/hallucinating-law-legal-mistak...
https://www.reuters.com/technology/artificial-intelligence/a...
There are more of these stories every week. Are you using AI in a way that doesn’t allow you to be entrapped by this sort of thing?
The code coming out of LLMs is just as deterministic as code coming out of humans, and despite humans being feckle beings, we still talk of software engineering.
As for LLMs, they are and will forever be "unknowable". The human mind just can't comprehend what a billion parameters trained on trillions of tokens under different regimes for months corresponds to. While science has to do microscopic steps towards understanding the brain, we still have methods to teach, learn, be creative, be rigorous, communicate that do work despite it being this "magical" organ.
With LLMs, you can be pretty rigorous. Benchmarks, evals, and just the vibes of day to day usage if you are a programmer, are not "wishful thinking", they are reasonably effective methods and the best we have.
So, why can't we just come up with some definition for what AGI is? We could then, say, logically prove that some AI fits that definition. Even if this doesn't seem practically useful, it's theoretically much more useful than just using that term with no meaning.
Instead it kind of feels like it's an escape hatch. On wikipedia we have "a type of ai that would match or surpass human capabilities across virtually all cognitive tasks". How could we measure that? What good is this if we can't prove that a system has this property?
Bit of a rant but I hope it's somewhat legible still.
"AI is whatever hasn't been done yet."[1]
The conclusion that everything around LLMs is magical thinking seems to be fairly hubristic to me given that in the last 5 years a set of previously borderline intractable problems have become completely or near completely solved, translation, transcription, and code generation (up to some scale), for instance.
With LLMs, it's quite similar: you have to learn how to use them. Yes, they are non-deterministic, but if you know how to use them, you can increase your chances of getting a good result dramatically. Often, this not only means articulating a task, but also looking at the bigger picture and asking yourself what tasks you should assign in the first place.
For example, I can ask the LLM to write software directly, or I can ask it to write user stories or prototypes and then take a multi-step approach to develop the software. This can make a huge difference in reliability.
And to be clear, I don't mean that every bad result is caused by not correctly handling the LLM (some models are simply poor at specific tasks), but rather that it is a significant factor to consider when evaluating results.
The LLM is more like a Ouija board than a reliable tool.
>I can ask it to write user stories or prototypes
By the time I write enough to explain thoroughly to an LLM what to write in "user stories" or "prototypes", I could have just written it myself, without the middleman(bot), and without the LLM hallucinating.
If half the time I spend with an LLM is telling it what to do, and then another half is correcting what it did, then I'm not really saving any time at all by using it.
Millions of beginner developers running with scissors in their hands, millions of investment in the garbage.
I don't think this can be reversed anymore, companies are all-in and pot commited.
1. he talks about what he's shipped, and yet compares it to crypto – already, you're in a contradiction as to your relative comparison – you straight up shouldn't blog if you can't conceive that these two are opposing thoughts
2. this whole refrain from people of like, "SHOW ME your enterprise codebase that includes lots of LLM code" – HELLO, people who work at private companies CANNOT just reveal their codebase to you for internet points
3. anyone who has actually used these tools has now integrated them into their daily life on the order of millions of people and billions of dollars – unless you think all CEOs are in a grand conspiracy, lying about their teams adopting AI
Same for LLMs and AI: it is awesome for some things and absolutely sucks for other things. Curiously tho, it feels like UX was solved by making chats, but it actually still sucks enormously, as with crypto. It is mostly sufficient for doing basic stuff. It is difficult to predict where we'll land on the curve of difficult (or expensive) vs abilities. I'd bet AI will get way more capable, but even now you can't really deny its usefulness.
You could even argue that without network effects AI is also very limited: way less users -> way worse models. It took OpenAI to commit capital first to pull this off.
The point is I think comparing these areas (and other tech) is still interesting and worthy.
The real issue isn't the technology itself, but our complete inability to predict its competence. Our intuition for what should be hard or easy simply shatters. It can display superhuman breadth of knowledge, yet fail with a confident absurdity that, in a person, we'd label as malicious or delusional.
The discourse is stuck because we're trying to map a familiar psychology onto a system that has none. We haven't just built a new tool; we've built a new kind of intellectual blindness for ourselves.
I will use an LLM/agent if
- I need to get a bunch of coding done and I keep getting booked into meetings. I'll give it a task on my todo list and see how it did when I get done with said meeting(s). Maybe 40% of the time it will have done something I'll keep or just need to do a few tweaks to. YMMV though.
- I need to write up a bunch of dumb boilerplatey code. I've got my rules tuned so that it generally gets this kind of thing right.
- I need a stupid one off script or a little application to help me with a specific problem and I don't care about code quality or maintainability.
- Stack overflow replacement.
- I need to do something annoying but well understood. An XML serializer in Java for example.
- Unit tests. I'm questioning if this ones a good idea though outside of maybe doing some of the setup work though. I find I generally come to understand my code better through the exercise of writing up tests. Sometimes you're in a hurry though so...<shrug>
With any of the above, if it doesn't get me close to what I want within 2 or 3 tries, I just back off and do the work. I also avoid building things I don't fully understand. I'm not going to waste 3 hours to save 1 hour of coding.
I will not use an LLM if I need to do anything involving business logic and/or need to solve a novel problem. I also don't bother if I am working with novel tech. You'll get way more usable answers asking about Python then you will asking about Elm.
TL;DR - use your brain. Understand how this tech works, its limitations, AND its strengths.
Few days ago Google released very competent summary generator, interpreter between 10-s of languages, gpt-3 class general purpose assistant. Working locally on modest hardware. On 5 years old laptop, no discrete GPU.
It alone potentially saves so much toil, so much stupid work.
We also finally “solved computer vision”. Read from PDF, read diagrams and tables.
Local vision models are much less impressive and need some care to use. Give it 2 years.
I don't know if we can overhype it when it archives holy grail level on some important tasks.
1) what products we're usually compared to
2) what problems users have with our software
3) what use cases users mention most often
What used to take weeks of research took just a couple of hours. It helped us form a new strategy and brought real business value.
I see LLMs as just a natural language processing engine, and they're great at that. Some people overhype it, sure, but that doesn't change the fact that it's been genuinely useful for our cases. Not sure what's up with all those "LLM bad" articles. If it doesn't work for you, just move on. Why should anyone have to prove anything to anyone? It's just a tool.
I use LLMs nearly every day for my job as of about a year ago and they solve my issues about 90% of the time. I have a very hard time deciphering if these types of complaints about AI/LLMs should be taken seriously, or written off as irrational use patterns by some users. For example, I have never fed an LLM a codebase and expected it to work magic. I ask direct, specific questions at the edge of my understanding (not beyond it) and apply the solutions in a deliberate and testable manner.
if you're taking a different approach and complaining about LLMs, I'm inclined to think you're doing it wrong. And missing out on the actual magic, which is small, useful and fairly consistent.
"90%" also seems a bit suspect.
(There are times I do other kinds of work and it fails terribly. My main point stands.)
I also use gpt and Claude daily via cursor.
Gpt o3 is kinda good for general knowledge searches. Claude falls down all the time, but I've noticed that while it's spending tokens to jerk itself off, quite often it happens on the actual issue going on with out recognizing it.
Models are dumb and more idiot than idiot savant, but sometimes they hit on relevant items. As long as you personally have an idea of what you need to happen and treat LLMs like rat terriers in a farm field, you can utilize them properly
I do PhD research for superconducting materials and right I've been adapting and scaling an existing segmentation model from a research paper for image processing to run multithreaded and took the training runtime per image from 55min to 2min. Yeah it was low hanging fruit but honestly its the type of thing that is just tedious and easy to make mistakes and spend forever debugging.
Like sure I could have done it myself but it would have taken me days to figure out and I would have had to test and read a ton of docs. Claude got it working in like half an hour and generated every data plot I could need. If I wanted to test out different strategies and optimizations, I could iterate through various strategies rapidly.
I don't really like to rely on AI a bunch but it indisputably is incredibly good at certain things. If I am just trying to get something done and don't need to worry about vulnerabilities as it is just data collection code that runs once, it saves a tremendous amount of time. I don't think it will outright replace developers but there is some room for it to expand the effectiveness of individual devs so long as they are actually providing oversight and not just letting it do stuff unchecked.
I think the larger issue is more how economically viable it is for businesses to spend a ton on electricity and compute for me to be able to use it like this for 20 bucks a month. There will be an inevitable enshittification of services once a lot of the spaces investors are dumping money are figured out to be dead ends and people start calling for returns on their investment.
Right now the cash is flowing cause business people don't fully understand what its good at or not but that's not gonna last forever.
atemerev•4h ago
Crypto is a lifeline for me, as I cannot open a bank account in the country I live in, for reasons I can neither control nor fix. So I am happy if crypto is useless for you. For me and for millions like me, it is a matter of life and death.
As for LLMs — once again, magic for some, reliable deterministic instrument for others (and also magic). Just classified and sorted a few hundreds of invoices. Yes, magic.
harel•4h ago
mumbisChungo•4h ago
It's the same problem that crypto experiences. Almost everyone is propagating lies about the technology, even if a majority of those doing so don't understand enough to realize they're lies (naivety vs malice).
I'd argue there's more intentional lying in crypto and less value to be gained, but in both cases people who might derive real benefit from the hard truth of the matter are turning away before they enter the door due to dishonesty/misrepresentation- and in both cases there are examples of people deriving real value today.
o11c•3h ago
I disagree. Crypto sounds more like intentional lying because it's primarily hyped in contexts typical for scams/gambling. Yes, there are businesses involved (anybody can start one), but they're mostly new businesses or a tiny tack-on to an existing business.
AI is largely being hyped within the existing major corporate structures, therefore its lies just get tagged as as "business as usual". That doesn't make them any less of a lie though.
mumbisChungo•3h ago
Anecdotally, I see a lot more bold-facing lies by crypto traders or NFT "collectors" than by LLM enthusiasts.
tehjoker•3h ago
foobarchu•3h ago
troupo•3h ago
"You had to be there to believe it" https://x.com/0xbags/status/1940774543553146956
AI craze is currently going through a similar period: any criticism is brushed away as being presented by morons who know nothing