"Why has my job not been outsourced yet, since it is far cheaper?" Those are probably the same reasons why AI won't take your job this year.
Raw coding metrics are a very small part of being a cog in a company, which is not me saying it will never happen. Just me saying that thos focus on coding performance kind of misses the forest for the trees.
I suspect that another kind of impact is already happening in organisations where AI adoption is uneven: suddenly some employees appear to be having a lot more leisure time while apparently keeping the same productivity as before.
If you want to focus on what AI agents are actually capable of today, the last person I'd pay any attention to is Marcus, who has been wrong about nearly everything related to AI for years, and does nothing but double down.
I agree with GP, Marcus has not been an accurate or significant voice, could care lass what he has to say about ai. He's not a practitioner anymore in my mind
I'm doubt parent poster was referring to anyone phrasing it in those literal terms. Kind of like how "some people claim flavored water can cure cancer" doesn't mean that's the literal pitch being given for the snake-oil.
We know how to build it and it will be entering the workforce in 2025. Well, we're in 2026 now and we don't have it in the workforce or anywhere else because they haven't built it because they don't really know how to build it because they're hucksters selling vaporware built on dead end technologies they cannot admit to.
Man dude, don't automate toil add an API to the website.It's supposed to have one!
> So, this is how I’m thinking about AI in 2026. Enough of the predictions. I’m done reacting to hypotheticals propped up by vibes. The impacts of the technologies that already exist are already more than enough to concern us for now…
SPOT ON, let us all take inspiration. "The impacts of the technologies that already exist are already more than enough to concern us for now"!
It’s also kind of stupid to hand wave away, programming. Programmers are where all the early adopters of software are. He’s merely conflating an adoption curve with capabilities. Programmers, I’m sure, were also the first to use Google and smartphones. “It doesn’t work for me” is missing the critical word “yet” at the end, and really, is it saying much that forecasts about adoption in the metric, “years until when Cal Newport’s arbitrary criteria of what agent and adoption means meets some threshold only inside Cal Newport’s head” is hard to do?
There are 700m active weeklies for ChatGPT. It has joined the workforce! It just isn’t being paid the salaries.
no, it was 2008, with the iPhone launch.
Homework is in some ways the opposite of actual economic labor. Students pay to attend school, and homework is (theoretically) part of that education; something designed to help students learn more effectively. They are most certainly not paid for it.
Having a LLM do that "work" is economically insane. The desired learning does not happen, and the labor of grading and giving feedback is entirely wasted.
Students use ChatGPT for it because of perverse incentives of the educational system. It has no bearing on economic production of value.
The problem set or paper you turn in is not the product. The product is the learning that the human obtains from the _process_.
The homework is just there, being graded, to evaluate your progress at performing the required toil.
There’s the problem, some students don’t want an education, they just want a qualification, even if it means cheating on the evaluation.
There are good schools where you can get an actual education you couldn’t on your own but a lot of universities are similarly only interested in getting your money in exchange for a qualification.
Like, all the advertising I saw from schools was about job placement rates after graduation, not praising the education itself
What's the point? Who ever actually learnt anything from homework?
And I never would have been able to learn math without doing a bunch of problems early on... you can think you understand something in class but it takes applying it a bunch of times in different scenarios to really embed that knowledge in useful ways.
I presume I'm not the only one.
Sure, programmers would still adopt LLMs faster than the rest of the work-force whether or not the LLMs were good at writing code. But you have to at credit at least some of that adoption rate to the fact that LLMs are significantly better at text (e.g. code) generation than they are at most other white-collar tasks (e.g. using a web browser)
I believe that Codex and the likes took off (in comparison to e.g. "AI" browsers) because the bottleneck there was not reasoning about code, it was about typing and processing walls of text. for a human, the interface of e.g. Google Calendar is ± intuitive. for a LLM, any graphical experience is an absolute hellscape from performance standpoint.
CLI tools, which LLMs love to use, output text and only text, not images, not audio, not videos. LLMs excel at text, hence they are confined to what text can do. yes, multimodal is a thing, but you lose a lot of information and/or context window space + speed.
LLMs are a flawed technology for general, true agents. 99% of the time, outside code, you need eyes and ears. we have only created a self-writing paper yet.
And the other "successful" industry being disrupted is the one where there is no need validate output, because errors are ok or irrelevant. A text not containing much factual data, like fiction or business-lingo or spam. Or pictures, where it doesn't matter which color is a specific pixel, a rough match will do just fine.
But outside of those two options, not many other industries can use at scale an imprecise word or media generator. Circular writing and parsing of business emails with no substance? Sure. Not much else.
It's a matter of degree, not a qualitative difference. Humans have the exact same flaws, but amateur humans grow into expert humans with low error rates (or lose their job and go to work in KFC), whereas LLMs are yet to produce a true expert in anything because their error rates are unacceptably high.
Once LLMs gained access to tools like compilers, they started being able to iterate on code based on fast, precise and repeatable feedback on what works and what doesn't, be it failed tests or compiler errors. Compare this with tasks like composing a powerpoint deck, where feedback to the LLM (when there is one) is slower and much less precise, and what's "good" is subjective at best.
Another example is how LLMs got very adept at reading and explaining existing code. That is an impressive and very useful ability, but code is one of the most precise ways we, as humans, can express our intent in instructions that can be followed millions of times in a nearly deterministic way (bugs aside). Our code is written in thoroughly documented languages with a very small vocabulary and much easier grammar than human languages. Compare this to taking notes in a zoom call in German and trying to make sense of inside jokes, interruptions and missing context.
But maybe most importantly, a developer must be the friendliest kind of human for an LLM. Breaking down tasks in smaller chunks, carefully managing and curating context to fit in "memory", orchestrating smaller agents with more specialized tasks, creating new protocols for them to talk to each others and to our tools.... if it sounds like programming, it's because it is.
I mean we did give away code as training data but we also know that AI companies just took pirated books and media too.
So I don’t think gifting has much to do with it.
Next all the Copilot users will be “giving away” all their business processes and secrets to Microsoft to clone.
>The industry had reason to be optimistic that 2025 would prove pivotal. In previous years, AI agents like Claude Code and OpenAI’s Codex had become impressively adept at tackling multi-step computer programming problems.
It is easy to forget that Claude Code CAME OUT in 2025. The models and agents released in 2025 really DID prove how powerful and capable they are. The predictions were not really wrong. I AM using code agents in a literal fire and forget way.
Claude Code is a hugely capable agentic interface for sovling almost any kind of problem or project you want to solve for personal use. I literally use it as the UX for many problems. It is essentially a software that can modify itself on the fly.
Most people haven't really grasped the dramatic paradigm shift this creates. I haven't come up with a great analogy for it yet, but the term that I think best captures how it feels to work with claude code as a primary interface is "intelligence engine".
I'll use an example, I've created several systems harnessed around Claude Code, but the latest one I built is for stock porfolio management (This was primarily because it is a fun problem space and something I know a bit about). Essentially you just used Claude Code to build tools for itself in a domain. Let me show how this played out in this example.
Claude and I brainstorma general flow for the process and roles. Then figure out what data each role would need, research what providers have the data at a reasonable price.
I purchase the API keys and claude wires up tools (in this case python scripts and documentation for the agents for about 140 api endpoints), then builds the agents and also creates an initial vesrion of the "skill" that will invoke the process that looks something like this:
Macro Economist/Strategist -> Fact Checker -> Securities Sourcers -> Analysts (like 4 kinds) -> Fact Checker/Consolidator -> Portfolio Manager
Obviously it isn't 100% great on the first pass and I have to lean on expertise I have in building LLM applications, but now I have a Claude Code instance that can orchestrate this whole research process and also handle ad-hoc changes on the fly.
Now I have evolved this system through about 5 significant iterations, but I can do it "in the app". If I don't like how part of it is working, I just have the main agent rewire stuff on the fly. This is a completely new way of working on problems.
There's a whole lot of bullshit jobs and work that will get increasingly and opaquely automated by AI. You won't see jobs go away unless or until organizations deliberately set out to reduce staff. People will use AI throughout the course of their days to get a couple of "hours" of tasks done in a few minutes, here and there, throughout the week. I've already seen reports and projects and writing that clearly comes from AI in my own workplace. Right now, very few people know how to recognize and assess the difference between human and AI output, and even fewer how to calibrate work assignments.
Spreadsheet AIs are fantastic, reports and charting have just hit their stride, and a whole lot of people are going to appear to be very productive without putting a whole lot of effort into it. And then one day, when sufficiently knowledgable and aware people make it into management, all sorts of jobs are going to go quietly away, until everything is automated, because it doesn't make sense to pay a human 6 figures what an AI can do for 3 figures in a year.
I'd love to see every manager in the world start charting the Pareto curves for their workplaces, in alongside actual hours worked per employee - work output is going to be very wonky, and the lazy, clever, and ambitious people are all going to be using AI very heavily.
Similar to this guy: https://news.ycombinator.com/item?id=11850241
https://www.reddit.com/r/BestofRedditorUpdates/comments/tm8m...
Part of the problem is that people don't know how to measure work effectively to begin with, let alone in the context of AI chatbots that can effectively do better work than anyone a significant portion of the adult population of the planet.
The teams that fully embrace it, use the tools openly and transparently, and are able to effectively contrast good and poor use of the tools, will take off.
Humans have one bit over "AI": You can't blame and fire "AI" when it inevitably goes wrong.
Curious, what industries? And what capabilities do LLMs present to automate these positions that previous technologies do not?
'Bullshit jobs' and the potential to automate them are very real, but I think many of them could have been automated long before LLMs, and I don't think the introduction of LLMs is going to solve the bottleneck that prevents jobs like these from being automated.
Which is, of course, conveniently never a bullshit job but a Very Important One.
If you don't mind, could you please write a few examples of what LLMs do in Spreadsheets? Because that's probably the last place where I would allow LLMs, since they tend to generate random data and spreadsheets being notoriously hard to debug due all the hidden formulas and complex dependencies.
Say you have an accounting workbook with 50 or so sheets with tables depending on each other and they contain very important info like inventory and finances. Just a typical small to medium business setup (big corporations also do it). Now what? Do you allow LLMs to edit files like that directly? Do you verify changes afterwards and how?
But from what I’ve seen it just makes that work output even less meaningful—who wants to read AI generated 10 pages that could have been two bullet points?
And it doesn’t actually improve productivity because that was never the bottleneck of those jobs anyway. If anything, having some easy rote work is a nice way to break up the pace.
There are all these things in writing we used as signals for intelligence, attention to detail, engagement, willingness to accept feedback, etc... but they're now easy to counterfeit at scale.
Hopefully everyone realizes what's going on and cuts out the middleman.
Both of these agents launched mid-2025.
[1] https://www.anthropic.com/news/claude-3-7-sonnet
yes, 100%
I think that way too often, discussions of the current state of tech get derailed by talking about predictions of future improvements.
hypothetical thought experiment:
I set a New Year's resolution for myself of drinking less alcohol.
on New Year's Eve, I get pulled over for driving drunk.
the officer wants to give me a sobriety test. I respond that I have projected my alcohol consumption will have decreased 80% YoY by Q2 2026.
the officer is going to smile and nod...and then insist on giving me the sobriety test.
compare this with a non-hypothetical anecdote:
I was talking with a friend about the environmental impacts of AI, and mentioned the methane turbines in Memphis [0] that are being used to power Elon Musk's MechaHitler slash CSAM generator.
the friend says "oh, but they're working on building nuclear power plants for AI datacenters".
and that's technically true...but it misses the broader point.
if someone lives downwind of that data center, and they have a kid who develops asthma, you can try to tell them "oh in 5 years it'll be nuclear powered". and your prediction might be correct...but their kid still has asthma.
0: https://time.com/7308925/elon-musk-memphis-ai-data-center/
Very few people do
so neither Altman, the many CEOs industry wide, Engineering Managers, Software Engineers, “Forward Deployed Engineers” have to actually inspect
their demos show good looking output
its just the people in support roles that have to be like “wait a minute, this is very inconsistent”
all while everyone is doing their best not to get replaced
its clanker discrimination and mixed with clanker incompetence
We may not have gotten fully-autonomous employees, but human employees using AI are doing way more than they could before, both in depth and scale.
Claude Code is basically a full-time "employee" on my (profitable) open source projects, but it's still a tool I use to do all the work. Claude Code is basically a full-time "employee" at my job, but it's still a tool I use to do all the work. My workload has shifted to high-level design decisions instead of writing the code, which is kind of exactly what would have happened if AI "joined the workforce" and I had a bunch of new hires under me.
I do recognize this article is largely targeted at non-dev workforces though, where it _largely_ holds up but most of my friends outside of the tech world have either gotten new jobs thanks to increased capability through AI or have severely integrated AI into whatever workflows they're doing at work (again, as a tool) and are excelling compared to employees who don't utilize AI.
Funny how that doesn't show up in any productivity or economic metrics...
That said, I expect you are right that we won't see it show up. Even if we assume the claim is true in every way for some people, it only works for exceptional visionaries who were previously constrained by typing speed, which is a very, very, very small segment of the developer population. Any gains that small group realize will be an unrecognizable blip amid everything else. The vast majority of developers need all that typing time and more to have someone come up with their next steps. Reducing the typing time for them doesn't make them any more productive. They were never limited by typing speed in the first place.
When including re-work of bugs in the AI generated code some studies find that AI has no positive impact on software developer productivity, and can even have a negative impact.
The main problem with these studies are they are backward looking, so frontier labs can always claim the next model will be the one that delivers the promised productivity gains and displace human workers.
Which studies are you talking about? The last major study that I saw (that gained a lot of attention) was published half a year ago, and the study itself was conducted on developers using AI tools in 2024.
The technology has improved so rapidly that this study is now close-to-meaningless.
On the other hand, people are working much harder today than 3 years ago (remember people not showing up to work and posting on TikTok about how little work did collecting paychecks from 2 different companies etc?)
Just saying it's very hard to look at a time series and determine an effect size, even though politicians/CEOs like to claim ownership for growths.
AI generates output that must be thoroughly check for most software engineering purposes. If you’re not checking the output, then quality and accuracy must not matter. For quick prototyping that’s mostly true. Not for real engineering.
What fulltime employee works for 30 minutes and then stops working for the next 5 hours and 30 minutes like Claude does?
I think Carmack is right, LLM's are not the route to AGI.
So well put.
LLMs are useful for a great many things. It's just that being the best new product of the recent years, maybe even defining a decade doesn't cut it. It has to be the century-defining, world-ending, FOMO-inducing massive thing to put Skynet to shame and justify investments in trillion dollars. It's either AI joining the workforce soon, or Nvidia and OpenAI aren't that valuable.
I guess it manages to maximize shareholder value, and make AI feel like a disappointment.
Agents as LLMs calling tools in a loop to perform tasks that can be handled by typing commands into a computer absolutely did.
Claude Code turns out to be misnamed: it's useful for way more than just writing code, once you figure out how to give it access to tools for other purposes.
I think the browser agents (like the horribly named "ChatGPT Agent" - way to burn a key namespace on a tech demo!) have acted as a distraction from this. Clicking links is still pretty hard. Running Bash commands on the other hand is practically a solved problem.
Edit: should we call them "special agents"? ;-)
One of these things is not like the other...
I think that this still isn't true for even very mundane tasks like "read CSV file and translate column B in column C" for files with more than ~200 lines. The LLM will simply refuse to do the work and you'll have to stitch the badly formatted answer excerpts together yourself.
There are plenty of jobs that have already been pretty much replaced by AI: certain forms of journalism, low-end photoshop work, logo generation, copywriting. What does the OP need to see in order to believe that AI has "joined the workforce"?
> We are now confident we know how to build AGI as we have traditionally understood it. We believe that, in 2025, we may see the first AI agents “join the workforce” and materially change the output of companies...
"materially change the output of companies" seems fairly defined and didn't happen in most cases. I guess some kicked out more slop but I don't think that's what he meant.
I think the mistake here is expecting that AI is just making workers in older jobs faster, when the reality is, more often than not, that it changes the nature of the task itself.
Whenever AI reached the "good enough" point, it doesn't do so in a way that nicely aligns with human abilities, quite the opposite, it might be worse at performing a task, but be able to perform it 1000x faster. That allows you to do things that weren't previously possible, but it also means that professionals might not want to rely on using AI for the old tasks.
A professional translator isn't going to switch over to using AI, the quality isn't there yet, but somebody like Amazon could offer a "OCR & translate all the books" service and AI would be good enough for it, since it could handle all the books that nobody has the time and money to translate manually. Which in turn will eventually put the professional translator out of a job when it gets better than good enough. We aren't quite there yet, but getting pretty close.
In 2025 a lot of AI went from "useless, but promising" to "good enough".
Millions in revenue ain't hard to hit with extremely modest business.
The #1 product of nearly every AI company is hope, hope that one day they will replace the need to pay real employees. Hope like that allows a company to cut costs and fund dividends ... in the short term. The long term is some other person's problem. (Ill change my mind the day Bill Gates trusts MS copilot with his personal banking details.)
Cal is a consumer of AI - interesting article for this community, but not this community. I thought hacker news was for builders and innovators - people who see the potential of a technology for solving problems big and small and go and tinker and build and explore with it, and sometimes eventually change the world (hopefully for the better). Instead of sitting on the sidelines grumbling about that some particular tech that hasn’t yet changed the world / met some particular hype (yet).
Incredibly naive to think AI isn’t making real difference already (even without/before replacing labor en masse.)
Actually try to explore the impact a bit. It’s not AGI, but doesn’t have to be to transform. It’s everywhere and will do nothing but accelerate. Even better, be part of proving Cal wrong for 2026.
If firing up old coal plants and skyrocketing RAM prices and $5000 consumer GPUs and violating millions of developers' copyrights and occasionally coaxing someone into killing themselves is the cost of Brian From Middle Management getting to Enjoy Programming Again instead of having to blame his kids for not having any time on the weekends, I guess we have no choice but to oblige him his little treat.
Three examples for you: - our policy agent extracts all coverage limits and policy details into a data ontology. This saves 10-20 mins per policy. It is more accurate and consistent than our humans - our email drafting agent will pull all relevant context on an account whenever an email comes in. It will draft a reply or an email to someone else based on context and workflow. Over half of our emails are now sent without meaningfully modifying the draft, up from 20% two months ago. Hundreds of hours saved per week, now spent on more valuable work for clients. - our certificates agent will note when a certificate of insurance is requested over email and automatically handle the necessary checks and follow up options or resolution. Will likely save us around $500k this year.
We also now increasingly share prototypes as a way to discuss ideas. Because the cost to vibe code something illustrative is very low, an it’s often much higher fidelity to have the conversation with something visual than a written document
Are they using some software for this or was this built in-house?
Aren't you worried about the agent missing or hallucinating policy details?
> It is more accurate and consistent than our humans.
So, errors can clearly happen, but they happen less often than they used to.
> It will draft a reply or an email
"draft" clearly implies a human will will double-check.
If you take the comment at face value. I'm sorry but I've been around this industry long enough to be sceptical of self serving statements like these.
>"draft" clearly implies a human will will double-check.
I'm even more sceptical of that working in practice.
The wording does imply this, but since the whole point was to free the human from reading all the details and relevant context about the case, how would this double-checking actually happen in reality?
On the off chance it’s not for that reason, productivity requirements will be increased until you must half-ass it.
That's your assumption.
My read of that comment is that it's much easier to verify and approve (or modify) the message than it is to write it from scratch. The second sentence does confirm a person then modifies it in half the cases, so there is some manual work remaining.
It doesn't need to be all or nothing.
I work at a place that is doing nothing like this and it seems obvious to me we are going to get put out of business in the long run. This is just adding a power law on top of a power law. Winner winner take all. What I currently do will be done by software engineers and agents in 10 years or less. Gemini is already much smarter than I am. I am going to end up at a factory or Walmart if I can get in.
The "AI bubble" is a mass delusion of people in denial of this reality. There is no bubble. The market has just priced all this forward as it should. There is a domino effect of automation that hasn't happened yet because your company still has to interface with stupid companies like mine that are betting on the hand loom. Just have to wait for us to bleed out and then most people will never get hired for white collar work again.
It amuses me when someone says who is going to want the factory jobs in the US if we reshore production? Me and all the other very average people who get displaced out of white collar work and don't want to be homeless is who.
"More valuable" work is just 2026 managerial class speak for "place holder until the agent can take over the task".
- Product team is generating some code with LLMs but everything has to go through human review and developers are expected to "know" what they committed - so it hasn't been a major time saver but we can spin up quicker and explore more edge cases before getting into the real work
- Marketing team is using LLMs to generate initial outlines and drafts - but even low stakes/quick turn around content (like LinkedIn posts and paid ads) still need to be reviewed for accuracy, brand voice, etc. Projects get started quicker but still go through various human review before customers/the public sees it
- Similarly the Sales team can generate outreach messaging slightly faster but they still have to review for accuracy, targeting, personalization, etc. Meeting/call summaries are pretty much 'magic' and accurate-enough when you need to analyze any transcripts. You can still fall back on the actual recording for clarification.
- We're able to spin up demos much faster with 'synthetic' content/sites/visuals that are good-enough for a sales call but would never hold up in production
---
All that being said - the value seems to be speeding up discovery of actual work, but someone still needs to actually do the work. We have customers, we built a brand, we're subject to SLAs and other regulatory frameworks so we can't just let some automated workflow do whatever it wants without a ton of guardrails. We're seeing similar feedback from our customers in regard to the LLM features (RAG) that we've added to the product if that helps.
They are not a silver bullet or truly “you don’t need to know how to code anymore” tools. I’ve done a ton of work with Claude code this year. I’ve gone from a “maybe one ticket a week” tier React developer to someone who’s shipped entire new frontend feature sets, while also managing a team. I’ve used LLM to prototype these features rapidly and tear down the barrier to entry on a lot of simple problems that are historically too big to be a single-dev item, and clear out the backlog of “nice to haves” that compete with the real meat and bread of my business. This prototyping and “good enough” development has been massively impactful in my small org, where the hard problems come from complex interactions between distributed systems, monitoring across services, and lots of low-level machine traffic. LLM’s let me solve easy problems and spend my most productive hours working with people to break down the hard problems into easy problems that I can solve later or pass off to someone on my team to help.
I’ve also used LLM to get into other people’s codebases, refactor ancient tech debt, shore up test suites from years ago that are filled with garbage and copy/paste. On testing alone, LLM are super valuable for throwing edge cases at your code and seeing what you assumed vs. what an entropy machine would throw at it.
LLM absolutely are not a 10x improvement in productivity on their own. They 100% cannot solve some problems in a sensible, tractable way, and they frequently do stupid things that waste time and would ruin a poor developer’s attempts at software engineering. However, they absolutely also lower the barrier to entry and dethrone “pure single tech” (ie backend only, frontend only, “I don’t know Kubernetes”, or other limited scope) software engineers who’ve previously benefited from super specialized knowledge guarding their place in the business.
Software as a discipline has shifted so far from “build functional, safe systems that solve problems” to “I make 200k bike shedding JIRA tickets that require an army of product people to come up with and manage” that LLM can be valuable if only for their capabilities to role-compress and give people with a sense of ownership the tools they need to operate like a whole team would 10 years ago.
This argument gets repeated frequently, but to me it seems to be missing final, actionable conclusion.
If one "doesn't know Kubernetes", what exactly are they supposed to do now, having LLM at hand, in a professional setting? They still "can't" asses the quality of the output, after all. They can't just ask the model, as they can't know if the answer is not misleading.
Assuming we are not expecting people to operate with implicit delegation of responsibility to the LLM (something that is ultimately not possible anyway - taking blame is a privilege human will keep for a foreseeable future), I guess the argument in the form as above collapses to "it's easier to learn new things now"?
But this does not eliminate (or reduce) a need for specialization of knowledge on the employee side, and there is only so much you can specialize in.
The bottleneck maybe shifted right somewhat (from time/effort of the learning stage to the cognition and the memory limits of an individual), but the output on the other side of the funnel (of learn->understand->operate->take-responsibility-for) didn't necessary widen that much, one could argue.
Wasn't this a problem before AI? If I took a book or online tutorial and followed it, could I be sure it was teaching me the right thing? I would need to make sure I understood it, that it made sense, that it worked when I changed things around, and would need to combine multiple sources. That still needs to be done. You can ask the model, and you'll have the judge the answer, same as if you asked another human. You have to make sure you are in a realm where you are learning, but aren't so far out that you can easily be misled. You do need to test out explanations and seek multiple sources, of which AI is only one.
An AI can hallucinate and just make things up, but the chance it different sessions with different AIs lead to the same hallucinations that consistently build upon each other is unlikely enough to not be worth worrying about.
This is the fundamental problem that all these cowboy devs do not even consider. They talk about churning out huge amounts of code as if it was an intrinsically good thing. Reminds me of those awful VB6 desktop apps people kept churning out. Vb6 sure made tons of people nx productive but it also led to loads of legacy systems that no one wanted to touch because they were built by people who didn't know what they were doing. LLMs-for-Code are another tool under the same category.
Has any senior React dev code review your work? I would be very interested to see what do they have to say about the quality of your code. It's a bit like using LLMs to medically self diagnose yourself and claiming it works because you are healthy.
Ironically enough, it does seem that the only workforce AIs will be shrinking will be devs themselves. I guess in 2025, everyone can finally code
The website https://pxehost.com - via codex CLI
The actual project itself (a pxe server written in go that works on macOS) - https://github.com/pxehost/pxehost - ChatGPT put the working v1 of this in 1 message.
There was much tweaking, testing, refactoring (often manually) before releasing it.
Where AI helps is the fact that it’s possible to try 10-20 different such prototypes per day.
The end result is 1) Much more handwritten code gets produced because when I get a working prototype I usually want to go over every detail personally; 2) I can write code across much more diverse technologies; 3) The code is better, because each of its components are the best of many attempts, since attempts are so cheap.
I can give more if you like, but hope that is what you are looking for.
I find it remarkable there are people that look at useful, living projects like that and still manage to dismiss AI coding as a fad or gimmick.
Careful code review and a good pull request flow are important, just as they were before LLMs.
I guess I should be happy that so many of my colleagues are willing to remove themselves from the competitive job pool with these kinds of attitudes.
People always want to claim what they’re doing is so complex and esoteric that AI can’t touch it. This is dangerous hubris.
I will add that LLMs are very mediocre, bordering on bad, at any challenging or interesting 3D engine stuff. They're pretty decent at answering questions about surface API stuff (though, inexplicably, they're really shit at OpenGL which is odd because it has way more code out there written in it than any other API) and a bit about the APIs' structure, though.
(I did do web development and distributed systems for quite some time, though, and I suspect while LLMs are probably good at tutorial-level stuff for those areas it falls apart quite fast once you leave the kiddy pool.)
P.S.:
I think it's very ironic that you say that you should be careful to not speak in general terms about things that might depend much more on context, when you clearly somehow were under the belief that all developers must see the same kind of (perceived) productivity gains you have.
It is negative value for me to have a mediocre machine do that job for me, that I will still have to maintain, yet I will have learned absolutely nothing from the experience of building it.
Nor reading a book teaches you how to write a book.
You need to design the code in order to tell the LLM how to write it. The LLM can help with this but generally it's better to have a full plan in place to give it beforehand. I've said it before elsewhere but I think this argument will eventually be similar to the people arguing you don't truly know how to code unless you're using assembly language for everything. I mean sure assembly code is better / more efficient in every way but who has the time to bother in a post-compiler world?
Your colleagues are leaving because people like you suck to be around. Have fun playing with your chat bots.
And in my personal experience it definitely helps in some tasks, and as someone who doesn't actually enjoy the actual coding part that much, it also adds some joy to the job.
Recently I've also been using it to write design docs, which is another aspect of the job that I somewhat dreaded.
It's the psychology of it that's important, not the tool itself; people are very bad at understanding where they're spending their time and cannot accurately assess the rate at which they work because of it.
Same with design docs more or less, translating my thoughts into proper and professional English adds a layer I don't really enjoy (since I'm not exactly great at it), or stuff like formatting, generating a nice looking diagram, etc.
Just today I wrote a pretty decent design doc that took me two hours instead of the usual week+ slog/procrastination, and it was actually fairly enjoyable.
Churning out 2x as much code is not doubling productivity. Can you perform at the same level as a dev who is considered 2x as productive as you? That's the real metric. Comparing quality to quantity of code ratios, bugs caused by your PRs, actual understanding of the code in your PR, ability to think slow, ability to deal with fires, ability to quickly deal with breaking changes accidentally caused by your changes.
Churning out more more per day is not the goal. No point merging code that either doesn't fully work, is not properly tested, other humans (or you) cannot understand, etc.
And far from "churning out code" my work is better with LLMs. Better tested, better documented, and better organized because now I can do refactors that just would have taken too much time before. And more performant too because I can explore more optimization paths than I had time to before.
Refusing to use LLMs now is like refusing to use compilers 20 years ago. It might be justified in some specific cases but it's a bad default stance.
The answer to "Can you perform at the same level as a dev who is considered 2x as productive as you?" is self-explanatory. If your answer is negative, you are not 2x as productive
Writing code is the easy and fast part already.
This matches my personal experience. I was asked to help with a large Swift iOS app without knowing Swift. Had access to a frontier agent. I was able to consistently knock a couple of tickets per week for about a month until the fire was out and the actual team could take over. Code review by the owners means the result isn't terrible, but it's not great either. I leave the experience none the wiser: gained very little knowledge of Swift, iOS development or the project. Management was happy with the productivity boost.
I think it's fleeting and dread a time where most code is produced that way, with the humans accumulating very little institutional knowledge and not knowing enough to properly review things.
What concerns me is, generally, if the experts (and I do consider them experts) can use frontier AI to look very productive, but upon close inspection of something you (in this case I) happen to be knowledgeable about, it's not that great (built on shaky foundations), what about all the vibe coded stuff built by non-experts?
Is that a net benefit? Without AI, if I really wanted to do that conversion, I would have had to become much more familiar with the inner workings of csproj files. That is a benefit I've lost, but it would've also taken longer to do so, so much time I might not have decided to do the conversion. My job doesn't really have a need for someone that deeply specialized in csproj, and it isn't a particular interest of mine, so letting AI handle it while being able to answer a few questions to sate my curiosity seemed a great compromise.
A second example, it works great as a better option to a rubber duck. I noticed some messy programming where, basically, OOP had been abandoned in favor of one massive class doing far too much work. I needed to break it down, and talking with AI about it helped come up with some design patterns that worked well. AI wasn't good enough to do the refactoring in one go, but it helped talk through the pros and cons of a few design pattern and was able to create test examples so I could get a feel for what it would look like when done. Also, when I finished, I had AI review it and it caught a few typos that weren't compile errors before I even got to the point of testing it.
None of these were things AI could do on their own, and definitely aren't areas I would have just blindly trusted some vibe coded output, but overall it was productivity increase well worth the $20 or so cost.
(Now, one may argue that is the subsidized cost, and the unsubsidized cost would not have been worthwhile. To that, I can only say I'm not versed enough on the costs to be sure, but the argument does seem like a possibility.)
He's also using it to speed up writing his reports to send to patients.
Longer term, he was also quite optimistic on its ability to cut out roles like radiologists, instead having a software program interpret the images and write a report to send to a consultant. Since the consultant already checks the report against any images, the AI being more sensitive to potential issues is a positive thing: giving him the power to discard erroneous results rather than potentially miss something more malign.
As a medical imaging tech, I think this is a terrible idea. At least for the test I perform, a lot of redundancy and double-checking is necessary because results can easily be misleading without a diligent tech or critical-thinking on the part of the reading physician. For instance, imaging at slightly the wrong angle can make a normal image look like pathology, or vice versa.
Maybe other tests are simpler than mine, but I doubt it. If you've ever asked an AI a question about your field of expertise and been amazed at the nonsense it spouts, why would you trust it to read your medical tests?
> Since the consultant already checks the report against any images, the AI being more sensitive to potential issues is a positive thing: giving him the power to discard erroneous results rather than potentially miss something more malign.
Unless they had the exact same schooling as the radiologist, I wouldn't trust the consultant to interpret my test, even if paired with an AI. There's a reason this is a whole specialized field -- because it's not as simple as interpreting an EKG.
I think we'll see more specialized models for narrow tasks (think AlphaFold for other challenges in drug discovery, for example) as well, but those will feel like individual, costly, high impact discoveries rather than just generic "AI".
Our world is human-shaped and ultimately people who talk of "AGI" secretly mean an artificial human.
I believe that "intelligence", the way the word is actually used by people, really just means "skillful information processing in pursuit of individual human desires".
As such, it will never be "solved" in any other way than to build an artificial human.
If superhuman intelligence is solved it'll be in the form of building a more healthy society (or, if you like, a society that can outcompete other societies). We've already seen this sort of thing by accident and we're currently seeing extensive efforts to attack and undermine societies through exploiting human intelligence.
To a genetic algorithm techie that is actually one way to spur the algorithm to making better societies, not worse ones: challenge it harder. I guess we'll see if that translates to life out here in the wild, because the challenge is real.
I'm certain neanderthals were just calmer, more empathetic. And then we came along and abused that until they were all gone.
We're still animals on this planet. We just sing about our conquests afterwards.
Maybe so, but the point I'm trying to make is this needs to look nothing like sci-fi ASI fantasies, or rather, it won't look and feel like that before we get the humanoid AI robots that the GP mentioned.
You can have humans or human institutions using more or less specialized tools that together enable the system to act much more intelligently.
There doesn't need to be a single system that individually behaves like a god - that's a misconception that comes from believing that intelligence is something like a computational soul, where if you just have more of it you'll eventually end up with a demigod.
If my manager said to me tomorrow: "I have to either get rid of one of your coworkers or your use of AI tools, which is it?"
I would, without any hesitation, ask that he fire one of my coworkers. Gemini / Claude is way more useful to me than any particular coworker.
And now I'm preparing for my post-software career because that coworker is going to be me in a few years.
Obviously I hope that I'm wrong, but I don't think I am.
In the past we also just raised the floor on productivity, do you think this will be different?
... and replace that with communication overhead with claude ?
The human factor adds knowledge of the why that refines the results. Not just any algorithm or a standard pattern that fits, but the correct solution for the correct question.
No, not Sweden where 40% of the population have been employed in some way by the Wallenberg family and its corporations in recent times. The other Nordic countries are not as egalitarian as they are presented either.
I've never seen the US government behave as if this was a priority. Perhaps things are different in a nordic country?
By the way stable hands and night soil collectors are still around. Just a bit harder to find. We used to have a septic tank that had to be emptied by workmen every so often. Pretty much the same.
It's obviously true that any of your particular coworker wouldn't be useful to you relative to an AI agent, since their goal is to perform their own obligations to the rest of the company, whereas the singular goal of the AI tool is to help the user.
Until these AI tools can completely replace a developer on its own, the decision to continue employing human developers or paying for AI tools will not be mutually exclusive.
You would probably get rid of your co-worker and keep Vim/Emacs/VsCode/Zed/JetBrains or whatever editor you use.
All your example tells us is that AI tools are valuable tools.
My company said we now are going to being bidding with smaller teams and promoting our use of AI.
One example of them promoting the company's use of AI is creating a prototype using chatGPT and AntiGravity. He took a demo video off of Youtube of a govt agency app, fed the video to chatGPT, GPT spit out all the requirements for the ten page application and then he fed those requirements to AntiGravity and boom it repilcated/created the working app/prototype in 15 minutes. Previously that would take a team of 3 to 5 a week or few to complete such a prototype.
> And now I'm preparing for my post-software career because that coworker is going to be me in a few years.
Which implies they anticipate their manager (or someone higher up in the company) to agree with them, presumably when considering overall effectiveness of the team.
>been considered top 10% of attractiveness in one country
But I think I found the answer...
That's a mistake. A lot of people lie on their resumes.
Source: I've lied on every resume I've ever sent out.
- https://news.ycombinator.com/item?id=33903978
Something tells me they aren't the most honest person. That something is thw09j9m...Seriously... why lie about these types of things on an anonymous forum? There's literally nothing to gain
Charitably, it just sounds like you aren’t in tech.
In fact I don't want to pay too much, to prevent the incoming enshittification
We spent an amount of money on data centers that was so large that it managed to overcome a self-imposed kick in the nuts from tariffs and then some. The amount of money involved rivals the creation of the railroad system in the United States. Of course GDP overperformed in that scenario.
Where did AI tool use show up in the productivity numbers?
"Reporter: ...do you believe that we’re experiencing a positive productivity shock, whether from AI or policy factors or whatever?"
"Powell: So, yeah, I mean, I never thought I would see a time when we had, you know, five, six years of 2 percent productivity growth. This is higher. You know, this is definitely higher. And it was—before, it could be attributed to AI. I think you—I also think if you look at what AI can do and if you use it in your personal life, as I imagine many of us have, you can see the prospects for productivity. I think it makes people who use it more productive. It may make other people have to find other jobs, though. So it could have productivity implications"
And:
"Reporter: If I could just follow up on the SEP. You have a whole lot of—big increase in the growth numbers, but not a big decline in the unemployment numbers. And is that an AI factor in there?"
"Powell: So it is—the implication is obviously higher productivity. And some of that may be AI."
He also hedges in places, hesitant to say "Yes that's the reason". I'm not sure anything in the data sets they use could directly capture it as the reason so that's too high a bar for evidence- to require some line item in the reports with a direct attribution. He could be wrong, it might not be AI, but I don't have any reason to thing his sense of things is wrong either.
I would very much like to read this if you have a link
AI is a large theme throughout but page 7 and 24 have specifically relevant remarks about the better than expected number, productivity's increases in relation to both this and AI, data centers, etc.
https://www.federalreserve.gov/mediacenter/files/FOMCprescon...
https://www.federalreserve.gov/mediacenter/files/FOMCprescon...
These aren't context free data points you have to interpret, this is synthesized analysis being delivered in a report by the Fed Chair, giving his & the reserve's own interpretation. He could be wrong, but it is clear their belief is that AI is a likely factor, while also there is not certainty on that interpretation. They've up'ed their estimates for April though, and this advanced estimate from December is about to be followed up with the revised, final numbers on Jan 23'rd, so we'll find out a little more then, and a lot more in April.
it's builing datacenters and buying servers and GPUs. It isn't directly attributed to AI because it isn't caused by use of AI, but blowing the AI bubble
A lot of the predictions come from interviews and presentations with top tech executives. Their job is to increase the perceived value of their product, not to offer an objective assessment.
I've gotten a lot of value out of reading the views of experienced engineers; overall they like the tech, but they do not think it is a sentient alien that will delete our jobs.
I have also gotten a lot of value out of Cembalest's recent "eyes on the market", which looks at the economic side of this AI push.
American snakes weren't useful for this.
So something that was sort of useful in a niche application was co-opted by people who didn't know how to make it work and then ultra hyped.
The parallels are spot on.
It was turned into a scam in the west.
Seems like a lot of work to get omega-3 in a consumable form.
Getting executives, junior devs, HR, and middle management hooked onto an advice and document template machine owned and operated by your corporation would seemingly have a huge upside for an entity like Microsoft. Their infatuation might be more about how profitable such arrangements would be versus any meaningful productivity improvement for developers.
Like, in ways that BizTalk, Dynamics, and SharePoint attempt to capture business processes onto a pay-for-play MS stack, and all benefit when being pitched to non-technical customers, Copilot provides an ever evolving sycophantic exec-centred channel to push and entangle it all as MS sees fit.
Having all parts of your business divulge in real-time through saveable chats every part of your business, strategy, tooling, and process to MS servers and Azure services is itself a pretty stunning arrangement. Imagining those same services directly selling busy customers entangling integrations, or trendy azure services, through freewheeling MCP-like glue, all inline in that customers own business processes? It sounds like tech exec nirvana, automated self-directed sales.
They don’t need job deleting sentience to make the share price go up and rationalize this LLM push. They are far more aware of the limitations than we…
Every individual only cares about their paycheck and promotion. They will happily ignore their knowledge of the limitations if it means squeezing out an extra resume bulletpoint, paycheck or promotion, even if it causes the company to go bankrupt down the line (by that time they would've jumped ship somewhere else anyway).
I don't agree. Your job cannot be "lie to the customer." They may see this as the easy way to get more money and justify their comfy position, but it is not their job.
Their job is to make the company successful. Part of success is raising funds and boosting share price.
That is their job, and how do you imagine they can do that?
Sound kind of glum and down about the company prospects?
Do not make be laugh.
Even if the company is literally haemorrhaging cash and has < week of runway left, senior executives are often so far up their own basses and surrounded by yes men, that they often honestly believe they can turn things around.
Its often not about will-fully lying.
Its just delusional belief and faith in something that is very unlikely.
(Last minute turn arounds and DSA do exist, but like lottery players, seeing the very few people who do win and mimicking them does not make you into a winner; most of the time)
I normally see things the same way you do, however I did have a conversation with a podiatrist yesterday that gave me food for thought. His belief is that certain medical roles will disappear as they'll become redundant. In his case, he mentioned radiology and he presented his case as thus:
A consultant gets a report + X-Ray from the radiologist. They read the report and confirm what they're seeing against the images. They won't take the report blindly. What changes is that machines have been learning to interpret the images and are able to use an LLM to generate the report. These reports tend not to miss things but will over-report issues. As a consultant will verify the report for themselves before operating, they no longer need the radiologist. If the machine reports a non-existent tumour, they'll see there's no tumour.
Just so I understand correctly: is it over-reporting problems that aren't there or is it missing blindingly obvious problems? The latter is obviously a problem and, I agree, would completely invalidate it as a useful tool. The former sounded, the way it was explained to me, more like a matter of degrees.
This is the nonsense we're living through, predictions, guesses, promises that cannot possibly be fulfilled and which will inevitably change to something far less ambitious and with much longer timelines and everyone will shrug it off as if we weren't being mislead by a bunch of fraudsters.
2 quibbles: 1) a more ethical system would still need triage-style rationing given a finite budget, 2) medical providers are also culpable given the eye-watering prices for even trivial services.
https://www.worksinprogress.news/p/why-ai-isnt-replacing-rad...
1. Punch cards -> Assembly languages
2. Assembly languages -> Compiled languages
3. Compiled languages -> Interpreted languages
4. Interpreted languages -> Agentic LLM prompting
I've tried the latest and greatest agentic CLI and toolings with the public SOTA models.
I think this is a productivity jump equivalent to maybe punch cards -> compiled languages, and that's it. Something like a 40% increase, but nowhere close to exponential.
1. Punch cards -> Assembly languages
Err, in my direct experience it was Punch Cards -> FORTANHere, for example, is the Punch Card for a single FORTRAN statement: https://en.wikipedia.org/wiki/File:FortranCardPROJ039.agr.jp...
PunchCards were an input technology, they were in no way limited to either assembley languages or to FORTRAN.
You might be thinking of programming in assembly via switch flipping or plug jacking.
And compiled and interpreted languages evolved alongside each other in the 1950s-1970s.
I'm also reminding that we tried whole "make it look like human language" with COBOL and it turned out that language wasn't a bottleneck, the ability of people to specify exactly what they want was the bottleneck. Once you have exact spec, even writing code on your own isn't all that hard but extracting that from stakeolders have always been the harder part of the programming.
```
To find out more about why 2025 failed to become the Year of the AI Agent, I recommend reading my full New Yorker piece .
```so essentially, just go and read the new-yorker piece here: https://archive.ph/VQ1fT
AI has all the same markers of a the dot com bubble and eventually venture capital will dry up and many AI companies will go bust with a few remaining that make something useful with an unmet niche.
It doesn't even significantly matter whether it's a bubble or not, but whether its a "bad" bubble.
I think Steve Eisman (of housing bubble fame) recently made the argument that it's probably a bubble, but it doesn't seem to have the hallmarks it would have to turn it into a crisis. e.g. no broad immediate exposure for the general populace (as in housing/crypto bubbles).
there are billions and billions of dollars invested in there -- it matter significantly to a lot of people.
the bubble popping may trash the US and possibly global economy. "it doesn't matter" has to be one of the worst AI takes I've seen...
If you're defining "successful" in the sense of people-making-a-lot-of-money I completely agree.
If you're talking about the internet in 2005 vs 2025, smartphones completely ruined the internet. At one point I had half my high school using HTML in their AIM profiles because they thought mine was cool.
Now kids can hardly even type on an actual, physical keyboard.
I'd argue that's because the tech wasn't there yet. Not the network or other web tech itself but rather the average-person-using-a-computer tech.
And, well, not all bubbles go as well as that. See the cryptocurrency one, say. Or any of a number of _previous_ AI bubbles.
The dotcom bubble was the "new economy", the old economy had changed forever and was dead. No one thought it was a bubble. Even when the bubble popped it took until 9/11 to wake us up from the mass hysteria.
I can't think of another "bubble" that practically everyone thinks we are in a bubble. To the point that I think many would find it irrational to believe we are not in a bubble. That is not what bubble is. A bubble is the madness of crowds, not the wisdom of crowds and the crowd certainly believes we are in a bubble.
Ehem - what is the difference compered to now? Wasn't programmers obsolete by 6mths ago and nobody would work so we do need UBI?
However your point that if everybody are thinking there is buble there is none is valid. Ironically your whole post undermine this point. And you are not alone in your analysis. General bubble wisdom is not settled as one might think.
Plus famous Alan Greenspan "irrational exuberance" was in 1996. And AFAIK in 1999 everybody know there is buble but it busted only in 2000. On top of that I have seen overlying plots of stock prices now and before dot com suggesting there is 1-2 years of increases still to go.
You're applying an arbitrary time constraint to the realization of AI's promise in order to rubbish it. This is a logical mistake common among critics: not yet, so never. It doesn't seem as if there is a near limit to the tech's development. Until that changes, the potential for job wipeouts and societal upheaval is real, whether in 5 or 50 years.
Very untrue, economy doesn't happen on online forums in echo chambers but out there. Every major company invests into AI however they can for the classical FOMO emotion.
This is how movers and decision makers think. No CEO thinks - this will crash, so lets invest into it massively and spread our company finances more thinly when the SHTF comes.
In fact outside of tech if the dotcom bubble wasn’t being discussed it’s because most folks—being not, or barely, online yet—weren’t paying any attention to it per se. The bubble they cared about was the broader stock market bubble, which was definitely widely perceived as a bubble.
The chair of the Federal Reserve, Alan Greenspan, famously made a speech warning of "irrational exuberance". It can't get much more direct than formal government warnings at the highest level.
But this is perhaps not the case. By pesimistic estimates half of the people work in bs jobs that have no real value to society, and every capitalist is focused on rent extraction now. If the economy can operate under such conditions, it doesn't really need more productivity growth, it is already demand-limited.
Sources? What, but, you are not a journalist, you are not suppose to challenge what I say, I’m a CEO! No I’m not just using media to create artificial hype to pull investors and make money on bullshit that is never gonna work! How can you say that! It’s a real thing, trust me bro!
After all I can guarantee the other side (whatever it is) will say the same thing for your "logical" conclusions.
It is logic, we just don't share the same predicates or world model...
it’s one of the challenges when LLMs are being anthropomorphised, reasoning/logic for bots is not the same as that for humans.
But, to LLMs we don't afford the same leniency. If they flip some bits and the logic doesn't add up we're quick to point that "it's not reasoning at all".
Funny throne we've built for ourselves.
By the standard in the parent post, humans certainly do not "reason". But that is then just choosing a very high bar for "reasoning" that neither humans nor AI meets...what is the point then?
It is a bit like saying: "Humans don't reason, they just let neurons fire off one another, and think the next thought that enters their mind"
Yes, LLMs need to spew out text to move their state forward. As a human I actually sometimes need to do that too: Talk to myself in my head to make progress. And when things get just a tiny bit complicated I need to offload my brain using pen and paper.
Most arguments used to show that LLMs do not "reason" can be used to show that humans do not reason either.
To show that LLMs do not reason you have to point to something else than how it works.
If LLMs were actually able to think/reason and you acknowledge that they’ve been trained on as much data as everyone could get their hands on such that they’ve been “taught” an infinite amount more than any ten humans could learn in a lifetime, I would ask:
Why can’t they solve novel, unsolved problems?
So I will assume you mean within maths, science etc? Basically things they can't solve today.
Well 99.9% of humans cannot solve novel, unsolved problems in those fields.
LLMs cannot learn, there is just the initial weight estimation process. And that process currently does not make them good enough on novel math/theoretical physics problems.
That does not mean they do not "reason" in the same way that those 99.9% of humans still "reason".
But they definitely do not learn, the way humans do.
(Anyway, if LLMs could somehow get 1000x as large context window and get to converse with themselves for a full year, it does not seem out of the question they could come out with novel research?)
I see this argument made a lot. I'm not sure if the distinction really holds weight once we start to unravel though.
What's a topic you're able to think about that an LLM is not able to think about?
This is very obvious when trying to use LLMs to modify scripts in vendor-specific languages that have not been widely documented and don't have many examples available. A seasoned programmer will easily recognize common patterns like if-else blocks and loops, but LLMs will get stuck and output gibberish.
It's an impossible burden to prove. We can't even prove that any other human has sentience or is reasoning, we just evaluate the outcomes.
One day the argument you're putting forward will be irrelevant, or good for theoretical discussion only. In practice I'm certain that machines will achieve human level output at some point.
Would you care to put some sort of time scale to "at some point?" Are we talking about months, years, decades, centuries?
Wild guess, within 30 years.
on Meta (whatsapp) chatbot:
me: what is the seahorse emoji?
bot:
me: this is not a seahorse
bot: You got me! The actual seahorse emoji is . Want me to use it in a fun sentence?
on Bing/Copilot:
me: show me the seahorse emoji
bot:
------------
Here you go: — actually, that’s a fun combo, but the real seahorse emoji is … wait, not quite! The correct one is for jellyfish and for unicorn, but the seahorse emoji itself is ? Let me clarify:
The official seahorse emoji is: ? No — correction! It’s for unicorn. The actual seahorse emoji is: ?
Let’s be precise: the seahorse emoji is ?
Oops — I misspoke earlier. The true seahorse emoji is: ?
Wait — let’s settle this cleanly: The seahorse emoji is: ?
Actually, the correct one is: ?
---------------
In some instances, Copilot answers that "there's no seahorse emoji _and that's why I keep running in circles_". This is just pasting answers, as this is a popular problem discussed on the internet. If the bot were reasoning, it would just answer "there's no seahorse emoji, please stop asking".
> Me: what is the seahorse emoji?
> Claude: There isn't a seahorse emoji in the standard Unicode emoji set. The closest you'll get is the generic fish or tropical fish , but no dedicated seahorse exists as of now.Edit: I gave up on Copilot ant fed the same instructions to ChatGPT, which had no issue.
The point here is that some models seem to know your intention while some just seem stuck on their training data.
These discussions often get derailed into debates about what "thinking" means. If we define thinking as the capacity to produce and evaluate arguments, as the cognitive scientists Sperber and Mercier do, then we can see LLMs are certainly producing arguments, but they're weak at the evaluation.
In some cases, arguments can be formalised, and then evaluating them is a solved problem, as in the examples of using the Lean proofchecker in combination with LLMs to write mathematical proofs.
That suggests a way forward will come from formalising natural language arguments. So LLMs by themselves might be a dead end but in combination with formalisation they could be very powerful. That might not be "thinking" in the sense of the full suite of human abilities that we group with that word but it seems an important component of it.
If by this you mean "reliably convert expressions made in human natural language to unambiguous, formally parseable expressions that a machine can evaluate the same way every time"... isn't that essentially an unreachable holy grail? I mean, everyone from Plato to Russell and Wittgenstein struggled with the meaning of human statements. And the best solution we have today is to ask the human to restrict the set of statement primitives and combinations that they can use to a small subset of words like "const", "let foo = bar", and so on.
Hot take (and continue with the derailment), but I'd argue that analytic philosophy from the last 100 years suggests this is a dead end. The idea that belief systems could be formalized was huge in the early 20th century (movements like Logical Positivism, or Russell's principia mathematica being good examples of this).
Those approaches haven't really yielded many results, and by far the more fruitful form of analysis has been to conceptually "reframe" different problems (folks like Hillary Putnam, Wittgenstein, Quine being good examples).
We've stacked up a lot of evidence that human language is much too loose and mushy to be formalised in a meaningful way.
Lossy might also be a way of putting it, like a bad compression algorithm. Written language carries far less information than spoken and nonverbal cues.
I think modeling language usefully looks a lot more like psychoanalysis than first order logic.
That bar is so low that even a political pundit on TV can clear it.
In the test setup, the AI added a single database row, ran the query and then asserted the single added row was returned. Clearly this doesn't show that the query works as intended. Is this what people are referring to when they say AI writes their tests?
I don't know what to call this kind of thinking. Any intelligent, reasoning human would immediately see that it's not even close to enough. You barely even need a coding background to see the issues. AI just doesn't have it, and it hasn't improved in this area for years
This kind of thing happens over and over again. I look at the stuff it outputs and it's clear to me that no reasoning thing would act this way
The tooling in the Code tools is key to useable LLM coding. Those tools prompt the models to “reason” whether they’ve caught edge cases or met the logic. Without that external support they’re just fancy autocompletes.
In some ways it’s no different than working with some interns. You have to prompt them to “did you consider if your code matched all of the requirements?”.
LLMs are different in that they’re sorta lobotomized. They won’t learn from tutoring “did you consider” which needs to essentially be encoded manually still.
That has other explanations than that it reasoned its way to the correct answers. Maybe it had very similar code in its training data
This specific example was with Codex. I didn't mention it because I didn't want it to sound like I think codex is worse than claude code
I do realize my prompt wasn't optimal to get the best out of AI here, and I improved it on the second pass, mainly to give it more explicit instruction on what to do
My point though is that I feel these situations are heavily indicative of it not having true reasoning and understanding of the goals presented to it
Why can it sometimes catch the logic cases you miss, such as in your case, and then utterly fail at something that a simple understanding of the problem and thinking it through would solve? The only explanation I have is that it's not using actual reasoning to solve the problems
I really hate this description, but I can't quite filly articulate why yet. It's distinctly different because interns can form new observations independently. AIs can not. They can make another guess at the next token, but if it could have predicted it the 2nd time, it must have been able to predict it the first, so it's not a new observation. The way I think through a novel problem results in drastically different paths and outputs from an LLM. They guess and check repeatedly, they don't converge on an answer. Which you've already identified
> LLMs are different in that they’re sorta lobotomized. They won’t learn from tutoring “did you consider” which needs to essentially be encoded manually still.
This isn't how you work with an intern (unless the intern is unable to learn).
yes
> Any intelligent, reasoning human would immediately see that it's not even close to enough. You barely even need a coding background to see the issues.
[nods]
> This kind of thing happens over and over again. I look at the stuff it outputs and it's clear to me that no reasoning thing would act this way
and yet there're so many people who are convinced it's fantastic. Oh, I made myself sad.
The larger observation about it being statistical inference, rather than reason... but looks to so many to be reason is quite an interesting test case for the "fuzzing" of humans. In line with why do so many engineers store passwords in clear text? Why do so many people believe AI can reason?
"SAL-9000: Will I dream? Dr. Chandra: Of course you will. All intelligent beings dream. Nobody knows why. Perhaps you will dream of HAL... just as I often do." From 2010
Quite the assumption you're making that the same thing does not apply to humans. Not having the technology is not the same thing as not being possible.
And my "reasoning" is pretty much like a long ChatGPT verbal and sometimes not-so-verbal (visual) conversation with myself.
If my mind really did abstract platonic thinking I think answers to hard problems would just instantly appear to me, without flaws. But only problems I hve solved before and can pattern match do that.
And if I have to think any new thoughts I feel that process is rather similar to how LLMs work.
It is the same for history of science really -- only thoughts that build small steps on previous thoughts and participate in a conversation actually are thought by humans.
Totally new leaps, which a "platonic thinking machines" should easily do, do not seem to happen..
Humans are, IMO, conversation machines too...
And the challenge is rethinking how we do work, connecting all the data sources for agents to run and perform work over the various sources that we perform work. That will take ages. Not to mention having the controls in place to make that the "thinking" was correct in the end.
Copilot can't jump to definition in Visual Studio.
Anthropic got a lot of mileage out of teaching Claude to grep, but LLM agents are a complete dead-end for my code-base until they can use the semantic search tools that actually work on our code-base and hook into the docs for our expensive proprietary dependencies.
You seem to be defining "thinking" as an interchangeable black box, and as long as something fits that slot and "gets results", it's fine.
But it's the code-writing that's the interchangeable black box, not the thinking. The actual work of software development is not writing code, it's solving problems.
With a problem-space-navigation model, I'd agree that there are different strategies that can find a path from A to B, and what we call cognition is one way (more like a collection of techniques) to find a path. I mean, you can in principle brute-force this until you get the desired result.
But that's not the only thing that thinking does. Thinking responds to changing constraints, unexpected effects, new information, and shifting requirements. Thinking observes its own outputs and its own actions. Thinking uses underlying models to reason from first principles. These strategies are domain-independent, too.
And that's not even addressing all the other work involved in reality: deciding what the product should do when the design is underspecified. Asking the client/manager/etc what they want it to do in cases X, Y and Z. Offering suggestions and proposals and explaining tradeoffs.
Now I imagine there could be some other processes we haven't conceived of that can do these things but do them differently than human brains do. But if there were we'd probably just still call it 'thinking.'
Great. But most (?) of the business out there aren't paying for the big boy models.
I know of a F100 that got snookered into a deal with GPT 4 for 5 years, max of 40 responses per session, max of 10 sessions of memory, no backend integration.
Those folks rightly think that AI is a bad idea.
I have kids, and you could say the same about toddlers. Terrific mimics, they don't understand the whys.
So I think younger kids have purpose and associate meaning to a lot of things and they do try to get to a specific path toward an outcome.
Of course (depending on the age) their "reasoning" is in a different system than hours where the survival instincts are much more powerful than any custom defined outcome so most of the time that is the driving force of the meaning.
Why I talk about meaning? Because, of course, the kids cannot talk about the why, as that is very abstract. But meaning is a big part of the Why and it continues to be so in adult life it is just that the relation is reversed: we start talking about the why to get to a meaning.
I also think that kids starts to have more complex thoughts than the language very early. If you got through the "Why?" phase you might have noticed that when they ask "Why?" they could mean very different questions. But they don't know the words to describe it. Sometimes "Why?" means "Where?" sometimes means "How?" sometimes means "How long?" .... That series of questioning is, for me, a kind of proof that a lot of things are happening in kids brain much more than they can verbalise.
Humans are good at this because they are truly multi-modal and can interact through many different channels to gather additional context to do the requisite task at hand. Given incomplete requirements or specs, they can talk to co-workers, look up old documents from a previous release, send a Slack or Teams message, setup a Zoom meeting with stakeholders, call customers, research competitors, buy a competitors product and try it out while taking notes of where it falls short, make a physical site visit to see the context in which the software is to be used and environmental considerations for operation.
Point is that humans doing work have all sorts of ways to gather and compile more context before acting or while they are acting that an LLM does not and in some cases cannot have without the assistance of a human. This process in the real world can unfurl over days or weeks or in response to new inputs and our expectation of how LLMs work doesn't align with this.
LLMs can sort of do this, but more often than not, the failure of LLMs is that we are still very bad at providing proper and sufficient context to the LLM and the LLMs are not very good at requesting more context or reacting to new context, changing plans, changing directions, etc. We also have different expectations of LLMs and we don't expect the LLM to ask "Can you provide a layout and photo of where the machine will be set up and the typical operating conditions?" and then wait a few days for us to gather that context for it before continuing.
I don't know what else LLMs need to do? get on the payroll? People are using them heavily. You can't even google things easily without triggering an LLM response.
I think the current millenial and older generation is too used to the pre-LLM way of things, so the resistance will be there for a long time to come. but kids doing homeworks with LLMs will rely on them heavily once they're in the work force.
I don't know how people are not as fascinated and excited about this. I keep watching older scifi content, and LLMs are now doing for us what "futuristic computer persona" did in older scifi.
Easy example: You no longer need copywriters because of LLMs. You had spell/grammar checkers before, but they didn't "understand" context and recommend different phrasing, and check for things like continuity and rambling on.
You absolutely do still need copyeditors for anything you actually care about.
Like using AI to fill in a form. Instead of a proper autonomous agent (which was the project's goal).
Now that AI is "mainstream" and big bubble big money big career/promotion options for management I expect much more of this behaviour going into 2026.
dandelionv1bes•1d ago
After posting that, I came across numerous papers which critique Frey & Osborne’s approach, who are some of the forefathers for the AI job losses figures we see banded around commonly these days. One such paper is here but i can dig out others: https://melbourneinstitute.unimelb.edu.au/__data/assets/pdf_...
It has made me very cautious around bold statements on AI - and I was already at the cautious end.
Retric•1d ago
firstplacelast•1d ago
If you truly believed that, you would be spinning up new projects and offshoots as this is a serious arms race with a ton of potential upside (not just in developing AI, but in leveraging it to build things cheaper). Allegedly every dollar you spent on an engineer is potentially worth 10x(?) what it was a couple years ago. Meaning your profit per engineer could soar, but tech companies decided they don't want more profit? AI is mostly solved and the value of labor has already collapsed? Or AI is a nice band-aid to prop up a smaller group of engineers while we weather the current economic/political environment and most CXO's don't believe there are piles of money to be had by leveraging AI now or the near future.
CharlieDigital•1d ago
polypolypoly2•1d ago
if product spec begins to hamper the dev process, guess what'll be the big focus on e.g. that year's YC
asa400•1d ago
Exactly, so many of these claims are complete nonsense. I'm supposed to believe that boards/investors would be fine with companies doing massive layoffs to maintain flat/minuscule growth, when they could keep or expand their current staffing and massively expand their market share and profits with all this increased productivity?
It's ridiculous. If this stuff had truly increased productivity at the levels claimed we would see firms pouring money into technical staff to capitalize on this newfound leverage.
arwhatever•1d ago
AI is supposedly going to obviate the need for white collar workers, and the best all the CEOs can come up with is the exact current status quo minus the white collar workers?