Why didn't AI “join the workforce” in 2025?

https://calnewport.com/why-didnt-ai-join-the-workforce-in-2025/

237•zdw•1mo ago

Comments

dandelionv1bes•1mo ago

The response to the Sal Khan op-ed resonated with me, along with other parts of this article. Something I’ve been digging more into is some of the figures around proposed job losses from AI. I think I even posted a simulation paper last week.

After posting that, I came across numerous papers which critique Frey & Osborne’s approach, who are some of the forefathers for the AI job losses figures we see banded around commonly these days. One such paper is here but i can dig out others: https://melbourneinstitute.unimelb.edu.au/__data/assets/pdf_...

It has made me very cautious around bold statements on AI - and I was already at the cautious end.

Retric•1mo ago

Job losses aren’t directly tied to productivity, in the short term it’s all about expectations. Many companies are laying people off and then trying to get staff back when it doesn’t work. How much of this is hype and how much is sustained is difficult to determine right now.

firstplacelast•1mo ago

It never made sense to blame AI in the first place for tech layoffs. You have a new tool that you think can supercharge your employees, make them ~10x productive, be leveraged to disrupt all sorts of industries, and have the workforce best suited to learn and use these tools to their full potential. You think the value of labor may soon collapse, but there are piles of money to be made before that happens.

If you truly believed that, you would be spinning up new projects and offshoots as this is a serious arms race with a ton of potential upside (not just in developing AI, but in leveraging it to build things cheaper). Allegedly every dollar you spent on an engineer is potentially worth 10x(?) what it was a couple years ago. Meaning your profit per engineer could soar, but tech companies decided they don't want more profit? AI is mostly solved and the value of labor has already collapsed? Or AI is a nice band-aid to prop up a smaller group of engineers while we weather the current economic/political environment and most CXO's don't believe there are piles of money to be had by leveraging AI now or the near future.

CharlieDigital•1mo ago

    > you would be spinning up new projects and offshoots

If the engineers can 10x their output, this actually exposes the product leadership since I find it unlikely that they can 10x the number of revenue generating projects or 10x their product spec development.

polypolypoly2•1mo ago

AI improvements will chase the bottlenecks

if product spec begins to hamper the dev process, guess what'll be the big focus on e.g. that year's YC

asa400•1mo ago

> Allegedly every dollar you spent on an engineer is potentially worth 10x(?) what it was a couple years ago. Meaning your profit per engineer could soar, but tech companies decided they don't want more profit?

Exactly, so many of these claims are complete nonsense. I'm supposed to believe that boards/investors would be fine with companies doing massive layoffs to maintain flat/minuscule growth, when they could keep or expand their current staffing and massively expand their market share and profits with all this increased productivity?

It's ridiculous. If this stuff had truly increased productivity at the levels claimed we would see firms pouring money into technical staff to capitalize on this newfound leverage.

arwhatever•1mo ago

I’ve had this same thought, although less well-articulated:

AI is supposedly going to obviate the need for white collar workers, and the best all the CEOs can come up with is the exact current status quo minus the white collar workers?

matt3210•1mo ago

A previous company I worked for is San Francisco was very anti remote, but they announced on linked in that they are ok with remote engineers suddenly. It seems it’s still a workers market at least in SF. I’d AI could do it or even reduced head count I don’t think that would be the case.

n_u•1mo ago

what company?

ehnto•1mo ago

I think a practical measure still useful right now, which does capture a lot of the "non-performance" capabilities of an employee, is as follows:

"Why has my job not been outsourced yet, since it is far cheaper?" Those are probably the same reasons why AI won't take your job this year.

Raw coding metrics are a very small part of being a cog in a company, which is not me saying it will never happen. Just me saying that thos focus on coding performance kind of misses the forest for the trees.

felipeerias•1mo ago

The adoption of AI tools for software development will probably not result in sudden layoffs but rather on harder to measure changes, like smaller teams being able to tackle significantly more ambitious projects than before.

I suspect that another kind of impact is already happening in organisations where AI adoption is uneven: suddenly some employees appear to be having a lot more leisure time while apparently keeping the same productivity as before.

senordevnyc•1mo ago

Pretty ironic that he complains about Kahn citing someone who told him AI agents are capable of replacing 80% of call center employees, right after quoting Gary Marcus of all people, claiming LLMs will never live up to the hype.

If you want to focus on what AI agents are actually capable of today, the last person I'd pay any attention to is Marcus, who has been wrong about nearly everything related to AI for years, and does nothing but double down.

Jonovono•1mo ago

What has he been wrong about? He was way ahead of predicting the scaling limitations, llm not making it to agi.

verdverm•1mo ago

What scaling limitations, Gemini 3 shows us that is not over yet, and little brother flash is a hyper sparse, 1T parameter model (aiui) that is both fast and good

I agree with GP, Marcus has not been an accurate or significant voice, could care lass what he has to say about ai. He's not a practitioner anymore in my mind

renewiltord•1mo ago

Gary Marcus is just a free Outrage As A Service. People politically align with him and then feel the need to go along with the rest of the charade.

rvz•1mo ago

Well clearly LLMs are not AGI, and all such calls of them being 'AGI' have been a pump and dump scam. So he got that dead right for years.

senordevnyc•1mo ago

Who has said that "LLMs are AGI"?

Terr_•1mo ago

Probably any sales and marketing departments of companies with an "AI" product (based on an LLM) which is presented as having AGI-like capabilities.

I'm doubt parent poster was referring to anyone phrasing it in those literal terms. Kind of like how "some people claim flavored water can cure cancer" doesn't mean that's the literal pitch being given for the snake-oil.

senordevnyc•1mo ago

That's complete bullshit, and you know it. The big labs are saying that AGI will be here soon, not that it's here now. Please prove me wrong.

asadotzler•1mo ago

"We are now confident we know how to build AGI as we have traditionally understood it. We believe that, in 2025, we may see the first AI agents 'join the workforce' and materially change the output of companies."

We know how to build it and it will be entering the workforce in 2025. Well, we're in 2026 now and we don't have it in the workforce or anywhere else because they haven't built it because they don't really know how to build it because they're hucksters selling vaporware built on dead end technologies they cannot admit to.

senordevnyc•1mo ago

I know tons of engineers who are coding all day every day with agents. And that doesn't have anything to do with whether AGI is here...

suddenlybananas•1mo ago

https://news.ycombinator.com/item?id=46494830 This was on HN yesterday.

senordevnyc•1mo ago

lol, yes, Gary Marcus is worth quoting because some random blogger said AGI is here.

jcastro•1mo ago

> In one example I cite in my article, ChatGPT Agent spends fourteen minutes futilely trying to select a value from a drop-down menu on a real estate website

Man dude, don't automate toil add an API to the website.It's supposed to have one!

stvltvs•1mo ago

It probably has one that the web form is already using, but if agentic AI requires specialized APIs, it's going to be a while before reality meets the hype.

edfletcher_t137•1mo ago

> But for now, I want to emphasize a broader point: I’m hoping 2026 will be the year we stop caring about what people believe AI might do, and instead start reacting to its real, present capabilities.

> So, this is how I’m thinking about AI in 2026. Enough of the predictions. I’m done reacting to hypotheticals propped up by vibes. The impacts of the technologies that already exist are already more than enough to concern us for now…

SPOT ON, let us all take inspiration. "The impacts of the technologies that already exist are already more than enough to concern us for now"!

doctorpangloss•1mo ago

Cal Newport looked in the wrong places. He has no visibility into the usage of ChatGPT to do homework. The collapse of Chegg should tell you, with no other public information, that if 30% of students were already cheating somehow, somewhat weakly, they are now doing super-powerful cheating, and surely more than 30% of students at this stage.

It’s also kind of stupid to hand wave away, programming. Programmers are where all the early adopters of software are. He’s merely conflating an adoption curve with capabilities. Programmers, I’m sure, were also the first to use Google and smartphones. “It doesn’t work for me” is missing the critical word “yet” at the end, and really, is it saying much that forecasts about adoption in the metric, “years until when Cal Newport’s arbitrary criteria of what agent and adoption means meets some threshold only inside Cal Newport’s head” is hard to do?

There are 700m active weeklies for ChatGPT. It has joined the workforce! It just isn’t being paid the salaries.

bpavuk•1mo ago

read it again. he criticizes the hype built around 2025 as the Year X for agents. many were thinking that "we'll carry PCs in our pockets" when Windows Mobile-powered devices came out. many predicted 2003 as the Year X for what we now call smartphones.

no, it was 2008, with the iPhone launch.

lukev•1mo ago

Wow, homework is an insane example of a "workforce."

Homework is in some ways the opposite of actual economic labor. Students pay to attend school, and homework is (theoretically) part of that education; something designed to help students learn more effectively. They are most certainly not paid for it.

Having a LLM do that "work" is economically insane. The desired learning does not happen, and the labor of grading and giving feedback is entirely wasted.

Students use ChatGPT for it because of perverse incentives of the educational system. It has no bearing on economic production of value.

rconti•1mo ago

Importantly, the _reason_ that ChatGPT is good at this kind of homework, is that the homework is _intended_ to be toil. That's how we learn- through doing things, and through repetition.

The problem set or paper you turn in is not the product. The product is the learning that the human obtains from the _process_.

The homework is just there, being graded, to evaluate your progress at performing the required toil.

mr_toad•1mo ago

> The homework is just there, being graded, to evaluate your progress at performing the required toil.

There’s the problem, some students don’t want an education, they just want a qualification, even if it means cheating on the evaluation.

lovich•1mo ago

I’ve got much better education reading and researching myself than I did from most of my classes in undergrad where an underpaid TA or professor who was just there for their mandatory minimum teaching time before returning to their research, just reading out loud portions of the textbook and then even doing shit as dumb as giving us physics labs that had a fill in the blank sheet with an answer bank on the paper as the homework.

There are good schools where you can get an actual education you couldn’t on your own but a lot of universities are similarly only interested in getting your money in exchange for a qualification.

Like, all the advertising I saw from schools was about job placement rates after graduation, not praising the education itself

onraglanroad•1mo ago

But, taking a step back, assigning homework in the first place is economically insane.

What's the point? Who ever actually learnt anything from homework?

lukev•1mo ago

I have. Reading assignments and writing papers on them gave me a good command of topics I carry to this day.

And I never would have been able to learn math without doing a bunch of problems early on... you can think you understand something in class but it takes applying it a bunch of times in different scenarios to really embed that knowledge in useful ways.

I presume I'm not the only one.

onraglanroad•4w ago

I always read more anyway. Homework interfered with my reading. Similarly with maths problems.

But I believe they benefitted you. I just would have carried on without the enforcement.

skobux•1mo ago

> He’s merely conflating an adoption curve with capabilities.

Sure, programmers would still adopt LLMs faster than the rest of the work-force whether or not the LLMs were good at writing code. But you have to at credit at least some of that adoption rate to the fact that LLMs are significantly better at text (e.g. code) generation than they are at most other white-collar tasks (e.g. using a web browser)

bpavuk•1mo ago

a stellar piece, Cal, as always. short and straight to the point.

I believe that Codex and the likes took off (in comparison to e.g. "AI" browsers) because the bottleneck there was not reasoning about code, it was about typing and processing walls of text. for a human, the interface of e.g. Google Calendar is ± intuitive. for a LLM, any graphical experience is an absolute hellscape from performance standpoint.

CLI tools, which LLMs love to use, output text and only text, not images, not audio, not videos. LLMs excel at text, hence they are confined to what text can do. yes, multimodal is a thing, but you lose a lot of information and/or context window space + speed.

LLMs are a flawed technology for general, true agents. 99% of the time, outside code, you need eyes and ears. we have only created a self-writing paper yet.

Yizahi•1mo ago

Codex and the like took off because there existed a "validator" of its work - a collection of pre-existing non-LLM software - compilers, linters, code analyzers etc. And the second factor is very limited and defined grammar of programming languages. Under such constraints it was much easier to build a text generator which will validate itself using external tools in a loop, until generated stream makes sense.

And the other "successful" industry being disrupted is the one where there is no need validate output, because errors are ok or irrelevant. A text not containing much factual data, like fiction or business-lingo or spam. Or pictures, where it doesn't matter which color is a specific pixel, a rough match will do just fine.

But outside of those two options, not many other industries can use at scale an imprecise word or media generator. Circular writing and parsing of business emails with no substance? Sure. Not much else.

zarzavat•1mo ago

This is the reasoning deficit. Models are very good at generating large quantities of truthy outputs, but are still too stupid to know when they've made a serious mistake. Or, when they are informed about a mistake they sometimes don't "get it" and keep saying "you're absolutely right!" while doing nothing to fix the problem.

It's a matter of degree, not a qualitative difference. Humans have the exact same flaws, but amateur humans grow into expert humans with low error rates (or lose their job and go to work in KFC), whereas LLMs are yet to produce a true expert in anything because their error rates are unacceptably high.

fpaf•1mo ago

Besides the ability to deal with text, I think there are several reasons why coding is an exceptionally good fit for LLMs.

Once LLMs gained access to tools like compilers, they started being able to iterate on code based on fast, precise and repeatable feedback on what works and what doesn't, be it failed tests or compiler errors. Compare this with tasks like composing a powerpoint deck, where feedback to the LLM (when there is one) is slower and much less precise, and what's "good" is subjective at best.

Another example is how LLMs got very adept at reading and explaining existing code. That is an impressive and very useful ability, but code is one of the most precise ways we, as humans, can express our intent in instructions that can be followed millions of times in a nearly deterministic way (bugs aside). Our code is written in thoroughly documented languages with a very small vocabulary and much easier grammar than human languages. Compare this to taking notes in a zoom call in German and trying to make sense of inside jokes, interruptions and missing context.

But maybe most importantly, a developer must be the friendliest kind of human for an LLM. Breaking down tasks in smaller chunks, carefully managing and curating context to fit in "memory", orchestrating smaller agents with more specialized tasks, creating new protocols for them to talk to each others and to our tools.... if it sounds like programming, it's because it is.

rhubarbtree•1mo ago

LLMs are good at coding (well, kinda, sometimes) because programmers gave away their work away for free and created vast training data.

Lio•1mo ago

I don’t think “giving away” has much to do with it.

I mean we did give away code as training data but we also know that AI companies just took pirated books and media too.

So I don’t think gifting has much to do with it.

Next all the Copilot users will be “giving away” all their business processes and secrets to Microsoft to clone.

gosub100•1mo ago

I agree with that. For code, most of it was in a "public space" similar to driving down a street and training the model on trees and signs etc. The property is not yours but looking at it doesn't require ownership.

XenophileJKO•1mo ago

It was not a well thought out piece and it is discounting the agentic progress that has happened.

>The industry had reason to be optimistic that 2025 would prove pivotal. In previous years, AI agents like Claude Code and OpenAI’s Codex had become impressively adept at tackling multi-step computer programming problems.

It is easy to forget that Claude Code CAME OUT in 2025. The models and agents released in 2025 really DID prove how powerful and capable they are. The predictions were not really wrong. I AM using code agents in a literal fire and forget way.

Claude Code is a hugely capable agentic interface for sovling almost any kind of problem or project you want to solve for personal use. I literally use it as the UX for many problems. It is essentially a software that can modify itself on the fly.

Most people haven't really grasped the dramatic paradigm shift this creates. I haven't come up with a great analogy for it yet, but the term that I think best captures how it feels to work with claude code as a primary interface is "intelligence engine".

I'll use an example, I've created several systems harnessed around Claude Code, but the latest one I built is for stock porfolio management (This was primarily because it is a fun problem space and something I know a bit about). Essentially you just used Claude Code to build tools for itself in a domain. Let me show how this played out in this example.

Claude and I brainstorma general flow for the process and roles. Then figure out what data each role would need, research what providers have the data at a reasonable price.

I purchase the API keys and claude wires up tools (in this case python scripts and documentation for the agents for about 140 api endpoints), then builds the agents and also creates an initial vesrion of the "skill" that will invoke the process that looks something like this:

Macro Economist/Strategist -> Fact Checker -> Securities Sourcers -> Analysts (like 4 kinds) -> Fact Checker/Consolidator -> Portfolio Manager

Obviously it isn't 100% great on the first pass and I have to lean on expertise I have in building LLM applications, but now I have a Claude Code instance that can orchestrate this whole research process and also handle ad-hoc changes on the fly.

Now I have evolved this system through about 5 significant iterations, but I can do it "in the app". If I don't like how part of it is working, I just have the main agent rewire stuff on the fly. This is a completely new way of working on problems.

observationist•1mo ago

I've seen organizations where 300 of 500 people could effectively be replaced by AI, just by having some of the the remaining 200 orchestrate and manage automation workflows that are trivially within the capabilities of current frontier models.

There's a whole lot of bullshit jobs and work that will get increasingly and opaquely automated by AI. You won't see jobs go away unless or until organizations deliberately set out to reduce staff. People will use AI throughout the course of their days to get a couple of "hours" of tasks done in a few minutes, here and there, throughout the week. I've already seen reports and projects and writing that clearly comes from AI in my own workplace. Right now, very few people know how to recognize and assess the difference between human and AI output, and even fewer how to calibrate work assignments.

Spreadsheet AIs are fantastic, reports and charting have just hit their stride, and a whole lot of people are going to appear to be very productive without putting a whole lot of effort into it. And then one day, when sufficiently knowledgable and aware people make it into management, all sorts of jobs are going to go quietly away, until everything is automated, because it doesn't make sense to pay a human 6 figures what an AI can do for 3 figures in a year.

I'd love to see every manager in the world start charting the Pareto curves for their workplaces, in alongside actual hours worked per employee - work output is going to be very wonky, and the lazy, clever, and ambitious people are all going to be using AI very heavily.

Similar to this guy: https://news.ycombinator.com/item?id=11850241

https://www.reddit.com/r/BestofRedditorUpdates/comments/tm8m...

Part of the problem is that people don't know how to measure work effectively to begin with, let alone in the context of AI chatbots that can effectively do better work than anyone a significant portion of the adult population of the planet.

The teams that fully embrace it, use the tools openly and transparently, and are able to effectively contrast good and poor use of the tools, will take off.

consp•1mo ago

> because it doesn't make sense to pay a human 6 figures what an AI can do for 3 figures in a year.

Humans have one bit over "AI": You can't blame and fire "AI" when it inevitably goes wrong.

skobux•1mo ago

> I've seen organizations where 300 of 500 people could effectively be replaced by AI, just by having some of the the remaining 200 orchestrate and manage automation workflows that are trivially within the capabilities of current frontier models

Curious, what industries? And what capabilities do LLMs present to automate these positions that previous technologies do not?

'Bullshit jobs' and the potential to automate them are very real, but I think many of them could have been automated long before LLMs, and I don't think the introduction of LLMs is going to solve the bottleneck that prevents jobs like these from being automated.

kukkeliskuu•1mo ago

What do you think is the bottleneck?

psunavy03•1mo ago

The percentage of jobs that are actually bullshit as opposed to the percentage of jobs the person making the claim thinks are bullshit merely because they are not that person's own job.

Which is, of course, conveniently never a bullshit job but a Very Important One.

Yizahi•1mo ago

> Spreadsheet AI

If you don't mind, could you please write a few examples of what LLMs do in Spreadsheets? Because that's probably the last place where I would allow LLMs, since they tend to generate random data and spreadsheets being notoriously hard to debug due all the hidden formulas and complex dependencies.

Say you have an accounting workbook with 50 or so sheets with tables depending on each other and they contain very important info like inventory and finances. Just a typical small to medium business setup (big corporations also do it). Now what? Do you allow LLMs to edit files like that directly? Do you verify changes afterwards and how?

doug_durham•1mo ago

Do LLM's generate "random data"? I you give them source data there is virtually no room for hallucination in my experience. Spreadsheets are no different than coding. You can put tests in place to verify results.

renewiltord•1mo ago

Which are the best spreadsheet AIs?

zeroonetwothree•1mo ago

It seems like we are using AI to automate the unimportant parts of jobs that we shouldn’t have been doing anyway. Things like endless status reports or emails.

But from what I’ve seen it just makes that work output even less meaningful—who wants to read AI generated 10 pages that could have been two bullet points?

And it doesn’t actually improve productivity because that was never the bottleneck of those jobs anyway. If anything, having some easy rote work is a nice way to break up the pace.

Terr_•1mo ago

Employee has a few bullet-points of updates, they feed it through an LLM to fluff it out into an email to their manager, and then the manager puts the received email through an LLM to summarize it down to a few bullet points... Probably making some mistakes.

There are all these things in writing we used as signals for intelligence, attention to detail, engagement, willingness to accept feedback, etc... but they're now easy to counterfeit at scale.

Hopefully everyone realizes what's going on and cuts out the middleman.

procaryote•1mo ago

AI doing a bullshit job isn't a productivity increase though; it's at best a cost cut. It would be an even bigger cost cut to remove the bullshit job

milancurcic•1mo ago

> The industry had reason to be optimistic that 2025 would prove pivotal. In previous years, AI agents like Claude Code and OpenAI’s Codex had become impressively adept at tackling multi-step computer programming problems.

Both of these agents launched mid-2025.

bpavuk•1mo ago

don't forget Aider from 2023

outofpaper•1mo ago

Still working hard and now we also have Aider-ce.

LPisGood•1mo ago

I’m confused. Claude is a few years old.

bpavuk•1mo ago

but not Claude Code. it was released just this summer (I guess?)

advisedwang•1mo ago

The parent comment specifically referenced Claude Code, which launched in Feb 2025 [1] and went GA May 2025 [2]. Codex also launched May 2025 [3].

[1] https://www.anthropic.com/news/claude-3-7-sonnet

[2] https://www.anthropic.com/news/claude-4

[3] https://openai.com/index/introducing-codex/

userbinator•1mo ago

That makes me wonder how much of the article is true, and how much was hallucinated by an AI.

evil-olive•1mo ago

yes, 100%

I think that way too often, discussions of the current state of tech get derailed by talking about predictions of future improvements.

hypothetical thought experiment:

I set a New Year's resolution for myself of drinking less alcohol.

on New Year's Eve, I get pulled over for driving drunk.

the officer wants to give me a sobriety test. I respond that I have projected my alcohol consumption will have decreased 80% YoY by Q2 2026.

the officer is going to smile and nod...and then insist on giving me the sobriety test.

compare this with a non-hypothetical anecdote:

I was talking with a friend about the environmental impacts of AI, and mentioned the methane turbines in Memphis [0] that are being used to power Elon Musk's MechaHitler slash CSAM generator.

the friend says "oh, but they're working on building nuclear power plants for AI datacenters".

and that's technically true...but it misses the broader point.

if someone lives downwind of that data center, and they have a kid who develops asthma, you can try to tell them "oh in 5 years it'll be nuclear powered". and your prediction might be correct...but their kid still has asthma.

0: https://time.com/7308925/elon-musk-memphis-ai-data-center/

yieldcrv•1mo ago

everyone excited about AI agents doesn’t have to evaluate the actual output they do

Very few people do

so neither Altman, the many CEOs industry wide, Engineering Managers, Software Engineers, “Forward Deployed Engineers” have to actually inspect

their demos show good looking output

its just the people in support roles that have to be like “wait a minute, this is very inconsistent”

all while everyone is doing their best not to get replaced

its clanker discrimination and mixed with clanker incompetence

XorNot•1mo ago

"good looking output" is exactly the problem. They're all good at good looking output which survives a first glance.

relaxing•1mo ago

There needs to be a companion to Betteridge’s law that addresses AI-related headlines with “because since the beginning of time the field of artificial intelligence over-promises and under-delivers.”

drusepth•1mo ago

"People using AI" had a meaningful change when they "joined the workforce" in 2025.

We may not have gotten fully-autonomous employees, but human employees using AI are doing way more than they could before, both in depth and scale.

Claude Code is basically a full-time "employee" on my (profitable) open source projects, but it's still a tool I use to do all the work. Claude Code is basically a full-time "employee" at my job, but it's still a tool I use to do all the work. My workload has shifted to high-level design decisions instead of writing the code, which is kind of exactly what would have happened if AI "joined the workforce" and I had a bunch of new hires under me.

I do recognize this article is largely targeted at non-dev workforces though, where it _largely_ holds up but most of my friends outside of the tech world have either gotten new jobs thanks to increased capability through AI or have severely integrated AI into whatever workflows they're doing at work (again, as a tool) and are excelling compared to employees who don't utilize AI.

llmslave2•1mo ago

> human employees using AI are doing way more than they could before, both in depth and scale

Funny how that doesn't show up in any productivity or economic metrics...

9rx•1mo ago

Bit too soon to tell, no? Claude Code wasn't released until the latter half of Q2, offering little time for it to show up in those figures, and Q3 data is only preliminary right now. Moreover, it seems to be the pairing with Opus 4.5 that lends some credence to the claims. However, it was released in Q4. We won't have that data for quite a while. And like Claude Code, it came late in the quarter, so realistically we really need to wait on Q1 2026 figures, which hasn't happened yet and won't really start to appear until summertime and beyond.

That said, I expect you are right that we won't see it show up. Even if we assume the claim is true in every way for some people, it only works for exceptional visionaries who were previously constrained by typing speed, which is a very, very, very small segment of the developer population. Any gains that small group realize will be an unrecognizable blip amid everything else. The vast majority of developers need all that typing time and more to have someone come up with their next steps. Reducing the typing time for them doesn't make them any more productive. They were never limited by typing speed in the first place.

Ianjit•1mo ago

The productivity studies on software engineers directly don't show much of a productivity gain certainly nowhere near the 10x the frontier labs would like to claim.

When including re-work of bugs in the AI generated code some studies find that AI has no positive impact on software developer productivity, and can even have a negative impact.

The main problem with these studies are they are backward looking, so frontier labs can always claim the next model will be the one that delivers the promised productivity gains and displace human workers.

edanm•1mo ago

> The productivity studies on software engineers directly don't show much of a productivity gain certainly nowhere near the 10x the frontier labs would like to claim.

Which studies are you talking about? The last major study that I saw (that gained a lot of attention) was published half a year ago, and the study itself was conducted on developers using AI tools in 2024.

The technology has improved so rapidly that this study is now close-to-meaningless.

Ianjit•4w ago

A few studies over different time frames:

[1] https://www.youtube.com/watch?v=1OzxYK2-qsI [2] https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-gen... [3] https://www.youtube.com/watch?v=JvosMkuNxF8 [4] https://www.faros.ai/blog/ai-software-engineering

"The technology has improved so rapidly that this study is now close-to-meaningless."

You could have said that anytime in the last 3 years, but the data has never shown it to be true. Is there data to show that the current gen models are so much better than the last gen models that the existing productivity data should be ignored? I don't think the coding benchmarks show a step change in capabilities, its generally dev vibes rather than a large change to measurements.

agnosticmantis•1mo ago

One would have to control for all the layoffs, tarrifs, interest rates etc too.

On the other hand, people are working much harder today than 3 years ago (remember people not showing up to work and posting on TikTok about how little work did collecting paychecks from 2 different companies etc?)

Just saying it's very hard to look at a time series and determine an effect size, even though politicians/CEOs like to claim ownership for growths.

blhack•1mo ago

Can you elaborate on this more? What would be a task you would use claude code for, and what would accomplishing the task look like?

rhubarbtree•1mo ago

Humans are doing a bit more, specifically around 20% more.

AI generates output that must be thoroughly check for most software engineering purposes. If you’re not checking the output, then quality and accuracy must not matter. For quick prototyping that’s mostly true. Not for real engineering.

LunaSea•1mo ago

> Claude Code is basically a full-time "employee" on my (profitable) open source projects,

What fulltime employee works for 30 minutes and then stops working for the next 5 hours and 30 minutes like Claude does?

llmslave2•1mo ago

Once again, more evidence mounts that AI is massively overhyped and limited in usefulness, and once again we will see people making grandiose claims (without evidence of course) and predictions that will inevitably fall flat in the future. We are, of course, perpetually just 3-6 months away from when everything changes.

I think Carmack is right, LLM's are not the route to AGI.

aranelsurion•1mo ago

> I’m hoping 2026 will be the year we stop caring about what people believe AI might do, and instead start reacting to its real, present capabilities.

So well put.

LLMs are useful for a great many things. It's just that being the best new product of the recent years, maybe even defining a decade doesn't cut it. It has to be the century-defining, world-ending, FOMO-inducing massive thing to put Skynet to shame and justify investments in trillion dollars. It's either AI joining the workforce soon, or Nvidia and OpenAI aren't that valuable.

I guess it manages to maximize shareholder value, and make AI feel like a disappointment.

simonw•1mo ago

Agents as staff replacements that can tackle tasks you would normally assign to a human employee didn't happen in 2025.

Agents as LLMs calling tools in a loop to perform tasks that can be handled by typing commands into a computer absolutely did.

Claude Code turns out to be misnamed: it's useful for way more than just writing code, once you figure out how to give it access to tools for other purposes.

I think the browser agents (like the horribly named "ChatGPT Agent" - way to burn a key namespace on a tech demo!) have acted as a distraction from this. Clicking links is still pretty hard. Running Bash commands on the other hand is practically a solved problem.

hecanjog•1mo ago

We still sandbox, quarantine and restrict them though, because they can't really behave as agents, but they're effective in limited contexts. Like the way waymo cars kind of drive on a track I guess? Still very useful, but not the agents that were being sold, really.

Edit: should we call them "special agents"? ;-)

jdross•1mo ago

Have you been in a Waymo recently or used Tesla FSD 14.2? I live in Austin and my Model 3 is basically autonomous - regularly going for hours from parking space to destination parking space without my touching the steering wheel, navigating really complex situations including construction workers using hand motions to signal the car.

esafak•1mo ago

Thanks for collecting training data for the rest of us, 'coz I don't trust Musk with my life.

senordevnyc•1mo ago

Have you been in a Waymo recently or used Tesla FSD 14.2?

One of these things is not like the other...

Culonavirus•1mo ago

"special" agents from the CIA (Clanker Intelligence Agency)

LunaSea•1mo ago

> Agents as LLMs calling tools in a loop to perform tasks that can be handled by typing commands into a computer absolutely did.

I think that this still isn't true for even very mundane tasks like "read CSV file and translate column B in column C" for files with more than ~200 lines. The LLM will simply refuse to do the work and you'll have to stitch the badly formatted answer excerpts together yourself.

simonw•1mo ago

Try it. It will work fine, because the coding agent will write a little Python script (or sed or similar) and run that against the file - it won't attempt to rewrite the file by reading it and then outputting the transformed version via the LLM itself.

fragmede•1mo ago

In December of 2025, I took five tickets I was assigned in Jira and threw them at codex, which just did them, and with the help of MCPs, codex was able to read the ticket, generate some code, test the code, update gitlab, create a merge request on Gitlab, and update the Jira with the MR. CodeRabbit then reviewed the MR before a human had to look at it. It didn't happen in 2025, but I see it happening for 2026.

stevage•1mo ago

This article seems based in a poorly defined statement. What does "joining the workforce" actually mean?

There are plenty of jobs that have already been pretty much replaced by AI: certain forms of journalism, low-end photoshop work, logo generation, copywriting. What does the OP need to see in order to believe that AI has "joined the workforce"?

tim333•1mo ago

It was from Altman's blog:

> We are now confident we know how to build AGI as we have traditionally understood it. We believe that, in 2025, we may see the first AI agents “join the workforce” and materially change the output of companies...

"materially change the output of companies" seems fairly defined and didn't happen in most cases. I guess some kicked out more slop but I don't think that's what he meant.

grumbel•1mo ago

TikTok, Youtube, news, blogs, … are getting flooded with AI generated content, I'd call that a pretty substantial "change in output".

I think the mistake here is expecting that AI is just making workers in older jobs faster, when the reality is, more often than not, that it changes the nature of the task itself.

Whenever AI reached the "good enough" point, it doesn't do so in a way that nicely aligns with human abilities, quite the opposite, it might be worse at performing a task, but be able to perform it 1000x faster. That allows you to do things that weren't previously possible, but it also means that professionals might not want to rely on using AI for the old tasks.

A professional translator isn't going to switch over to using AI, the quality isn't there yet, but somebody like Amazon could offer a "OCR & translate all the books" service and AI would be good enough for it, since it could handle all the books that nobody has the time and money to translate manually. Which in turn will eventually put the professional translator out of a job when it gets better than good enough. We aren't quite there yet, but getting pretty close.

In 2025 a lot of AI went from "useless, but promising" to "good enough".

ahussain•1mo ago

Agentic AI companies are doing millions in revenue. Just because agents haven’t spread to the entire economy yet doesn’t mean they are not useful for relatively complex tasks.

XorNot•1mo ago

Okay but so is like, the 3 store chain which does my cars tires.

Millions in revenue ain't hard to hit with extremely modest business.

sandworm101•1mo ago

And just because people are thowing money at an AI company doesnt mean they have or will ever have a marketable product.

The #1 product of nearly every AI company is hope, hope that one day they will replace the need to pay real employees. Hope like that allows a company to cut costs and fund dividends ... in the short term. The long term is some other person's problem. (Ill change my mind the day Bill Gates trusts MS copilot with his personal banking details.)

aik•1mo ago

When did hacker news become laggard-adopter/consumer-news.

Cal is a consumer of AI - interesting article for this community, but not this community. I thought hacker news was for builders and innovators - people who see the potential of a technology for solving problems big and small and go and tinker and build and explore with it, and sometimes eventually change the world (hopefully for the better). Instead of sitting on the sidelines grumbling about that some particular tech that hasn’t yet changed the world / met some particular hype (yet).

Incredibly naive to think AI isn’t making real difference already (even without/before replacing labor en masse.)

Actually try to explore the impact a bit. It’s not AGI, but doesn’t have to be to transform. It’s everywhere and will do nothing but accelerate. Even better, be part of proving Cal wrong for 2026.

PunchyHamster•1mo ago

not much in income tho

nunez•1mo ago

I'm curious about how they count revenue...

hackable_sand•1mo ago

One million companies with a dollar in revenue?

moezd•1mo ago

I recall someone saying stories of LLMs doing something useful to "I have a Canadian girlfriend" stories. Not trying to discredit or be a pessimist, can anyone elaborate how exactly they use these agents while working in interdependent projects in multi-team settings in e.g. regulated industries?

heyitsguay•1mo ago

Agreed. I've never seen a concrete answer with an outcome that can be explained in clear, simple terms.

moron4hire•1mo ago

Lately, it seems like all the blogs have shifted away from talking about productivity and are now talking about how much they "enjoy" working with LLMs.

If firing up old coal plants and skyrocketing RAM prices and $5000 consumer GPUs and violating millions of developers' copyrights and occasionally coaxing someone into killing themselves is the cost of Brian From Middle Management getting to Enjoy Programming Again instead of having to blame his kids for not having any time on the weekends, I guess we have no choice but to oblige him his little treat.

wiseowise•1mo ago

It’s the honeymoon period with crack all over again. Everyone feels great until their teeth start falling out.

jdross•1mo ago

I work in insurance - regulated, human capital heavy, etc.

Three examples for you: - our policy agent extracts all coverage limits and policy details into a data ontology. This saves 10-20 mins per policy. It is more accurate and consistent than our humans - our email drafting agent will pull all relevant context on an account whenever an email comes in. It will draft a reply or an email to someone else based on context and workflow. Over half of our emails are now sent without meaningfully modifying the draft, up from 20% two months ago. Hundreds of hours saved per week, now spent on more valuable work for clients. - our certificates agent will note when a certificate of insurance is requested over email and automatically handle the necessary checks and follow up options or resolution. Will likely save us around $500k this year.

We also now increasingly share prototypes as a way to discuss ideas. Because the cost to vibe code something illustrative is very low, an it’s often much higher fidelity to have the conversation with something visual than a written document

hattmall•1mo ago

Thanks for that. It's a really interesting data point. My takeaway, which I've already felt and I feel like anyone dealing with insurance would anyway, is that the industry is wildly outdated. Which I guess offers a lot of low hanging fruit where AI could be useful. Other than the email drafting, it really seems like all of that should have been handled by just normal software decades ago.

mjevans•1mo ago

A big win for 'normal software' here is to have authentication as a multi-party/agent approval process. Have the client of the insurance company request the automated delivery of certified documents to some other company's email.

potamic•1mo ago

> our policy agent extracts all coverage limits and policy details into a data ontology.

Are they using some software for this or was this built in-house?

heyitsguay•1mo ago

Thanks for this answer! I appreciate the clarity, I can see the economic impact for your company. Very cool.

ThrowawayTestr•1mo ago

>our policy agent extracts all coverage limits and policy details into a data ontology

Aren't you worried about the agent missing or hallucinating policy details?

tonyedgecombe•1mo ago

Management has decreed that won't happen so it won't.

senko•1mo ago

What an uncharitable and nasty comment for something they clearly addressed in theirs:

> It is more accurate and consistent than our humans.

So, errors can clearly happen, but they happen less often than they used to.

> It will draft a reply or an email

"draft" clearly implies a human will will double-check.

tonyedgecombe•1mo ago

>So, errors can clearly happen, but they happen less often than they used to.

If you take the comment at face value. I'm sorry but I've been around this industry long enough to be sceptical of self serving statements like these.

>"draft" clearly implies a human will will double-check.

I'm even more sceptical of that working in practice.

ptx•1mo ago

> "draft" clearly implies a human will will double-check.

The wording does imply this, but since the whole point was to free the human from reading all the details and relevant context about the case, how would this double-checking actually happen in reality?

phantasmish•1mo ago

The “double checking” is a step to make sure there’s someone low-level to blame. Everyone knows the “double-checking” in most of these systems will be cursory at best, for most double-checkers. It’s a miserable job to do much of, and with AI, it’s a lot of what a person would be doing. It’ll be half-assed. People will go batshit crazy otherwise.

On the off chance it’s not for that reason, productivity requirements will be increased until you must half-ass it.

senko•1mo ago

> the whole point was to free the human from reading all the details and relevant context about the case

That's your assumption.

My read of that comment is that it's much easier to verify and approve (or modify) the message than it is to write it from scratch. The second sentence does confirm a person then modifies it in half the cases, so there is some manual work remaining.

It doesn't need to be all or nothing.

JTbane•1mo ago

I think it's a good comment, given that the best agents seem to hallucinate something like 10% on a simple task and more than 70% on complex ones.

pwagland•1mo ago

The real question is how do you enforce that the human is reviewing and double-checking?

When the AI gets "good enough", and the review becomes largely rubber stamping, and 50% is pretty close to that, then you run the risk that a good percentage of the reviews are approved without real checks.

This is why nuclear operators and security scanning operators have regular "awareness checks". Is something like this also being done, and if so what is the failure rate of these checks?

jryb•1mo ago

Years ago I worked at an insurance company where the whole job was doing this - essentially reading through long PDFs with mostly unrelated information and extracting 3-4 numbers of interest. It paid terrible and few people who worked there cared about doing a good job. I’m sure mistakes were constantly being made.

stefan_•1mo ago

That sounds a lot like "LLMs are finally powerful enough technology to overcome our paper/PDF-based business". Solving problems that frankly had no business existing in 2020.

throaway45425•1mo ago

I think we are the stage of the "AI Bubble" that is equivalent to saying it is 1997, 18% of U.S. households have internet access. Obviously, the internet is not working out or 90%+ of households would have internet access if it was going to be as big of deal as some claim.

I work at a place that is doing nothing like this and it seems obvious to me we are going to get put out of business in the long run. This is just adding a power law on top of a power law. Winner winner take all. What I currently do will be done by software engineers and agents in 10 years or less. Gemini is already much smarter than I am. I am going to end up at a factory or Walmart if I can get in.

The "AI bubble" is a mass delusion of people in denial of this reality. There is no bubble. The market has just priced all this forward as it should. There is a domino effect of automation that hasn't happened yet because your company still has to interface with stupid companies like mine that are betting on the hand loom. Just have to wait for us to bleed out and then most people will never get hired for white collar work again.

It amuses me when someone says who is going to want the factory jobs in the US if we reshore production? Me and all the other very average people who get displaced out of white collar work and don't want to be homeless is who.

"More valuable" work is just 2026 managerial class speak for "place holder until the agent can take over the task".

linkjuice4all•1mo ago

Here's some anecdata from the B2B SaaS company I work at

- Product team is generating some code with LLMs but everything has to go through human review and developers are expected to "know" what they committed - so it hasn't been a major time saver but we can spin up quicker and explore more edge cases before getting into the real work

- Marketing team is using LLMs to generate initial outlines and drafts - but even low stakes/quick turn around content (like LinkedIn posts and paid ads) still need to be reviewed for accuracy, brand voice, etc. Projects get started quicker but still go through various human review before customers/the public sees it

- Similarly the Sales team can generate outreach messaging slightly faster but they still have to review for accuracy, targeting, personalization, etc. Meeting/call summaries are pretty much 'magic' and accurate-enough when you need to analyze any transcripts. You can still fall back on the actual recording for clarification.

- We're able to spin up demos much faster with 'synthetic' content/sites/visuals that are good-enough for a sales call but would never hold up in production

---

All that being said - the value seems to be speeding up discovery of actual work, but someone still needs to actually do the work. We have customers, we built a brand, we're subject to SLAs and other regulatory frameworks so we can't just let some automated workflow do whatever it wants without a ton of guardrails. We're seeing similar feedback from our customers in regard to the LLM features (RAG) that we've added to the product if that helps.

procaryote•1mo ago

This makes a lot of sense and is consistent with the lens that LLMs are essentially better autocomplete

hyperadvanced•1mo ago

I’m strictly talking about “Agentic” coding here:

They are not a silver bullet or truly “you don’t need to know how to code anymore” tools. I’ve done a ton of work with Claude code this year. I’ve gone from a “maybe one ticket a week” tier React developer to someone who’s shipped entire new frontend feature sets, while also managing a team. I’ve used LLM to prototype these features rapidly and tear down the barrier to entry on a lot of simple problems that are historically too big to be a single-dev item, and clear out the backlog of “nice to haves” that compete with the real meat and bread of my business. This prototyping and “good enough” development has been massively impactful in my small org, where the hard problems come from complex interactions between distributed systems, monitoring across services, and lots of low-level machine traffic. LLM’s let me solve easy problems and spend my most productive hours working with people to break down the hard problems into easy problems that I can solve later or pass off to someone on my team to help.

I’ve also used LLM to get into other people’s codebases, refactor ancient tech debt, shore up test suites from years ago that are filled with garbage and copy/paste. On testing alone, LLM are super valuable for throwing edge cases at your code and seeing what you assumed vs. what an entropy machine would throw at it.

LLM absolutely are not a 10x improvement in productivity on their own. They 100% cannot solve some problems in a sensible, tractable way, and they frequently do stupid things that waste time and would ruin a poor developer’s attempts at software engineering. However, they absolutely also lower the barrier to entry and dethrone “pure single tech” (ie backend only, frontend only, “I don’t know Kubernetes”, or other limited scope) software engineers who’ve previously benefited from super specialized knowledge guarding their place in the business.

Software as a discipline has shifted so far from “build functional, safe systems that solve problems” to “I make 200k bike shedding JIRA tickets that require an army of product people to come up with and manage” that LLM can be valuable if only for their capabilities to role-compress and give people with a sense of ownership the tools they need to operate like a whole team would 10 years ago.

moezd•1mo ago

That's a solid answer, I like it, thanks!

aflukasz•1mo ago

> However, they absolutely also lower the barrier to entry and dethrone “pure single tech” (ie backend only, frontend only, “I don’t know Kubernetes”, or other limited scope) software engineers who’ve previously benefited from super specialized knowledge guarding their place in the business.

This argument gets repeated frequently, but to me it seems to be missing final, actionable conclusion.

If one "doesn't know Kubernetes", what exactly are they supposed to do now, having LLM at hand, in a professional setting? They still "can't" asses the quality of the output, after all. They can't just ask the model, as they can't know if the answer is not misleading.

Assuming we are not expecting people to operate with implicit delegation of responsibility to the LLM (something that is ultimately not possible anyway - taking blame is a privilege human will keep for a foreseeable future), I guess the argument in the form as above collapses to "it's easier to learn new things now"?

But this does not eliminate (or reduce) a need for specialization of knowledge on the employee side, and there is only so much you can specialize in.

The bottleneck maybe shifted right somewhat (from time/effort of the learning stage to the cognition and the memory limits of an individual), but the output on the other side of the funnel (of learn->understand->operate->take-responsibility-for) didn't necessary widen that much, one could argue.

SkyBelow•1mo ago

>They still "can't" asses the quality of the output, after all. They can't just ask the model, as they can't know if the answer is not misleading.

Wasn't this a problem before AI? If I took a book or online tutorial and followed it, could I be sure it was teaching me the right thing? I would need to make sure I understood it, that it made sense, that it worked when I changed things around, and would need to combine multiple sources. That still needs to be done. You can ask the model, and you'll have the judge the answer, same as if you asked another human. You have to make sure you are in a realm where you are learning, but aren't so far out that you can easily be misled. You do need to test out explanations and seek multiple sources, of which AI is only one.

An AI can hallucinate and just make things up, but the chance it different sessions with different AIs lead to the same hallucinations that consistently build upon each other is unlikely enough to not be worth worrying about.

netdevphoenix•1mo ago

> If one "doesn't know Kubernetes", what exactly are they supposed to do now, having LLM at hand, in a professional setting? They still "can't" asses the quality of the output, after all. They can't just ask the model, as they can't know if the answer is not misleading.

This is the fundamental problem that all these cowboy devs do not even consider. They talk about churning out huge amounts of code as if it was an intrinsically good thing. Reminds me of those awful VB6 desktop apps people kept churning out. Vb6 sure made tons of people nx productive but it also led to loads of legacy systems that no one wanted to touch because they were built by people who didn't know what they were doing. LLMs-for-Code are another tool under the same category.

nasmorn•1mo ago

I don’t think the conclusion is right. Your org might still require enough React knowledge to keep you gainfully employed as a pure React dev but if all you did was changing some forms, this is now something pretty much anyone can do. The value of good FE architecture increased if anything since you will be adding code quicker. Making sure the LLM doesn’t stupidly couple stuff together is quite important for long term success

likium•1mo ago

It really depends on whether coding agents is closer to "compiler" or not. Very few amongst us verify assembly code. If the program runs and does the thing, we just assume it did the right thing.

hyperadvanced•1mo ago

If you don’t know k8s, or any tech really, you can RTFM, you can generate or apply some premade manifests, you can feed the errors into the LLM and ask about it, you can google the error message, you can do a lot of things. Often times, in the “real world” of software engineering, you learn by having zero idea of how to do something to start with and gradually come up with ideas from screwing around with a particular tool or prototyping a solution and seeing how well it works.

I agree that some of the above basically amounts to: it’s easier to learn new things. Which itself might sound ho-hum, but it really is a fundamental responsibility of software engineers to learn new things, understand new and complex problems, and learn how to do it correctly and repeatable. LLMs unquestionably help with this, even with their tendency to hallucinate: usually proof by contradiction (or the failure of an over-confident chaos machine) is even better than just having a thing that spits out perfect solutions without needing the operator to understand it.

However, I will say that there is a very large gulf between learning how to reason about complex systems or code and learning how to use the entropy machine to produce nominally acceptable work. Pure reliance and delegation of responsibility to the AI will torpedo a lot of projects that a good engineer could solve, and no amount of lines of code makes up for a poorly conceived product or a brittle implementation that the LLM later stumbles over. Good engineering principles are more important than ever, and the developer has to force the LLM to conform to those.

There are many things to question about agentic coding: whether it’s truly cost/effort effective, whether it saves time, whether it makes you worse at problem solving by handing you facile half-solutions that wither in the face of the chaos of the real world, etc. But they clearly aren’t a technology which “doesn’t do ANYTHING useful”, as some HN posters claim.

netdevphoenix•1mo ago

> someone who’s shipped entire new frontend feature sets, while also managing a team. I’ve used LLM to prototype these features rapidly and tear down the barrier to entry on a lot of simple problems that are historically too big to be a single-dev item, and clear out the backlog of “nice to haves” that compete with the real meat and bread of my business. This prototyping and “good enough” development has been massively impactful in my small org

Has any senior React dev code review your work? I would be very interested to see what do they have to say about the quality of your code. It's a bit like using LLMs to medically self diagnose yourself and claiming it works because you are healthy.

Ironically enough, it does seem that the only workforce AIs will be shrinking will be devs themselves. I guess in 2025, everyone can finally code

PacificSpecific•1mo ago

Oh wow that's a great analogy. So many posts talking about how AI is a massive benefit for their work but no examples or further information.

srcreigh•1mo ago

This project and its website were both originally working 1 shot prototypes:

The website https://pxehost.com - via codex CLI

The actual project itself (a pxe server written in go that works on macOS) - https://github.com/pxehost/pxehost - ChatGPT put the working v1 of this in 1 message.

There was much tweaking, testing, refactoring (often manually) before releasing it.

Where AI helps is the fact that it’s possible to try 10-20 different such prototypes per day.

The end result is 1) Much more handwritten code gets produced because when I get a working prototype I usually want to go over every detail personally; 2) I can write code across much more diverse technologies; 3) The code is better, because each of its components are the best of many attempts, since attempts are so cheap.

I can give more if you like, but hope that is what you are looking for.

PacificSpecific•1mo ago

I appreciate the effort and that's a nice looking project. That's similar to the gains I've gotten as well with Greenfield projects (I use codex too!). However not as grandiose as these the Canadian girlfriend post category.

joenot443•1mo ago

This looks awesome, well done.

I find it remarkable there are people that look at useful, living projects like that and still manage to dismiss AI coding as a fad or gimmick.

cageface•1mo ago

This kind of take I find genuinely baffling. I can't see how anybody working with current frontier models isn't finding them a massive performance boost. No they can't replace a competent developer yet, but they can easily at least double your productivity.

Careful code review and a good pull request flow are important, just as they were before LLMs.

59nadir•1mo ago

People thought they were doubling their productivity and then real, actual studies showed they were actually slower. These types of claims have to be taken with entire quarries of salt at this point.

cageface•1mo ago

The denial on this topic is genuinely surreal. I've knocked out entire features in a single prompt that took me days in the past.

I guess I should be happy that so many of my colleagues are willing to remove themselves from the competitive job pool with these kinds of attitudes.

econ•1mo ago

Good point! You should generate a website for them with "why ai is not good" articles. Have it explore all possible angles. Make it detective style story with appealing characters.

59nadir•1mo ago

I'm going to go ahead and assume that we don't do the same type of work, we aren't likely in the same pool anyway.

joenot443•1mo ago

What kind of work do you do?

59nadir•1mo ago

I make custom 3D engines and the things that run on them.

cageface•1mo ago

C, Swift, Typescript, audio dsp, robotics etc.

People always want to claim what they’re doing is so complex and esoteric that AI can’t touch it. This is dangerous hubris.

59nadir•1mo ago

No, I wouldn't say it's super complex. I make custom 3D engines. It's just that you and I were probably never in any real competition anyway, because it's not super common to do what I do.

I will add that LLMs are very mediocre, bordering on bad, at any challenging or interesting 3D engine stuff. They're pretty decent at answering questions about surface API stuff (though, inexplicably, they're really shit at OpenGL which is odd because it has way more code out there written in it than any other API) and a bit about the APIs' structure, though.

cageface•1mo ago

I really don't know how effective LLMs are at that but also that puts you in an extremely narrow niche of development, so you should keep that in mind when making much more general claims about how useful they are.

59nadir•1mo ago

My bigger point was that not everyone who is skeptical about supposed productivity gains and their veracity is in competition with you. I think any inference you made beyond that is a mistake on your part.

(I did do web development and distributed systems for quite some time, though, and I suspect while LLMs are probably good at tutorial-level stuff for those areas it falls apart quite fast once you leave the kiddy pool.)

P.S.:

I think it's very ironic that you say that you should be careful to not speak in general terms about things that might depend much more on context, when you clearly somehow were under the belief that all developers must see the same kind of (perceived) productivity gains you have.

sph•1mo ago

You discount the value of being intimately familiar with each line of code, the design decisions and tradeoffs because one wrote the bloody thing.

It is negative value for me to have a mediocre machine do that job for me, that I will still have to maintain, yet I will have learned absolutely nothing from the experience of building it.

ragequittah•1mo ago

This to me seems like saying you can learn nothing from a book unless you yourself have written it. You can read the code the LLM writes the same as you can read the code your colleagues write. Moreover you have to pretty explicitly tell it what to write for it to be very useful. You're still designing what it's doing you just don't have to write every line.

sph•1mo ago

Code review != code design.

Nor reading a book teaches you how to write a book.

ragequittah•1mo ago

"Reading is the creative center of a writer’s life.” — Stephen King, On Writing

You need to design the code in order to tell the LLM how to write it. The LLM can help with this but generally it's better to have a full plan in place to give it beforehand. I've said it before elsewhere but I think this argument will eventually be similar to the people arguing you don't truly know how to code unless you're using assembly language for everything. I mean sure assembly code is better / more efficient in every way but who has the time to bother in a post-compiler world?

mupuff1234•1mo ago

I would also take those studies with a grain of salt at this point, or at least taking into consideration that a model from even a few months ago might have significant enough results from the current frontier models.

And in my personal experience it definitely helps in some tasks, and as someone who doesn't actually enjoy the actual coding part that much, it also adds some joy to the job.

Recently I've also been using it to write design docs, which is another aspect of the job that I somewhat dreaded.

59nadir•1mo ago

I think the bigger part of those studies was actually that they were a clear sign that whatever productivity coefficient people were imagining back then was clearly a figment of their imagination, so it's useful to take that lesson with you forward. If people are saying they're 2 times productive with LLMs, it's still likely the case that a large part of that is hyperbole, whatever model they're working with.

It's the psychology of it that's important, not the tool itself; people are very bad at understanding where they're spending their time and cannot accurately assess the rate at which they work because of it.

terminalshort•1mo ago

Which part of the job do you not hate? Writing design docs and code is pretty much the job.

mupuff1234•1mo ago

I like coming up with the system design and the low level pseudo code, but actually translating it to the specific programming language and remembering the exact syntax or whatnot I find pretty uninspiring.

Same with design docs more or less, translating my thoughts into proper and professional English adds a layer I don't really enjoy (since I'm not exactly great at it), or stuff like formatting, generating a nice looking diagram, etc.

Just today I wrote a pretty decent design doc that took me two hours instead of the usual week+ slog/procrastination, and it was actually fairly enjoyable.

netdevphoenix•1mo ago

> double your productivity

Churning out 2x as much code is not doubling productivity. Can you perform at the same level as a dev who is considered 2x as productive as you? That's the real metric. Comparing quality to quantity of code ratios, bugs caused by your PRs, actual understanding of the code in your PR, ability to think slow, ability to deal with fires, ability to quickly deal with breaking changes accidentally caused by your changes.

Churning out more more per day is not the goal. No point merging code that either doesn't fully work, is not properly tested, other humans (or you) cannot understand, etc.

cageface•1mo ago

Why is that the real metric? If you can turn a 1x dev into a 2x dev that's a huge deal, especially if you can also turn the original 2x dev into a 4x dev.

And far from "churning out code" my work is better with LLMs. Better tested, better documented, and better organized because now I can do refactors that just would have taken too much time before. And more performant too because I can explore more optimization paths than I had time to before.

Refusing to use LLMs now is like refusing to use compilers 20 years ago. It might be justified in some specific cases but it's a bad default stance.

netdevphoenix•1mo ago

> Why is that the real metric?

The answer to "Can you perform at the same level as a dev who is considered 2x as productive as you?" is self-explanatory. If your answer is negative, you are not 2x as productive

phantasmish•1mo ago

Seriously, I’m lucky if 10% of what I do in a week is writing code. I’m doubly lucky if, when I do, it doesn’t involve touching awful corporate horse-shit like low-code products that are allergic to LLM aid, plus multiple git repos, plus having knowledge from a bunch of “cloud” dashboard and SaaS product configs. By the time I prompt all that external crap in I could have just written what I wanted to write.

Writing code is the easy and fast part already.

ncruces•1mo ago

I follow at least one GitHub repo (a well respected one that's made the HN front page), and where everything is now Claude coded. Things do move fast, but I'm seriously under impressed with the quality. I've raised a few concerns, some were taken in, others seem to have been shut down with an explanation Claude produced that IMO makes no sense, but which is taken at face value.

This matches my personal experience. I was asked to help with a large Swift iOS app without knowing Swift. Had access to a frontier agent. I was able to consistently knock a couple of tickets per week for about a month until the fire was out and the actual team could take over. Code review by the owners means the result isn't terrible, but it's not great either. I leave the experience none the wiser: gained very little knowledge of Swift, iOS development or the project. Management was happy with the productivity boost.

I think it's fleeting and dread a time where most code is produced that way, with the humans accumulating very little institutional knowledge and not knowing enough to properly review things.

piker•1mo ago

Any reason not to link to the repo in question?

ncruces•1mo ago

I'm just one data point. Me being unimpressed should not be used to judge their entire work. I feel like I have a pretty decent understanding of a few small corners of what they're doing, and find it a bad omen that they've brushed aside some of my concerns. But I'm definitely not knowledgeable enough about the rest of it all.

What concerns me is, generally, if the experts (and I do consider them experts) can use frontier AI to look very productive, but upon close inspection of something you (in this case I) happen to be knowledgeable about, it's not that great (built on shaky foundations), what about all the vibe coded stuff built by non-experts?

SkyBelow•1mo ago

I had some .csproj files that only worked with msbuild/vsbuild that I wanted to make compatible with dotnet. Copilot does a pretty good job of updating these and identifying the ones more likely to break (say web projects compared to plain dlls). It isn't a simple fire and forget, but it did make it possible without me needing to do as much research into what was changing.

Is that a net benefit? Without AI, if I really wanted to do that conversion, I would have had to become much more familiar with the inner workings of csproj files. That is a benefit I've lost, but it would've also taken longer to do so, so much time I might not have decided to do the conversion. My job doesn't really have a need for someone that deeply specialized in csproj, and it isn't a particular interest of mine, so letting AI handle it while being able to answer a few questions to sate my curiosity seemed a great compromise.

A second example, it works great as a better option to a rubber duck. I noticed some messy programming where, basically, OOP had been abandoned in favor of one massive class doing far too much work. I needed to break it down, and talking with AI about it helped come up with some design patterns that worked well. AI wasn't good enough to do the refactoring in one go, but it helped talk through the pros and cons of a few design pattern and was able to create test examples so I could get a feel for what it would look like when done. Also, when I finished, I had AI review it and it caught a few typos that weren't compile errors before I even got to the point of testing it.

None of these were things AI could do on their own, and definitely aren't areas I would have just blindly trusted some vibe coded output, but overall it was productivity increase well worth the $20 or so cost.

(Now, one may argue that is the subsidized cost, and the unsubsidized cost would not have been worthwhile. To that, I can only say I'm not versed enough on the costs to be sure, but the argument does seem like a possibility.)

scott_w•1mo ago

I was at a podiatrist yesterday who explained that what he's trying to do is to "train" an LLM agent on the articles and research papers he's published to create a chatbot that can provide answers to the most common questions more quickly than his reception team can.

He's also using it to speed up writing his reports to send to patients.

Longer term, he was also quite optimistic on its ability to cut out roles like radiologists, instead having a software program interpret the images and write a report to send to a consultant. Since the consultant already checks the report against any images, the AI being more sensitive to potential issues is a positive thing: giving him the power to discard erroneous results rather than potentially miss something more malign.

engeljohnb•1mo ago

> Longer term, he was also quite optimistic on its ability to cut out roles like radiologists, instead having a software program interpret the images and write a report to send to a consultant.

As a medical imaging tech, I think this is a terrible idea. At least for the test I perform, a lot of redundancy and double-checking is necessary because results can easily be misleading without a diligent tech or critical-thinking on the part of the reading physician. For instance, imaging at slightly the wrong angle can make a normal image look like pathology, or vice versa.

Maybe other tests are simpler than mine, but I doubt it. If you've ever asked an AI a question about your field of expertise and been amazed at the nonsense it spouts, why would you trust it to read your medical tests?

> Since the consultant already checks the report against any images, the AI being more sensitive to potential issues is a positive thing: giving him the power to discard erroneous results rather than potentially miss something more malign.

Unless they had the exact same schooling as the radiologist, I wouldn't trust the consultant to interpret my test, even if paired with an AI. There's a reason this is a whole specialized field -- because it's not as simple as interpreting an EKG.

wcfrobert•1mo ago

I don't see how AI can bring about 10%+ annual economic growth, let alone infinite abundance, without somehow crossing the bit-to-atom interface. Without a breakthrough in general-purpose robotics - which feels decades away - agents will just be confined to optimizing B2B SaaS. Human utility is rooted in the physical environment. I find digital abundance incredibly uninspiring.

svara•1mo ago

I'm mostly a fan of AI coding tools, but I think you're basically right about this.

I think we'll see more specialized models for narrow tasks (think AlphaFold for other challenges in drug discovery, for example) as well, but those will feel like individual, costly, high impact discoveries rather than just generic "AI".

Our world is human-shaped and ultimately people who talk of "AGI" secretly mean an artificial human.

I believe that "intelligence", the way the word is actually used by people, really just means "skillful information processing in pursuit of individual human desires".

As such, it will never be "solved" in any other way than to build an artificial human.

Applejinx•1mo ago

No, when you bring in the genetic algorithm (something LLM AI can be adjacent to by the scale of information it deals in) you can go beyond human intelligence. I work with GA coding tools pretty regularly. Instead of prompting it becomes all about devising ingenious fitness functions, while not having to care if they're contradictory.

If superhuman intelligence is solved it'll be in the form of building a more healthy society (or, if you like, a society that can outcompete other societies). We've already seen this sort of thing by accident and we're currently seeing extensive efforts to attack and undermine societies through exploiting human intelligence.

To a genetic algorithm techie that is actually one way to spur the algorithm to making better societies, not worse ones: challenge it harder. I guess we'll see if that translates to life out here in the wild, because the challenge is real.

ozmodiar•1mo ago

The troubling thing here is: what is a "better" society? As you said, it's just the one that outcompetes the other societies on the globe. We'd like to believe such a thing is an egalitarian "healthy" liberal society, but it's just as likely to be some form of enslaved/boot stomping on face society. Some think people won't accept this, but given human history I'm pretty sure they will. I think these sorts of societies are more of a local minima, but they only need to grant enough of a short term boost to unseat the other major powers. Once competition is out of the way they'll probably survive as a bloated mess for quite some time. The price of entry is so high they won't have to worry about being unseated by competition unless they really screw the pooch. I think this is the troubling conclusion a lot of people, including those in power, are reaching.

R_D_Olivaw•1mo ago

I tend to agree.

I'm certain neanderthals were just calmer, more empathetic. And then we came along and abused that until they were all gone.

We're still animals on this planet. We just sing about our conquests afterwards.

svara•1mo ago

It's worth thinking about, but why hasn't this already happened? Or maybe it already has, and if so, what about AI specifically is it that will make it suddenly much worse?

Applejinx•4w ago

We've had plenty of examples of all those things, over and over, throughout history. Nothing's really new. Societies that get into faceboot territory run afoul of what's already known (there's apparently a CIA handbook to this effect that's being largely ignored in modern America): assert hard rather than soft power and you generate determined and desperate resistance more than you undermine it. That's being demonstrated in countless places right now.

I'm arguing that the egalatarian 'lift my lamp beside the golden door' society is a cheat code for producing the variety and ferment that makes everybody frustrated and unhappy but producing with wild abandon. As a society this tactic dominates the hell out of would-be ethnostates and dictatorships, which seems to also be a natural tendency of humans. They are interested in not being challenged, in those like them not being challenged. Comfortable for those fortunate individuals, hopelessly suboptimal for the society they're in.

The rallying cry of 'NO New York Cities! Only sundown towns where if you don't look right you are killed and nobody ever knows about it!' might please some people (who have never been anywhere near those evil cities) but it just goes to show that many people have unhealthy wishes that are bad for them and the societies they're in.

svara•1mo ago

> If superhuman intelligence is solved it'll be in the form of building a more healthy society (or, if you like, a society that can outcompete other societies).

Maybe so, but the point I'm trying to make is this needs to look nothing like sci-fi ASI fantasies, or rather, it won't look and feel like that before we get the humanoid AI robots that the GP mentioned.

You can have humans or human institutions using more or less specialized tools that together enable the system to act much more intelligently.

There doesn't need to be a single system that individually behaves like a god - that's a misconception that comes from believing that intelligence is something like a computational soul, where if you just have more of it you'll eventually end up with a demigod.

tim333•1mo ago

Robots are coming along. While they may not be human level for a while they are close to being useful for general production.

thw09j9m•1mo ago

I'm a staff level SWE at a company that you've all heard of (not a flex, just providing context).

If my manager said to me tomorrow: "I have to either get rid of one of your coworkers or your use of AI tools, which is it?"

I would, without any hesitation, ask that he fire one of my coworkers. Gemini / Claude is way more useful to me than any particular coworker.

And now I'm preparing for my post-software career because that coworker is going to be me in a few years.

Obviously I hope that I'm wrong, but I don't think I am.

episteme•1mo ago

Is that a useful thought experiment? Claude benefits you as an individual more than a coworker, but I find I hard to believe your use of Claude is more of a value add to the business than an additional coworker. Especially since that coworker will also have access to Claude.

In the past we also just raised the floor on productivity, do you think this will be different?

shrubble•1mo ago

There’s often the question of communication overhead between people; Claude would remove that.

stanleykm•1mo ago

No that’s not true at all. Humans can deal with ambiguity and operate independently. Claude can’t do that. You’re trading one “problem” for an entirely different one in this hypothetical.

signa11•1mo ago

> There’s often the question of communication overhead between people; Claude would remove that.

... and replace that with communication overhead with claude ?

shrubble•1mo ago

If you’re already communicating with Claude, it’s not additional overhead.

mjevans•1mo ago

Isn't that what polishing 'the prompt' does? Refine the communication like an editor does for a publication? Only in this case it's instructions for how to get a transformer to mine an existing set of code to produce some sort of vaguely useful output.

The human factor adds knowledge of the why that refines the results. Not just any algorithm or a standard pattern that fits, but the correct solution for the correct question.

fendy3002•1mo ago

people talking as if communication overhead is bad. That overhead makes someone else able to substitute for you (or other one) when needs happen, and sometimes can discover concerns earlier.

wcfrobert•1mo ago

But isn't living in a stable society, where everyone can find employment, achieve some form of financial security, and not be ravaged by endless rounds of layoffs, more desirable than having net productive co-workers?

measurablefunc•1mo ago

You're forgetting that corporations have only one responsibility & it is to make profits for their shareholders.

LastTrain•1mo ago

Whereas a government's responsibility is to ensure peace and prosperity for as many of its citizens as possible. These things will be at odds when increased profits for companies no longer coincides with increased employment.

forgetfreeman•1mo ago

You may be overlooking the fact that the US is an oligarchy.

nephihaha•1mo ago

Can you name a present-day country which isn't?

No, not Sweden where 40% of the population have been employed in some way by the Wallenberg family and its corporations in recent times. The other Nordic countries are not as egalitarian as they are presented either.

wing-_-nuts•1mo ago

>a government's responsibility is to ensure peace and prosperity for as many of its citizens as possible.

I've never seen the US government behave as if this was a priority. Perhaps things are different in a nordic country?

mxkopy•1mo ago

It has been a priority, but only for a certain group of citizens (which only briefly became unfashionable to legislate for explicitly)

mannanj•1mo ago

Yes. Perhaps the OP was speaking from a dream or a theory standpoint. We know our government in the US has lost its original intent.

LastTrain•1mo ago

Yes, believe it or not some of still believe in this and vote accordingly. Aspirational, as it has always been, with the understanding that we will always fall short.

nephihaha•1mo ago

A government's main responsibility is to protect and fund itself. All the rest is secondary in real terms. I know it shouldn't be that way.

wincy•1mo ago

I’ll make sure to pour one out in memory of all the lamplighters, the stable hands, night soil collectors, and coopers that no longer can find employment these days. These arguments were had 150 years ago with the advent of the railroad, with electricity, with factories and textiles, even if you don’t have net productive coworkers, if there’s a more productive way to do things, you’ll go out of business and be supplanted. Short of absolutely tyrannical top down control, which would make everyone as a whole objectively poorer, how would this ever be prevented?

nephihaha•1mo ago

The difference is that back then we were talking a few jobs here and there. Now we are talking about the majority of work being automated, from accountancy to zoo keeping, and very little in the way of new jobs coming in to replace them.

By the way stable hands and night soil collectors are still around. Just a bit harder to find. We used to have a septic tank that had to be emptied by workmen every so often. Pretty much the same.

Juliate•1mo ago

I am truly, both as a being and having a manager, at loss as to how a manager would ask that, and would get such an answer… what is the rationale and the expectation?

Clent•1mo ago

Seems like a good test to see if the person who is ask should be fired. It's a test that clearly proves they are not a team player.

ares623•1mo ago

Sweet then your fired coworker goes “i will do the same work for 80% of thw09j9m’s salary”.

hillcrestenigma•1mo ago

I get the point you are making, but the hypothetical question from your manager doesn't make sense to me.

It's obviously true that any of your particular coworker wouldn't be useful to you relative to an AI agent, since their goal is to perform their own obligations to the rest of the company, whereas the singular goal of the AI tool is to help the user.

Until these AI tools can completely replace a developer on its own, the decision to continue employing human developers or paying for AI tools will not be mutually exclusive.

biophysboy•1mo ago

Why would a company pocket the savings of less labor when they could reinvest the productivity gains of AI in more labor, shifting employees to higher-level engineering tasks?

nehal3m•1mo ago

Because shareholder value is more important than productivity to leadership. Thank Jack Welch.

RcouF1uZ4gsC•1mo ago

You would probably have the same answer if your boss said, I have to get rid of one of your co-workers or your use of editing tools - ie all editors. You either get rid of your co-worker or go back to using punch cards.

You would probably get rid of your co-worker and keep Vim/Emacs/VsCode/Zed/JetBrains or whatever editor you use.

All your example tells us is that AI tools are valuable tools.

paul7986•1mo ago

Im there with you at the govt contracting company i work for we lost a contract we had for ten years. Our team was 10 to 15 employees and we lost the contract to a company who are now doing the work with 5 employees and AI.

My company said we now are going to being bidding with smaller teams and promoting our use of AI.

One example of them promoting the company's use of AI is creating a prototype using chatGPT and AntiGravity. He took a demo video off of Youtube of a govt agency app, fed the video to chatGPT, GPT spit out all the requirements for the ten page application and then he fed those requirements to AntiGravity and boom it repilcated/created the working app/prototype in 15 minutes. Previously that would take a team of 3 to 5 a week or few to complete such a prototype.

tehnub•1mo ago

What's most useful to you is not necessarily most useful to the business. The bar for critical thinking to get staff at this company I've surely heard of must not be very high.

lunar_mycroft•1mo ago

For your answer to be correct for your employer, the added productivity from your use of LLMs must be at least as much as the productivity from whichever coworker you're having fired. No study I've seen claims much above a 20% increase in productivity, so either a) your productivity without LLMs was ~5x that of your coworkers, or b) you're making a mistake in your analysis (likely some combination of thinking about it from your perspective instead of your employers and overestimating how helpful LLMs are to you).

saulpw•1mo ago

It makes him (presumed) 20% more effective than his coworker makes him. Overall effectiveness of the team is not being considered, but that's why his manager isn't asking him :)

lunar_mycroft•1mo ago

That fits, except later on they say

> And now I'm preparing for my post-software career because that coworker is going to be me in a few years.

Which implies they anticipate their manager (or someone higher up in the company) to agree with them, presumably when considering overall effectiveness of the team.

thegrim000•1mo ago

Interesting that this guy claims to be a "staff level SWE at a major company", yet one year ago he was on HN posting about how horrible of a time he was having getting a SWE job, how he's failed multiple interviews, including at FAANGs, was being rejected for even no-name small startups, had failed multiple interviews because of inability on algorithm questions ... and yet within the last year he was supposedly successfully hired on at a "major company" for a staff-level senior coding position.

LarsDu88•1mo ago

Maybe the company is Red Lobster?

caminante•1mo ago

That would be a flex.

ugh123•1mo ago

Pay me in biscuits

caminante•1mo ago

Don't forget he's also

>been considered top 10% of attractiveness in one country

tayo42•1mo ago

Funny call out. I always see people brag about working at a fortune 500 company, also meaningless with companies like Lululemon on their lol

__loam•1mo ago

Damn you got his receipts

godelski•1mo ago

What I think is the strangest part of it is that they don't respond to a single comment. They've only done it twice in their entire comment history (3 pages). Once 2 years ago where they talk about banging women and the other being a few months earlier talking about HFT (which the comment previous to that says they work at a HFT firm)

But I think I found the answer...

  That's a mistake. A lot of people lie on their resumes.

  Source: I've lied on every resume I've ever sent out.

  - https://news.ycombinator.com/item?id=33903978

Something tells me they aren't the most honest person. That something is thw09j9m...

Seriously... why lie about these types of things on an anonymous forum? There's literally nothing to gain

snicky•1mo ago

Back in my days this was called "trolling".

godelski•1mo ago

Trolling is more than lying. It requires having bait. And sometimes a boat.

thw09j9m•1mo ago

Post your blind username. I'll message you morning time EST with receipts.

viraptor•1mo ago

It's not necessarily inconsistent though. People get rejected for so many different reasons and the job market is tough recently. And there's a post about getting lucky with the offer.

PunchyHamster•1mo ago

seems like replacing him with AI would be blessing for his team

mxkopy•1mo ago

In some companies, “one of your coworkers” have the skills to create & improve upon AI models themselves. Honestly at staff level I’d expect you to be able to do a literature review and start proposing architecture improvements within a year.

Charitably, it just sounds like you aren’t in tech.

fendy3002•1mo ago

here's the thing, my manager won't need to do that. windsurf swe-1 is good enough for my use case and swe-1.5 is even better. Combined with free quotas of mixed openai, gemini and claude I don't really need to pay anything.

In fact I don't want to pay too much, to prevent the incoming enshittification

giuscri•1mo ago

"I have to either get rid of one of your coworkers or your laptop, which is it?"

ineedasername•1mo ago

It pretty much did join the work force. Listen to the fed chair, listen to related analysis, the unexpected overperformance of GDP isn’t directly attributed AI but it is very much in the “how did that happen?” conversation. And there’s plenty of softer, more anecdotal evidence in addition to that to respond to the headline with “It did.” The fact that it has been gradual and subtle as the very first agent tools reach production readiness, gain awareness in the public, start being used…? That really doesn’t seem at all unexpected as the path than “joining” would follow.

asa400•1mo ago

> the unexpected overperformance of GDP isn’t directly attributed AI but it is very much in the “how did that happen?” conversation.

We spent an amount of money on data centers that was so large that it managed to overcome a self-imposed kick in the nuts from tariffs and then some. The amount of money involved rivals the creation of the railroad system in the United States. Of course GDP overperformed in that scenario.

Where did AI tool use show up in the productivity numbers?

ineedasername•1mo ago

Productivity increase is what showed in the numbers. AI is the partial attribution chair Powell gave for the reason. Quotes from December meeting:

"Reporter: ...do you believe that we’re experiencing a positive productivity shock, whether from AI or policy factors or whatever?"

"Powell: So, yeah, I mean, I never thought I would see a time when we had, you know, five, six years of 2 percent productivity growth. This is higher. You know, this is definitely higher. And it was—before, it could be attributed to AI. I think you—I also think if you look at what AI can do and if you use it in your personal life, as I imagine many of us have, you can see the prospects for productivity. I think it makes people who use it more productive. It may make other people have to find other jobs, though. So it could have productivity implications"

And:

"Reporter: If I could just follow up on the SEP. You have a whole lot of—big increase in the growth numbers, but not a big decline in the unemployment numbers. And is that an AI factor in there?"

"Powell: So it is—the implication is obviously higher productivity. And some of that may be AI."

He also hedges in places, hesitant to say "Yes that's the reason". I'm not sure anything in the data sets they use could directly capture it as the reason so that's too high a bar for evidence- to require some line item in the reports with a direct attribution. He could be wrong, it might not be AI, but I don't have any reason to thing his sense of things is wrong either.

wing-_-nuts•1mo ago

>Listen to the fed chair, listen to related analysis, the unexpected overperformance of GDP isn’t directly attributed AI but it is very much in the “how did that happen?” conversation

I would very much like to read this if you have a link

asadotzler•1mo ago

They're just bluffing. It's bullshitting they get away with everywhere else so they think it's acceptable here.

ineedasername•1mo ago

I won't take offense since I doubt you meant me in particular. So, while it's tedious to have to go back over web history from weeks ago to cite my sources like its high school every time I reference a specific, concrete source that people here can followup for themselves, I did it because it's worth doing, given that your sentiment is also not completely wrong: lots and lots of BS being thrown around all over the place on AI, so here's the direct source, chair Powell's report from December, and he's actually stating some things stronger than I remembered:

AI is a large theme throughout but page 7 and 24 have specifically relevant remarks about the better than expected number, productivity's increases in relation to both this and AI, data centers, etc.

https://www.federalreserve.gov/mediacenter/files/FOMCprescon...

ineedasername•1mo ago

Sure, here's the transcript, AI is a large theme throughout but page 7 and 24 have specifically relevant remarks about the better than expected number, productivity's increases in relation to both this and AI, data centers, etc., but really the whole thing is peppered with details related to AI so it's worth reading in depth if you want the Fed's synthesized pov, and the press at these meetings ask incisive questions.

https://www.federalreserve.gov/mediacenter/files/FOMCprescon...

wing-_-nuts•1mo ago

Thanks for the response!

janalsncm•1mo ago

My understanding was that the growth came mainly from things like building data centers and buying chips. Boring old fashioned stuff.

mr_toad•1mo ago

This is why I’m not worried about an imminent AI bubble burst. The data centers will be built, the GPUs have already been ordered, etc. What I am worried about is what happens when in 2-3 years time the AI companies need to find paying customers to use those data centers. Then it might be time to rebalance into gold or something.

LunaSea•1mo ago

The energy isn't available and that is going to take much longer to build.

mr_toad•1mo ago

Grid energy is lacking for now, which is why the new builds have their own power supplies. It’s a good time to be in the solar/battery/turbine industries.

ineedasername•1mo ago

Those were known factors. I'm referencing Powell, in the late December meeting. There wasn't much substantive change in terms of knowledge about how much building was going on since the prior September GDP release that would say "yes, after September we learned X, so that's why forecasts were lower than actual". In his address, and in his followup with questions from the press, Powell specifically talks about data centers and AI-driven productivity separately.

These aren't context free data points you have to interpret, this is synthesized analysis being delivered in a report by the Fed Chair, giving his & the reserve's own interpretation. He could be wrong, but it is clear their belief is that AI is a likely factor, while also there is not certainty on that interpretation. They've up'ed their estimates for April though, and this advanced estimate from December is about to be followed up with the revised, final numbers on Jan 23'rd, so we'll find out a little more then, and a lot more in April.

PunchyHamster•1mo ago

> the unexpected overperformance of GDP isn’t directly attributed AI but it is very much in the “how did that happen?” conversation.

it's builing datacenters and buying servers and GPUs. It isn't directly attributed to AI because it isn't caused by use of AI, but blowing the AI bubble

biophysboy•1mo ago

> So, this is how I’m thinking about AI in 2026. Enough of the predictions. I’m done reacting to hypotheticals propped up by vibes.

A lot of the predictions come from interviews and presentations with top tech executives. Their job is to increase the perceived value of their product, not to offer an objective assessment.

I've gotten a lot of value out of reading the views of experienced engineers; overall they like the tech, but they do not think it is a sentient alien that will delete our jobs.

I have also gotten a lot of value out of Cembalest's recent "eyes on the market", which looks at the economic side of this AI push.

MichaelRo•1mo ago

Snake oil salesmen will predict that "this will be the year when the snake oil you buy from us will cure all ailments". Nothing new under the sun.

tempodox•1mo ago

And publications can expect more readers for breathless hype articles than for sober analyses.

Verdex•1mo ago

Interestingly enough, my understanding is that some snakes in Asia can be used to produce oil that helps joint problems.

American snakes weren't useful for this.

So something that was sort of useful in a niche application was co-opted by people who didn't know how to make it work and then ultra hyped.

The parallels are spot on.

irishcoffee•1mo ago

That’s actually how it started. Some water snake native to china, when the oil is extracted it contains omega-3 fatty acid, which does help arthritis.

It was turned into a scam in the west.

Seems like a lot of work to get omega-3 in a consumable form.

tdeck•1mo ago

The famous "snake oil" salesman sold jars filled with beef fat mixed with capsaicin and turpentine.

https://en.wikipedia.org/wiki/Clark_Stanley

bonesss•1mo ago

In the gap between reality and executive speak around LLMs, I’m wondering about motives.

Getting executives, junior devs, HR, and middle management hooked onto an advice and document template machine owned and operated by your corporation would seemingly have a huge upside for an entity like Microsoft. Their infatuation might be more about how profitable such arrangements would be versus any meaningful productivity improvement for developers.

Like, in ways that BizTalk, Dynamics, and SharePoint attempt to capture business processes onto a pay-for-play MS stack, and all benefit when being pitched to non-technical customers, Copilot provides an ever evolving sycophantic exec-centred channel to push and entangle it all as MS sees fit.

Having all parts of your business divulge in real-time through saveable chats every part of your business, strategy, tooling, and process to MS servers and Azure services is itself a pretty stunning arrangement. Imagining those same services directly selling busy customers entangling integrations, or trendy azure services, through freewheeling MCP-like glue, all inline in that customers own business processes? It sounds like tech exec nirvana, automated self-directed sales.

They don’t need job deleting sentience to make the share price go up and rationalize this LLM push. They are far more aware of the limitations than we…

Nextgrid•1mo ago

> They are far more aware of the limitations than we…

Every individual only cares about their paycheck and promotion. They will happily ignore their knowledge of the limitations if it means squeezing out an extra resume bulletpoint, paycheck or promotion, even if it causes the company to go bankrupt down the line (by that time they would've jumped ship somewhere else anyway).

biophysboy•1mo ago

Weren't we trusting MS with our business info before AI with cloud platforms like Azure? Not saying "eh don't worry about it" - I'm just asking why I should be more worried than before. These businesses need subscribers which means they need our trust..

tgv•1mo ago

> Their job is to increase the perceived value of their product

I don't agree. Your job cannot be "lie to the customer." They may see this as the easy way to get more money and justify their comfy position, but it is not their job.

noodletheworld•1mo ago

No one said anything about lying.

Their job is to make the company successful. Part of success is raising funds and boosting share price.

That is their job, and how do you imagine they can do that?

Sound kind of glum and down about the company prospects?

Do not make be laugh.

Even if the company is literally haemorrhaging cash and has < week of runway left, senior executives are often so far up their own basses and surrounded by yes men, that they often honestly believe they can turn things around.

Its often not about will-fully lying.

Its just delusional belief and faith in something that is very unlikely.

(Last minute turn arounds and DSA do exist, but like lottery players, seeing the very few people who do win and mimicking them does not make you into a winner; most of the time)

albatross79•1mo ago

Elon proved that "corporate puffery" is more valuable than any product you make or could make. The job of CEOs now is to generate science fiction fantasies and sell those to the public.

biophysboy•1mo ago

I overstated. To clarify, I think a major purpose of those public interviews/conferences is to increase the perceived value of the product. Entrepreneurship, essentially. Taking chances, speculating, thinking about possibilities, etc.

scott_w•1mo ago

> I've gotten a lot of value out of reading the views of experienced engineers; overall they like the tech, but they do not think it is a sentient alien that will delete our jobs.

I normally see things the same way you do, however I did have a conversation with a podiatrist yesterday that gave me food for thought. His belief is that certain medical roles will disappear as they'll become redundant. In his case, he mentioned radiology and he presented his case as thus:

A consultant gets a report + X-Ray from the radiologist. They read the report and confirm what they're seeing against the images. They won't take the report blindly. What changes is that machines have been learning to interpret the images and are able to use an LLM to generate the report. These reports tend not to miss things but will over-report issues. As a consultant will verify the report for themselves before operating, they no longer need the radiologist. If the machine reports a non-existent tumour, they'll see there's no tumour.

Verdex•1mo ago

I've seen this sort of thing a few times. "Yes, I'm sure AI can do that other job that's not mine over there.". Now maybe foot doctors work closer to radiologists than I'm aware of. But the radiologists that I've talked to aren't impressed with the work AI had managed to do in their field. Apparently there are one or two incredibly easy tasks that they can sort of do, but it comprises a very small amount of the job of an actual radiologist.

scott_w•1mo ago

> But the radiologists that I've talked to aren't impressed with the work AI had managed to do in their field.

Just so I understand correctly: is it over-reporting problems that aren't there or is it missing blindingly obvious problems? The latter is obviously a problem and, I agree, would completely invalidate it as a useful tool. The former sounded, the way it was explained to me, more like a matter of degrees.

Verdex•1mo ago

I'm afraid I don't have the details. I was reading about certain lung issues the AI was doing a good job on and thought, "oh well that's it for radiology." But the radiologist chimed in with, "yeah that's the easiest thing we do and the rates are still not acceptable, meanwhile we keep trying to get it to do anything harder and the success rates are completely unworkable."

asadotzler•1mo ago

AI luminary and computer scientist Geoffrey Hinton predicted in 2016 that AI would be able to do all of the things radiologists can do within five years. We're still not even close. He was full of shit and now almost 10 years later he's changed his prediction, though still pretending he was right, by moving the goal posts. His new prediction is that radiologists will use AI to be more efficient and accurate, half suggesting he meant that all along. He didn't. He was simply bullshitting, bluffing, making an educated wish.

This is the nonsense we're living through, predictions, guesses, promises that cannot possibly be fulfilled and which will inevitably change to something far less ambitious and with much longer timelines and everyone will shrug it off as if we weren't being mislead by a bunch of fraudsters.

Ianjit•1mo ago

"History doesn't repeat itself, but it often rhymes", except in the world of computer science where history does repeat.

biophysboy•1mo ago

I doubt this simply because of the inertia of medicine. The industry still does not have a standardized method for handling automated claims like banking. It gets worse for services that require prior authorization; they settle this over the phone! These might sound like irrelevant ranting, but my point is that they haven't even addressed the low-hanging fruits, let alone complex ailments like cancer.

KittenInABox•1mo ago

IMO prior authorization needing to be done on the phone is a feature, not a bug. It intentionally wastes a doctor's time so they are less incentivized to advocate for their patients and this frustration saves the insurance companies money.

biophysboy•1mo ago

Heard. I do wonder why hospitals haven't automated their side though. Regardless, the recent prior auth situation is a trainwreck. If I were dictator, insurance companies would be non-profit and required to have a higher loss ratio.

2 quibbles: 1) a more ethical system would still need triage-style rationing given a finite budget, 2) medical providers are also culpable given the eye-watering prices for even trivial services.

KittenInABox•1mo ago

I would love to know how much rationing is actually necessary. I have literally 0 evidence to support this but my intuition says that this is like food stamps in that there is way less frivolous use than an overly negative media ecosystem would lead people to believe.

Ianjit•1mo ago

Radiology has proven to be one of the most defensive jobs in medicine, radiologists beat AI once already!

https://www.worksinprogress.news/p/why-ai-isnt-replacing-rad...

dinobones•1mo ago

A brief history of programming:

1. Punch cards -> Assembly languages

2. Assembly languages -> Compiled languages

3. Compiled languages -> Interpreted languages

4. Interpreted languages -> Agentic LLM prompting

I've tried the latest and greatest agentic CLI and toolings with the public SOTA models.

I think this is a productivity jump equivalent to maybe punch cards -> compiled languages, and that's it. Something like a 40% increase, but nowhere close to exponential.

defrost•1mo ago

  1. Punch cards -> Assembly languages

Err, in my direct experience it was Punch Cards -> FORTAN

Here, for example, is the Punch Card for a single FORTRAN statement: https://en.wikipedia.org/wiki/File:FortranCardPROJ039.agr.jp...

PunchCards were an input technology, they were in no way limited to either assembley languages or to FORTRAN.

You might be thinking of programming in assembly via switch flipping or plug jacking.

asadotzler•1mo ago

They're simply bluffing, and you called them on it. Thanks for your service. Too many people think they can just bullshit and bluff their way along and need to be taken down a peg, or for repeat offenders, shunned and ostracized.

ThrowawayR2•1mo ago

Except punch cards are a data storage format, not a language. Some of the the earliest computers were programmed by plugboard ( https://en.wikipedia.org/wiki/Plugboard#Early_computers ) so that might be thought of as a precursor to machine language / assembly language.

And compiled and interpreted languages evolved alongside each other in the 1950s-1970s.

PunchyHamster•1mo ago

That's jump if you are a junior. It falls down hard for the seniors doing more complex stuff.

I'm also reminding that we tried whole "make it look like human language" with COBOL and it turned out that language wasn't a bottleneck, the ability of people to specify exactly what they want was the bottleneck. Once you have exact spec, even writing code on your own isn't all that hard but extracting that from stakeolders have always been the harder part of the programming.

nprateem•1mo ago

And we enter the Trough of Discontent...

signa11•1mo ago

the title is bit misleading, and all the article sez is to

```

    To find out more about why 2025 failed to become the Year of the AI Agent, I recommend reading  my full New Yorker piece .

```

so essentially, just go and read the new-yorker piece here: https://archive.ph/VQ1fT

themafia•1mo ago

Headline only response: Same reason my toddler children didn't.

moonshotideas•1mo ago

Have y’all tried Claude code using opus 4.5 - I believe it has fully joined the workforce, had my grandma build and deploy her own blog with a built in cms and an admin portal, post editor, integrate uploads with GitHub, add ci/cd and took about 2 hours mostly because she types slow

phantasmish•1mo ago

Terrible productivity loss vs. signing up for a hosted Wordpress site.

thatgerhard•1mo ago

I use an agent in all my day to day coding now. It's a lot small tasks to speed me up, but it's definitely in use.

nwmcsween•1mo ago

If anyone was around for the dot-com bubble any company internet related or with a web like name was irrationally funded, P/E didn't matter, burn didn't matter, product didn't matter.

AI has all the same markers of a the dot com bubble and eventually venture capital will dry up and many AI companies will go bust with a few remaining that make something useful with an unmet niche.

Marciplan•1mo ago

regardless if you were around or not, this take has been repeated a quintillion times by now

passwordoops•1mo ago

Because it's the correct take

hobofan•1mo ago

Doesn't mean it's a valuable take.

It doesn't even significantly matter whether it's a bubble or not, but whether its a "bad" bubble.

I think Steve Eisman (of housing bubble fame) recently made the argument that it's probably a bubble, but it doesn't seem to have the hallmarks it would have to turn it into a crisis. e.g. no broad immediate exposure for the general populace (as in housing/crypto bubbles).

red-iron-pine•1mo ago

> It doesn't even significantly matter whether it's a bubble or not, but whether its a "bad" bubble.

there are billions and billions of dollars invested in there -- it matter significantly to a lot of people.

the bubble popping may trash the US and possibly global economy. "it doesn't matter" has to be one of the worst AI takes I've seen...

ido•1mo ago

"The Web" and "E-Commerce" ended up being quite gigantic "unmet niches" though!

julkali•1mo ago

Yes, but only 15-20 years later in the extent expected for 2000. The technology wasn't there yet, just like with LLMs.

cheschire•1mo ago

Right, one could say it was actually the smartphone that made the Internet as successful as it was being pitched to be in the late 90’s.

irishcoffee•1mo ago

From where I sit, smartphones completely ruined the internet.

If you're defining "successful" in the sense of people-making-a-lot-of-money I completely agree.

If you're talking about the internet in 2005 vs 2025, smartphones completely ruined the internet. At one point I had half my high school using HTML in their AIM profiles because they thought mine was cool.

Now kids can hardly even type on an actual, physical keyboard.

cheschire•1mo ago

We're in complete agreement.

rollcat•1mo ago

The technology was there all along, and it's ironic that we're discussing it on this website of all. Look up ViaWeb.

kjkjadksj•1mo ago

The tech was there. The websites worked fine. The issue was number of internet users was too low. Look at number of internet users today vs 1999…

fc417fc802•1mo ago

> The issue was number of internet users was too low.

I'd argue that's because the tech wasn't there yet. Not the network or other web tech itself but rather the average-person-using-a-computer tech.

mazurnification•1mo ago

If smth is bubble it does not mean that said smth has no value. It just means that there is over investment and thus inefficient investment. Like housing bubble - nobody argues that houses are not needed and are not big part of economy.

WackyFighter•1mo ago

The startups/businesses that provided value survived the bubble bursting.

rsynnott•1mo ago

Yes, but generally not for the companies who were closely involved in the dot-com bubble; Amazon is probably literally the only exception of any significance.

And, well, not all bubbles go as well as that. See the cryptocurrency one, say. Or any of a number of _previous_ AI bubbles.

throwaway777x•1mo ago

I had a job at a small investment firm at the time in college and to me it is nothing like the dotcom bubble.

The dotcom bubble was the "new economy", the old economy had changed forever and was dead. No one thought it was a bubble. Even when the bubble popped it took until 9/11 to wake us up from the mass hysteria.

I can't think of another "bubble" that practically everyone thinks we are in a bubble. To the point that I think many would find it irrational to believe we are not in a bubble. That is not what bubble is. A bubble is the madness of crowds, not the wisdom of crowds and the crowd certainly believes we are in a bubble.

mazurnification•1mo ago

"the old economy had changed forever and was dead."

Ehem - what is the difference compered to now? Wasn't programmers obsolete by 6mths ago and nobody would work so we do need UBI?

However your point that if everybody are thinking there is buble there is none is valid. Ironically your whole post undermine this point. And you are not alone in your analysis. General bubble wisdom is not settled as one might think.

Plus famous Alan Greenspan "irrational exuberance" was in 1996. And AFAIK in 1999 everybody know there is buble but it busted only in 2000. On top of that I have seen overlying plots of stock prices now and before dot com suggesting there is 1-2 years of increases still to go.

squidbeak•1mo ago

> Ehem - what is the difference compered to now? Wasn't programmers obsolete by 6mths ago and nobody would work so we do need UBI?

You're applying an arbitrary time constraint to the realization of AI's promise in order to rubbish it. This is a logical mistake common among critics: not yet, so never. It doesn't seem as if there is a near limit to the tech's development. Until that changes, the potential for job wipeouts and societal upheaval is real, whether in 5 or 50 years.

mazurnification•1mo ago

Sorry but that was not my point. My point was refuting of the thesis (in the comment that I am replying to) that nobody was making grand claims about AI contrary to grand claims about internet pre dot com. Obviously in both cases there were/are grand claims made.

phatfish•1mo ago

25 years time. "I remember the LLM bubble, everyone knew we were in a bubble but they carried on as if the music would never stop. Don't worry, our situation is nothing like that, there is no talk of a bubble."

kakacik•1mo ago

> practically everyone thinks we are in a bubble

Very untrue, economy doesn't happen on online forums in echo chambers but out there. Every major company invests into AI however they can for the classical FOMO emotion.

This is how movers and decision makers think. No CEO thinks - this will crash, so lets invest into it massively and spread our company finances more thinly when the SHTF comes.

phantasmish•1mo ago

Have you considered that your investment firm and other peers at the time were in an information bubble?

In fact outside of tech if the dotcom bubble wasn’t being discussed it’s because most folks—being not, or barely, online yet—weren’t paying any attention to it per se. The bubble they cared about was the broader stock market bubble, which was definitely widely perceived as a bubble.

nitwit005•1mo ago

There were front page news stories about a possible bubble back then.

The chair of the Federal Reserve, Alan Greenspan, famously made a speech warning of "irrational exuberance". It can't get much more direct than formal government warnings at the highest level.

js8•1mo ago

I think the issue is that everybody assumes the economy operates under some kind of "free market" conditions: limited by available labor but with unlimited potential demand. In that situation, AI could indeed cause massive unemployment.

But this is perhaps not the case. By pesimistic estimates half of the people work in bs jobs that have no real value to society, and every capitalist is focused on rent extraction now. If the economy can operate under such conditions, it doesn't really need more productivity growth, it is already demand-limited.

delis-thumbs-7e•1mo ago

I predict all house cats will be replaced by robots by 2027. People just do not realise how great of an effect the AI and robotics will have on home pet ownership. However as a CEO of a publicly listed company ”Robot-cats-that-are-totally-like-awesome-and-are-gonna-like-totally-be-as-lovable-as-real-ones Inc.” I am seeing this change from first row seat. We plan to be on the helm of the new pet-robot future, which not furry and cute, but cold and boring.

Sources? What, but, you are not a journalist, you are not suppose to challenge what I say, I’m a CEO! No I’m not just using media to create artificial hype to pull investors and make money on bullshit that is never gonna work! How can you say that! It’s a real thing, trust me bro!

poulpy123•1mo ago

The answer is reasoning. It is obvious now that whatever quality LLM have, they don't think and reason, they are just statistical machine outputing whatever they training set as most probable. They are useful, and they can mimic thinking to a certain level, mainly because they have been trained on a inhumane amount of data that no person could learn in one life. But they do not think, and the current algorithms are clearly a dead end for thinking machines.

cbdevidal•1mo ago

Reasoning keeps improving, but they still have a ways to go

https://arcprize.org/leaderboard

vaylian•1mo ago

What we need is reasoning as in "drawing logical conclusions based on logic". LLMs do reasoning by recursively adding more words to the context window. That's not logical reasoning.

tim333•1mo ago

It's debatable that humans do "drawing logical conclusions based on logic". Look at politics and what people vote for. They seem to do something more like pattern matching.

coldtea•1mo ago

Maybe we say that when we don't like those conclusions?

After all I can guarantee the other side (whatever it is) will say the same thing for your "logical" conclusions.

It is logic, we just don't share the same predicates or world model...

isodev•1mo ago

Humans are far from logical. We make decisions within the context of our existence. This includes emotions, friends, family, goals, dreams, fears, feelings, mood, etc.

it’s one of the challenges when LLMs are being anthropomorphised, reasoning/logic for bots is not the same as that for humans.

R_D_Olivaw•1mo ago

And yet, when we make bad calls or do illogical things, because of hormones, emotions, energy levels, etc we still calling it reasoning.

But, to LLMs we don't afford the same leniency. If they flip some bits and the logic doesn't add up we're quick to point that "it's not reasoning at all".

Funny throne we've built for ourselves.

habinero•1mo ago

Yes, because different things are different.

virgil_disgr4ce•1mo ago

Just because all humans don't use reason all the time doesn't mean reasoning isn't a good and desirable strategy.

vaylian•1mo ago

I agree that some people use intuition or pattern matching to make decisions. But humans are also able to use logical reasoning to come to conclusions.

elzbardico•1mo ago

I don't know why you were downvoted. It is a bit more complicated, but that's the gist of it. LLMs don't actually reason.

dagss•1mo ago

Whether LLM is reasoning or not is an independent question to whether it works by generating text.

By the standard in the parent post, humans certainly do not "reason". But that is then just choosing a very high bar for "reasoning" that neither humans nor AI meets...what is the point then?

It is a bit like saying: "Humans don't reason, they just let neurons fire off one another, and think the next thought that enters their mind"

Yes, LLMs need to spew out text to move their state forward. As a human I actually sometimes need to do that too: Talk to myself in my head to make progress. And when things get just a tiny bit complicated I need to offload my brain using pen and paper.

Most arguments used to show that LLMs do not "reason" can be used to show that humans do not reason either.

To show that LLMs do not reason you have to point to something else than how it works.

irishcoffee•1mo ago

I’ll take a stab.

If LLMs were actually able to think/reason and you acknowledge that they’ve been trained on as much data as everyone could get their hands on such that they’ve been “taught” an infinite amount more than any ten humans could learn in a lifetime, I would ask:

Why can’t they solve novel, unsolved problems?

dagss•1mo ago

When coding they are solving "novel, unsolved problems" related to coding problems set up.

So I will assume you mean within maths, science etc? Basically things they can't solve today.

Well 99.9% of humans cannot solve novel, unsolved problems in those fields.

LLMs cannot learn, there is just the initial weight estimation process. And that process currently does not make them good enough on novel math/theoretical physics problems.

That does not mean they do not "reason" in the same way that those 99.9% of humans still "reason".

But they definitely do not learn, the way humans do.

(Anyway, if LLMs could somehow get 1000x as large context window and get to converse with themselves for a full year, it does not seem out of the question they could come out with novel research?)

joenot443•1mo ago

> But they do not think

I see this argument made a lot. I'm not sure if the distinction really holds weight once we start to unravel though.

What's a topic you're able to think about that an LLM is not able to think about?

mopsi•1mo ago

Any topic with little coverage in the training data. LLMs will keep circling around the small bits in the training data, unable synthesize new connections.

This is very obvious when trying to use LLMs to modify scripts in vendor-specific languages that have not been widely documented and don't have many examples available. A seasoned programmer will easily recognize common patterns like if-else blocks and loops, but LLMs will get stuck and output gibberish.

cgriswald•1mo ago

I can solve a mystery novel based on the evidence alone. Assuming an LLM doesn’t already have the answer it will offer solutions based on meta-information like how similar mysteries conclude or are structured. While this can be effective, it’s not really solving the mystery and will fail with anything truly novel.

Tteriffic•1mo ago

People more often do the same thing and act like llm’s.

ted_bunny•1mo ago

I asked GPT for rules on 101-level French grammar. That should be well documented for someone learning from English, no? The answers were so consistently wrong that it seemed intentional. Absolutely nothing novel asked of it. It could have quoted verbatim if it wanted to be lazy. I can't think of an easier question to give an LLM. If it's possible to "prompt wrong" a simple task that my six-year old nephew could easily do, the burden of proof is not on the people denying LLM intelligence, it's on the boosters.

munksbeer•1mo ago

> the burden of proof is not on the people denying LLM intelligence, it's on the boosters

It's an impossible burden to prove. We can't even prove that any other human has sentience or is reasoning, we just evaluate the outcomes.

One day the argument you're putting forward will be irrelevant, or good for theoretical discussion only. In practice I'm certain that machines will achieve human level output at some point.

mjcl•1mo ago

> machines will achieve human level output at some point

Would you care to put some sort of time scale to "at some point?" Are we talking about months, years, decades, centuries?

munksbeer•1mo ago

No real idea. It is also a very difficult thing to measure. But I think once we see it, the argument will be over.

Wild guess, within 30 years.

paodealho•1mo ago

Edit: HN filters emojis in comments. TLDR: the bots reply with various sea-related emojis and can't explain that there's no seahorse emoji.

on Meta (whatsapp) chatbot:

me: what is the seahorse emoji?

bot:

me: this is not a seahorse

bot: You got me! The actual seahorse emoji is . Want me to use it in a fun sentence?

on Bing/Copilot:

me: show me the seahorse emoji

bot:

------------

Here you go: — actually, that’s a fun combo, but the real seahorse emoji is … wait, not quite! The correct one is for jellyfish and for unicorn, but the seahorse emoji itself is ? Let me clarify:

The official seahorse emoji is: ? No — correction! It’s for unicorn. The actual seahorse emoji is: ?

Let’s be precise: the seahorse emoji is ?

Oops — I misspoke earlier. The true seahorse emoji is: ?

Wait — let’s settle this cleanly: The seahorse emoji is: ?

Actually, the correct one is: ?

---------------

In some instances, Copilot answers that "there's no seahorse emoji _and that's why I keep running in circles_". This is just pasting answers, as this is a popular problem discussed on the internet. If the bot were reasoning, it would just answer "there's no seahorse emoji, please stop asking".

mr_mitm•1mo ago

If that's the benchmark, then Opus 4.5 (with "extended thinking") can think:

    > Me: what is the seahorse emoji?
    > Claude: There isn't a seahorse emoji in the standard Unicode emoji set. The closest you'll get is the generic fish  or tropical fish , but no dedicated seahorse exists as of now.

montjoy•1mo ago

Copilot is the absolute worst. Yesterday I had tried to have it create a printable calendar for January 2026 but no matter how I instructed it, it kept showing that the first was on a Wednesday, not Thursday. I even fed it back its own incorrect PDF in a new conversation, which clearly showed the 1st on a Wednesday and asked it to tell me what day the calendar showed the first on. It said the calendar showed the 1st as a Thursday. It started to make me disbelieve my own eyes.

Edit: I gave up on Copilot ant fed the same instructions to ChatGPT, which had no issue.

The point here is that some models seem to know your intention while some just seem stuck on their training data.

griffzhowl•1mo ago

> the current algorithms are clearly a dead end for thinking machines.

These discussions often get derailed into debates about what "thinking" means. If we define thinking as the capacity to produce and evaluate arguments, as the cognitive scientists Sperber and Mercier do, then we can see LLMs are certainly producing arguments, but they're weak at the evaluation.

In some cases, arguments can be formalised, and then evaluating them is a solved problem, as in the examples of using the Lean proofchecker in combination with LLMs to write mathematical proofs.

That suggests a way forward will come from formalising natural language arguments. So LLMs by themselves might be a dead end but in combination with formalisation they could be very powerful. That might not be "thinking" in the sense of the full suite of human abilities that we group with that word but it seems an important component of it.

kjellsbells•1mo ago

> suggests a way forward will come from formalising natural language arguments

If by this you mean "reliably convert expressions made in human natural language to unambiguous, formally parseable expressions that a machine can evaluate the same way every time"... isn't that essentially an unreachable holy grail? I mean, everyone from Plato to Russell and Wittgenstein struggled with the meaning of human statements. And the best solution we have today is to ask the human to restrict the set of statement primitives and combinations that they can use to a small subset of words like "const", "let foo = bar", and so on.

griffzhowl•1mo ago

Whether the Holy Grail is unreachable or not is the question. Of course, the problem in full generality is hard, but that doesn't mean it can't be approached in various partial ways, either by restricting the inputs as you suggest or by coming up with some kind of evaluation procedures that are less strict than formal verifiability. I don't have any detailed proposals tbh

benrutter•1mo ago

> That suggests a way forward will come from formalising natural language arguments.

Hot take (and continue with the derailment), but I'd argue that analytic philosophy from the last 100 years suggests this is a dead end. The idea that belief systems could be formalized was huge in the early 20th century (movements like Logical Positivism, or Russell's principia mathematica being good examples of this).

Those approaches haven't really yielded many results, and by far the more fruitful form of analysis has been to conceptually "reframe" different problems (folks like Hillary Putnam, Wittgenstein, Quine being good examples).

We've stacked up a lot of evidence that human language is much too loose and mushy to be formalised in a meaningful way.

master_crab•1mo ago

We've stacked up a lot of evidence that human language is much too loose and mushy to be formalized in a meaningful way.

Lossy might also be a way of putting it, like a bad compression algorithm. Written language carries far less information than spoken and nonverbal cues.

griffzhowl•1mo ago

True, maybe full formalisation is too strict and the evaluation should be fuzzier

sdwr•1mo ago

Wittgenstein obliterates any hope of formalizing language, yeah.

I think modeling language usefully looks a lot more like psychoanalysis than first order logic.

staticman2•1mo ago

I think you may mean Sperber and Mercier define "reasoning" as the capacity to produce and evaluate arguments?

griffzhowl•1mo ago

True, they use the word "reasoning". Part of my point was just to focus on the more concrete concept: the capacity to produce and evaluate arguments.

coldtea•1mo ago

>If we define thinking as the capacity to produce and evaluate arguments

That bar is so low that even a political pundit on TV can clear it.

AstroBen•1mo ago

Yesterday I got AI (a sota model) to write some tests for a backend I'm working on. One set of tests was for a function that does a somewhat complex SQL query that should return multiple rows

In the test setup, the AI added a single database row, ran the query and then asserted the single added row was returned. Clearly this doesn't show that the query works as intended. Is this what people are referring to when they say AI writes their tests?

I don't know what to call this kind of thinking. Any intelligent, reasoning human would immediately see that it's not even close to enough. You barely even need a coding background to see the issues. AI just doesn't have it, and it hasn't improved in this area for years

This kind of thing happens over and over again. I look at the stuff it outputs and it's clear to me that no reasoning thing would act this way

elcritch•1mo ago

As a counter I’ve had OpenAI Codex and Claude Code both catch logic cases I’d missed in both tests and codes.

The tooling in the Code tools is key to useable LLM coding. Those tools prompt the models to “reason” whether they’ve caught edge cases or met the logic. Without that external support they’re just fancy autocompletes.

In some ways it’s no different than working with some interns. You have to prompt them to “did you consider if your code matched all of the requirements?”.

LLMs are different in that they’re sorta lobotomized. They won’t learn from tutoring “did you consider” which needs to essentially be encoded manually still.

AstroBen•1mo ago

> As a counter I’ve had OpenAI Codex and Claude Code both catch logic cases I’d missed in both tests and codes

That has other explanations than that it reasoned its way to the correct answers. Maybe it had very similar code in its training data

This specific example was with Codex. I didn't mention it because I didn't want it to sound like I think codex is worse than claude code

I do realize my prompt wasn't optimal to get the best out of AI here, and I improved it on the second pass, mainly to give it more explicit instruction on what to do

My point though is that I feel these situations are heavily indicative of it not having true reasoning and understanding of the goals presented to it

Why can it sometimes catch the logic cases you miss, such as in your case, and then utterly fail at something that a simple understanding of the problem and thinking it through would solve? The only explanation I have is that it's not using actual reasoning to solve the problems

grayhatter•1mo ago

> In some ways it’s no different than working with some interns. You have to prompt them to “did you consider if your code matched all of the requirements?”.

I really hate this description, but I can't quite filly articulate why yet. It's distinctly different because interns can form new observations independently. AIs can not. They can make another guess at the next token, but if it could have predicted it the 2nd time, it must have been able to predict it the first, so it's not a new observation. The way I think through a novel problem results in drastically different paths and outputs from an LLM. They guess and check repeatedly, they don't converge on an answer. Which you've already identified

> LLMs are different in that they’re sorta lobotomized. They won’t learn from tutoring “did you consider” which needs to essentially be encoded manually still.

This isn't how you work with an intern (unless the intern is unable to learn).

expedition32•1mo ago

The whole point about an intern is that after a month they can act without coaching. Humans do actually learn- it is quite a revelation to see a child soak up data like an AI on steroids.

grayhatter•1mo ago

> Is this what people are referring to when they say AI writes their tests?

yes

> Any intelligent, reasoning human would immediately see that it's not even close to enough. You barely even need a coding background to see the issues.

[nods]

> This kind of thing happens over and over again. I look at the stuff it outputs and it's clear to me that no reasoning thing would act this way

and yet there're so many people who are convinced it's fantastic. Oh, I made myself sad.

The larger observation about it being statistical inference, rather than reason... but looks to so many to be reason is quite an interesting test case for the "fuzzing" of humans. In line with why do so many engineers store passwords in clear text? Why do so many people believe AI can reason?

dorgo•1mo ago

Sounds like the AI was not dumb but lazy. I do it similarly when I don't feel like doing it.

hulitu•1mo ago

> These discussions often get derailed into debates about what "thinking" means.

"SAL-9000: Will I dream? Dr. Chandra: Of course you will. All intelligent beings dream. Nobody knows why. Perhaps you will dream of HAL... just as I often do." From 2010

lerp-io•1mo ago

they can think just not in the same abstract platonic way that a human mind can

HPsquared•1mo ago

I rather approach it from a Cartesian perspective. A context window is just that, it's not "existence". And because they do not exist in the world the same way as a human does, they do not think in the same way a human does (reversal of "I think therefore I am")

bonesss•1mo ago

I have a context matrix, therefore I transform?

HPsquared•1mo ago

An LLM doesn't have a valid concept of "I" because the data and algorithm it comprises could be run deterministically on any hardware, even multiple times, held in storage, rewound, edited and so on. There is no unique subject, there is no "I". The "life" or "soul" or "session ID" when that sentence is generated, is not known by itself to be unique.

fc417fc802•1mo ago

> because the data and algorithm it comprises could be run deterministically on any hardware, even multiple times, held in storage, rewound, edited and so on.

Quite the assumption you're making that the same thing does not apply to humans. Not having the technology is not the same thing as not being possible.

lerp-io•3w ago

moot argument imo. u can do same with life or anything if you have access to tools, knowledge and time

dagss•1mo ago

Your mind must work differently than mine. I have programmed for 20 years, I have a PhD in astrophysics..

And my "reasoning" is pretty much like a long ChatGPT verbal and sometimes not-so-verbal (visual) conversation with myself.

If my mind really did abstract platonic thinking I think answers to hard problems would just instantly appear to me, without flaws. But only problems I hve solved before and can pattern match do that.

And if I have to think any new thoughts I feel that process is rather similar to how LLMs work.

It is the same for history of science really -- only thoughts that build small steps on previous thoughts and participate in a conversation actually are thought by humans.

Totally new leaps, which a "platonic thinking machines" should easily do, do not seem to happen..

Humans are, IMO, conversation machines too...

lerp-io•3w ago

it’s very different architectures for brain vs modern transformer even though they are related which is why it may seem easy for some to draw analogies

trash_cat•1mo ago

What constitutes real "thinking" or "reasoning" is beside the point. What matters is what results we getting.

And the challenge is rethinking how we do work, connecting all the data sources for agents to run and perform work over the various sources that we perform work. That will take ages. Not to mention having the controls in place to make that the "thinking" was correct in the end.

neutronicus•1mo ago

> connecting all the data sources for agents to run

Copilot can't jump to definition in Visual Studio.

Anthropic got a lot of mileage out of teaching Claude to grep, but LLM agents are a complete dead-end for my code-base until they can use the semantic search tools that actually work on our code-base and hook into the docs for our expensive proprietary dependencies.

virgil_disgr4ce•1mo ago

Thinking is not besides the point, it is the entire point.

You seem to be defining "thinking" as an interchangeable black box, and as long as something fits that slot and "gets results", it's fine.

But it's the code-writing that's the interchangeable black box, not the thinking. The actual work of software development is not writing code, it's solving problems.

With a problem-space-navigation model, I'd agree that there are different strategies that can find a path from A to B, and what we call cognition is one way (more like a collection of techniques) to find a path. I mean, you can in principle brute-force this until you get the desired result.

But that's not the only thing that thinking does. Thinking responds to changing constraints, unexpected effects, new information, and shifting requirements. Thinking observes its own outputs and its own actions. Thinking uses underlying models to reason from first principles. These strategies are domain-independent, too.

And that's not even addressing all the other work involved in reality: deciding what the product should do when the design is underspecified. Asking the client/manager/etc what they want it to do in cases X, Y and Z. Offering suggestions and proposals and explaining tradeoffs.

Now I imagine there could be some other processes we haven't conceived of that can do these things but do them differently than human brains do. But if there were we'd probably just still call it 'thinking.'

Balgair•1mo ago

I know a lot of people with access to Claude Code and the like will say that 'No, it sure seems to reason to me!'

Great. But most (?) of the business out there aren't paying for the big boy models.

I know of a F100 that got snookered into a deal with GPT 4 for 5 years, max of 40 responses per session, max of 10 sessions of memory, no backend integration.

Those folks rightly think that AI is a bad idea.

notepad0x90•1mo ago

Do you think reasoning models don't count? there is a lot of work around those and things like RAGs.

vrighter•1mo ago

"Reasoning" in this context is just a marketing gimmick that means "run the llm in a loop". But people who don't know how they work (the people buying them for others) just take that word at face value and assume it means what it usually means in totally different contexts.

notepad0x90•1mo ago

That's fair, thanks.

NuclearPM•1mo ago

LLMs

klooney•1mo ago

> It is obvious now that whatever quality LLM have, they don't think and reason, they are just statistical machine outputing whatever they training set as most probable

I have kids, and you could say the same about toddlers. Terrific mimics, they don't understand the whys.

gls2ro•1mo ago

IMHO when toddlers say mama they really understand that to a much much bigger degree than any LLM. They might not be able to articulate it but the deep understanding is there.

So I think younger kids have purpose and associate meaning to a lot of things and they do try to get to a specific path toward an outcome.

Of course (depending on the age) their "reasoning" is in a different system than hours where the survival instincts are much more powerful than any custom defined outcome so most of the time that is the driving force of the meaning.

Why I talk about meaning? Because, of course, the kids cannot talk about the why, as that is very abstract. But meaning is a big part of the Why and it continues to be so in adult life it is just that the relation is reversed: we start talking about the why to get to a meaning.

I also think that kids starts to have more complex thoughts than the language very early. If you got through the "Why?" phase you might have noticed that when they ask "Why?" they could mean very different questions. But they don't know the words to describe it. Sometimes "Why?" means "Where?" sometimes means "How?" sometimes means "How long?" .... That series of questioning is, for me, a kind of proof that a lot of things are happening in kids brain much more than they can verbalise.

md2020•4w ago

> they are just statistical machine outputing whatever they training set as most probable.

How is this sentiment still the top comment on an article about AI on HN in 2026? It's not true with today's models. They undergo vast amounts of reinforcement learning optimizing an objective that is NOT just predict the most likely next token given the training corpus. I would say even without the RL the "predict the next token" objective doesn't preclude thinking and reasoning, but that's a separate discussion. Generative sequence modeling learns to (approximately) model the process that produced the sequence. When you consider that text sequences are produced by human minds, which most would consider to be thinking and reasoning, well...

officerk•1mo ago

Link to the NewYorker piece: https://archive.ph/VQ1fT

CharlieDigital•1mo ago

If you think about the real-world and the key bottleneck with most creative work projects (this includes software), it's usually context (in the broadest sense of the word).

Humans are good at this because they are truly multi-modal and can interact through many different channels to gather additional context to do the requisite task at hand. Given incomplete requirements or specs, they can talk to co-workers, look up old documents from a previous release, send a Slack or Teams message, setup a Zoom meeting with stakeholders, call customers, research competitors, buy a competitors product and try it out while taking notes of where it falls short, make a physical site visit to see the context in which the software is to be used and environmental considerations for operation.

Point is that humans doing work have all sorts of ways to gather and compile more context before acting or while they are acting that an LLM does not and in some cases cannot have without the assistance of a human. This process in the real world can unfurl over days or weeks or in response to new inputs and our expectation of how LLMs work doesn't align with this.

LLMs can sort of do this, but more often than not, the failure of LLMs is that we are still very bad at providing proper and sufficient context to the LLM and the LLMs are not very good at requesting more context or reacting to new context, changing plans, changing directions, etc. We also have different expectations of LLMs and we don't expect the LLM to ask "Can you provide a layout and photo of where the machine will be set up and the typical operating conditions?" and then wait a few days for us to gather that context for it before continuing.

ramoz•1mo ago

Claude Code became a critical part of my workflow in March 2025. It is now the primary tool.

stocksinsmocks•1mo ago

I really don’t agree with the author here. Perplexity has, for me, largely replaced Cal Newport’s job (read other journalists work and synthesize celebrity and pundit takes on topic X). I think the take that Claude isn’t literally a human so agents failed is silly and a sign of motivated reasoning. Business processes are going to lag the cutting edge by years in any conditions and by generations if there is no market pressure. But Codex isn’t capable of doing a substantial portion of what I would have had to pay a freelancer/consultant to do? Any LLM can’t replace a writer for a content mill? Nonsense. Newport needs to open his eyes and think harder about how a journalist can deliver value in the emerging market.

FromTheFirstIn•1mo ago

But it isn’t joining the workforce. Your perspective is that it could, but the point that it hasn’t is the one that’s salient. Codex might be able to do a substantial portion of what a freelancer can do, but even you fell short of saying it can replace the freelancer. As long as every ai agent needs its hand held the effect on the labor force is an increase in costs and an increase in outputs where quality doesn’t matter. It’s not a reduction of labor forces

stocksinsmocks•1mo ago

OK, let me fall less short. It has replaced the freelancer for me. I communicate product requirements. It builds the product immediately at trivial cost. It’s better than a human. There are jobs I would have considered hiring out that I don’t because the machine is better. Nothing you said about labor effects in the large even logically follow. Have you even used one of these systems?

FromTheFirstIn•1mo ago

Yup

notepad0x90•1mo ago

I think it depends on what "join" means. I see no reason why it has to be "replace a human". People used to have secretaries back in the day, we don't anymore, we all do our own thing, but in a way, LLMs are our secretaries of sorts now. Or our personal executive assitants, even if you're not an executive.

I don't know what else LLMs need to do? get on the payroll? People are using them heavily. You can't even google things easily without triggering an LLM response.

I think the current millenial and older generation is too used to the pre-LLM way of things, so the resistance will be there for a long time to come. but kids doing homeworks with LLMs will rely on them heavily once they're in the work force.

I don't know how people are not as fascinated and excited about this. I keep watching older scifi content, and LLMs are now doing for us what "futuristic computer persona" did in older scifi.

Easy example: You no longer need copywriters because of LLMs. You had spell/grammar checkers before, but they didn't "understand" context and recommend different phrasing, and check for things like continuity and rambling on.

habinero•1mo ago

> You no longer need copywriters because of LLMs

You absolutely do still need copyeditors for anything you actually care about.

notepad0x90•1mo ago

But why? I routinely see typos and grammar errors on major news articles. LLMs catch that stuff really well, it's like the one thing they're good at. They can be as opinionated as humans on the subject. Perhaps sycophantic mainstream models might be to eager to please, but that's not an issue with the tech itself. I like to think that every publication that is using LLMs to write content has also replaced copyeditors/writers with them.

fennecfoxy•1mo ago

Because rather than let me and my team focus on building the guts of something that works well, my employer was insistent we build PoCs of cut-down concepts that don't make any sense at all.

Like using AI to fill in a form. Instead of a proper autonomous agent (which was the project's goal).

Now that AI is "mainstream" and big bubble big money big career/promotion options for management I expect much more of this behaviour going into 2026.

PeterStuer•1mo ago

I have seen few people fired because of AI in 2025. Instead, I've seen plenty not hired because the existing employees are now producing 2x the work, or tackling jobs themselves that would previously have been outsourced.

lunias•1mo ago

Because AI is not actually autonomous, it can't join the workforce. You have to bring it to work with you. Once you've brought it to work, you need to tell it what to work on. This is the hardest part.

My AI usage exploded in 2025, but a lot of this stuff still requires a decent amount of technical know-how to set up and operate effectively. It also costs money, or requires hardware that many don't possess.

samrus•1mo ago

To understand why they syck at real world tasks, you gotta realize what LLMs are as a model. What do they model? They model static sequences, an example of which is language, the first thing they were used on. What are those?

'Sequences' is self explanatory. 'Static' is interesting, it means that the future state of the sequence is not dependent on your current action. As opposed to dynamical, which means the next state is a function of your current action/prediction, and your next action is a function of that resulting state and so on and so on.

An example of a dynamical system is easy, the stock market. If you sell 100 dollars of nvidia stock right now, it would have no effect on the market, but if you sold 50 billion dollars of nvidia stock, the market will definitely react, at least a little. You current action can change the future state of the system.

An example of a static system the way LLMs are trained on sentence pairs where you could have the sentence pair "i went to the bank. Swimming in the river was alot of fun" ans you give the LLM the first sentence and ask it to tell you what type of bank the sentect is referring to. If the LLM incorrectly says financial bank, the second sentence doesnt magically change to talk about a cashing a check, it stays the exact same. It stays the same whther the LLM guesses right or wrong. You just tell it whether its right or wrong

If your not scared of equations, a static system is y = f(x), and a dynamical system is a_1 = g(s_1), s_2 = h(a_1) and so on.

Static systems are modelled by supervised/self-supervised learning, dynamical systems are modelled by reinforcement learning. LLMs at their core are trained in a self supervised way (the RLHF is afterwards, not on the core of the foundation model) so they cant model a situation where their actions can have consequences that they have to plan for. They dont model the idea of planning at all. Thats why they can write pretty well. And can even pretend to have furhter capabilities, but they fall short of actually being able to do things because that requires planning and understanding the dynamical nature of the real world

Training an LLM or any modality of foundation model in a dynamical way would be the biggest improvement in AI since the attention mechanism

FDA Intends to Take Action Against Non-FDA-Approved GLP-1 Drugs

Tiny C Compiler

SectorC: A C Compiler in 512 bytes

The F Word

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Speed up responses with fast mode

Software factories and the agentic moment

You Are Here

Hoot: Scheme on WebAssembly

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Stories from 25 Years of Software Development

Italy Railways Sabotaged

The Waymo World Model

I write games in C (yes, C)

First Proof

Show HN: A luma dependent chroma compression algorithm (image compression)

Al Lowe on model trains, funny deaths and working with Disney

Vocal Guide – belt sing without killing yourself

Start all of your commands with a comma (2009)

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

LLMs as the new high level language

Selection Rather Than Prediction

Reinforcement Learning from Human Feedback

The AI boom is causing shortages everywhere else

72M Points of Interest

A Fresh Look at IBM 3270 Information Display System

Coding agents have replaced every framework I used

France's homegrown open source online office suite

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

FDA Intends to Take Action Against Non-FDA-Approved GLP-1 Drugs

Tiny C Compiler

SectorC: A C Compiler in 512 bytes

The F Word

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Speed up responses with fast mode

Software factories and the agentic moment

You Are Here

Hoot: Scheme on WebAssembly

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Stories from 25 Years of Software Development

Italy Railways Sabotaged

The Waymo World Model

I write games in C (yes, C)

First Proof

Show HN: A luma dependent chroma compression algorithm (image compression)

Al Lowe on model trains, funny deaths and working with Disney

Vocal Guide – belt sing without killing yourself

Start all of your commands with a comma (2009)

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

LLMs as the new high level language

Selection Rather Than Prediction

Reinforcement Learning from Human Feedback

The AI boom is causing shortages everywhere else

72M Points of Interest

A Fresh Look at IBM 3270 Information Display System

Coding agents have replaced every framework I used

France's homegrown open source online office suite

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Where did all the starships go?

Why didn't AI “join the workforce” in 2025?

Comments