Large language models, small labor market effects [pdf]

https://bfi.uchicago.edu/wp-content/uploads/2025/04/BFI_WP_2025-56-1.pdf

144•luu•2mo ago

Comments

mediaman•2mo ago

Great read. One of the interesting insights from it is how difficult good application of AI is.

A lot of companies are just "deploying a chatbot" and some of the results from this study show that this doesn't work very well. My experience is similar: deploying simple chatbots to the enterprise doesn't do a lot.

For things to get better, two things are required, neither of which are easy:

- Integration into existing systems. You have to build data lakes or similar system that allow the AI to use data and information broadly across an enterprise. For example, for an AI tool to be useful in accounting, it's going to need high quality data access to the company's POs, issued invoices, receivers, GL data, vendor invoices, and so on. But many systems are old, have dodgy or nonexistent APIs, and data is held in various bureaucratic fiefdoms. This work is hard and doesn't scale that well.

- Knowledge of specific workflows. It's better when these tools are built with specific workflows in mind that are designed around specific peoples' jobs. This can start looking less like pure AI and more like a mix of traditional software with some AI capabilities. My experience is that I sell software as "AI solutions," but often I feel a lot of the value created is because it's replacing bad processes (either terrible older software, or attempting to do collaborative work via spreadsheet), and the AI tastefully sprinkled throughout may not be the primary value driver.

Knowledge of specific workflows also requires really good product design. High empathy, ability to understand what's not being said, ability to understand how to create an overall process value stream from many different peoples' narrower viewpoints, etc. This is also hard.

Moreover, this is deceiving because for some types of work (coding, ideating around marketing copy) you really don't need that much scaffolding at all because the capabilities are latent in the AI, and layering stuff on top mostly gets in the way.

My experience is that this type of work is a narrow slice of the total amount of work to be done, though, which is why I'd agree with the overall direction this study is suggesting that creating actual measurable major economic value with AI is going to be a long-term slog, and that we'll probably gradually stop calling it AI in the process as we attenuate to it and it starts being used as a tool within software processes.

ladeez•2mo ago

The pivot to cloud had a decade warmup before HOWTO was normalized to existing standards.

In the lead up a lot of the same naysaying we see about AI was everywhere. AI can be compressed into less logic on a chip, bootstrap from models. Require less state management tooling software dev relies on now. We’re slowly being trained to accept a down turn in software jobs. No need to generate the code that makes up an electrical state when we can just tune hardware to the state from an abstract model deterministically. Energy based models are the futuuuuuure.

https://www.chipstrat.com/p/jensen-were-with-you-but-were-no...

Lot of the same naysaying about Dungeons and Dragons and comic books in the past too. Life carried on.

Functional illiterates fetishize semantics, come to view their special literacy as key to the future of humanity. Tale as old as time.

aerhardt•2mo ago

> how difficult good application of AI is.

The only interesting application I've identified thus far in my domain in Enterprise IT (I don't do consumer-facing stuff like chatbots) is in replacing tasks that previously would've been done by NLP: mainly extraction, synthesis, classification. I am currently working a long-neglected dataset that needs a massive remodel and I think that would've taken a lot of manual intervention and a mix of different NLP models to whip into shape in the past, but with LLMs we might be able to pull it off with far fewer resources.

Mind you at the scale of the customer I am currently working with, this task also would've never been done in the first place - so it's not replacing anyone.

> This can start looking less like pure AI and more like a mix of traditional software with some AI capabilities

Yes, the other use case I'm seeing is in peppering already existing workflow integrations with a bit of LLM magic here and there. But why would I re-work a worklfow that's already implemented and well-understood in Zapier, n8n or Python with total reliability.

> Knowledge of specific workflows also requires really good product design. High empathy, ability to understand what's not being said, ability to understand how to create an overall process value stream from many different peoples' narrower viewpoints, etc. This is also hard.

> My experience is that this type of work is a narrow slice of the total amount of work to be done

Reading you I get the sense we are on the same page on a lot of thing and I am pretty sure if we worked together we'd get along fine. I'm struggling a bit with the LLM delulus as of late so it's a breath of fresh air to read people out there who get it.

PaulHoule•2mo ago

As I see it three letter organizations have been using frameworks like Apache UIMA to build information extraction pipelines that are manual at worst and hybrid at best. Before BERT the models we had for this sucked, only useful for certain things, and usually requiring training sets of 20,000 or so examples.

Today the range of things for which the models are tolerable to "great" has greatly expanded. In arXiv papers you tend to see people getting tepid results with 500 examples, I get better results with 5000 examples and diminishing returns past 15k.

For a lot of people it begins and ends with "prompt engineering" of commercial decoder models and evaluation isn't even an afterthought For information extraction, classification and such though you get often good results with encoder models (e.g. BERT) put together with serious eval, calibration and model selection. Still the system looks like the old systems if your problem is hard and has to be done in a scalable way, but sometimes you can make something that "just works" without trying too hard, keeping your train/eval data in a spreadsheet.

AlexCoventry•2mo ago

I think when the costs and latencies of reasoning models like o1-pro, o3 and o4-mini-high come down, chatbots are going to be much more effective for technical support. They're quite reliable and knowledgeable, in my experience.

treis•2mo ago

LLM chatbots are a step forward for customer support. Well, ours started hallucinating a support phone number that while is a real number is not our number. Lots of people started calling which was a bad time for everyone. Especially the person's number it actually is. So maybe two steps forward and occasionally one back.

TRiG_Ireland•2mo ago

As a customer, LLM chatbots are fifteen steps backwards and have approximately zero upsides. I hate them with a deep and abiding passion.

stingraycharles•2mo ago

Not compared to the previous iteration of chatbots. But compared to human operators, definitely.

stego-tech•2mo ago

> Integration into existing systems.

Integration alone isn't enough. Organizations let their data go stale, because keeping it updated is a political task instead of a technical one. Feeding an AI stale data effectively renders it useless, because it doesn't have the presence of mind to ask for assistance when it encounters an issue, or to ask colleagues if this process is still correct even though the expected data doesn't "fit".

Automations - including AI - require clean, up-to-date data in order to function effectively. Orgs who slap in a chatbot and call it a day don't understand the assignment.

no_wizard•2mo ago

I work on an application that uses AI to index and evaluate any given corpus (like papers, knowledge bases etc) of knowledge and it has been a huge help here, and I know its because we are dealing with what is effectively structured data that can be well classified once identified, and we have relatively straightforward ways of doing identification. The real magic is when the finely tuned AI started to correctly stitch pieces of information together that previously didn't appear to be related that is the secret sauce beyond simply indexing for search

Code is similar - programming languages have rules that are well known, couple that with proper identification, pattern matching and thats how you get to these generated prototypes[0] done via so called 'vibe coding' (not the biggest fan of the term but I digress)

I think this is early signs that this generation of LLMs at least, are likely to be augmentations to many existing roles as opposed to strictly replacing them. Productivity will increase by a good magnitude once the tools are well understood and scoped to task

[0]: They really are prototypes. You will eventually hit walls by having an LLM generate the code without understanding the code.

bwfan123•2mo ago

an analogy i find useful is the search-engine (google).

yea the search-engine improved productivity of almost everyone, but didnt change any workflows.

jaxtracks•2mo ago

Interesting study! Far too early in the adoption lifecycle for any conclusions I think, especially given that the data is from Denmark which tends to be have a far less hype-driven business culture than the US going by my bit of experience working in both. Anecdotally, I've seen a couple of AI hiring freezes in the states (some from LLM integrations I've built) that I'm fairly sure will be reversed when management gets a more realistic sense of capabilities, and my general sense is that the Danes I've worked with would be far less likely to overestimate the value of these tools.

sottol•2mo ago

I agree on the "far too early" part. But imo we can probably say more about the impact in a year though, not 5-10 years. But it does show that some of the randomized-controlled-trials that showed large labor-force impact and productivity gains are probably only applicable to a small sub-section of the work-force.

It also looks like the second survey was sent out in June 2024 - so the data is 10 months old at this point, another reason why this it might be early.

That said, the latest round of models are the first I've started using more extensively.

The paper does address the fact that Denmark is not the US, but supposedly not that different:

"First, Danish workers have been at the forefront of Generative AI adoption, with take-up rates comparable to those in the United States (Bick, Blandin and Deming, 2025; Humlum and Vestergaard, 2025; RISJ, 2024).

Second, Denmark’s labor market is highly flexible, with low hiring and firing costs and decentralized wage bargaining—similar to that of the U.S.—which allows firms and workers to adjust hours and earnings in response to technological change (Botero et al., 2004; Dahl, Le Maire and Munch, 2013). In particular, most workers in our sample engage in annual negotiations with their employers, providing regular opportunities to adjust earnings and hours in response to AI chatbot adoption during the study period."

meta_ai_x•2mo ago

It's incredibly hard to model complex non-linear systems. So, while I applaud the researchers to provide some data points, these things provide ZERO value for current/future decision making.

Chatbots were absolute garbage before chatGPT, while post chatGPT everything changed. So, there is going to be a tipping point event on labor market effects and past single variable "data analysis" will not provide anything to predict the event or it's effects

Legend2440•2mo ago

Seems premature, like measuring the economic impact of the internet in 1985.

LLMs are more tech demo than product right now, and it could take many years for their full impact to become apparent.

amarcheschi•2mo ago

I wouldn't call "premature" when llm companies ceos have been proposing ai agents for replacing workers - and similar things that I find debatable - in about the 2nd half of the twenties. I mean, a cold shower might eventually happen for a lot of Ai based companies

frankfrank13•2mo ago

> cold shower might eventually happen for a lot of Ai based companies

undoubtedly.

The economic impact of some actually useful tools (Cursor, Claude) are propping up hundreds of billions of dollars in funding for, idk, "AI for <pick an industry> "or "replace your <job title> with our AI tool"

dehrmann•2mo ago

The most recent example is the Anthropic CEO:

> I think we will be there in three to six months, where AI is writing 90% of the code. And then, in 12 months, we may be in a world where AI is writing essentially all of the code

https://www.businessinsider.com/anthropic-ceo-ai-90-percent-...

This seems either wildly optimistic or comes with a giant asterisk that AI will write it by token predicting, then a human will have to double check and refine it.

amarcheschi•2mo ago

I'm honestly slightly appalled by what we might miss by not reading the docs and just letting Ai code. I'm attending a course where we have to analyze medical datasets using up to ~200gb of ram. Calculations can take some time. A simple skim through the library (or even asking the chatbot) can tell you that one of the longest call can be approximated and it takes about 1/3rd of the time it takes with another solver. And yet, none of my colleagues thought about either looking the docs or asking the chatbot. Because it was working. And of course the chatbot was using the solver that was "standard" but that you probably don't need to use for prototyping.

Again. We had some parts of one of 3 datasets split in ~40 files, and we had to manipulate and save them before doing anything else. A colleague asked chatgpt to write the code to do it and it was single threaded, and not feasible. I hopped up on htop and upon seeing it was using only one core, I suggested her to ask chatgpt to make the conversion run on different files in different threads, and we basically went from absolutely slow to quite fast. But that supposed that the person using the code knows what's going on, why, and what is not going on. And when it is possible to do something different. Using it without asking yourself more about the context is a terrible use imho, but it's absolutely the direction that I see we're headed towards and I'm not a fan of it

agumonkey•2mo ago

I anticipate a control issue, where agents can produce code faster than people can analyze and beside applications with small visible surfaces, nobody will be able to check what is going on

I saw people with trouble manipulating boolean tables of 3 variables in their head trying to generate complete web applications, it will work for linear duties (input -> processing -> storage) but I highly doubt they will be able to understand anything with 2nd order effects

tylerrobinson•2mo ago

> people with trouble manipulating boolean tables of 3 variables in their head

To be fair, 3 booleans (2^3=8) is more working memory than most people are productive with. Way more if they’re nullable :)

agumonkey•2mo ago

Really feels like a race to the bottom to me. You used to have a trade that made you try to master those 8 configurations and now you're acting like the ignorant manager asking "agents" to deal with it.

Legend2440•2mo ago

I wouldn't really pay attention to what CEOs say, it's their job to sell things and drum up investment.

No one really knows exactly how AI will play out. There's a lot of motivated reasoning going on, both from hypesters and cynics.

bluefirebrand•2mo ago

> I wouldn't really pay attention to what CEOs say, it's their job to sell things and drum up investment

We don't have a choice but to pay attention to what CEOs say, they are the ones that might lead our companies off of cliffs if we let them

auggierose•2mo ago

It's not my company, it can go off the cliff for all I care.

latentsea•2mo ago

This quote to me feels more along the "no computer will ever need more than 640kb of RAM" lines in terms of historical accuracy. Like whoops, nope.

pcwalton•2mo ago

That statement seems extremely hyperbolic. Just today I tried automating some very pedestrian code for a system that wasn't particularly well-documented and ChatGPT 4o hallucinated the entire API. It was deeply frustrating and wasted more of my time than it would have taken to just slog through the documentation.

I won't deny that LLMs can be useful--I still use them--but in my experience an LLM's success rate in writing working code is somewhere around 50%. That leads to a productivity boost that, while not negative, isn't anywhere near the wild numbers that are bandied about.

layoric•2mo ago

A big difference here is the sheer scale of investment. In 1985, the internet was running on the dreams of a few. The sheer depth of investment in "AI" currently is hard to fathom, and being injected into everything regardless of what customers want.

Legend2440•2mo ago

That's because the tech industry has more money than god and nothing better to do with it.

Microsoft alone has half a trillion dollars in assets, and Apple/Google/Meta/Amazon are in similar financial positions. Spending a few tens of billions on datacenters is, as crazy as it sounds, nothing to them.

no_wizard•2mo ago

While ARPANET itself reportedly cost somewhere between 10 - 20 million USD, which is relatively cheap, the precursor research that allowed the internet to take off - which is directed more at general computing and advanced computer networks, the telecommunications investments - cost many billions of dollars, it was mostly public money is the biggest difference.

That said, private companies are pumping alot of money into this space, but technological progress is a peaks and valley situation. I imagine most of the money will ultimately move the needle very little following things of dubious hindsight value

trod1234•2mo ago

We seriously live in the world of Anathem now where apparently most people need a specialized expert to cut through plausible generated misinformation as a whole.

This is a second similar study I've seen today on HN that seems in part generated by AI, and fails rigorous methodology, while making conclusions that are unbased to seemingly fuel a narrative.

The study fails to account for a number of elements which nullify the conclusions as a whole.

AI Chatbot tasks by their nature are communication tasks involving a third-party (the customer). When the Chatbot fails to direct, or loops coercively, and this is a task computer's really can't do well; customers get enraged because it results in crazy-making/inducing behavior. The Chatbot in such cases imposes time-cost, with all the necessary elements suitable to call it torture. Those elements being isolation, cognitive dissonance, coercion with perceived/real loss, lack of agency. There is little if any differentiation between the tasks measured. Emotions Kill [1].

This results in outcomes where there is no change, or higher demand for workers, just to calm that person down and this is true regardless of occupation. In other words the punching bag of verbal hostility, which is the role of CSR receiving calls or communications from irrationally enraged customers after AI has had their first chance to wind them up.

It is a stochastic environment, and very few conclusions can actually be supported because they seem to follow reasoning along a null hypothesis.

The surveys use Denmark as an example (being part of the EU), but its unclear if they properly take into account company policies about not submitting certain private data for tasks to a US-based LLM given the risks related to GDPR. They say the surveys were sent to workers directly who are already employed, but it makes no measure of displaced workers, nor overall job reductions, which historically is how the changes in integration are adopted, misleading the non-domain expert reader.

The paper does not appear to be sound, and given it relies solely on a DiD approach without specifying alternatives, it may be pushing a pre-fabricated narrative that AI won't disrupt the workforce when the study doesn't actually support that in any meaningful rational way.

This isn't how you do good science. Overgeneralizing is a fallacy, and while some computation is being done to limit that it doesn't touch on what you don't know, because what you don't know hasn't been quantified (i.e. the streetlight effect)[1].

To understand this, the layman and expert alike must always pay attention to what you don't know. The video below touches on some of the issues without requiring technical expertise. [1]

[1][Talk] Survival Heuristics: My Favorite Techniques for Avoiding Intelligence Traps - SANS CTI Summit 2018

https://www.youtube.com/watch?v=kNv2PlqmsAc

kazinator•2mo ago

Economists who write LaTeX docs are scary, even with AI help.

credit_guy•2mo ago

In the early days of computers most scientists kept using slide rules.

atomicnature•2mo ago

The leading indicator of future market impact is programming and software engineering productivity increasing by 10x on the producer side.

The effects of these productivity gains will take time to materialize on the consumer side.

The New Skill in AI Is Not Prompting, It's Context Engineering

I write type-safe generic data structures in C

The hidden JTAG in a Qualcomm/Snapdragon device’s USB port

There are no new ideas in AI only new datasets

The Original LZEXE (A.K.A. Kosinski) Compressor Source Code Has Been Released

They don't make 'em like that any more: Sony DTC-700 audio DAT player/recorder

Show HN: TokenDagger – A tokenizer faster than OpenAI's Tiktoken

End of an Era

Show HN: New Ensō – first public beta

Xfinity using WiFi signals in your house to detect motion

Creating fair dice from random objects

The provenance memory model for C

Ask HN: What Are You Working On? (June 2025)

Donkey Kong Country 2 and Open Bus

The Plot of the Phantom, a text adventure that took 40 years to finish

14.ai (YC W24) hiring founding engineers in SF to build a Zendesk alternative

Ask HN: What's the 2025 stack for a self-hosted photo library with local AI?

Entropy of a Mixture

Ask HN: 80s electronics book club; anyone remember this illustrator?

Datadog's $65M/year customer mystery solved

CertMate – SSL Certificate Management System

Show HN: We're two coffee nerds who built an AI app to track beans and recipes

Researching LED Displays for the Time Circuits

Printegrated Circuits: Merging 3D Printing and Electronics

Asynchronous Error Handling Is Hard

Auth for B2B SaaS: it's not like auth for consumer software

Reverse Engineering Vercel's BotID

Jacobi Ellipsoid

Cloud-forming isoprene and terpenes from crops may drastically improve climate

New proof dramatically compresses space needed for computation

The New Skill in AI Is Not Prompting, It's Context Engineering

I write type-safe generic data structures in C

The hidden JTAG in a Qualcomm/Snapdragon device’s USB port

There are no new ideas in AI only new datasets

The Original LZEXE (A.K.A. Kosinski) Compressor Source Code Has Been Released

They don't make 'em like that any more: Sony DTC-700 audio DAT player/recorder

Show HN: TokenDagger – A tokenizer faster than OpenAI's Tiktoken

End of an Era

Show HN: New Ensō – first public beta

Xfinity using WiFi signals in your house to detect motion

Creating fair dice from random objects

The provenance memory model for C

Ask HN: What Are You Working On? (June 2025)

Donkey Kong Country 2 and Open Bus

The Plot of the Phantom, a text adventure that took 40 years to finish

14.ai (YC W24) hiring founding engineers in SF to build a Zendesk alternative

Ask HN: What's the 2025 stack for a self-hosted photo library with local AI?

Entropy of a Mixture

Ask HN: 80s electronics book club; anyone remember this illustrator?

Datadog's $65M/year customer mystery solved

CertMate – SSL Certificate Management System

Show HN: We're two coffee nerds who built an AI app to track beans and recipes

Researching LED Displays for the Time Circuits

Printegrated Circuits: Merging 3D Printing and Electronics

Asynchronous Error Handling Is Hard

Auth for B2B SaaS: it's not like auth for consumer software

Reverse Engineering Vercel's BotID

Jacobi Ellipsoid

Cloud-forming isoprene and terpenes from crops may drastically improve climate

New proof dramatically compresses space needed for computation

Large language models, small labor market effects [pdf]

Comments