The best I can say is that genAI is a self reported a 20% efficiency boost, and for a very (very) small group of people, it’s maybe a 2-3x boost. (And if you are at a frontier lab, you go fly into the big bucket of exceptions)
At this point, for most use cases, AI productivity is either the equivalent of giving people 3D printers, and seeing little benefit, or signing up for an outsourcing service, just without the development of human capital anywhere.
6 hours of debugging and docs reading is not equal to 6 hours of prompt fiddling. The return of value beyond the few fixes applied will be almost nil from the fiddling.
I’ve been told before.
Like others said, the frustration is when it gets something so wrong you just think "wow, how'd you mess that up?" but when it gets it right its kind of nice. I also dont like that I basically tell Claude what to do, and then either go to busy work or waste time on the internet.
It may be fun to look at inputs and outputs, but it's not hackable and trying to map one into the other is more like astrology than a science.
Welcome to the factory!
The problem is, we haven't had the debate on a societal level if we want to go the star trek route (aka, we give our darn best to automate everything so that humans have the time to do whatever they want) or the realcommunism route (we ward off automation so that we have jobs for people).
The result of that debate not having been made is the third possible outcome - rabid capitalism automates everything as soon as it is profitable and lays off the humans, focusing on getting higher margins out of less people if need be; the best example for that IMHO is Disneyland or Vegas going on ridiculous nickel-and-diming tours. In the end however, there will be no one left any more who has employment and we'll be in for quite the riots.
Generally, I spend anywhere between 15 mins and an hour setting things up (depending on how well the project is set up for AI work), and then set the agent going, coming back in a half-hour to an hour to check its progress. Generally, the tooling keeps it honest (for golang, forbidigo is AWESOME). 80% of the questions the agent asks me require a lot of thought. 20% of what it does needs correction.
The other thing to remember with LLMs is that they are NOT human, and won't react in a human way. So you'll see strikes of "brilliance" followed by the absolutely bizarre. But good guardrails keep that to a minimum.
AI should be assisting us, instead it's doing the job and it's us being an assistant to it. This is a monumental shift that people seem to be missing in how knowledge working is changing and it's going beyond mere coding.
Guardrails, prompts, whatever, it's us helping it doing the job, not the other way around.
Opus 4.6 was the last genuinely good assistant LLM, but since then it's quite clear that the training/reinforcement is focused "given prompt -> do task" so it's behavior is more and more about doing it itself, not helping you. If you try to use it as an assistant it just sucks and is perma wired into finding the solution. Many times I want it to help me investigate, and his answer will still be focused on the fix, not answering my questions.
4.7 first, 4.8 later and fable are absolute disasters as assistants.
Fable in particular is so "intelligent" that it will push with very strong and intelligent takes even if it is completely wrong.
I have never disliked our job more.
To me, this feels in many ways like a technical manager or team lead's job, where I guide the process along using my knowledge and experience, and then let the agent fill in the rest (to the best of its ability).
The agent can't really learn from its mistakes (at least, not without consuming precious context), so I apply a blameless postmortem process, updating the guardrails whenever it goes astray in the same way more than once.
And really, I'd rather be contemplating the more difficult and interesting questions of architecture, environment, ergonomics and market fit, so it suits me fine.
as a boss (or researcher) i'm going to measure productivity based on amount of output per hour that i'm paying you; as a workers, i'm going to measure productivity based on amount of output relative to the amount of effort i'm putting in.
so what may be happening is that bosses see that output is at 80% (productivity down!) but workers see that they can give that 80% output with 40% effort (productivity up!).
So why is it that the bosses are the ones that are so enthusiastic about adoption?
Early on in my days as a sysadmin, I automated a ton of my role when the rest of the team was still doing ClickOps. The reward for doing so was more work and expectations without the additional pay increase to justify my new found productivity. That happens all over the workforce, and so people will just keep it to themselves. I learned my lesson at that first job real fast that if I'm able to have the same, or greater output, for half the time, I keep that to myself so I can use the automation to free up my own time instead of have it filled by the company.
I wonder how much of that is happening now with AI in non-technical roles.
In some cases, workers are also being asked to automate the parts of their jobs they enjoy most, Hinds said on the podcast, pointing to customer-service employees who enjoy building relationships but are increasingly expected to supervise AI agents instead.
"That's what gives you joy and meaning at work," she said. "That is very dangerous."
What's a 20% productivity gain if I constantly feel deflated by work that used to energize me? That's going to give back the productivity gain and more, while also decreasing my quality of life.
Except the intern is trapped inside an iron lung and must communicate entirely by text. And also has zero real creativity or self-motivation.
Where did the 20% number come from? I’d argue it’s way more than that (or variable, i.e. dependent on who’s using it/how it’s being used/what it’s being used on).
Having said that, the number, to me, doesn’t even matter. You could replace that with 200%, and it’d be just as true.
It's actually kinda pleasant, especially when I consider all the tickets I'm not excited about doing. It's prob worth focusing on that aspect of it.
This is something that I don't see discussed a lot in these conversations, but its true for a ton of folks.
I didn't end up with a career in tech because I wanted to tell a bot to do the fun part of my job for me, leaving me only with the boring tedious parts. I didn't sign up to be a full time code reviewer, and I certainly never wanted to be a manager, yet alone a manager of bots.
It also can't help but spark feelings of "Why am I getting paid 6 figures for this??" and that makes me nervous for the future.
I imagine the engineers and assemblers in factories pre-assembly line felt the same when things started getting automated there. There's an element of craftsmanship that gets taken away as the product moves from being artisanal, hand crafted to mass produced.
I wonder if its too late for me to pivot to hardware
But those times when I had to drop down into a repl and play around with the output of a method. Or try different ways of doing what anyone else would think is boring, like array manipulation - that's a lot of what I actually LIKE to do.
A big part of me just hopes I can hang in there for another... decade, or two. Then I can retire! Maybe.
This is all normal. It’s also well worth the time spent learning
So for my work, it's made me much better at my job. Much faster and more accurate.
I can write a simple query before Claude finishes reading, querying the semantic layer, checking my files, then writes a query that I have to approve, reads the results, hides them (ctrl+o usually works), and gives me a summary.
We’ve reached this inflection point where it’s faster for me to do most tasks again.
I’m sure fast mode costing more money plays a role.
But now it's happening at the company level: "We're going to add a chatbot to increase productivity! Now MCP tools! Then agentic workflows! We’ll add skills, and now productivity will go up! Maybe loops will do it?"
If I was valued at 1 trillion dollars, and I was in the hole enough to sink a couple small countries' GDP, maybe I would slowly start to optimize to maximize token usage.
I want to sell tokens, how do I sell more tokens? Not by doing the same work in less tokens, that's for sure.
This is like if you pay me by the hour and then excitedly tell me that you keep paying 10k a month and it's great. I will most certainly not work faster, in this hypothetical, if you tell me you love spending money because it gives you a dopamine rush. I would probably spend a couple more hours REALLY thinking about the task, maybe writing some docs nobody will read, maybe considering multiple options, doing benchmarks, doing research, and then later maybe ill do the actual task as well.
Im not saying these AI companies are scamming us, but the incentives are there and extremely clear. The only thing currently holding it back is that there is some vague kind of competition.
(There's a reason why I call it the MBA's stone. It transmutes all knowledge work into a problem of management.)
(I spent too long by the horse racing track)
I haven't stated that it's not more capable nor more "intelligent", it's the opposite.
I will try to expand on what I mean.
I have said that it's "character/persona/tendencies" are increasingly less about acting as an assistant and more about finding the solution itself.
I use AI in a specific way: he assists, investigates and answers my question. I do the coding. It is increasingly difficult to use it as such, because it quickly jumps into giving me solutions.
Fable is no different. Even though I asked it to investigate how a certain emailer in phoenix works for a specific usage of mine, he did very little investigation and jumped into why I should've used magic links as they are the default on Phoenix.
Today at work, I had a problem with batching, I wanted to understand if batching was even needed at all for our use case, and he kept circling around how to fix the batching issue.
I am increasingly frustrated by these models "personality" and tendencies that are unhelpful to assist me doing the task at hand and more on it doing it and assisting/supervising.
Which is why (well, part of why) I think the long-term trend will be towards self-hosting models. Right now the frontier models are far enough ahead of the self-hosted ones that there are lots of people willing to pay by the token to rent someone else's model, because they get more value for money from that than from self-hosting models.
But the frontier companies won't be able to keep up their current levels of expenditure forever. At some point the investors are going to say "Hey, so, um, when am I going to see some return on my investment?" and then the current subsidized subscriptions (including the one my employer uses) are going to go away, much like what happened with Copilot this month.
And then the locally-hosted models are going to suddenly look like a more attractive picture. Because where you might have been willing to spend $100/month/employee to rent time on models in someone else's data center, you might suddenly balk at spending $500/month/employee. You might say "Hey, you know what? A $50,000 up-front capital investment is only, what, one month's worth of subscriptions for our 100 employees? Yeah, okay, I'll approve the hardware purchase. Get that self-hosted model set up and then we'll cancel the subscription and switch over."
Not everyone is going to do that. But once the locally-hosted models are good enough, the first few people who do so and report success are going to start a snowball effect. And it will likely be driven by money first, but it will also have the effect, that people will slowly discover, of meaning that you can better predict the model you're using. It will continue to work the same way next year that it is working this year; or if it doesn't, it's because you chose to install the new version.
And when that happens (I'm saying "when", not "if" because although it might take some time, I think it's inevitable in the long run), the frontier-model rental companies are going to struggle to stay afloat. Except for the ones who saw this coming and transitioned to a non-subscription income source somehow (maybe by selling licenses to self-host their frontier models for $$BIGNUM), or who have some other revenue stream besides renting out models.
Are you getting LLMsplained? :)
Consider what is happening in most construction sites. The heavy work is absolutely from the technology on site. But without people there to oversee it and keep it working, it would fail.
And that is almost certainly true at any industrial site. Indeed, look up videos of high tech looms. A large portion of the technology added to them are so that the operators can locate the fault and fix it.
If you're a manager and you ask a report to do something and they come back with a question, does that mean you're now their assistant?
I give agents the tasks, I answer their questions, I make choices about the tradeoffs in their plan, I supervise their implementation, I review their output, I have them walk me through things. In what way is this not delegating to them and managing their work, just like a more junior employee?
> 80% of the questions the agent asks me require a lot of thought. 20% of what it does needs correction.
I've found even the permissions questions give me veto power over fruitless lines of exploration, especially in planning mode. For instance, it wants to use tools I don't have installed to access information that I have made available elsewhere? I get a chance to override this decision by declining the permissions check and redirecting it. Feels tedious, but helps me understand what information sources are influencing it. I head off a lot of bugs this way.
If an initiative produces only 80% of the previous results and you’re paying large token bills on top of the same wages, the AI is going to get cut off.
> i've seen a number of articles claiming things like "devs self report they'er +x% more productive with AI, but actually they're -y% LESS efficient!".
Are you thinking of the old METR evals? Their more recent evals showed an actual performance improvement.
The old report is still circulated as bait for AI skeptics.
> Jeb: "If everybody's got one of these auto-whatsits, does anybody code anymore?"
> Doc Brown: "Of course we code. But for recreation. For fun."
> Jeb: "Code for fun? What the hell kind of fun is that?"
Then they announced that they removed the limit/making further request just cost extra for them. That's when I started using it as I did for my personal projects I pay subscriptions for...
Then Copilot increased their pricing. Announced in April I think? But took effect this month. This Monday they announced that the limits are back in effect. So I guess I'll be going back to hand coding next week, as my tokens are about to run out ಥ ‿ ಥ
Corporate is always so silly. I mean I know how it happens: everyone just wants to get their bonus, so different management roles try to coerce the employees to do whatever best serves their bottomline - rarely related to whatever is good for the corporation... But it's always silly to live through it.
It sucks for the employees, otoh it might be the only way we're going to beat Baumol's Cost Disease.
In the past few decades productivity has exploded, but service employees have largely failed to increase productivity in any way because it's harder to automate these tasks.
It's the reason the costs of things like education and healthcare are downright extortionate, the reason you're paying back your college well into your fifties, the reason you don't call an ambulance for someone in the US because you don't want to ruin their life financially.
We may have to trade the personal fulfillment in these jobs for the broader affordable access to these services.
Administrators, on the other hand, are a massive part of the costs in the health sector (IIRC the Obama administration chickened out on truly reforming healthcare exactly because the number of administrators that would be made redundant would tank the economy). A significant amount of administrative work can be automated.
You might wanna think again on that line of reasoning, because plenty of other countries have the same dynamics with respect to service employees, but they don't suffer the very US-only problem of ridiculous education and healthcare costs where calling an ambulance can ruin someones life.
Programming was one of the ones which was, because there were fewer programmers than openings. Now that's flipping, thus naturally, the enjoyment is going to be sucked out of it.
He said, "Almost half of what we do is not that valuable to our customers, but it's valuable to him, and her, and him", pointing through the conference-room window at my fellow programmers, "and that's why we do it. If we only did things that were very valuable to our customers, we wouldn't have nearly as many good engineers on the team as we do."
Is everyone entitled to a high quality of life?
If not, then who draws the line as to who deserves what benefit in life? You?
Consumers will be spoiled for choice between deeply mediocre options.
Besides, what's the point of adopting new technologies if it's not to increase the quality of life? If everyone just exists in service of the product development lifecycle, who and what are the products actually for?
stogot•1h ago
reluctant_dev•1h ago
I just can't imagine tanking my trust with my coworkers by doing something like that.
tommek4077•1h ago
liveoneggs•1h ago
rozap•1h ago
That's what I wonder about, what happens to all those folks.
loloquwowndueo•1h ago
kerblang•1h ago
Managers will be sure to tell you how much they respect you. Ask them if they respect the work and you'll get a blank stare.
yaodub•1h ago