The best I can say is that genAI is a self reported a 20% efficiency boost, and for a very (very) small group of people, it’s maybe a 2-3x boost. (And if you are at a frontier lab, you go fly into the big bucket of exceptions)
At this point, for most use cases, AI productivity is either the equivalent of giving people 3D printers, and seeing little benefit, or signing up for an outsourcing service, just without the development of human capital anywhere.
6 hours of debugging and docs reading is not equal to 6 hours of prompt fiddling. The return of value beyond the few fixes applied will be almost nil from the fiddling.
I’ve been told before.
Like others said, the frustration is when it gets something so wrong you just think "wow, how'd you mess that up?" but when it gets it right its kind of nice. I also dont like that I basically tell Claude what to do, and then either go to busy work or waste time on the internet.
It may be fun to look at inputs and outputs, but it's not hackable and trying to map one into the other is more like astrology than a science.
Welcome to the factory!
The problem is, we haven't had the debate on a societal level if we want to go the star trek route (aka, we give our darn best to automate everything so that humans have the time to do whatever they want) or the realcommunism route (we ward off automation so that we have jobs for people).
The result of that debate not having been made is the third possible outcome - rabid capitalism automates everything as soon as it is profitable and lays off the humans, focusing on getting higher margins out of less people if need be; the best example for that IMHO is Disneyland or Vegas going on ridiculous nickel-and-diming tours. In the end however, there will be no one left any more who has employment and we'll be in for quite the riots.
Generally, I spend anywhere between 15 mins and an hour setting things up (depending on how well the project is set up for AI work), and then set the agent going, coming back in a half-hour to an hour to check its progress. Generally, the tooling keeps it honest (for golang, forbidigo is AWESOME). 80% of the questions the agent asks me require a lot of thought. 20% of what it does needs correction.
The other thing to remember with LLMs is that they are NOT human, and won't react in a human way. So you'll see strikes of "brilliance" followed by the absolutely bizarre. But good guardrails keep that to a minimum.
AI should be assisting us, instead it's doing the job and it's us being an assistant to it. This is a monumental shift that people seem to be missing in how knowledge working is changing and it's going beyond mere coding.
Guardrails, prompts, whatever, it's us helping it doing the job, not the other way around.
Opus 4.6 was the last genuinely good assistant LLM, but since then it's quite clear that the training/reinforcement is focused "given prompt -> do task" so it's behavior is more and more about doing it itself, not helping you. If you try to use it as an assistant it just sucks and is perma wired into finding the solution. Many times I want it to help me investigate, and his answer will still be focused on the fix, not answering my questions.
4.7 first, 4.8 later and fable are absolute disasters as assistants.
Fable in particular is so "intelligent" that it will push with very strong and intelligent takes even if it is completely wrong.
I have never disliked our job more.
To me, this feels in many ways like a technical manager or team lead's job, where I guide the process along using my knowledge and experience, and then let the agent fill in the rest (to the best of its ability).
The agent can't really learn from its mistakes (at least, not without consuming precious context), so I apply a blameless postmortem process, updating the guardrails whenever it goes astray in the same way more than once.
And really, I'd rather be contemplating the more difficult and interesting questions of architecture, environment, ergonomics and market fit, so it suits me fine.
as a boss (or researcher) i'm going to measure productivity based on amount of output per hour that i'm paying you; as a workers, i'm going to measure productivity based on amount of output relative to the amount of effort i'm putting in.
so what may be happening is that bosses see that output is at 80% (productivity down!) but workers see that they can give that 80% output with 40% effort (productivity up!).
So why is it that the bosses are the ones that are so enthusiastic about adoption?
Early on in my days as a sysadmin, I automated a ton of my role when the rest of the team was still doing ClickOps. The reward for doing so was more work and expectations without the additional pay increase to justify my new found productivity. That happens all over the workforce, and so people will just keep it to themselves. I learned my lesson at that first job real fast that if I'm able to have the same, or greater output, for half the time, I keep that to myself so I can use the automation to free up my own time instead of have it filled by the company.
I wonder how much of that is happening now with AI in non-technical roles.
In some cases, workers are also being asked to automate the parts of their jobs they enjoy most, Hinds said on the podcast, pointing to customer-service employees who enjoy building relationships but are increasingly expected to supervise AI agents instead.
"That's what gives you joy and meaning at work," she said. "That is very dangerous."
What's a 20% productivity gain if I constantly feel deflated by work that used to energize me? That's going to give back the productivity gain and more, while also decreasing my quality of life.
Where did the 20% number come from? I’d argue it’s way more than that (or variable, i.e. dependent on who’s using it/how it’s being used/what it’s being used on).
Having said that, the number, to me, doesn’t even matter. You could replace that with 200%, and it’d be just as true.
It's actually kinda pleasant, especially when I consider all the tickets I'm not excited about doing. It's prob worth focusing on that aspect of it.
This is something that I don't see discussed a lot in these conversations, but its true for a ton of folks.
I didn't end up with a career in tech because I wanted to tell a bot to do the fun part of my job for me, leaving me only with the boring tedious parts. I didn't sign up to be a full time code reviewer, and I certainly never wanted to be a manager, yet alone a manager of bots.
It also can't help but spark feelings of "Why am I getting paid 6 figures for this??" and that makes me nervous for the future.
I imagine the engineers and assemblers in factories pre-assembly line felt the same when things started getting automated there. There's an element of craftsmanship that gets taken away as the product moves from being artisanal, hand crafted to mass produced.
I wonder if its too late for me to pivot to hardware
But those times when I had to drop down into a repl and play around with the output of a method. Or try different ways of doing what anyone else would think is boring, like array manipulation - that's a lot of what I actually LIKE to do.
A big part of me just hopes I can hang in there for another... decade, or two. Then I can retire! Maybe.
This is all normal. It’s also well worth the time spent learning
So for my work, it's made me much better at my job. Much faster and more accurate.
I can write a simple query before Claude finishes reading, querying the semantic layer, checking my files, then writes a query that I have to approve, reads the results, hides them (ctrl+o usually works), and gives me a summary.
We’ve reached this inflection point where it’s faster for me to do most tasks again.
I’m sure fast mode costing more money plays a role.
(I spent too long by the horse racing track)
Which is why (well, part of why) I think the long-term trend will be towards self-hosting models. Right now the frontier models are far enough ahead of the self-hosted ones that there are lots of people willing to pay by the token to rent someone else's model, because they get more value for money from that than from self-hosting models.
But the frontier companies won't be able to keep up their current levels of expenditure forever. At some point the investors are going to say "Hey, so, um, when am I going to see some return on my investment?" and then the current subsidized subscriptions (including the one my employer uses) are going to go away, much like what happened with Copilot this month.
And then the locally-hosted models are going to suddenly look like a more attractive picture. Because where you might have been willing to spend $100/month/employee to rent time on models in someone else's data center, you might suddenly balk at spending $500/month/employee. You might say "Hey, you know what? A $50,000 up-front capital investment is only, what, one month's worth of subscriptions for our 100 employees? Yeah, okay, I'll approve the hardware purchase. Get that self-hosted model set up and then we'll cancel the subscription and switch over."
Not everyone is going to do that. But once the locally-hosted models are good enough, the first few people who do so and report success are going to start a snowball effect. And it will likely be driven by money first, but it will also have the effect, that people will slowly discover, of meaning that you can better predict the model you're using. It will continue to work the same way next year that it is working this year; or if it doesn't, it's because you chose to install the new version.
And when that happens (I'm saying "when", not "if" because although it might take some time, I think it's inevitable in the long run), the frontier-model rental companies are going to struggle to stay afloat. Except for the ones who saw this coming and transitioned to a non-subscription income source somehow (maybe by selling licenses to self-host their frontier models for $$BIGNUM), or who have some other revenue stream besides renting out models.
Are you getting LLMsplained? :)
Consider what is happening in most construction sites. The heavy work is absolutely from the technology on site. But without people there to oversee it and keep it working, it would fail.
And that is almost certainly true at any industrial site. Indeed, look up videos of high tech looms. A large portion of the technology added to them are so that the operators can locate the fault and fix it.
> 80% of the questions the agent asks me require a lot of thought. 20% of what it does needs correction.
I've found even the permissions questions give me veto power over fruitless lines of exploration, especially in planning mode. For instance, it wants to use tools I don't have installed to access information that I have made available elsewhere? I get a chance to override this decision by declining the permissions check and redirecting it. Feels tedious, but helps me understand what information sources are influencing it. I head off a lot of bugs this way.
If an initiative produces only 80% of the previous results and you’re paying large token bills on top of the same wages, the AI is going to get cut off.
> i've seen a number of articles claiming things like "devs self report they'er +x% more productive with AI, but actually they're -y% LESS efficient!".
Are you thinking of the old METR evals? Their more recent evals showed an actual performance improvement.
The old report is still circulated as bait for AI skeptics.
It sucks for the employees, otoh it might be the only way we're going to beat Baumol's Cost Disease.
In the past few decades productivity has exploded, but service employees have largely failed to increase productivity in any way because it's harder to automate these tasks.
It's the reason the costs of things like education and healthcare are downright extortionate, the reason you're paying back your college well into your fifties, the reason you don't call an ambulance for someone in the US because you don't want to ruin their life financially.
We may have to trade the personal fulfillment in these jobs for the broader affordable access to these services.
Programming was one of the ones which was, because there were fewer programmers than openings. Now that's flipping, thus naturally, the enjoyment is going to be sucked out of it.
stogot•1h ago
reluctant_dev•1h ago
I just can't imagine tanking my trust with my coworkers by doing something like that.
tommek4077•1h ago
liveoneggs•1h ago
rozap•51m ago
That's what I wonder about, what happens to all those folks.
loloquwowndueo•1h ago
kerblang•1h ago
Managers will be sure to tell you how much they respect you. Ask them if they respect the work and you'll get a blank stare.
yaodub•46m ago