The new calculus of AI-based coding

https://blog.joemag.dev/2025/10/the-new-calculus-of-ai-based-coding.html

55•todsacerdoti•5h ago

Comments

Madmallard•2h ago

first the Microsoft guy touting agents

now AWS guy doing it !

"My team is no different—we are producing code at 10x of typical high-velocity team. That's not hyperbole - we've actually collected and analyzed the metrics."

Rofl

"The Cost-Benefit Rebalance"

In here he basically just talks about setting up mock dependencies and introducing intermittent failures into them. Mock dependencies have been around for decades, nothing new here.

It sounds like this test system you set up is as time consuming as solving the actual problems you're trying to solve, so what time are you saving?

"Driving Fast Requires Tighter Feedback Loop"

Yes if you're code-vomiting with agents and your test infrastructure isn't rock solid things will fall apart fast, that's obvious. But setting up a rock solid test infrastructure for your system involves basically solving most of the hard problems in the first place. So again, what? What value are you gaining here?

"The communication bottleneck"

Amazon was doing this when I worked there 12 years ago. We all sat in the same room.

"The gains are real - our team's 10x throughput increase isn't theoretical, it's measurable."

Show the data and proof. Doubt.

Yeah I don't know. This reads like complete nonsense honestly.

Paraphrasing: "AI will give us huge gains, and we're already seeing it. But our pipelines and testing will need to be way stronger to withstand the massive increase in velocity!"

Velocity to do what? What are you guys even doing?

Amazon is firing 30,000 people by the way.

p1necone•1h ago

"Our testing needs to be better to handle all this increased velocity" reads to me like a euphemistic way of saying "we've 10x'ed the amount of broken garbage we're producing".

lispisok•58m ago

We're back to using LOC as a productivity metric because LLMs are best at cranking out thousands of LOC really fast. Personal experience I had a colleague use Claude Code top create a PR consisting of a dozen files and thousands of line of code for something that could have been done in a couple hundred LOC in a single file.

CharlesW•51m ago

> We're back to using LOC as a productivity metric because LLMs are best at cranking out thousands of LOC really fast.

Can you point me to anyone who knows what they're talking about declaring that LOC is the best productivity metric for AI-assisted software development?

chipsrafferty•29m ago

Are you implying that the author of this article doesn't know what they are talking about? Because they basically declared it in the article we just read.

Can you point me to where the author of this article gives any proof to the claim of 10x increased productivity other than the screenshot of their git commits, which shows more squares in recent weeks? I know git commits could be net deleting code rather than adding code, but that's still using LOC, or number of commits as a proxy to it, as a metric.

CharlesW•15m ago

> I know git commits could be net deleting code rather than adding code…

Yes, I'm also reading that the author believes commit velocity is one reflection of the productivity increases they're seeing, but I assume they're not a moron and has access to many other signals they're not sharing with us. Probably stuff like: https://www.amazon.science/blog/measuring-the-effectiveness-...

blibble•15m ago

if you've ever had a friend that you knew before, then they went to work at amazon, it's like watching someone get indoctrinated into a cult

and this guy didn't survive there for a decade by challenging it

skinnymuch•1h ago

Interesting enough to me though I only skimmed.

I switched back to Rails for my side project a month ago and ai coding when doing not too complex stuff has been great. While the old NextJS code base was in shambles.

Before I was still doing a good chunk of the NextJS coding. I’m probably going to be directly coding less than 10% of the code base from here on out. I’m now spending time trying to automate things as much as possible, make my workflow better, and see what things can be coded without me in the loop. The stuff I’m talking about is basic CRUD and scraping/crawling.

For serious coding, I’d think coding yourself and having ai as your pair programmer is still the way to go.

gachaprize•1h ago

Classic LLM article:

1) Abstract data showing an increase in "productivity" ... CHECK

2) Completely lacking in any information on what was built with that "productivity" ... CHECK

Hilarious to read this on the backend of the most widely publicized AWS failure.

alfalfasprout•1h ago

Yep. The problem is then leadership sees this and says "oh, we too can expect 10x productivity if everyone uses these tools. We'll force people to use them or else."

And guess what happens? Reality doesn't match expectations and everyone ends up miserable.

Good engineering orgs should have engineers deciding what tools are appropriate based on what they're trying to do.

Animats•1h ago

> Instead, we use an approach where a human and AI agent collaborate to produce the code changes. For our team, every commit has an engineer's name attached to it, and that engineer ultimately needs to review and stand behind the code. We use steering rules to setup constraints for how the AI agent should operate within our codebase,

This sounds a lot like Tesla's Fake Self Driving. It self drives right up to the crash, then the user is blamed.

groby_b•38m ago

Except here it's made abundantly clear, up front, who has responsibility. There's no pretense that it's fully self driving. And the engineer has the power to modify every bit of that decision.

Part of being a mature engineer is knowing when to use which tools, and accepting responsibility for your decisions.

It's not that different from collaborating with a junior engineer. This one can just churn out a lot more code, and has occasional flashes of brilliance, and occasional flashes of inanity.

exasperaited•1h ago

Absolutely none of that article has ever even so much as brushed past the colloquial definition of "calculus".

These guys actually seem rattled now.

photochemsyn•38m ago

Well, 'calculus' is the kind of marketing word that sounds more impressive than 'arithmetic' and I think 'quantum logic' has gone a bit stale, and 'AI-based' might give more hope to the anxious investor class, as 'AI-assisted' is a bit weak as it means the core developer team isn't going to be cut from the labor costs on the balance sheet, they're just going to be 'assisted' (things like AI-written unit tests that still need some checking).

"The Arithmetic of AI-Assisted Coding Looks Marginal" would be the more honest article title.

collingreen•13m ago

"Galaxy-brain pair programming with the next superintelligence"

philipp-gayret•1h ago

This is the first time I see "steering rules" mentioned. I do something similar with Claude, curious how it looks for them and how they integrate it with Q/Kiro.

manmal•50m ago

Those rules are often ignored by agents. Codex is known to be quite adhering, but it falls back to its own ideas, which run counter to rules I‘ve given it. The longer a session goes on, the more it goes off the rails.

CharlesW•46m ago

Everything related to LLMs is probabilistic, but those rules are also often followed well by agents.

philipp-gayret•10m ago

I'm aware of the issues around rules as in a default prompt. I had hoped the author of the blog meant a different mechanism when they mentioned "steering rules". I do mean something different, where an agent will self-correct when it is seen going against rules in the initial prompt. I have a different setup myself for Claude Code, and would call parts of that "steering"; adjusting the trajectory of the agent as it goes.

CharlesW•48m ago

I'd assume it's related to this Amazon "Socratic Human Feedback (SoHF): Expert Steering Strategies for LLM Code Generation" paper: https://assets.amazon.science/bf/d7/04e34cc14e11b03e798dfec5...

whiterook6•47m ago

This reads like "Hey, we're not vibe coding, but when we do, we're careful!" with hints of "AI coding changes the costs associated with writing code, designing features, and refactoring" sprinkles in to stand out.

reenorap•30m ago

No.

The way to code going forward with AI is Test Driven Development. The code itself no longer matters. You give the AI a set of requirements, ie. tests that need to pass, and then let it code whatever way it needs to in order to fulfill those requirements. That's it. The new reality us programmers need to face is that code itself has an exact value of $0. That's because AI can generate it, and with every new iteration of the AI, the internal code will get better. What matters now are the prompts.

I always thought TDD was garbage, but now with AI it's the only thing that makes sense. The code itself doesn't matter at all, the only thing that matters is the tests that will prove to the AI that their code is good enough. It can be dogshit code but if it passes all the tests, then it's "good enough". Then, just wait a few months and then rerun the code generation with a new version of the AI and the code will be better. The humans don't need to know what the code actually is. If they find a bug, write a new test and force the AI to rewrite the code to include the new test.

I think TDD has really found its future now that AI coding is here to stay. Human code doesn't matter anymore and in fact I would wager that modifying AI generated code is as bad and a burden. We will need to make sure the test cases are accurate and describe what the AI needs to generate, but that's it.

pcarolan•26m ago

I mostly agree, but why stop at tests? Shouldn’t it be spec driven development? Then neither the code or the language matter. Wouldn’t user stories and requirements à la bdd (see cucumber) be the right abstraction?

reenorap•20m ago

I don't think you're wrong but I feel like there's a big bridge between the spec and the code. I think the tests are the part that will be able to give the AI enough context to "get it right" quicker.

It's sort of like a director telling an AI the high level plot of a movie, vs giving an AI the actual storyboards. The storyboards will better capture the vision of the director vs just a high level plot description, in my opinion.

__MatrixMan__•14m ago

Maybe one day. I find myself doing plenty of course correction at the test level. Safely zooming out doesn't feel imminent.

gmd63•13m ago

Why stop there? Whichever shareholders flood the datacenter with the most electrical signals get the most profits.

blibble•16m ago

you will end up with something that passes all your tests then smashes into the back of the lorry the moment it sees anything unexpected

writing comprehensive tests is harder than writing the code

reenorap•14m ago

Then you write another test. That's the whole point of TDD. As you keep writing more tests, the closer it gets to its final form.

blibble•6m ago

right, and by the time I have 2^googolplex tests then the "AI" will finally be able to produce a correctly operating hello world

oh no! another bug!

HellDunkel•10m ago

No.

The reason AI code generation works so well is a) it is text based- the training data is huge and b) the output is not the final result but a human readable blueprint (source code), ready to be made fit by a human who can form an abstract idea of the whole in his head. The final product is the compiled machine code, we use compilers to do that, not LLMs.

Ai genereted code is not suitable to be directly transferred to the final product awaiting validation by TDD, it would simply be very inefficient to do so.

brazukadev•30m ago

But here's the critical part: the quality of what you are creating is way lower than you think, just like AI-written blog posts.

collingreen•14m ago

Upvoted for dig that is also an accurate and insightful metaphor.

cadamsdotcom•21m ago

"We have real mock versions of all our dependencies!"

Congratulations, you invented end-to-end testing.

"We have yellow flags when the build breaks!"

Congratulations! You invented backpressure.

Every team has different needs and path dependencies, so settles on a different interpretation of CI/CD and software eng process. Productizing anything in this space is going to be an uphill battle to yank away teams' hard-earned processes.

Productizing process is hard but it's been done before! When paired with a LOT of spruiking it can really progress the field. It's how we got the first CI/CD tools (eg. https://en.wikipedia.org/wiki/CruiseControl) and testing libraries (eg. pytest)

So I wish you luck!

moron4hire•19m ago

If you are producing real results at 10x then you should be able to show that you are a year ahead of schedule in 5 weeks.

Waiting to see anyone show even a month ahead of schedule after 6 months.

__MatrixMan__•11m ago

I've never worked anywhere that knew where they were going well enough that it was even possible to be a month ahead of schedule. By the time a month has elapsed the plan is entirely different.

AI can't keep up because its context window is full of yesteryear's wrong ideas about what next month will look like.

ned_roberts•2m ago

Looking at the “metrics” they shared, going from committing just about zero code over the last two years to more than zero in the past two months may be a 10x improvement. I haven’t seen any evidence more experienced developers see anywhere near that speedup.

Easy RISC-V

Claude for Excel

JetKVM – Control any computer remotely

10M people watched a YouTuber shim a lock; the lock company sued him – bad idea

Simplify Your Code: Functional Core, Imperative Shell

Pyrex catalog from from 1938 with hand-drawn lab glassware [pdf]

Go beyond Goroutines: introducing the Reactive paradigm

The new calculus of AI-based coding

Why Busy Beaver hunters fear the Antihydra

MCP-Scanner – Scan MCP Servers for vulnerabilities

Rust cross-platform GPUI components

Tags to make HTML work like you expect

TOON – Token Oriented Object Notation

Avoid 2:00 and 3:00 am cron jobs (2013)

Solving regex crosswords with Z3

Image Dithering: Eleven Algorithms and Source Code (2012)

When 'perfect' code fails

Sieve (YC X25) is hiring engineers to build video datasets for frontier AI

Study finds growing social circles may fuel polarization

It's not always DNS

Show HN: Dlog – Journaling and AI coach that learns what drives well-being (Mac)

Iroh-blobs 0.95 – New features – Iroh

The last European train that travels by sea

Should LLMs just treat text content as an image?

PSF has withdrawn $1.5M proposal to US Government grant program

Show HN: Erdos – open-source, AI data science IDE

fnox, a secret manager that pairs well with mise

Eight Million Copies of Moby-Dick (2014)

Why Nigeria accepted GMOs

Let the little guys in: A context sharing runtime for the personalised web

The new calculus of AI-based coding

Comments

Easy RISC-V

Claude for Excel

JetKVM – Control any computer remotely

10M people watched a YouTuber shim a lock; the lock company sued him – bad idea

Simplify Your Code: Functional Core, Imperative Shell

Pyrex catalog from from 1938 with hand-drawn lab glassware [pdf]

Go beyond Goroutines: introducing the Reactive paradigm

The new calculus of AI-based coding

Why Busy Beaver hunters fear the Antihydra

MCP-Scanner – Scan MCP Servers for vulnerabilities

Rust cross-platform GPUI components

Tags to make HTML work like you expect

TOON – Token Oriented Object Notation

Avoid 2:00 and 3:00 am cron jobs (2013)

Solving regex crosswords with Z3

Image Dithering: Eleven Algorithms and Source Code (2012)

When 'perfect' code fails

Sieve (YC X25) is hiring engineers to build video datasets for frontier AI

Study finds growing social circles may fuel polarization

It's not always DNS

Show HN: Dlog – Journaling and AI coach that learns what drives well-being (Mac)

Iroh-blobs 0.95 – New features – Iroh

The last European train that travels by sea

Should LLMs just treat text content as an image?

PSF has withdrawn $1.5M proposal to US Government grant program

Show HN: Erdos – open-source, AI data science IDE

fnox, a secret manager that pairs well with mise

Eight Million Copies of Moby-Dick (2014)

Why Nigeria accepted GMOs

Let the little guys in: A context sharing runtime for the personalised web