The State of AI Coding Report 2025

https://www.greptile.com/state-of-ai-coding-2025

132•dakshgupta•1mo ago

Comments

dakshgupta•1mo ago

Hi, I'm Daksh, a co-founder of Greptile. We're an AI code review agent used by 2,000 companies from startups like PostHog, Brex, and Partiful, to F500s and F10s.

About a billion lines of code go through Greptile every month, and we're able to do a lot of interesting analysis on that data.

We decided to compile some of the most interesting findings into a report. This is the first time we've done this, so any feedback would be great, especially around what analytics we should include next time.

wrs•1mo ago

It’s hard to reach any conclusion from the quantitative code metrics in the first section, because as we all know, more code is not necessarily better. “Quantity” is not actually the same as “velocity”. And that gets to the most important question people have about AI assistance: does help you maintain a codebase long term, or does it help you fly headlong into a ditch?

So, do you have any quality metrics to go with these?

dakshgupta•1mo ago

We weren’t able to find a good quality measure. LLM-as-judge dint feel right. You’re correct that without that the data is interesting but not particular insightful.

ChrisbyMe•1mo ago

Hey! Thanks for publishing this.

Would be interested in seeing the breakdown between uplift vs company size.

e.g. I work in a FAANG and have seen an uptick in the number of lines on PRs, partially due to AI coding tools and partially due to incentives for performance reviews.

dakshgupta•1mo ago

This is a good one, wish we had included it. I'd run some analysis on this a while ago and it was pretty interesting.

An interesting subtrend is that Devin and other full async agents write the highest proportion of code at the largest companies. Ticket-to-PR hasn't worked nearly as well for startups as it has for the F500.

neom•1mo ago

If AI tools are making teams 76% faster with 100% more bugs, one would presume you're not more productive you're just punting more debt. I'm no expert on this stuff, but coupling it with some type of defect density insights might be helpful. Would be also interested to know what percentage of AI assisted code is "rolled back" or "reverted" within 48 hours. Has there been any change in number of review iterations over time?

apercu•1mo ago

Right? I want to see the problem ticket variance year over year with something to qualify the data if release velocity is more frequent.

8note•1mo ago

i wouldnt find that convincing.

plenty of tickets are never written because they dont seem worth tracking. an llm speeding up development can have the opposite effect - increasing the amount of tickets because more fixes look possible than before

apercu•1mo ago

Fair. Everything has nuance.

refactor_master•1mo ago

I’m interested in earnings correlating with feature releases. Maybe you’re pushing 100% more bugs, but if you can sell twice as many buggy features as your neighbor at the same time, it could be that you could land more contracts.

It’s definitely a raise to the bottom scenario, but that was already the scenario we lived in before LLMs.

jacekm•1mo ago

> About a billion lines of code go through Greptile every month, and we're able to do a lot of interesting analysis on that data.

Which stats in the report come from such analysis? I see that most metrics are based on either data from your internal teams or publicly available stats from npm and PyPi.

Regardless of the source, it's still an interesting report, thank you for this!

dakshgupta•1mo ago

Thanks! The first 4 charts as well as Chart 2.3 are all from our data!

chis•1mo ago

Wish you'd show data from past years too! It's hard to know if these are seasonal trends or random variance without that.

Super interesting report though.

Morromist•1mo ago

Thanks for publishing this. People will complain about your metrics, but I would say its just useful to have metrics of any kind at this point. People talk a lot about AI coding today without having any data, just thousands of anecdotes. This is like a glass of pure water in a desert.

I'm a bit of an AI coding skeptic btw, but I'm open to being convinced as the technology matures.

I actually think LOC is a useful metric. It may or may not be a positive thing to have more LOC, but its data, and that's great.

I would be interested in seeing how AI has changed coding trends. Are some languages not being used as much because they work poorly with AI? How much is the average script length changing over time? Stuff like that. Also how often is code being deleted and rewritten - that might not be easy to figure out, but it would be interesting.

alienbaby•1mo ago

I actually ended up enjoying reading the cards after the charts more than I did reading the charts, but the charts were really interesting too.

conartist6•1mo ago

It's a shame that the AI branch is the software engineering industry is so determined to make us look like compete fools.

WHY ARE THEY STILL TALKING ABOUT ADDING LINES OF CODE IF THEY KNOW HOW SOFTWARE COMPLEXITY SCALES.

I could not put it more simply: you don't get the benefit of the doubt anymore. Too many asinine things have been done like this line-of-code-counting BS for me to not see I it as attempted fraud.

Something we know for sure is that the most productive engineers are usually neutral or negative on lines of code. Bad ones who are costing your business money by cranking out debt: those amp up you number of lines

conartist6•1mo ago

I cannot believe how often I have to call out ostensibly smart AI people for saying shit that is obviously not literally true.

It's like they all forgot how to think, or that other people can spot right where and then they stopped thinking critically and started to go with the hype. Many lines of code good! Few lines of code bad!

dremnik•1mo ago

very cool report. been looking for some data on this (memory + AI SDKs) for a while :)

psunavy03•1mo ago

Sigh . . . once again I see "velocity" as something to be increased.

This makes me metaphorically stabby.

dakshgupta•1mo ago

We were trying not to insinuate that, because we don’t have a good way to measure quality, without which velocity is useless.

locusofself•1mo ago

This is definitely interesting information and I plan to take a deeper look at it.

What a lot of us must be wondering though is:

- how maintainable is the code being outputted

- how much is this newfound productivity saving (costing) on compute, given that we are definitely seeing more code

- how many livesite/security incidents will be caused by AI generated code that hasn't been reviewed properly

dakshgupta•1mo ago

We weren’t able to agree on a good way to measure this. Curious - what’s your opinion on code churn as a metric? If code simply persists over some number of months, is that indication it’s good quality code?

wordpad•1mo ago

I've seen code entropy as the suggested hueriatic to measure.

arcwhite•1mo ago

I've seen code persist a long time because it is unmaintainable gloop that takes forever to understand and nobody is brave enough to rebuild it.

So no, I don't think persistence-through-time is a good metric. Probably better to look at cyclomatic complexity, and maybe for a given code path or module or class hierarchy, how many calls it makes within itself vs to things outside the hierarchy - some measure of how many files you need to jump between to understand it

refactor_master•1mo ago

I second the persistence. Some of the most persistent code we own is because it’s untested and poorly written, but managed to become critical infrastructure early on. Most new tests are best-effort black box tests and guesswork, since the creators have left a long time ago.

Of course, feeding the code to an LLM makes it really go to town. And break every test in the process. Then you start babying it to do smaller and smaller changes, but at that point it’s faster to just do it manually.

nerevarthelame•1mo ago

You run a company that does AI code review, and you've never devised any metrics to assess the quality of code?

dakshgupta•1mo ago

We have ways to approximate our impact on code quality, because we track:

- Change in number of revisions made between open and merge before vs. after greptile

- Percentage of greptile's PR comments that cause the developer to change the flagged lines

Assuming the author is will only change their PR for the better, this tells us if we're impacting quality.

We haven't yet found a way to measure absolute quality, beyond that.

nluken•1mo ago

Might be harder to track but what about CFR or some other metric to measure how many bugs are getting through review before versus after the introduction of your product?

You might respond that ultimately, developers need to stay in charge of the review process, but tracking that kind of thing reflects how the product is actually getting used. If you can prove it helps to ship features faster as opposed to just allowing more LOC to get past review (these are not the same thing!) then your product has a much stronger demonstrable value.

nekooooo•1mo ago

i'm a designer and even i know not to measure 'lines of code' as meaningful output or impact. are we really doing this?

dakshgupta•1mo ago

We expressly did not conclude that more lines = better. You could easily argue more lines = worse. All we wanted to show is that there are more lines.

poliphili•1mo ago

Language like "productivity gains", "output" and "force multiplier" isn't neutral like you're claiming here, and does imply that the line count metric indicates value being delivered for the business.

rsynnott•1mo ago

> Lines of code per developer grew from 4,450 to 7,839 as AI coding tools act as a force multiplier.

I mean, come on, now. "Force multiplier" is hardly ambiguous.

We have known that this is a useless way to measure productivity since before most people on this site were born.

simonw•1mo ago

> Lines of code per developer grew from 4,450 to 7,839 as AI coding tools act as a force multiplier.

Is that a per-year number?

If a year has 200 working days that's still only about 40 lines of code a day.

When I'm in full-blown work mode with a decent coding agent (usually Claude Code) I'm genuinely producing 1,000+ lines of (good, tested, reviewed) code a day.

Maybe there is something to those absurd 10x multiplier claims after all!

(I still think there's plenty of work done by software engineers that isn't crunching out code, much of which isn't accelerated by AI assistance nearly as much. 40 lines of code per day felt about right for me a few years ago.)

rnewme•1mo ago

1k loc per day or 1k git additions? I don't think one person can consistently review 1k loc, and grow codebase at that speed and size and classify it as good, tested and reviewed.. Can you tell us more about your process?

simonw•1mo ago

I'm effectively no longer typing code by hand: I decide what change I want to make and then prompt Claude Code to describe that change. Sometimes I'll have it figure out the fix too.

An example from earlier today: https://github.com/simonw/llm-gemini/commit/fa6d147f5cff9ea9...

That commit added 33 lines and removed 13 - so I'm already at a 20-lines-a-day level just from that one commit (and I shipped a few more plus a release of llm-gemini: https://github.com/simonw/llm-gemini/commits/a2bdec13e03ca8a...)

It took about 3.5 minutes. I started from this issue someone had filed against my repo:

Then I opened Claude Code and said:

  Run this command: uv run llm -m gemma-3-27b-it hi

That ran the command and returned the error message. I then said:

  Yes, fix that - the gemma models do not support media resolution

Which was enough for it to figure out the fix and run the tests to confirm it hadn't broken anything.

I ran "git diff", thought about the change it had made for a moment, then committed and pushed it.

Here's the full Claude Code transcript: https://gistpreview.github.io/?62d090551ff26676dfbe54d8eebbc...

I verified the fix myself by running:

  uv run llm -m gemma-3-27b-it hi

I pasted the result into an issue comment to prove to myself (and anyone else who cares) that I had manually verified the fix: https://github.com/simonw/llm-gemini/issues/116#issuecomment...

Here's a more detailed version of the transcript including timestamps, showing my first prompt at 10:01:13am and the final response at 10:04:55am. https://tools.simonwillison.net/claude-code-timeline?url=htt...

I built that claude-code-timeline application this morning too, and that thing is 2284 lines of code: https://github.com/simonw/tools/commits/main/claude-code-tim... - but that was much more of a vibe-coded thing, I hardly reviewed the code that was written at all and shipped it as soon as it appeared to work correctly. Since it's a standalone HTML file there's not too much that can go wrong if it has bugs in it.

WhyOhWhyQ•1mo ago

Whenever I start reviewing code produced by Claude I find hundreds of ways to improve it.

I don't know if code quality really matters to most people or to the bottom line, but a good software engineer writes better code than Claude. It is a testament to library maintainers that Claude is able to code at all, in my opinion. One reason is that Claude uses API's in whacky ways. For instance by reading the SDL2 documentation I was able to find many ways that Claude writes SDL2 using archaic patterns from the SDL days.

I think there are a lot of hidden ways AI booster types benefit from basic software engineering practices that they actively promote damaging ideas about. Maybe it will only be 10 years from now that we learn that having good engineers is actually important.

simonw•1mo ago

> Whenever I start reviewing code produced by Claude I find hundreds of ways to improve it.

Same here. So I tell it what improvements I want to make and watch it make them.

I've gained enough experience at prompting it that it genuinely is faster for me to tell it the change I want to make than it is for me to make that change myself, 90% of the time.

HDThoreaun•1mo ago

Ok then you just make review comments and it fixes them. Still faster than writing code yourself

WhyOhWhyQ•1mo ago

You missed the point. The original post is about not reading code.

You actually missed the point in two ways, because my response had little or nothing to do with speed of producing code. I'm not sure why you felt the need to express that irrelevant objection.

rnewme•1mo ago

My approach used to be similar few months ago before I figured out a way to automate and paralleize part of this process. However the part I'm still curious about is how to get into consistent and sustainable 4 digit LOC daily (assuming up to 8h of work). I can have 10k PR today but debug it and refine it whole week and it'll still be weak code long term..

lumost•1mo ago

There is a long tail of engineers working on mature/stable code bases where there are fewer extremely large diffs, or the review burden is extremely high. If you work on core software - then you can never say that a line of code was wrong "because of the AI." e.g. places where you might need 2-3x code approvers or more.

cmdtab•1mo ago

I saw your example and it was a simple cli tool. Of course you can have claude make commits effectively to it!

simonw•1mo ago

Totally. I have dozens of "simple CLI tools" that I work on - and small plugins, and HTML+JavaScript utilities.

If I was hacking on the Linux kernel I would be delighted with myself for producing 40 lines of landed code in a single day.

eikenberry•1mo ago

They are obviously talking about writing code against expectations greater than these simple tools. Why troll with the hyperbole?

simonw•1mo ago

I don't actually think the CLI tools and JavaScript apps I work on are particularly "simple". I think they're the level of complexity that most developers spend effectively all of their time building.

Kernel / database / systems engineers are a pretty rare breed.

leothetechguy•1mo ago

I couldn't in good conscience work like that, I believe the risk of bad AI generated code due to the tiniest of output variation is far too high. Especially in systems that need to maintain a large state governed by many rules and edge cases.

simonw•1mo ago

I recommend having the AI do the typing while still reviewing, comprehensively testing and even dictating the exact shape of every line that you commit.

dakshgupta•1mo ago

This is per month, I see now that's not super clear on the chart!

CrzyLngPwd•1mo ago

1,000 lines of debt that you didn't review and probably have no idea what they do.

AlexandrB•1mo ago

Yeah, I don't get it. It's well know that "LOC" is not a good metric of developer productivity. But now that AI is writing those lines of code, it's fine as a metric?

noosphr•1mo ago

Senior developers know that every line of code is debt. Junior developers think that every line of code is wealth.

simonw•1mo ago

I only commit code to production repos if I have fully reviewed it, manually tested it and could explain exactly how it works to someone else.

noosphr•1mo ago

I'm a good aerospace engineer, my rockets weigh an extra 50kg after every day I work on them.

WhyOhWhyQ•1mo ago

You're writing Python and Javascript right? Those languages are extremely easy to write in (which conversely means the legibility is likely to be poor). People maintaining legacy systems in systems level languages aren't going to be able to produce as much code as people writing Python and Javascript.

simonw•1mo ago

Yes, mostly Python and JavaScript and SQL. I'm dabbling a little more with Go these days too.

WhyOhWhyQ•1mo ago

I do want to pick up Go. I'm also interested in learning Zig, but I'm worried about employability (no knock on Zig, it looks pretty cool).

Anyway, sorry if any of my comments seemed harsh. I always appreciate your posts and comments.

observationist•1mo ago

If you actually work, the amount of work you do is absurdly more than the amount of work most others do, and a lot of the time, both the high and low productivity people assume everyone just does as much as they do, in both directions.

A lot of people are oblivious to Zipf distributions in effort and output, and if you ever catch on to it as a productive person, it really reframes ideas about fairness and policy and good or bad management.

It also means that you can recognize a good team, and when a bunch of high performers are pushing and supporting eachother and being held to account out in the open, amazing things happen that just make other workplaces look ridiculous.

My hope for AI is that instead of 20% of the humans doing 80% of the work, you end up with force multipliers, and a ramping up, so that more workplaces look like high function teams, making everything more fair and engaging and productive, but i suspect once people get better with AI, at least up to the point of AGI, is we're going to see the same distribution but 10x or 50x the productivity.

Garlef•1mo ago

My experience with coding agents points into the other direction: It's mentally very taxing!

Usually, you have a lot of time to think on the side while coding on what to do next, strategize, etc. But if you work in small increments with an LLM agent, this time is reduced and you have to be ready for the next thing once one increment is done.

So I don't see this as an equalizer. Rather, those who can constantly push forward are getting much more than those who don't.

waterproof•1mo ago

Looks like it's a monthly number.

simonw•1mo ago

It's a per MONTH number! https://news.ycombinator.com/item?id=46301886#46305425

7,839 / 30 = 261 lines of code per day.

(Given that mistake, I'm slightly amused at the number of replies my post here drew defending that incorrect 40-per-day number. AI-haters-gonna-hate.)

zkmon•1mo ago

I take this "code-output" metrics with a pinch of salt. Ofcourse, a machine can generate 1000 times more lines of code similar to a power loom does. However, the comparison with power loom ends there.

How maintainable is this code output? I saw a SPA html file produced by a model, which appeared almost similar to assembly code. So if the code can only be maintained by model, then an appropriate metric should should be based on a long-term maintainability achieved, but not on instant generation of code.

hvb2•1mo ago

Agreed, I stopped reading at that point. You can't take yourself seriously to create a report and use LOC as your measure.

I feel like we humans try to separate things and keep things short. We do this not because we think it's pretty, we do it so our human brains can still reason about a big system. As a result LOC is a bad measure as being concise then hurts your productivity????

dakshgupta•1mo ago

We're careful not to draw any conclusions from LoC. The fact is LoCs are higher, which by itself is interesting. This could be a good or bad thing depending on code quality, which itself varied wildly person-to-person and agent-to-agent.

mrdependable•1mo ago

Can you expand on why it is interesting?

zed31726•1mo ago

Because it's different. Change is important to track

hvb2•1mo ago

When the heading above it says "Developer output increased by x" I think you're very much drawing conclusions

a_imho•1mo ago

My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.

As a dev I very much subscribe to this line of thought, but I also have to admit most of the business class people would disagree.

order-matters•1mo ago

From a business perspective, the developer is the expert in lines of code and the assumption is that expertise should agree on the necessity of a line of code. To create lines of code that do not need to be there is akin to simply not doing your job in this perspective. The finished product should have X lines of code

so from a business standpoint, if equivalent expertise amongst staff is assumed then productivity comes down to lines of code created. Just like how you might measure productivity of a warehouse employee by the number of items moved per hour. Of course if someone just throws things across the warehouse or moves things that dont need to be moved they will maximize this metric, but that would be doing the job wrong - which is not a productivity measurement problem. though admittedly the incentive structures and competition make these things often related

the bigger issue to highlight, imo, is that the business side of things have no idea if coders are doing the job sufficiently well or not, and the lack of understanding is amplified by the reality that productivity contribution varies wildly per line, some requiring much more work to conjure than others. The person they need to rely on validate this difference per instance is the same person who is responsible for creating the lines. So there is a catch-22 on the business side. An unproductive employee can claim productivity no matter what the measurement is.

if the variance of work required per line could be understood by the business side then it could be managed for. I used to manage productivity metrics for a medical coding company, and some charts are more dense and harder to code than others. I did not know how to code a medical chart but I could still manage productivity by charts per hour while still understanding this caveat

the point isnt to use the productivity metric as a one stop shop for promoting and firing people but as a filter for attention, where all the middle of the pack stuff will more of less even out and not require too much direct attention. you then just need to get an understanding of how the average difficulty per item varies by product/project.

that said, maybe lines edited is still a step better - so that refactoring in a way that reduces the size of the codebase can still be seen as productive. 1 point for each line deleted and 1 point for each line added.

I understand that every line should be viewed as a liability, not an asset, but thats the job responsibility of the hired expert to figure out how many need to exist. its not the job of the business side of things to manage.

I wouldnt tell my foundation guys how much concrete to use, or my electrician how much wire to use, but if one team can handle more concrete per hour than another and they are both qualified professionals, it really doesnt seem unreasonable to start off conversations with an assumption that one is more productive than the other. Lazy people do exist everywhere, its usually a matter of magnitude of laziness between people more than it is a matter of actual full earnest capability

Talanes•1mo ago

"Just like how you might measure productivity of a warehouse employee by the number of items moved per hour. Of course if someone just throws things across the warehouse or moves things that dont need to be moved they will maximize this metric, but that would be doing the job wrong - which is not a productivity measurement problem."

I fail to see how having a measurement that clearly doesn't measure what is actually produced isn't exactly a productivity measurement problem. If your measurement is defeated by someone doing their job badly, what use is it?

kordlessagain•1mo ago

The argument that some people can code with less lines of code but if no lines are written that’s an issue.

order-matters•1mo ago

nearly all productivity measurements can be defeated by people doing their job badly and trying to game the measurement.

as a business analyst, there are a lot of things to consider to assess productivity and performance at a distance, and no single measurement is ever relied on too heavily - except if the analyst is doing the job poorly of course

dakshgupta•1mo ago

How would you measure code quality? Would persistence be a good measure?

epicureanideal•1mo ago

Bad code can persist because nobody wants to touch it.

Unfortunately I’m not sure there are good metrics.

scuff3d•1mo ago

That question has been baffling product managers, scrum masters, and C-suite assholes for decades. Along with how you measure engineering productivity.

d-lisp•1mo ago

I don't know if code is literacy but I think measuring code quality is somehow like measuring the quality of a novel.

nhumrich•1mo ago

The way DORA does. Error rate and mean time to recovery.

rsynnott•1mo ago

"It's difficult to come up with a good metric" doesn't imply "we should use a known-bad metric".

I'm kind of baffled that "lines of code" seems to have come back; by the 1980s people were beginning to figure out that it didn't make any sense.

keeda•1mo ago

The folks at Stanford in this video have a somewhat similar dataset, and they account for "code churn" i.e. reworking AI output: https://www.youtube.com/watch?v=tbDDYKRFjhk -- I think they do so by tracking if the same lines of code are changed in subsequent commits. Maybe something to consider.

scuff3d•1mo ago

It shouldn't be taken with a pinch of salt, it should be disregarded entirely. It's an utterly useless metric, and given that the report leads with it makes the entire thing suspect.

heliumtera•1mo ago

Every single metric inferred from code itself should be discarded. Opinions derived from this metrics should be entirely discounted as well,they are doomed, no point in listening.

scuff3d•1mo ago

It's even worse once I realized what the company was. Of course they lead with the most BS metrics that could most easily be sold to some idiot exec to justify buying their product.

apercu•1mo ago

When I was first learning Perl after being a shell scripter/sysadmin I produced a lot of code. 2-3 years later the same tasks would be way less code. So is more code good?

Also, my anecdotal experience is that LLM code is flat wrong sometimes. Like a significant percentage. I can't quote a number really, because I rarely do the same thing/similar thing twice. But it's a double digit percentage.

TuringNYC•1mo ago

Kudos to the designer, this site is beautiful.

a1ff00•1mo ago

Was going to comment the same. Love the dot matrix paper look.

dionian•1mo ago

agreed. was it AI ?! not that i care - ive been doing a lot of tailwind apps in ai with great success. AI is great for the web, takes all the tedium out of it

superchris•1mo ago

This thing that can't be measured is up 76%. Eyeroll

vb-8448•1mo ago

In the engineering team velocity section, the most important metric is missing: change rate of new code or how many times it is change before being fully consolidated.

dakshgupta•1mo ago

This is a great suggestion. I'll note it down for next years. Curious, do you think this would be a good proxy for code quality?

all2•1mo ago

I would consider feature complete with robust testing to be a great proxy for code quality. Specifically, that if a chunk of code is feature complete and well tested and now changing slowly, it means -- as far as I can tell -- that the abstractions contained are at least ok at modeling the problem domain.

I would expect code that continually changes and deprecates and creates new features is still looking for a good problem domain fit.

dakshgupta•1mo ago

Most of our customers are enterprises, so I feel relatively comfortable assuming they have some decent testing and QA in place. Perhaps I am too optimistic?

all2•1mo ago

That sounds like an opportunity for some inspection; coverage, linting (type checking??), and a by-hand spot check to assess the quality of testing. You might also inspect the QA process (ride-along with folks from QA).

vb-8448•1mo ago

It's tricky, but one can assume that code written once and not touched in a while is good code (didn't cause any issues, performance is good enough, ecc).

I guess you can already derive this value if you sum the total line changed by all PRs and divide it by (SLOC end - SLOC start). Ideally it must be a value slightly greater than 1.

sillyfluke•1mo ago

It depends on how well you vetted your sanples.

fyi: You headline with "cross-industry", lead with fancy engineering productivity graphics, then caption it with small print saying its from your internal team data. Unless I'm completely missing something, it comes of as a little misleading and disingenuous. Maybe intro with what your company does and your data collection approach.

dakshgupta•1mo ago

Apologies, that is poor wording on our part. It's internal data from engineers that use Greptile, which are tens of thousands of people from a variety of industries. As opposed to external, public data, which is where some of the charts are from.

magicloop•1mo ago

Your graphs roughly marry up with my anecdotal experience. After a while, when you know when and how to utilize LLMs/agents, coding does become more productive. There is a discernible improvement in productivity at the same quality level.

Also I notice it when the LLMs are offline. It feels a bit like when the internet connect fails. You remember the old days of lower productivity.

Of course, there is a lot of junk/silly ways to approach these tools but all tools are just a lever, and need judgement/skill to use them well.

wessorh•1mo ago

clearly selling the report to business people whom don't code. Like most things in the AI arena today, the report is BS about a system the mostly create technical debt and is sold as intelligence.

dandaka•1mo ago

Why are we still measuring velocity in lines of code in 2025?

shruubi•1mo ago

So not only are we measuring lines of code as a productivity metric as though that has any actual relation to productivity, but across the board they are boasting that lines of code is going up and PR density is getting bigger as well.

Those numbers should be seen as a giant red flag, not as any kind of positive.

heliumtera•1mo ago

Create an automated tools that inserts comments and line breaks wherever it's possible. Productivity multiplied by 10^23. With humans being this stupid, I'm not that impressed they confused llms to human cognition. Maybe it truly is a replacement.

citizenpaul•1mo ago

Oh wow, this is the revolving door of dumb.

KLOC's KLOC's KLOC's

Even Steve Balmer was smart enough to realize LOC was a dumb metric.

To add some substance. Many regard a great deal of IBM's decline to managements near obsession with developer LOC metrics, driving out skilled employees.

jp0d•1mo ago

The site/visualisations look great. But having used AI tools in my programming, I still haven't been able to justify the cost (to the planet too) vs benefit. I've noticed that it's great for pattern recognition and if I've missed something small or missed a variable name here and there, then it's quite good at finding those problems. However, if I ask it to produce a complete piece of work, I've never been able to get something without any bugs. Forget about getting it to design data pipelines with customer privacy and data security in mind!

For reference I work in finance/econometrics and the code is often about numerical analysis written in SQL and python. More often than not I end up wasting a lot of time fixing issues with AI generated code. None of these nuances ever gets captured by metrics like these and it makes me question people (mostly sales and top execs) that push for "AI" at work.

gerdesj•1mo ago

Why not have a crack with a local LLM or two? You work in an industry with a lot of money involved.

Recently Apple have released beasties with up to 512GB of RAM. Apples have unified RAM (both for general use and GPU) so that 512GB looks a bit handy, and they have quite a lot of CPU cores too. They are of the order of £10,000. You should be able to run some pretty large models on that.

I've just blown a fair bit of money on network infra (yum: more switches that boot Linux for the control plane and shuffle packets at incredible speeds) at work so will need to wait a bit or perhaps persuade wifie that we really do need a really expensive Apple box at home.

The snag I have is getting over my mild distaste for Apple! I'm sure I'll manage it.

jp0d•1mo ago

Environmental restrictions! All our data is on cloud and for customer privacy we're not allowed to download anything locally. We've access to most of the LLM models from all big vendors. I've found them to be very similar.

tbrownaw•1mo ago

>measuring productivity by lines of code

I wrote zero lines of code today. I read some code and some emails, and wrote a few lines of markdown and some short emails.

All of the code I've written in the past couple weeks was meant to be thrown away. I used it to make some notes, which ended up condensed into those few lines of markdown.

catoc•1mo ago

If you ask two humans to explain a problem to you, and human 1 takes an hour to explain what human 2 explained in 5 minutes… everyone would consider human 1 LESS ‘productive’ than human 2.

But what if human 2 was wrong?

What if both were wrong and human 3 simply said ‘I don’t know’.

LoC is a measure ripe for ignorance driven managerial abuse.

We’ve all seen senior devs explain concepts to junior devs, increasing their understanding and productivity while they themselves ‘produced’ zero lines of code.

Yes zero LoC maybe point to laziness; or to proper preparation.

All this is so obvious. LoC are easy to count but otherwise have hardly any value

Rperry2174•1mo ago

LOC is a bad quality metric, but its a reasonable proxy in practice..

Teams generally don't keep merging code that "doesn't work" for long... prod will brake, users will push back fast. So unless the "wrongness" of the AI-generated code is buried so deeply that it only shows up way later, higher merged LOC probably does mean more real output.

Its just not directly correlated there is some bloat associated too.

So that caveat applies to human-written code too, which we tend to forget. There's bloat and noise in the metric, but its not meaningless

catoc•1mo ago

Agreed, there is some correlation between productivity and LoC. That said the correlation it’s weak; and does not say anything about quality (if anything quality might be inversely correlated; which too would be a very weak signal)

frizlab•1mo ago

For instance if I push 10kloc that are in a lib I would have used if I were not using AI, yes, I have pushed much more code, but I was not more productive.

bicepjai•1mo ago

I take it as greptile folks know LOC metric is in no way a metric that can be correlated to productivity in LLM era. But putting aside that just knowing how much code is going thru their system seems interesting enough to read the report. Thanks for the dot matrix report.

lemonish97•1mo ago

Not sure if it's a TPU constraint, but according to this report it seems like the Gemini models have really poor TTFT and tps inference times.

topisan•1mo ago

loved this, thank you

johnnyasantos•1mo ago

tbh this seems to built upon the narrative that AI Coding is actually better. Gitclear report from last year still provides better insights on code quality. Downloading sdks, ttft, price comp means absolutely nothing to code (I supposed that's what the report is about).

We Mourn Our Craft

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Al Lowe on model trains, funny deaths and working with Disney

The AI boom is causing shortages everywhere else

The Waymo World Model

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

Vocal Guide – belt sing without killing yourself

France's homegrown open source online office suite

Coding agents have replaced every framework I used

Selection Rather Than Prediction

A Fresh Look at IBM 3270 Information Display System

72M Points of Interest

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

I Write Games in C (yes, C)

Unseen Footage of Atari Battlezone Arcade Cabinet Production

SectorC: A C Compiler in 512 bytes

Where did all the starships go?

Software factories and the agentic moment

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Learning from context is harder than we thought

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Making geo joins faster with H3 indexes

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Hackers (1995) Animated Experience

Ga68, a GNU Algol 68 Compiler

We Mourn Our Craft

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

Stories from 25 Years of Software Development

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Al Lowe on model trains, funny deaths and working with Disney

The AI boom is causing shortages everywhere else

The Waymo World Model

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

Vocal Guide – belt sing without killing yourself

France's homegrown open source online office suite

Coding agents have replaced every framework I used

Selection Rather Than Prediction

A Fresh Look at IBM 3270 Information Display System

72M Points of Interest

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

I Write Games in C (yes, C)

Unseen Footage of Atari Battlezone Arcade Cabinet Production

SectorC: A C Compiler in 512 bytes

Where did all the starships go?

Software factories and the agentic moment

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Learning from context is harder than we thought

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Making geo joins faster with H3 indexes

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Hackers (1995) Animated Experience

Ga68, a GNU Algol 68 Compiler

The State of AI Coding Report 2025

Comments