As a simple example, accidentally inverting feature flag logic will not cause tests to fail if the new behavior you're guarding does not actually break existing tests. I and very senior developers I know have occasionally made this mistake and the "thinking" models are very good at catching issues like this, especially when prompted with a list of error categories to look for. Writing an LLM prompt for an issue class is much easier than a compiler plugin or static analysis pass, and in many cases works better because it can infer intent from comments and symbol names. False positives on issues can be annoying but aren't risky, and also can be a useful signal that the code is not written in a clear way.
I think the reason this discussion keeps coming up is that the people who are getting a lot out of these tools are people who are, at best, the software-equivalent of assembly-line workers. If something can be easily understood by passively reading it then it probably isn't complicated or novel and therefore it's not surprising a pseudorandom bullshit generator can do it for you; all it lacks is a unit testing system which can verify that its interpretation of the problem-statement matches the interpretation which would be most obvious to a human and that is evidently not a solved problem thus far.
If the hardest part of your job is understanding code written by other people and even code written by yourself in the distant past, then LLMs are of literal use because the problem they solve was never a significant bottleneck and in fact their "solution" only serves to pump a higher volume of fluid through the neck of the proverbial bottle.
It's the difference between reading somebody's paper in a mathematical journal to understand how they came to the conclusion they are presenting, and merely using the identity they have proven on faith. If all that mattered was to perform some calculation based on their work then its clear which approach will get more work done in less time but if you don't take it for granted that everything in the journal is correct or if you want to be able to further develop ideas based upon their proof then you have to spend a few days or even weeks trying to understand how each step leads to its successor.
It's also why i hate the old adage about not reinventing wheels, it promotes ignorance by asserting that education itself is ignorance.
But, I like it, I’ve reinvented many wheels in my work and it’s benefited me greatly. So I will reinvent this particular wheel as well…
You can enable virtually free test driven development. Write the test names down and let the LLM implement them for you. You save 50% of your time and you get to go to town on implementation and or optimizations.
You can have the LLM take the non-tech-counterparts description of a bug and have it point you at precise lines of code to investigate rather than grepping around a codebase you might not know well.
You can onboard to new languages, frameworks, repositories extremely fast by having a partner (the LLM) explain implementation patterns and approaches on demand! You don't even need to talk to another human being! Get your questions answered in seconds and start coding!
You can rapidly prototype. You can get immediate code reviews. You can rubber duck. You can visualize business/logic flows and code branching to better understand existing implementations. You can even have the LLM write an implementation plan for you then write the code yourself!
If you cant find a way to write more code with LLMs, its either an imagination or skill issue.
That's assuming that it writes good tests, and that you don't care to take the time to verify the tests it wrote, no?
Being able to imagine something doesn't mean I have to like it.
> Write the test names down and let the LLM implement them for you.
This sort of reinforces the idea I (and I believe others) have that people mostly talk past each other on this topic. It seems like there might be some other difference in understanding and/or practice when it comes to using these tools effectively. This seems to be a common issue to notice once one starts noticing it.
That being said I noticed that the more opinionated a language/framework/library is, the worse off one is using LLMs.
I was surprised by this, but then I put a particularly fishy line into GitHub's search box. What I saw were piles upon piles of bad practices and incorrect usages. There's a lot of bad code there and LLMs are learning from it.
Software crafting is so much more than merely writing code. There's a significant amount of reading code that goes into it. Code written by you. Code written by someone else. Someone else's code that you butchered with your edits, your own code butchered by someone else, and everything intertwined in between. Code that can't easily be explained by looking at it - sometimes you have to find relevant PRs, tickets, documentation, related online communication, some loosely-related code sitting someplace else, etc.
LLMs absolutely can help you read code, just as they are very capable of helping someone study a book or an academic paper. Denying that fact simply is ignorance. Of course, LLMs are absolutely capable of leading you in the wrong direction, confusing you, and giving you incorrect facts, even when you're studying text in plain English, just like it's possible to end up at the bottom of a lake when driving a car. Everyone needs to exercise caution and "know what the fuck they're doing" when using a model. But calling LLMs "bullshit generators" and "magic 8 balls" is so stupid. Sure, if you use it to perform bullshit stuff, it will generate nothing but bullshit.
Our_Benefactors•4mo ago
It’s anyone’s prerogative to continue to advocate for the horse and buggy over the automobile, but most people won’t bother to take the discussion seriously.
snickerbockers•4mo ago
These two sentences appear to be at odds with one another.
Our_Benefactors•4mo ago
snickerbockers•4mo ago
JohnFen•4mo ago
snickerbockers•4mo ago
Our_Benefactors•4mo ago
snickerbockers•4mo ago
Even putting the sophistry aside your argument is incomplete because you never defined what "productivity" means in this context or how it can be quantified. I would never dispute that a pseudo-random bullshit generator can shit out javascript faster than any human, but that's not necessarily productive.
lmf4lol•4mo ago
i wait
Our_Benefactors•4mo ago
### An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation We evaluate TESTPILOT using OpenAI’s gpt3.5-turbo LLM on 25 npm packages with a total of 1,684 API functions. The generated tests achieve a median statement coverage of 70.2% and branch coverage of 52.8%. In contrast, the state-of-the feedback-directed JavaScript test generation technique, Nessie, achieves only 51.3% statement coverage and 25.6% branch coverage. - *Link:* [An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation (arXiv)](https://arxiv.org/abs/2302.06527)
---
### Field Experiment – CodeFuse (12-week deployment) - Productivity (measured by the number of lines of code produced) increased by 55% for the group using the LLM. Approximately one third of this increase was directly attributable to code generated by the LLM. - *Link:* [CodeFuse: Generative AI for Code Productivity in the Workplace (BIS Working Paper 1208)](https://www.bis.org/publ/work1208.htm)
capyba•4mo ago
The LLM’s better have written more code, they’re a text generation machine!
In what world does this study prove that the LLM actually accomplished anything useful?
Our_Benefactors•4mo ago
LOC does have a correlation with productivity, as much as devs hate to acknowledge it. I don’t care that you can provide counterexamples to this, or even if the AI on average takes more LOC to accomplish the same task - it still results in more productivity overall because it arrives at the result faster.
capyba•4mo ago
If you want to measure time to complete a complex task, then measure that. LOC is an intermediate measure. How much more productive is "55% more lines of code"?
I can write a bunch of garbage code really fast with a lot of bugs that doesn't work, or I can write a better program that works properly, slower. Under your framework, the former must be classified as 'better' - but why?
I read the study you reference and there is literally nothing in the study that talks about whether or not tasks were accomplished successfully.
It says: * Junior devs benefited more than senior devs, then presents a disingenuous argument as to why that's the senior devs' fault (more experienced employees are worse than less experienced employees, who knew?!) * 11% of the 55% increase in LOC was attributed directly to LLM output * Makes absolutely no attempt to measure whether or not the extra code was beneficial
Our_Benefactors•4mo ago
footy•4mo ago
This is a terrible way to do research!
Our_Benefactors•4mo ago
psunavy03•4mo ago
Our_Benefactors•4mo ago
psunavy03•4mo ago
darvid•4mo ago
Refreeze5224•4mo ago
AI is about destroying working-class jobs so that corporations and the owning class can profit. It's not about writing code or summarizing articles. Those are just things workers can do with it. That's not what it's actually for. Its purpose is to reduce payroll costs for companies by replacing workers.
logicprog•4mo ago
They were not against technology; they were against technology that their destroyed jobs. If we had followed what they wanted, we'd still be in a semi pre industrial artisnal economy, and the worse off for it.
lkey•4mo ago
> In North West England, textile workers lacked these long-standing trade institutions and their letters composed an attempt to achieve recognition as a united body of tradespeople. As such, they were more likely to include petitions for governmental reforms, such as increased minimum wages and the cessation of child labor.
Sounds pretty modern doesn't it? unions, wages, no child-exploitation...
And the government response?
> Mill and factory owners took to shooting protesters and eventually the movement was suppressed by legal and military force, which included execution and penal transportation of accused and convicted Luddites.
AllegedAlec•4mo ago
"Guys this debate is so stupid. Every serious inquiry shows productivity gains when we take away all senses, jack workers into the matrix and feed them a steady diet of speed intravenously. This put debate to rest. Now we are post-debate"
Something can increase productivity and still not be good.
xg15•4mo ago
logicprog•4mo ago
https://www.fightforthehuman.com/are-developers-slowed-down-...
steve_adams_86•4mo ago