Would open source, local models keep pressure on AI companies to prioritize the usable code, as code quality and engineering time saved are critical to build vs buy discussions?
Both can generate code though, I've generated code using the web interface and it works, it's just a bit tedious to copy back and forth.
You remember those days right? All those Flash sites.
I've found with LLMs I can usually convince them to get me at least something that mostly works, but each step compounds with excessive amounts of extra code, extraneous comments ("This loop goes through each..."), and redundant functions.
In the short term it feels good to achieve something 'quickly', but there's a lot of debt associated with running a random number generator on your codebase.
Good programs are written by people who anticipate what might go wrong. If the document says 'don't do X'; they know a tester is likely to try X because a user will eventually do it.
On the other hand, it shows how much coding is just repetition. You don't need to be a good coder to perform serviceable work, but you won't create anything new and amazing either, if you don't learn to think and reason - but that might for some purposes be fine. (Worrying for the ability of the general population however)
You could ask whether these students would have gotten anything done without generated code? Probably, it's just a momentarily easier alternative to actual understanding. They did however realise the problem and decided by themselves to write their own code in a simpler, more repetitive and "stupid" style, but one that they could reason about. So hopefully a good lesson and all well in the end!
Anthropomorphizing LLMs is not helpful. It doesn't get anything, you just gave it new tokens, ones which are more closely correlated with the correct answer. It also generates responses similar to what a human would say in the same situation.
Note i first wrote "it also mimicks what a human would say", then I realized I am anthropomorphizing a statistical algorithm and had to correct myself. It's hard sometimes but language shapes how we think (which is ironically why LLMs are a thing at all) and using terms which better describe how it really works is important.
https://www.microsoft.com/en-us/worklab/why-using-a-polite-t...
This is an obvious mistake, the price is per Megatoken, not per token.
Though I'm not a "vibe coder" myself I very much recognize this as part of the "appeal" of GenAI tools more generally. Trying to get Image Generators to do what I want has a very "gambling-like" quality to it.
if it doesn't work the first time you pull the lever, it might the second time, and it might not. Either way, the house wins.
It should be regulated as gambling, because it is. There's no metaphor, the only difference from a slot machine is that AI will never output cash directly, only the possibility of an output that could make money. So if you're lucky with your first gamble, it'll give you a second one to try.
Gambling all the way down.
That's wild. Anything with non-deterministic output will have this.
Anything with non-deterministic output that charges money ...
Edit Added words to clarify what I meant.
Brain scans have revealed that waiting for a potential win stimulates the same areas as the win itself. That's the "appeal" of gambling. Your brain literally feels like it's winning while waiting because it _might_ win.
Though you could say the same thing about pretty much any VC funded sector in the "Growth" phase. And I probably will.
It also kind of breaks the whole argument that they're designed to be addictive in order to make you spend more on tokens.
Every prompt and answer is contributing value toward your progress toward the final solution, even if that value is just narrowing the latent space of potential outputs by keeping track of failed paths in the context window, so that it can avoid that path in a future answer after you provide followup feedback.
The vast majority of slot machine pulls produce no value to the player. Every single prompt into an LLM tool produces some form of value. I have never once had an entirely wasted prompt unless you count the AI service literally crashing and returning a "Service Unavailable" type error.
One of the stupidest takes about AI is that a partial hallucination or a single bug destroys the value of the tool. If a response is 90% of the way there and I have to fix the 10% of it that doesn't meet my expectations, then I still got 90% value from that answer.
This has not been my experience, maybe sometimes, but certainly not always.
As an example: asking chatgpt/gemini about how to accomplish some sql data transformation set me back in finding the right answer because the answer it did give me was so plausible but also super duper not correct in the end. Would've been better off not using it in that case.
Brings to mind "You can't build a ladder to the moon"
That assumes that the value of a solution is linear with the amount completed. If the Pareto Principle holds (80% of effects come from 20% of causes), then not getting that critical 10+% likely has an outsized effect on the value of the solution. If I have to do the 20% of the work that's hard and important after taking what the LLM did for the remainder, I haven't gained as much because I still have to build the state machine in my head to understand the problem-space well enough to do that coding.
- I buy stock that doesn't perform how I expected. - I hire someone to produce art. - I pay a lawyer to represent me in court. - I pay a registration fee to play a sport expecting to win. - I buy a gift for someone expecting friendship.
Are all gambas.
All those laid off coders gambled on a career that didn’t pan out.
Want more certainty in life, gonna have to get political.
And even then there is no guarantee the future give a crap. Society may well collapse in 30 years, or 100…
This is all just role play to satisfy the prior generations story driven illusions.
Is it really vibe coding if you are building a detailed coding plan, conducting "git-based experimentation with ruthless pruning", and essentially reviewing the code incrementally for correctness and conciseness? Sure, it's a process dependent on AI, but it's very far from nearly "forget[ing] that the code even exists".
That all said, I do think the article captures some of the current cost/quality dilemmas. I wouldn't jump to conclusions that these incentives are actually driving most current training decisions, but it's an interesting area to highlight.
[1] https://trends.google.com/trends/explore?geo=US&q=%22vibe%20...
In my own usage, I tend to alternate between tiny, well-defined tasks and larger-scale, planned architectural changes or new features. Things in between those levels are hit and miss.
It also depends on what I'm building and why. If it's a quick-and-dirty script for my own use, I'll often write up - or speak - a prompt and let it do its thing in the background while I work on other things. I care much less about code quality in those instances.
This has lead to their abilities stalling while their output seemingly goes up. But when you look at the quality of their output, and their ability to get projects over the last 10% or make adjustments to an already completed project without breaking things, it's pretty horrendous.
At work I've inherited a Kotlin project and I've never touched Kotlin or android before, though I'm an experienced programmer in other domains. ChatGPT has been guiding me through what needs to be done. The problem I'm having is that it's just too damn easy to follow its advice without checking. I might save a few minutes over reading the docs myself, but I don't get the context the docs would have given me.
I'm a 'Real Programmer' and I can tell that the code is logically sound and self-consistent. The code works and it's usually rewritten so much as to be distinctly my code and style. But still it's largely magical. If I'm doing things the less-correct way, I wouldn't really know because this whole process has led me to some pretty lazy thinking.
On the other hand, I very much do not care about this project. I'm very sure that it will be used just a few times and never see the light of day again. I don't expect to ever do android development again after this, either. I think lazy thinking and farming the involved thinking out to ChatGPT is acceptable here, but it's clear how easily this could become a very bad habit.
I am making a modest effort to understand what I'm doing. I'm also completely rewriting or ignoring the code the AI gives me, it's more of an API reference and example. I can definitely see how a less-seasoned programmer might get suckered into blindly accepting AI code and iterating prompts until the code works. It's pretty scary to think about how the coming generations of programmers are going to experience and conceptualize programming.
2. I've had good fortunes keeping the agents to constrained areas, working on functions, or objects, with clearly defined (by me) boundaries. If the measure of a junior engineer is that you correct them once a day, an engineer once a week, a senior once a month, a principal once a quarter... Treat these agents like hyper-energetic interns. Nudge frequently.
3. Standard org management coding practices apply. Force the agents to show work, plan, unit test, investigate.
And, basically, I've described that we're becoming Software Development Managers with teams of on-demand low-quality interns. That's an incredibly powerful tool, but don't expect hyper-elegant and compact code from them. Keep that for the senior engineering staff (humans) for now.
(Note: The AlphaEvolve announcement makes me wonder if I'm going to have hyper-energetic applied science interns next...)
"write minimum code required"
It's not even that sensitive to the wording - "be terse" or "make minimal changes" amount to the same thing - but the resulting code will often be at least 50% shorter than the un-guided version.
It really captures how little control one has over the process, while simultaneously having the illusion of control.
I don't really believe that code is being made verbose to make more profits. There's probably some element of model providers not prioritizing concise code, but if conciseness while maintaining "quality" was possible is would give one model a sufficient edge over others that I suspect providers would do it.
Half of my job is fighting the "copy/paste/change one thing" garbage that developers generate. Keeping code DRY. The autocompletes do an amazing job of automating the repeated boilerplate. "Oh you're doing this little snippet for the first and second property? Obviously you want to do that for every property! Let me just expand that out for you!"
And I'm like "oooh, that's nice and convenient".
...
But I also should be looking at that with the stink-eye... part of that code is now duplicated a dozen times. Is there any way to reduce that duplication to the bare minimum? At least so it's only one duplicated declaration or call and all of the rest is per-thingy?
Or any way to directly/automatically wrap the thing without going property-by-property?
Normally I'd be asking myself these questions by the 3rd line. But this just made a dozen of those in an instant. And it's so tempting and addictive to just say "this is fine" and move on.
That kind of code is not fine.
I agree, but I'm also challenging that position within myself.
Why isn't it OK? If your primary concern is readability, then perhaps LLMs can better understand generated code relative to clean, human-readable code. Also, if you're not directly interacting with it, who cares?
As for duplication introducing inconsistencies, that's another issue entirely :)
The author should try Gemini it’s much better.
Just to illustrate, I asked both about a browser automation script this morning. Claude used Selenium. Gemini used Playwright.
I think the main reasons Gemini is much better are:
1. It gets my whole code base as context. Claude can't take that many tokens. I also include documentation for newer versions of libraries (e.g. Svelte 5) that the LLM is not so familiar with.
2. Gemini has a more recent knowledge cutoff.
3. Gemini 2.5 Pro is a thinking model.
4. It's free to use through the web UI.
1. Poor solutions.
2. Solutions not understood by the person who prompted them.
3. Development team being made dumber.
4. Legal and ethical concerns about laundering open source copyrights.
5. I'm suspicious of the name "vibe coding", like someone is intentionally marketing it to people who don't care to be good at their jobs.
6. I only want to hire people who can do holistically better work than current "AI". (Not churn code for a growth startup's Potemkin Village, nor to only nominally satisfy a client's requirements while shipping them piles of counterproductive garbage.)
7. Publicizing that you are a no-AI-slop company might scare away the majority of the bad prospective employees, while disproportionately attracting the especially good ones. (Not that everyone who uses "AI" is bad, but they've put themselves in the bucket with all the people who are bad, and that's a vastly better filter for the art of hiring than whether someone has spent months memorizing LeetCode answers solely for interviews.)
The second one is more intra/interpersonal: under pressure to produce, it's very easy to rely on LLMs to get one 80% of the way there and polish the remaining 20%. I'm in a new domain that requires learning a new language. So something I've started doing is asking ChatGPT to come up with exercises / coding etudes / homework for me based on past interactions.
comex•1h ago
> Blaine read that, shook his head, and called Sally. Presently she joined him in his cabin.
> “Yes, I wrote that," she said. "It seems to be true. Every nut and bolt in that probe was designed separately. It's less surprising if you think of the probe as having a religious purpose. But that's not all. You know how redundancy works?"
> “In machines? Two gilkickies to do one job. In case one fails."
> “Well, it seems that the Moties work it both ways."
> “Moties?"
> She shrugged. "We had to call them something. The Mote engineers made two widgets do one job, all right, but the second widget does two other jobs, and some of the supports are also bimetallic thermostats and thermoelectric generators all in one. Rod, I barely understand the words. Modules: human engineers work in modules, don't they?"
> “For a complicated job, of course they do."
> “The Moties don't. It's all one piece, everything working on everything else. Rod, there's a fair chance the Moties are brighter than we are."
- The Mote in God's Eye, Larry Niven and Jerry Pournelle (1974)
[…too bad that today's LLMs are not brighter than we are, at least when it comes to writing correct code…]
mnky9800n•1h ago
Loughla•39m ago
jerf•1h ago
I think a lot about Motie engineering versus human engineering. Could Motie engineering be practical? Is human engineering a fundamentally good idea, or is it just a reflection of our working memory of 7 +/- 2? Biology is Motie-esque, but it's pretty obvious we are nowhere near a technology level that could ever bring a biological system up from scratch.
If Motie engineering is a good idea, it's not a smooth gradient. The Motie-est code I've seen is also the worst. It is definitely not the case that getting a bit more Motie-esque, all else being equal, produces better results. Is there some crossover point where it gets better and maybe passes our modular designs? If AIs do get better than us at coding, and it turns out they do settle on Motie-esque coding, no human will ever be able to penetrate it ever again. We'd have to instruct our AI coders to deliberately cripple themselves to stay comprehensible, and that is... economically a tricky proposition.
After all, anyone can write anything into a novel they want to and make anything work. It's why I've generally stopped reading fiction that is explicitly meant to make ideological or political points to the exclusion of all else; anything can work on a page. Does Motie engineering correspond to anything that could be manifested practically in reality?
Will the AIs be better at modularization than any human? Will they actually manifest the Great OO Promise of vast piles of amazingly well-crafted, re-usable code once they mature? Or will the optimal solution turn out to be bespoke, locally-optimized versions of everything everywhere, and the solution to combining two systems is to do whatever locally-sensible customizations are called for?
(I speak of the final, mature version, however long that may be. Today LLMs are kind of the worst of both worlds. That turns out to be a big step up from "couldn't play in this space at all", so I'm not trying to fashionably slag on AIs here. I'm more saying that the one point we have is not yet enough to draw so much as a line through, let alone an entire multi-dimensional design methodology utility landscape.)
I didn't expect to live to see the answers, but maybe I will.
fwip•2m ago