Real quote
> "Hence their value stems from the discipline and the thinking the writer is forced to impose upon himself as he identifies and deals with trouble spots in his presentation."
I mean seriously?
This, but also for code. I just don't trust new code, especially generated code; I need time to sit with it. I can't make the "if it passes all the tests" crowd understand and I don't even want to. There are things you think of to worry about and test for as you spend time with a system. If I'm going to ship it and support it, it will take as long as it will take.
We have control points (prompts + context) and we ask LLMs to draw a 3D surface which passes through those points satisfying some given constraints. Subsequent chats are like edit operations.
If the code passes tests, and also works at the functionality level - what difference does it make if you’ve read the code or not?
You could come up with pathological cases like: it passed the tests by deleting them. And the code written by it is extremely messy.
But we know that LLMs are way smarter than this. There’s very very low chance of this happening and even if it does - it quick glance at code can fix it.
Why doesn’t outsourcing work if this is all that is needed?
But I have a hypothesis.
The quality of the output, when you don’t own the long term outcome or maintenance, is very poor.
This is not the case with AI in the same sense it is with human contractors.
The most immediate example I can think of is the beans LLM workflow tracker. It’s insane that its measured in the 100s of thousands of LoC and getting that thing setup in a repo is a mess. I had to use Github copilot to investigate the repo to get the latest method. This wouldn’t fly at my employer but a lot of projects are going to be a lot less scrupulous.
You can see the effects in popular consumer facing apps too: Anthropic has drunk way too much of its own koolaid and now I get 10-50% failure rates on messages in their iOS app depending on the day. Some of their devs have publicly said that Claude writes 100% of their code and its starting to show. Intermittent network failures and retries have been a solved problem for decades, ffs!
The code may seem to work functionally on day 1. Will it continue to seem to work on day 30? Most often it doesn't.
And in my experience, the chances of LLMs fucking up are hardly very very low. Maybe it's a skill issue on my part, but it's also the case that the spec is sometimes discovered as the app is being built. I'm sure this is not the case if you're essentially summoning up code that exists in the test set, even if the LLM has to port it from another language, and they can be useful in parts here and there. But turning the controls over to the infinite monkey machine has not worked out for me so far.
If you care about security, test it (red teaming).
If you care about maintainability, test it (advanced code analysis)
Your eyeballs are super fallible, this is why bad engineers exist. Get rigorous.
I'll leave with a teaser: are you testing what you think you are? Is it relevant? What do, after; buy more tokens? Hope it's worth it, enjoy the slot machine. I find it a little loud.
1. Since the same AI writes both the code and the unit tests, it stands to reason that both could be influenced by the same hallucinations.
2. Having a dev on call reduces time to restore service because the dev is familiar with the code. If developers stop reviewing code, they won't be familiar with it and won't be as effective. I am currently unaware of any viable agentic AI substitute for a dev on call capability.
3. There may be legal or compliance standards regarding due diligence which won't get met if developers are no longer familiar with the code.
I have blogged about this recently at https://www.exploravention.com/blogs/soft_arch_agentic_ai/
> I need time to sit with it
Everyone knows doing the work yourself is faster than reviewing somebody elses if you don’t trust them. I’d argue if AI ever gets to the point where you fully trust it, all white collar jobs are gone.
If the tests aren't good enough, break them. Red team your own software. Exploit your systems. "Sitting with the code" is some Henry David Thoreau bullshit, because it provides exactly 0 value to anyone else, whereas red teamed exploits are objective.
It's a good approach! It's just more 'negative space' than direct.
On the verification loop: I think there’s so much potential here. AI is pretty good at autonomously working on tasks that have a well defined and easy to process verification hook.
A lot of software tasks are “migrate X to Y” and this is a perfect job for AI.
The workflow is generally straightforward - map the old thing to the new thing and verify that the new thing works the same way. Most of this can be automated using AI.
Wanna migrate codebase from C to Rust? I definitely think it should be possible autonomously if the code base is small enough. You do have to ask the AI to intelligently come up with extensive way to verify that they work the same. Maybe UI check, sample input and output check on API and functionality check.
It's scary how good it's become with Opus 4.5. I've been experimenting with giving it access to Ghidra and a debugger [1] for reverse engineering and it's just been plowing through crackmes (from sites like crackmes.one where new ones are released constantly). I haven't bothered trying to have it crack any software but I wouldn't be surprised if it was effective at that too.
I'm also working through reverse engineering several file formats by just having it write CLI scripts to export them to JSON then recreate the input file byte by byte with an import command, using either CLI hex editors or custom diff scripts (vibe coded by the agent).
I still get routinely frustrated trying to use it for anything complicated but whole classes of software development problems have been reduced to vibe coding that feedback loop and then blowing through Claude Max rate limits.
[1] Shameless plug: https://github.com/akiselev/ghidra-cli https://github.com/akiselev/debugger-cli
Gave one of the repos a star as it's a cool example of what people are building with AI. Most common question on HN seems to be "what are people building". Well, stuff like this.
Hear, hear! I’ve got my altium-cli repo open source in Github as well, which is a vibe coded CLI for editing vibe reverse engineered Altium PCB projects. It’s not yet ready for primetime (I’m finishing up the file format reverse engineering this weekend) and the code quality is probably something twelve year old me would have been embarrassed by, but I can already use it and Claude/Gemini to automate a lot of the tedious parts of PCB design like part selection and footprints. I’m almost to the point where Claude Code can use it for the entire EE workflow from part selection to firmware, minus the PCB routing which I still do by hand.
I just ain’t wasting time blogging about it so unless someone stumbles onto it randomly by lurking on HN, they won’t know that Claude Code can now work on PCBs.
It honestly might have been easier without that experience because KiCad is open source and their S-expr file format is easy to use. I'm stuck with Altium since that's what I learned on and am used to.
This is better because I use my own test as a forcing function to learn and understand what the AI has done. Only after primary testing might I tell it to do checking for itself.
Lots of people who become successful are the ones who can get this prediction correct.
AI is a general-purpose tool, but that doesn't mean best-practices and wisdom are generalizable. Web dev is different than compilers which is different than embedded and all the differences of opinion in the comments never explain who does what.
That said, I would take this up a notch:
> If you ask AI to write a document for you, you might get 80% of the deep quality you’d get if you wrote it yourself for 5% of the effort. But, now you’ve also only done 5% of the thinking.
Writing _is_ the thinking. It's a critical input in developing good taste. I think we all ought to consider a maintenance dose. Write your own code without assistance on whatever interval makes sense to you, otherwise you'll atrophy those muscles. Best-practices are a moving train, not something that you learned once and you're done.
I'd go so far as to say that for the vast majority of people, if you don't know what you're going to say when you sit down to write, THAT is your problem. Writing is not thinking, thinking is thinking, and you didn't think. If you're trying to think when you should be writing, that's a process failure. If you're not Stephen King or Dean Koontz, trying to be a pantser with your writing is a huge mistake.
What AI is amazing for is taking a core idea/thesis you provide it, and asking you a ton of questions to extract your knowledge/intent, then crystallizing that into an outline/rough draft.
You're free to adopt this cynical and pessimistic outlook if you like but you're going a bit far trying to force it on others. Gawd.
If your argument starts with "how do you know you're not the best in the world unless you try" you fucked up.
The physical (and genetic) demands of athletics aside, we were talking about writing. Just starting on a lark is what worked for Haruki Murakami. Again, it's very unlikely you'll be the next Murakami. But at least you'll improve a lot at something you find interesting! What is the downside here, exactly? Unless you're an opportunity-cost-minimising, industrial-output-maximising kind of person. That's fine, but that's not everyone.
The same pattern holds at the elite levels of most things.
How does it hold for writing?
If you don't have that wisdom, trying to pants something is going to result in writing yourself in circles, contradictions, inconsistencies, incoherence of idea, etc.
Absolutely agree with this, the ratio of talk to output is insane, especially when the talk is all about how much better output is. So far the only example I've seen is Claude Code which is mired in its own technical problems and is literally built by an AI company.
> Write your own code without assistance on whatever interval makes sense to you, otherwise you'll atrophy those muscles
This is the one thing that concerns me, for the same reason as "AI writes the code, humans review it" does. The fact of the matter is, most people will get lazy and complacent pretty quickly, and the depth of which they review the code/ the frequency they "go it alone" will get less and less until eventually it just stops happening. We all (most of us anyway) do it, its just part of being human, for the same reason that thousands of people start going to the gym in January and stop by March.
Arguably, AI coding was at its best when it was pretty bad, because you HAD to review it frequently and there were immediate incentives to just take the keyboard and do it yourself sometimes. Now, we still have some serious faults, they're just not as immediate, which will lead to complacency for a lot of people.
Maybe one day AI will be able to reliably write the 100% of the code without review. The worry is that we stop paying attention first, which all in all looks quite likely
Those of us building are having so much fun we aren't slowing down to write think pieces.
I don't mean this flippantly. I'm a blogger. I love writing! But since a brief post on December 22 I haven't blogged because I have been too busy implementing incredible amounts of software with AI.
Since you'll want receipts, here they are:
- https://git.sr.ht/~kerrick/ratatui_ruby/tree/trunk/item/READ...
- https://git.sr.ht/~kerrick/rooibos/tree/trunk/item/README.rd...
- https://git.sr.ht/~kerrick/tokra/tree
Between Christmas and New Year's Day I was on vacation, so I had plenty of time. Since then, it's only been nights & weekends (and some early mornings and lunch breaks).
I believe they'll be maintainable long-term, as they've got extensive tests and documentation, and I built a theory of the program [2] on the Ruby side of it as I reviewed and guided the agent's work.
I am getting feedback from users, the largest of which drove the creation of (and iteration upon) Rooibos. As a rendering library, RatatuiRuby doesn't do much to guide the design or architecture of an application. Rooibos is an MVU/TEA framework [3] to do exactly that.
Tokra is basically a tech demo at this stage, [4] so (hopefully) no users yet.
[0]: https://ruby.social/@andrewnez@mastodon.social/1159351822843...
[1]: https://ruby.social/@getajobmike/115940044592981164
[2]: https://www.sciencedirect.com/science/article/abs/pii/016560...
We build a personal finance tool (referenced in the article). It's a web/mobile/backend stack (mostly React and Python). That said, I think a lot of the principles are generalizable.
> Writing _is_ the thinking. It's a critical input in developing good taste.
Agree, but I'll add that _good_ prompt-writing actually requires a lot of thought (which is what makes it so easy to write bad prompts, which are much more likely to produce slop).
I’d say it’s rarer to find a dev who doesn’t use AI tools in their arsenal these days, that’s why your question sounds so odd to me.
Those not shipping are talking about it.
Absence of evidence, while not the only signal, is a huge fucking signal.
That wasn't supposed to be an opportunity for you to get defensive, but an opportunity for you to show off awesome projects.
Let me fix that for you:
No more AI thought pieces until you SHOW us what you build!
And I think it can safely be generalised to:
No more thought pieces until you show us what you build!
And that goes double for posts on LinkedIn.
As a former local banker in Japan who spent decades appraising the intangible assets of businesses that have survived for centuries, I’ve learned that true mastery is found in stability, not novelty. In an era of rapid AI acceleration, the real risk is gambling your institutional reputation on unproven, volatile tools.
By 2026, when every “How” is a cheap commodity, the only thing that commands a premium is the “Why”—the core of human judgment. Staying a step behind the hype allows you to keep your hands on the steering wheel while the rest of the market is consumed by the noise. Stability is the ultimate luxury.
Waiting until patterns stabilize, better UX, clearer failure modes, and community best practices, tends to give a much better long-term payoff.
> Will AI replace my job?
> If you consider your job to be “typing code into an editor”, AI will replace it (in some senses, it already has). On the other hand, if you consider your job to be “to use software to build products and/or solve problems”, your job is just going to change and get more interesting.
willtemperley•1w ago
* The interface is near identical across bots
* I can switch bots whenever I like. No integration points and vendor lock-in.
* It's the same risk as any big-tech website.
* I really don't need more tooling in my life.
simianwords•1w ago
Any coding agent should be easily to whatever IDE or workflow you need.
The agents are not full fungible though. Each have their own characteristics.
jama211•6d ago
audience_mem•6d ago