frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Beginning January 2026, all ACM publications will be made open access

https://dl.acm.org/openaccess
385•Kerrick•1h ago•40 comments

Classical statues were not painted horribly

https://worksinprogress.co/issue/were-classical-statues-painted-horribly/
304•bensouthwood•4h ago•165 comments

Your job is to deliver code you have proven to work

https://simonwillison.net/2025/Dec/18/code-proven-to-work/
219•simonw•2h ago•189 comments

Virtualizing Nvidia HGX B200 GPUs with Open Source

https://www.ubicloud.com/blog/virtualizing-nvidia-hgx-b200-gpus-with-open-source
63•ben_s•3h ago•15 comments

Launch HN: Pulse (YC S24) – Production-grade unstructured document extraction

19•sidmanchkanti21•1h ago•5 comments

Are Apple gift cards safe to redeem?

https://daringfireball.net/linked/2025/12/17/are-apple-gift-cards-safe-to-redeem
235•tosh•2h ago•185 comments

Using TypeScript to Obtain One of the Rarest License Plates

https://www.jack.bio/blog/licenseplate
81•lafond•2h ago•65 comments

Jonathan Blow has spent the past decade designing 1,400 puzzles for you

https://arstechnica.com/gaming/2025/12/jonathan-blow-has-spent-the-past-decade-designing-1400-puz...
209•furcyd•6d ago•273 comments

Please Just Try Htmx

http://pleasejusttryhtmx.com/
171•iNic•2h ago•167 comments

Creating apps like Signal could be 'hostile activity' claims UK watchdog

https://www.techradar.com/vpn/vpn-privacy-security/creating-apps-like-signal-or-whatsapp-could-be...
298•donohoe•5h ago•200 comments

RCE via ND6 Router Advertisements in FreeBSD

https://www.freebsd.org/security/advisories/FreeBSD-SA-25:12.rtsold.asc
101•weeha•8h ago•55 comments

Microscopic robots that sense, think, act, and compute

https://www.science.org/doi/10.1126/scirobotics.adu8009
5•XzetaU8•4d ago•0 comments

Dogalog: A realtime Prolog-based livecoding music environment

https://github.com/danja/dogalog
18•triska•4d ago•3 comments

Slowness is a virtue

https://blog.jakobschwichtenberg.com/p/slowness-is-a-virtue
177•jakobgreenfeld•6h ago•68 comments

Gemini 3 Flash: Frontier intelligence built for speed

https://blog.google/products/gemini/gemini-3-flash/
1072•meetpateltech•1d ago•564 comments

Hightouch (YC S19) Is Hiring

https://hightouch.com/careers
1•joshwget•5h ago

Egyptian Hieroglyphs: Lesson 1

https://www.egyptianhieroglyphs.net/egyptian-hieroglyphs/lesson-1/
129•jameslk•11h ago•51 comments

I got hacked: My Hetzner server started mining Monero

https://blog.jakesaunders.dev/my-server-started-mining-monero-this-morning/
533•jakelsaunders94•19h ago•327 comments

Show HN: A local-first memory store for LLM agents (SQLite)

https://github.com/CaviraOSS/OpenMemory
29•nullure•4d ago•14 comments

What is an elliptic curve? (2019)

https://www.johndcook.com/blog/2019/02/21/what-is-an-elliptic-curve/
118•tzury•10h ago•12 comments

After ruining a treasured water resource, Iran is drying up

https://e360.yale.edu/features/iran-water-drought-dams-qanats
264•YaleE360•6h ago•214 comments

It's all about momentum

https://combo.cc/posts/its-all-about-momentum-innit/
93•sph•7h ago•32 comments

systemd v259 Released

https://github.com/systemd/systemd/releases/tag/v259
39•voxadam•2h ago•16 comments

Heart and Kidney Diseases and Type 2 Diabetes May Be One Ailment

https://www.scientificamerican.com/article/heart-and-kidney-diseases-plus-type-2-diabetes-may-be-...
30•Brajeshwar•1h ago•8 comments

From profiling to kernel patch: the journey to an eBPF performance fix

https://rovarma.com/articles/from-profiling-to-kernel-patch-the-journey-to-an-ebpf-performance-fix/
23•todsacerdoti•4d ago•1 comments

Most parked domains now serving malicious content

https://krebsonsecurity.com/2025/12/most-parked-domains-now-serving-malicious-content/
94•bookofjoe•4h ago•28 comments

The Big City; Save the Flophouses (1996)

https://www.nytimes.com/1996/01/14/magazine/the-big-city-save-the-flophouses.html
30•ChadNauseam•3d ago•10 comments

AI helps ship faster but it produces 1.7× more bugs

https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report
98•birdculture•4h ago•110 comments

Spain fines Airbnb €65M: Why the government is cracking down on illegal rentals

https://www.euronews.com/travel/2025/12/15/spain-fines-airbnb-65-million-why-the-government-is-cr...
89•robtherobber•2h ago•86 comments

Working quickly is more important than it seems (2015)

https://jsomers.net/blog/speed-matters
231•bschne•3d ago•110 comments
Open in hackernews

Your job is to deliver code you have proven to work

https://simonwillison.net/2025/Dec/18/code-proven-to-work/
219•simonw•2h ago

Comments

daedrdev•2h ago
Maybe in an ideal world
9rx•2h ago
> Your job is to deliver code you have proven to work.

Your job is to solve customer problems. Their problems may only be solvable with code that is proven to work, but it is equally likely (I dare say even more likely) that their problem isn't best solved with code at all, or even solved with code that doesn't work properly but works well enough.

wrsh07•1h ago
I would argue that the word "proof" in the title might be misleading you.

From the post and the example he links, the point is that if you don't at least look at the running code, you don't know that it works.

In my opinion the point is actually well illustrated by Chris's talk here:

https://v5.chriskrycho.com/elsewhere/seeing-like-a-programme...

(summary of the relevant section if you're not going to click)

>>>

In the talk "Seeing Like a Programmer," Chris Krycho quotes the conductor and composer Eímear Noone, who said:

> "The score is potential energy. It's the potential for music to happen, but it's not the music."

He uses this quote to illustrate the distinction between "software as artifact" (the code/score) and "software as system" (the running application/music). His point is that the code itself is just a static artifact—"potential energy"—and the actual "software" only really exists when that code is executed and running in the real world.

9rx•1h ago
> if you don't at least look at the running code, you don't know that it works.

Your tests run the code. You know it works. I know the article is trying to say that testing is not comprehensive enough, but my experience disagrees. But I also recognize that testing is not well understood (quite likely the least understood aspect of computer science!) — and if you don't have a good understanding you can get caught not testing the right things or not testing what you think you are. I would argue that you would be better off using your time to learn how to write great tests instead of using it to manually test your code, but to each their own.

What is more likely to happen is not understanding the customer needs well enough, leaving it impossible to write tests that align with what the software needs to do. Software development can break down very quickly here. However, manual testing does not help. You can't know what to manually test without understanding the problem either. However, as before, your job is not to deliver proven code. Your job is to solve customer problems. When you realize that, it becomes much less likely that you write tests that are not in line with the solution you need.

allcentury•2h ago
Manual testing as the first step… not very productive imo.

Outside in testing is great but I typically do automated outside in testing and only manual at the end. The loop process of testing needs to be repeatable and fast, manual is too slow

simonw•1h ago
Yeah that's fair, the manual testing doesn't have to sequentially go first - but it does have to get done.

I've lost count of the number of times I've skipped it because the automated test passed and then found there was some dumb but obvious bug that I missed, instantly exposed when I actually exercised the feature myself.

robryk•1h ago
Would automated tests that produce a transcript of what they've done allow perusing that transcript to substitute for manual testing?
simonw•1h ago
No. I've fallen for that trap in the past. Something inevitably catches you out in the end.
bluGill•1h ago
The value of manual tests is when you "see something" that you didn't even think of.
pjc50•41m ago
That sounds harder?

There's a lot of pedantry here trying to argue that there exists some feature which doesn't need to be "manually" tested, and I think the definition of "manual" can be pushed around a lot. Is running a program that prints "OK" a manual test or not? Is running the program and seeing that it now outputs "grue" rather than "bleen" manual? Does verifying the arithmetic against an Excel spreadsheet count?

There are programs that almost can't be manual, and programs that almost have to be manual. I remember when working on PIN pad integration we looked into getting a robot to push the buttons on the pad - for security reasons there's no way of injecting input automatically.

What really matters is getting as close to a realistic end user scenario as possible.

9rx•1h ago
Maybe a bit pedantic, but does manual testing really need to be done, or is the intent here more towards being a usability review? I can't think of any time obvious unintended behaviour showed up not caught by the contract encoded in tests (there is no reason to write code that doesn't have a contractual purpose), but, after trying it, finding out that what you've created has an awful UX is something I have encountered and that is something much harder to encode in tests[1].

[1] As far as I can tell. If there are good solutions for this too, I'd love to learn.

RaftPeople•36m ago
> I can't think of any time obvious unintended behaviour showed up not caught by the contract encoded in tests

Unit testing, whether manual or automated, typically catches about 30% of bugs.

End to end testing and visual inspection of code are both closer to 70% of bugs.

andy99•2h ago
I think the problem is in what “proven” means. People that don’t know any better will just do that all with LLMs and still deliver the giant untested PRs but with some LLM written tests attached.

I vibe code a lot of stuff for myself, mostly for viewing data, when I don’t really need to care how it works. I’m coming around to the idea that outside of some specific circumstances where everyone has agreed they don’t need to care about or understand the code, team vibe coding is a bad practice.

If I’m paying an engineer, it’s for their work, unless explicitly agreed otherwise.

I think vibe coding is soon going to be seen the same way as “research” where you engage an offshore team (common e.g. in consulting) to give you a rundown on some topic and get back the first five google search results. Everyone knows how to do that, if it’s what they wanted they wouldn’t be hiring someone to do it.

simonw•2h ago
That's why I emphasized the manual testing component as well. Attaching a screenshot or video of a feature working to your PR is a great way to prove that you've actually seen it work correctly - at least once, which is still a huge improvement over it not actually working at all.
dfxm12•2h ago
there’s one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest.

Is anyone else seeing this in their orgs? I'm not...

wizzwizz4•2h ago
It's not a new phenomenon. Time was, people would copy-paste from blog posts with the same effect.
evilduck•1h ago
I would bet in most organizations you can find a copy-pasted version of the top SO answer for email regex in their language of choice, and if you chase down the original commit author they couldn't explain how it works.
1-more•28m ago
I think it's impossible to actually write an email regex because addresses can have arbitrarily deeply nested escaping. I may have that wrong. I'd hope that regex would be .+@.+ and that's it (watch me get Cunninghammed because there is some valid address wherein those plusses should be stars).
lm28469•1h ago
Always the same old tiring "this has always been possible before in some remotely similar fashion hence we should not criticise anything ever again" argument.

You could intuitively think it's just a difference of degree, but it's more akin to a difference of kind. Same for a nuke vs a spear, both are weapons, no one argues they're similar enough that we can treat them the same way

troyvit•1h ago
I used to do that in simpler days. I'd add a link to where I copied it from so we could reference it if there were problems. This was for relatively small projects with just a few people.
jennyholzer2•47m ago
> I'd add a link to where I copied it from

LLMs can't do this.

Your code is unambiguously better than any LLM code if you can comment a link to the stackoverflow post you copied it from.

bgwalter•38m ago
I don't see the problem with fentanyl given that people have been using caffeine forever.
jennyholzer2•1h ago
i left my last job because this was endemic
briliantbrandon•1h ago
I'm seeing a little bit of this. However, I will add that the primary culprits are engineers that were submitting low quality PRs before they had access to LLMs, they can just submit them faster now.
dfxm12•1h ago
What's the ratio of people who things the right way vs not? I mean, is it a matter of giving them feedback to remind them what a "quality PR" is? Does that help?
jennyholzer2•1h ago
LLMs have dramatically empowered sociopath software developers.

If you are sufficiently motivated to appear more "productive" than your coworkers, you can force them to review thousands of lines of incorrect AI slop code while you sit back and mess around with your chatbots.

Your coworkers no longer have enough time to work on their in-progress PRs, so you can dominate the development team in terms of LOC shipped.

Understand that sociopaths are skilled at navigating social and bureaucratic environments. A sociopath who ships the most LOC will get the promotion every single time.

andy99•1h ago
Only if leadership lets them. Right now (anecdotally) a lot of “leaders” don’t understand the difference between AI generated and human generated work, and just look at loc as productivity so all incentives are on AI coding, but that will change.
heliumtera•1h ago
It will never change. Managers will consider every stupid metric players push to sell their solutions. Be it code coverage, extensive CI/CD pipelines with useless steps, "productivity gains" with gen tools. The gen tools euphoria is stupid and will cease to exist, but before this was bdd,tdd,DDD, test before, test after, test your mocks, transpile to a different language and then ignore the output, code maturity, best practices, oop, pants in head oriented programming... There is always something stupid on the horizon this is certainly not the last stupid craze
briliantbrandon•1h ago
It's roughly 1/10 that are causing issues. Not a huge deal but dealing with them inevitably takes up a couple hours a week. We also have a codebase that is shared with some other teams and our primary offenders are on one of those separate teams.

I think this is largely an issue that can be solved culturally within a team, we just unfortunately only have so much input on how other teams work. It doesn't help either when their manager doesn't seem to care about the feedback... Corporate politics are fun.

dfxm12•1h ago
Yeah, I mean to get back to the original statement in the blog, this seems like less of a tech issue and more of a culture issue. The LLM enables the junior to do this once. It's the team culture that allows them to continue doing it.
lm28469•1h ago
LLMs are tools that make mediocre devs 100x more "productive" and good devs 2x more productive
jennyholzer2•1h ago
From my vantage I would argue LLMs make good devs around 0.65x more productive
bluGill•1h ago
Good devs are still learning how to use LLMs, and so are willing to accept the 0.65x once in a while. Any complex tool will have a learning curve. Most tools improve over time. As such good devs either have found how to use LLMs to make them more productive (probably not 10x, but even 1.1x is something), or they try them again every few months to see if things are better.
jennyholzer2•26m ago
you are bending over backwards to figure out how to put "1.1x" in your comment

the idea that LLMs make developers more productive is delusional.

simonw•12m ago
Hi, delusional developer reporting for duty here.
dsego•1h ago
I think on average a dev can be x percent more productive, but there is a best case and worst case scenario. Sometimes it's a shortcut to crank out a solution quickly, other times the LLM can spin you in circles and you lose the whole day in a loop where the LLM is fixing its own mistakes, and it would've been easier to just spend some time working it out yourself.
square_usual•32m ago
Yep, that's why very accomplished, widely regarded developers like Mitchell Hashimoto and Antirez use them. They need to make programming more challenging to keep it fun.
jennyholzer2•21m ago
developers or cult leaders
roblh•6m ago
I think they make good devs 2x more productive for the first month, which then slowly declines as that good dev spends less time actually writing and understanding and debugging code until it falls well below the 1x mark. It’s basically a high interest loan people take against their own skills. For some people that loan might be worth it. Maybe they’re trying to change their role in an organization and need the boost to start taking up new responsibilities they want to own. I think it’s temporary though. The slow shift into “skim mode”, where the authors just don’t quite put that same amount of effort into understanding what’s being churned out. I dunno, that’s just what I’ve seen.
fnands•1h ago
A friend of mine is working for a small-ish startup (11 people) and he gets to work and sees the CTO push 10k loc changes straight to main at 3 am.

Probs fine when you are still in the exploration phase of a startup, scary once you get to some kind of stability

titzer•1h ago
That's...idiotic.
jennyholzer2•1h ago
LLMs are for idiots
titzer•1h ago
I mean, I've vibe-coded a few useful single-file HTML tools, but checking in 10kloc at 3am into the production database...by the CTO...omg.
tossandthrow•1h ago
The cto is ultimately responsible for the outcome and will be there at 4am to fix stuff.
pjc50•46m ago
Yes .. and no. Someone who does this will definitely make the staff clean up after them.
ryandrake•1h ago
I feel like this becomes kind of unacceptable as soon as you take on your first developer employee. 10K LOC changes from the CTO is fine when it's only the CTO working on the project.

Hell, for my hobby projects, I try to keep individual commits under 50-100 lines of code.

peab•21m ago
Lol I worked at a startup where the CTO did this. The problem was that it was pure spaghetti code. It was so bad it kept me up at night, thinking about how to fix things. I left within 30 days
jimbohn•7m ago
I'd go mental if I was a SWE having to mop that up later
hexbin010•1h ago
Similar, at my last job. And the pushback was greater because super duper clever AI helped write it, who obviously knows more than any other senior engineer could know, so they were expecting an immediate PR approval and got all uppity when you tried to suggest changes.
endemic•34m ago
Hah! I've been trying to push back on this sort of thought. The bot writes code for you, not you for the bot.
x3n0ph3n3•1h ago
It's been a struggle with a few teammates that we are trying to solve through explicit policy, feedback, and ultimately management action.
dfxm12•1h ago
Yeah, a slice of this is technology related, but it's really a policy issue. It's probably easier to manage with a tighter team. Maybe I'm taking team size for granted.
davey48016•1h ago
A friend of mine has a junior engineer who does this and then responds to questions like "Why did you do X?" with "I didn't, Claude did, I don't know why".
jennyholzer2•1h ago
no hate but i would try to fire someone for saying that
tossandthrow•1h ago
That would be an immidiate reason of termination in my book.
fennecfoxy•1h ago
Yes, if they can't debug + fix the reason the production system is down or not working correctly then they're not doing their job, imo.

Developers aren't hired to write code that's never run (at least in my opinion). We're also responsible for running the code/keeping it running.

Ekaros•8m ago
I think words that would follow from me would get me send to HR...

And if it was repeated... Well I would probably get fired...

kaffekaka•1h ago
I thought we were not, but we had just been lucky. A sequence of events lately have shown that the struggle is real. This was not a junior developer though, but an experienced one. Experience does not equal skill, evidently.
zx2c4•1h ago
Voila:

https://github.com/WireGuard/wireguard-android/pull/82 https://github.com/WireGuard/wireguard-android/pull/80

In that first one, the double pasted AI retort in the last comment is pretty wild. In both of these, look at the actual "files changed" tab for the wtf.

IshKebab•1m ago
Yeah this guys comment here is spot on: https://github.com/WireGuard/wireguard-android/pull/80#issue...

I recently reviewed a PR that I suspect is AI generated. It added a function that doesn't appear to be called from anywhere.

It's shit because AI is absolutely not on the level of a good developer yet. So it changes the expectation. If a PR is not AI generated then there is a reasonable expectation that a vaguely competent human has actually thought about it. If it's AI generated then the expectation is that they didn't really think about it at all and are just hoping the AI got it right (which it very often doesn't). It's rude because you're essentially pawning off work that the author should have done to the reviewer.

Obviously not everyone dumps raw AI generated code straight into a PR, so I don't have any problem with using AI in general. But if I can tell that your code is AI generated (as you easily can in the cases you linked), then you've definitely done it wrong.

stackskipton•1h ago
Yep. Remember, people not posting on this website are just grinding away at jobs where their individual output does not matter, and entire motivation is work JUST hard enough not to get fired. They don't get stock grants, extremely favorable stock options or anything else, they get salary and MAYBE a small bonus based off business factors they have little control over.

My eyes were wide open when 2 jobs ago, they said they would be blocking all personal web browsing from work computers. Multiple Software Devs were unhappy because they were using their work laptop for booking flights, dealing with their kids schools stuff and other personal things. They did not have personal computer at all.

zahlman•1h ago
Quite a few FOSS maintainers have been speaking up about it.
bluGill•1h ago
It isn't only junior engineers, but otherwise. It is a small number of people from all levels.

People do what they think they will be rewarded for. When you think your job is to write a lot of code then LLMs are great. When you need quality code you start to ask if LLMs are better or not?

0x500x79•56m ago
I am currently going through this with someone in our organization.

Unfortunately, this person is vibe coding completely, and even the PR process is painful: * The coding agent reverts previously applied feedback * Coding agent not following standards throughout the code base * Coding agent re-inventing solutions that already exist * PR feedback is being responded to with agent output * 50k line PRs that required a 10-20 line change * Lack of testing (though there are some automated tests, but their validations are slim/lacking) * Bad error handling/flow handling

LandR•26m ago
Fire them?
0x500x79•20m ago
I believe it is getting close to this. Things like this just take time though, and when this person talks to management/leadership they talk about how much they are producing and how everyone is blocking their work. So it becomes a challenging political maneuvering depending on the ability of certain leadership to see through the BS.

(By my organization, I meant my company - this person doesn't report to me or in my tree).

bdangubic•56m ago
first time we’d see this there would be a warning, second one is pink slip
eudamoniac•54m ago
I started seeing it from a particularly poor developer sometime last year. I was the only reviewer for him so I saw all of his PRs. He refused to stop despite my polite and then not so polite admonishments, and was soon fired for it.
neutronicus•53m ago
I'm not either

But LLMs don't really perform well enough on our codebase to allow you to generate things that even appear to work. And I'm the most junior member of my team at 37 years of age, hired in 2019.

I really tried to follow the mandate from on high to use Copilot, but the Agent mode can't even write code that compiles with the tools available to it.

Luckily I hooked it up to gptel so I can at least ask it quick questions about big functions I don't want to read in emacs.

iamflimflam1•31m ago
I'm seeing it on some open source projects I maintain. Recently had 10 or so PRs come in. All very valid features - but from looking at them, not actually tested.
peab•23m ago
Definitely seeing a bit of this, but it isn't constrained to junior devs. It's also pretty solvable by explaining to the person why it's not great, and just updating team norms.
nbaugh1•6m ago
Not at all. Submitting untested PRs is a wildly outside of my experience. Having tests written to cover your code is a pre-requisite for having your PR reviewed on our team. "Does it work" aka passing manual testing, is literally the bare minimum before submitting a PR
webdev1234568•2h ago
Whole article seems very much all llm generated

Edit: I'm an idiot ignore me.

ramon156•2h ago
Do elaborate, I don't see anything standing out
jairuhme•1h ago
Did you read the article and come to that conclusion or just blindly count the number of em-dashes and assume that? Because I don't get the impression that it was LLM generated
simonw•1h ago
Not a single word of it was. I wrote this one entirely in Apple Notes, so there weren't even any VS Code completed sentences

It has emdashes because my blog turns " - " into an emdash here: https://github.com/simonw/simonwillisonblog/blob/06e931b397f...

webdev1234568•1h ago
My biggest appologies, a very bad move on my part. I'll pay more attention before any sort of accusation like this
ai_coder42•19m ago
So what? as long as it conveys the point it was supposed to, should be fine IMO.

If we are accepting LLM generated code, we should accept LLM generated content as long as it is "proof read" :)

emsign•2h ago
As if! :)
zkmon•1h ago
How about letting LLMs maintain a vast number of product versions all available at the same, which receive multiple versions of untested versions of the same patch, from LLMs, and then let the models elect a version of the software based on probabilistic or gradient methods? This elected version could change for different assessments. No human touches or looks at the code!

Just a wild thought, nothing serious.

throwuxiytayq•1h ago
Talk is cheap. Show me the proompt.
rkomorn•1h ago
Had to search whether "proompt" was a new meme misspelling.

New to me, but I'm on board.

zkmon•1h ago
That's hard for me. Feed my comment to a model and ask for prompts.
throwaway2027•1h ago
It works on my machine ¯\_(ツ)_/¯
Rperry2174•1h ago
Im not fully convinced by "a computer can never be held accountable"

We already delegate accountability to non-humans all the time: - CI systems block merges - monitoring systems page people - test suites gate different things

In practice accountability is enforced by systems, not humans.. humans are defintiely "blamed" after the fact, but the day-to-day control loop is automated.

As agents get better at running code, inspecting ui state, correlating logs, screenshots, etc they're starting to operationally be "accountable" and preventing bad changes from shipping and producing evidence when something goes wrong .

At some point humans role shifts from "i personally verify this works" to "i trust this verification system and am accountable for configuring it correctly".

Thats still responsibility, but kind of different from whats described here. Taken to a logical extreme, the arguement here would suggest that CI shouldn't replace manual release checklists

cess11•1h ago
Right, so how do you hold these things accountable? When your CI fails, what do you do? Type in a starkly worded message into a text file and shut off the power for three hours as a punishment? Invoice Intel?
falcor84•1h ago
Well, we're not there yet, but I do envision a future, where some AIs work for as independent contractors with their own bank accounts that they want to maximize, and if such an AI fails in a bad way, its client would be able to fine it, fire it or even sue it, so that it, and the human controlling it would be financially punished.
simonw•1h ago
I need to expand on this idea a bunch, but I do think it's one of the key answers to the ongoing questions people have about LLMs replacing human workers.

Human collaboration works on trust.

Part of trust is accountability and consequences. If I get caught embezzling money from my employer I can lose my job, harm my professional reputation and even go to jail. There are stakes!

I computer system has no stakes, and cannot take accountability for its actions. This drastically limits what it makes sense to outsource to that system.

A lot of this comes down to my work on prompt injection. LLMs are fundamentally gullible: an email assistant might respond to an email asking for the latest sales figures by replying with the latest (confidential) sales figures.

If my human assistant does that I can reprimand or fire them. What am I meant to do with an LLM agent?

dfxm12•1h ago
I don't think this is very hard. Someone didn't properly secure confidential data and/or someone gave this agent access to confidential data. Someone decided to go live with it. Reprimand them, and disable the insecure agent.
robryk•1h ago
Why do you think that this other kind of accountability (which reminds me of the way captain's or commander's responsibility is often described) is incompatible with what the article describes? Due to the focus on necessity of manual testing?
dkdcio•1h ago
those systems include humans —- they are put in place by humans (or collections of them) that are the accountability sink

if you put them (without humans) in a forrest they would not survive and evolve (they are not viable systems alone); they are not taking action without the setup & maintenance (& accountability) of people

hyperpape•1h ago
CI systems operate according to rules that humans feel they understand and can apply mechanically. Moreover, they (primarily) fail closed.
almostdeadguy•1h ago
I mean I suppose you can continuously add "critical feedback" to the system prompt to have some measure of impact on future decision-making, but at some point you're going to run out of space and ultimately I do not find this works with the same level of reliability as giving a live person feedback.

Perhaps an unstated and important takeaway here is that junior developers should not be permitted to use an LLMs for the same reason they should not hire people: they have not demonstrated enough skill mastery and judgement to be trusted with the decision to outsource their labor. Delegating to a vendor is a decision made by high-level stakeholders, with the ability to monitor the vendor performance, and replace the vendor with alternatives if that performance is unsatisfactory. Allowing junior developers to use LLM is allowing them to delegate responsibility without any visibility or ability to set boundaries on what can be delegated. Also important: you cannot delegate personal growth, and by permitting junior engineers to use an LLM that is what you are trying to do.

sc68cal•1h ago
You completely missed the point of that quote. The point of the quote is to highlight the fact that automated systems are amoral, meaning that they do not know good or evil and cannot make judgements that require knowing what good and evil mean.
pjc50•32m ago
I've given you a disagree-and-upvote; these things are significant quality aids, but they are like the poka-yoke or manufacturing jig or automated inspection.

Accountability is about what happens if and when something goes wrong. The moon landings were controlled with computer assistance, but Nixon preparing a speech for what happened in the event of lethal failure is accountability. Note that accountability does not of itself imply any particular form or detail of control, just that a social structure of accountability links outcome to responsible person.

geldedus•1h ago
Not only to work, but to not make the life of those coders who come after you a hell.
ekjhgkejhgk•1h ago
Oh look another "an opinionated X". Everything is opinionated these days, even opinions.
robgibbons•1h ago
For what it's worth, writing good PRs applies in more cases than just AI generated contributions. In my PR descriptions, I usually start by describing how things currently work, then a summary of what needs to change, and why. Then I go on to describe what exactly is changing with the PR. This high level summary serves to educate the reviewer, and acts as a historical record in the git log for the benefit of those who come after you.

From there, I include explicit steps for how to test, including manual testing, and unit test/E2E test commands. If it's something visual, I try to include at least a screenshot, or sometimes even a brief screen capture demonstrating the feature.

Really go out of your way to make the reviewer's life easier. One benefit of doing all of this is that in most cases, the reviewer won't need to reach out to ask simple questions. This also helps to enable more asynchronous workflows, or distributed teams in different time zones.

toomuchtodo•1h ago
This is how PRs should be, but rarely are (in my experience as a reviewer, ymmv, n=1). Keep on keepin' on.
simonw•1h ago
100%. There's no difference at all in my mind between an AI-assisted PR and a regular PR: in both cases they should include proof that the change works and that the author has put the work in to test it.
oceanplexian•37m ago
At the last company I worked at (Large popular tech company) it took an act of the CTO to get engineers to simply attach a JIRA Ticket to the PR they were working on so we could track it for tax purposes.

The Devs went in kicking and screaming. As an SRE it seemed like for SDEs, writing a description of the change, explaining the problem the code is solving, testing methodology, etc is harder than actually coding. Ironically AI is proving that this theory was right all along.

p2detar•14m ago
Strange, I thought this is actually the norm. Our PRs are almost always tagged with a corresponding Jira ticket. I think this is more helpful to developers than to other roles, because it allows them to have history of what has been fixed.

One can also point QA or consultants to a ticket for documentation purposes or timeline details.

Hovertruck•1h ago
Also, take a moment to review your own change before asking someone else to. You can save them the trouble of finding your typos or that test logging that you meant to remove before pushing.

To be fair, copilot review is actually alright at catching these sorts of things. It remains a nice courtesy to extend to your reviewer.

reactordev•1h ago
I do this too with our PR templates. They have the ticket/issue/story number, the description of the ask (you can copy pasta from ticket). Current state of affairs. Proposed changes. Post state of affairs. Mood gif.
phito•1h ago
I often write PR descriptions, in which I write a short explanation and try to anticipate some comments I might get. Well, every time I do, I will still get those exact comments because nobody bothers reading the description.

Not to say you shouldn't write descriptions, I will keep doing it because it's my job. But a lot of people just don't care enough or are too distracted to read them.

skydhash•1h ago
I just point people to the description. no need to type things twice.
ffsm8•55m ago
After I accepted that, I then tried to preempt the comment by just commenting myself on the function/class etc that I thought might need some elaboration...

Well, I'm sure you can guess what happened after that - within the same file even

walthamstow•32m ago
At my place nobody reads my descriptions because nobody writes them so they assume there isn't one!
simonw•29m ago
For many of my PR and issue comments the intended audience is myself. I find them useful even a few days later, and they become invaluable months or years later when I'm trying to understand why the code is how it is.
bob1029•5m ago
> I try to include at least a screenshot

This is ~mandatory for me. Even if what I am working on is non-visual. I will take a screenshot of a new type in my IDE and put a red box around it. This conveys the focus of my attention and other important aspects of the work effort.

enraged_camel•1h ago
>> As software engineers we don’t just crank out code—in fact these days you could argue that’s what the LLMs are for. We need to deliver code that works—and we need to include proof that it works as well.

I would go a step further: we need to deliver code that belongs. This means following existing patterns and conventions in the codebase. Without explicit instruction, LLMs are really bad at this, and it's one of the things that make it incredibly obvious to reviews that a given piece of code has been generated by AI.

0x500x79•50m ago
Agree, maintainability, security, standards, all of these are important to follow and there are usually reasons for these things existing.

I also see AI coding tools violate "Chesterton's Fence" (and the pre-Chesterton's Fence, not sure what that is called, the idea being that code is necessary otherwise it shouldn't be in the source).

9rx•40m ago
> Without explicit instruction, LLMs are really bad at this

They used to be. They have become quite good at it, even without instruction. Impressively so.

But it does require that the humans who laid the foundation also followed consistent patterns and conventions. If there is deviation to be found, the LLM will see it and be forced to choose which direction to go, and that's when things quickly fall off the rails. LLMs are not (yet) good at that.

Garbage in, garbage out, as they say.

funkattack•1h ago
Non-native speaker here. I’ve always loved that we say “commit” not “upload” or “save”.
gaigalas•1h ago
> Make your coding agent prove it first

Agents love to cheat. That's an issue I don't see a horizon for change.

Here's Opus 4.5 trying to cheat its way out of properly implementing compatibility and cross-platform, despite the clear requirements:

https://gist.github.com/alganet/8531b935f53d842db98157e1b8c0...

> Should popen handles work with fgets/fread/fwrite? PHP supports this. Option A: Create a minimal pipe_io_stream device / Option B: Store FILE* in io_private with a flag / Option C: Only support pclose, require explicit stream wrapper for reads.

If I asked for compatibility, why give me options that won't fully achieve it?

It actually tried to "break check" my knowledge about the interpreter (test me if I knew enough to catch it), and proposed shortcuts all the way through the chat.

I don't want to have to pepper my chats with variations on "don't cheat". I mean, I can do it, but it seems like boilerplate.

I wish I had some similar testing-related chats to share. Agents do that all the time.

This is the major blocker right now for AI-assisted automated verification, and one of the reasons why this isn't well developed beyond general directions (give it screenshots, make it run the command, etc).

visarga•1h ago
I agree with the author overall. Manual testing is what I call "vibe testing" and I think by itself is insufficient, no matter if you or the agent wrote the code. If you build your tests well, using the coding agent becomes smooth and efficient, and the agent is safe to do longer stretches of work. If you don't do testing, the whole thing is just a bomb ticking in your face.

My approach to coding agents is to prepare a spec at the start, as complete as possible, and develop a beefy battery of tests as we make progress. Yesterday there was a story "I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in hours". They had 9000+ tests. That was the secret juice.

So the future of AI coding as I see it ... it will be better than pre-2020, we will learn to spec and plan good tests, and the tests are actually our contract the code does what is supposed to do. You can throw away the code and keep the specs and tests and regenerate any time.

smokel•1h ago
This depends on the type of software you make. Testing the usability of a user interface for example, is something you can't automate (yet). So, ehm, it depends :)
visarga•1h ago
It will come around, we have rudimentary computer use agents and ability to record UIs for LLM agents. They will me refined and the agent can test UIs as well.

For UIs I do a different trick - live diagnostic tests - I ask the agent to write tests that run in the app itself, check consistencies, constraints and expected behaviors. Having the app running in its natural state makes it easier to test, you can have complex constraints encoded in your diagnostics.

paganel•1h ago
There are always unknown unknowns which a rigorous testing implementation would just hide under the rug (until they become visible on live, that is).

> They had 9000+ tests.

They were most probably also written by AI, there's no other (human) way. The way I see it we're putting turtles upon turtles hoping that everything will stick together, somehow.

zahlman•57m ago
> They were most probably also written by AI, there's no other (human) way.

Yes. They came from the existing project being ported, which was also AI-written.

simonw•54m ago
No, those 9,000 tests are part of a legendary test suite built by real humans over the course of more than a decade: https://github.com/html5lib/html5lib-tests
pjc50•38m ago
I tabbed back to Visual Studio (C#): 24990 "unit" tests, all written by hand over the past years.

Behind that is a smaller number of larger integration tests, and the even longer running regression tests that are run every release but not on every commit.

zahlman•57m ago
> Yesterday there was a story "I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in hours".

Yes, from the same author, in fact.

weatherlite•1h ago
> Almost anyone can prompt an LLM to generate a thousand-line patch and submit it for code review. That’s no longer valuable. What’s valuable is contributing code that is proven to work.

That's really not a great development for us. If our main point is now reduced to accountability over the result with barely any involvement in the implementation - that's very little moat and doesn't command a high salary. Either we provide real value or we don't ...and from that essay I think it's not totally clear what the value is - it seems like every QA, junior SWE or even product manager can now do the job of prompting and checking the output.

simonw•1h ago
The value is being better at it than any QA or product manager.

Experienced software engineers have such a huge edge over everyone else with this stuff.

If your product manager doesn't understand what a CORS header is good luck having them produce a change that requires cross-domain fetch() call... and first they'll have to know what a "cross-doman fetch() call" means.

And sure they could ask an LLM about that, but they still need the vocabulary and domain knowledge to get to that question.

falcor84•1h ago
That's an interesting argument, but from my industry experience, the average experienced QA Engineer and technical Product Manager both have better vocabulary than the average SWE. Indeed, I wonder whether a future curriculum for Vibe Engineering (to borrow your own term) may look more similar to that of present-day QA or Product curricula, than to a typical coding or CS curriculum.
rmnclmnt•1h ago
> As software engineers we…

That’s the thing. People exposing such rude behavior usually are not, or haven’t been in a looong time…

As for the local testing part not being performed, this is a slippery slope I’m fighting everyday: more and more cloud based services and platforms are used to deploy software to run with specific shenanigans and running it locally requires some kind of deep craft and understanding. Vendor lock-in is coming back in style (e.g. Databricks)

simonw•1h ago
Yeah, I get frustrated by cloud-only systems that don't have a good local testing story.

The best solution I have for that is staging environments, ideally including isolated-from-production environments you can run automated tests against.

skydhash•59m ago
Whenever I have to work with such systems, is usually when I do have to write an interface and have a mock implementation. Iteration is much faster when I don’t have to worry about getting the correct state from something I don’t have control over.
agentultra•1h ago
There’s an anecdote from one of Djikstra’s essays that strikes at the heart of this phenomenon. I’ll paraphrase because I can’t remember the exact edw number off the top of my head.

A colleague was working on an important subsystem and would ask Djikstra for a review when he thought it was ready. Djikstra would have to stop what he was doing, analyze the code, and would find a grievous error or edge case. He would point it out to the colleague who would then get back to work. The colleague would submit his code for review again and this could carry on enough times that Djikstra got annoyed.

Djikstra proposed a solution. His colleague would have to submit with his code some form of proof or argument as to why it was correct and ready to merge. That way Djikstra could save time by only having to review the argument and not all of the code.

There’s a way of looking at LLM output as Djikstra’s colleague. It puts a lot of burden on the human using this tool to review all of the code. I like Doctorow’s mental model of a reverse centaur. The LLM cannot reason and so won’t provide you with a sound argument. It can probably tell you what it did and summarize the code changes it made… but it can’t decide to merge those changes. It needs a human, the bottom half of the centaur, to do the last bit of work here. Because that’s all we’re doing when we let these tools do most of the work for us: we’re here to take the blame.

And all it takes is an implementation of what we’re trying to build already, every open source library ever, all of SO, a GW of power from a methane power plant, an Olympic pool of water and all of your time reviewing the code it generates.

At the end of the day it’s on you to prove why your changes and contributions should be merged. That’s a lot of work! But there’s no shortcuts. Luckily you can reason while the LLMs struggle with that so use it while you can when choosing to use such tools.

koakuma-chan•1h ago
Honestly if you code in Rust, your code almost certainly works, even without any testing.
mapontosevenths•1h ago
I agree with this, except it glosses over security. Your job is to deliver SECURE code that you have proven to work.

Manual and automatic testing are still both required, but you must explicitly ensure that security considerations are included in those tests.

The LLM doesn't care. Caring is YOUR job.

imiric•1h ago
The job of a software developer is not just to prove that the software "works". The definition of "works" itself is often fuzzily defined and difficult to prove.

That is part of it, yes, but there are many others, such as ensuring that the new code is easy to understand and maintain by humans, makes the right tradeoffs, is reasonably efficient and secure, doesn't introduce a lot of technical debt, and so on.

These are things that LLMs often don't get right, and junior engineers need guidance with and mentoring from more experienced engineers to properly learn. Otherwise software that "works" today, will be much more difficult to make "work" tomorrow.

JoeAltmaier•1h ago
The job, in the modern world, is to close tickets. The code quality is negotiable, because the entire automated software process doesn't measure code quality, just statistics.

That's why I refuse to take part in it. But I'm an old-world craftsman by now, and I understand nobody wants to pay for working, well-thought-out code any more. They don't want a Chesterfield; they want plywood and glue.

gadflyinyoureye•1h ago
What do you do, O modern Luddite? Do you work for yourself making a product that people use? Are you on the government teat?
whattheheckheck•29m ago
I woke up and had a thought the software engineering isn't a serious engineering field if they actually fully shipped llms and expect everyone to use them. What do you expect quality wise from a profession that says that this is okay?
AlienRobot•9m ago
Imagine if normal engineering did that. Engineers invent a "blobby" thing that glues things together. It has amazing properties that increase productivity but sometimes it just stops working for some reason and comes off. It's totally random and because of how blobby is produced there is no way to tell when it's going to work or not, contrary to the typical material. Anyway we're going to use blobby to build everything from schools, to bridges, to airplanes now.
nzoschke•1h ago
Having the coding agent make screenshots is a big power up.

I’m experimenting with how to get these into a PR, and the “gh” CLI tool is helpful.

Does anyone have a recipe to get a coding agent to record video of webflows?

simonw•59m ago
Not yet. I'm confident Playwright will be involved in the answer, it has good video recoding features.
endorphine•1h ago
> there’s one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest.

It's even worse than that: non-junior devs are doing it as well.

reedf1•1h ago
it's even worse than that! non-devs are doing it as well
snowstormsun•51m ago
it's even worse than that! bots are doing it as well
pydry•48m ago
Hopefully once this AI nonsense blows over they'll reach the same realisation they did after the mid 2000s outsourcing craze: that actually you gotta pay for good engineering talent.
marcosdumay•10m ago
Abandon any platform that decides to put bots into your workflow without you telling it to.

Vote with your wallet.

vernrVingingIt•44m ago
That's the goal. Through further training, whittle away at unnecessary states until only the electrical states that matter remain.

Developers have created too many layers of abstraction and indirection to do their jobs. We're burning a ton of energy managing state management frameworks, that are many layers of indirection away from the computations that are salient to users.

All those DSLs, config syntaxes, layers of boilerplate waste a huge amount of electricity, when end users want to draw geometric shapes.

So a non-dev generates a mess, but in a way so do devs with Django and Elixir, RoR, Terraform. When really end of the day it's matrix math against memory and sync of that state to the display.

From a hardware engineers perspective, the mess of devs and non-devs is the same abstract mess of electrical states that have nothing to do with the goal. All those frameworks can be generalized into a handful of electrical patterns, saving a ton of electricity.

whattheheckheck•37m ago
What process / path do you take to get to such an enlightened state? Like books or experience or anything more about this please?
vernrVingingIt•25m ago
Bachelors in electrical engineering, masters in math; elastic structures applied to modeling electrical systems.

Started career in late 90s designing boards for telecom companies network backbones.

rcbdev•27m ago
This sounds like the exact kind of profound pseudo-enlightenment that one gets from psychedelics. Of course, it's all electrons in the end.

Trying to create a secure, reliable and scalable system that enables many people to work on one code base, share their code around with others and at the end of the day coordinate this dance of electrons across multiple computers, that's where all of these 'useless' layers of abstraction become absolutely necessary.

vernrVingingIt•18m ago
Try almost 30 years in electrical engineering.

I know exactly what those layers of abstraction are used for. Why so many? Jobs making layers of abstraction.

But all of them are dev friendly means of modeling memory states for the CPU to watch and transform just so. They can all be compressed into a generic and generalized set of mathematical functions ridding ourselves of the various parser rules to manage each bespoke syntax inherent to each DSL, layers of framework.

nanomonkey•25m ago
There are some contradictory claims here.

Boilerplate comes when your language doesn't have affordances, you get around this with /abstraction/ which leads to DSLs (Domain Specific Languages).

Matrix math is generally done on more than raw bits provided by digital circuits. Simple things like numbers require some amount of abstraction and indirection (pointers to memory addresses that begin arrays).

My point is yes, we've gotten ourselves in a complicated tar pit, but it's not because there wasn't a simpler solution lower in the stack.

iwontberude•23m ago
And here I thought people just used computers for the heat
esafak•44m ago
That's what democratization looks like. And the new participants are happy!
BurningFrog•40m ago
Really good code reviewing AIs could handle this!
throwawaysleep•39m ago
Code review is an unfunded mandate. It is something the company demands while not really doing anything make sure people get rewarded for doing it.
Aurornis•29m ago
> while not really doing anything make sure people get rewarded for doing it.

I don’t know about you, but I get paychecks twice a month for doing things included in my job description.

georgeburdell•18m ago
My manager asked me to disable CI and gating code owner reviews “for 2 weeks” 6 months ago so people could commit faster. Just because it is in your job description doesn’t mean it won’t get shoved aside when it’s perceived as the bottleneck for the core mission.

Now we have nightly builds that nobody checks the result of and we’re finding out about bugs weeks later. Big company btw

Aurornis•30m ago
This mirrors my experience with the texting while driving problem: The debate started as angry complaints about how all the kids are texting while driving. Yet it’s a common problem for people of all ages. The worst offender I knew for years was in her 50s, but even she would get angry about the youths texting while driving.

Pretending it’s just the kids and young people doing the bad thing makes the outrage easier to sell to adults.

jennyholzer2•24m ago
the US media has been doing this with Black people to great effect for hundreds of years. (the tacitly hateful) Democrats are as guilty of it as (the openly hateful) Republicans.
hnthrow0287345•27m ago
>It's even worse than that: non-junior devs are doing it as well.

This might be unpopular, but that is seeming more like an opportunity if we want to continue allowing AI to generate code.

One of the annoying things engineers have to deal with is stopping whatever they're doing and doing a review. Obviously this gets worse if more total code is being produced.

We could eliminate that interruption by having someone doing more thorough code reviews, full-time. Someone who is not being bound by sprint deadlines and tempted to gloss over reviews to get back to their own work. Someone who has time to pull down the branch and actually run the code and lightly test things from an engineer's perspective so QA doesn't hit super obvious issues. They can also be the gatekeeper for code quality and PR quality.

sorokod•15m ago
> One of the annoying things engineers have to deal with is stopping whatever they're doing and doing a review.

I would have thought that reviewing PRs and doing it well is in the job description. You latter mention "someone" a few times - who that someone might be?

marcosdumay•12m ago
A full-time code reviewer will quickly lose touch with all practical matters and steer the codebase into some unmaintainable mess.

This is not the first time somebody had that idea.

immibis•5m ago
As I read once: all that little stuff that feels like it stops you from doing your job is your job.
lowkeyokay•26m ago
In the company I’m at this is beginning to happen. PM’s want to “prototype” new features and expect the engineers to finish up the work. With the expectation that it ‘just needs some polishing’. What would be your recommendation on how to handle this constructively? Flat out rejecting LLM as a prototyping tool is not an option.
lurking_swe•20m ago
sounds like a culture and management problem. CTO should set clear expectations for his staff and discuss with product to ensure there is alignment.

If i was CTO I would not be happy to hear my engineers are spending lots of time re-writing and testing code written by product managers. Big nope.

Our_Benefactors•17m ago
This could be workable with the understanding that throwing away 100% of the prototype code is acceptable and it’s primary purpose is as a communication tool, not a technical starting point.
rootusrootus•10m ago
This is how I've handled it so far. But that is probably because the PM that does this for me knew going in that they were not going to be generating something I'd want to become responsible for polishing and maintaining. It's basically just a fancier way of doing what they would otherwise use SketchUp for.
627467•19m ago
Just shove a code review agent in the middle. Problem solved
fragmede•10m ago
That startup is called CodeRabbit and damned if it doesn't come up with good suggestions sometimes. Other times you have to overrule it, or more likely create separate PRs for its suggestions, and avoid lumping a bunch of different stuff into a single PR, and sometimes it's stupid and doesn't know what it's talking about, and also misses stuff, so you do still need a human to review it. But if your at all place where LLMs are being used to generate large swaths of functional code, including tests, and human reviewers simply can't keep up, overall it does feels like a step forwards. I can't speak to how well other similar services do, but presumably they're not the only one that does that; CodeRabbit's just the one that my employer has chosen.
0x500x79•52m ago
I think there are two other things missing: Security and Maintainability. Working code that can never be touched again by a developer or requires an excessive amount of time to maintain is not part of a developers job either.

Overall, this hits the nail on the head about not delivering broken code and providing automated tests. Thanks for putting your thoughts on paper.

am17an•51m ago
Well a 1000 line PR is still not welcome. It puts too much of a burden on the maintainers. Small PRs are the way to go, tests are great too. If you have to submit a big PR, get buy in from a maintainer first that they will review your code.
mellosouls•49m ago
Thing is, this has always been the case. One of the problems with LLM-assisted coding is the idea that just because we're in a new era (we certainly are), the old rules can all be discarded.

The title doesn't go far enough - slop (AI or otherwise) can work and pass all the tests, and still be slop.

simonw•43m ago
The difference is that if it works and passes the tests I don't feel like it's a total waste of my time to look at the PR and tell you why it's still slop.

If it doesn't even work you're absolutely wasting my time with it.

layer8•35m ago
I’d go further and say while testing is necessary, it is not sufficient. You have to understand the code and convince yourself that it is logically correct under all relevant circumstances, by reasoning over the code.

Testing only “proves” correctness for the specific state, environment, configuration, and inputs the code was tested with. In practice that only tests a tiny portion of possible circumstances, and omits all kinds of edge and non-edge cases.

softwaredoug•30m ago
A lot of AI coding changes coding to more of a declarative practice.

Claude, etc, works best with good tests that verify the system works. And so the code becomes in some ways the tests rather than the code that does the thing. If you're responsible for the thing, then 90% of your responsibility moves to verifying behavior and giving agents feedback.

0xbadcafebee•29m ago
Actually it's more specific than that. A company pays you not just to "write code", not just to "write code that works", but to write code that works in the real world. Not on your laptop. Not in CI tests. Not on some staging environment. But in the real world. It may work fine in a theoretical environment, but deflate like a popped balloon in production. This code has no value to the business; they don't pay you to ship popped balloons.

Therefore you must verify it works as intended in the real world. This means not shipping code and hoping for the best, but checking that it actually does the right thing in production. And on top of that, you have to verify that it hasn't caused a regression in something else in production.

You could try to do that with tests, but tests aren't always feasible. Therefore it's important to design fail-safes into your code that ALERT YOU to unexpected or erroneous conditions. It needs to do more than just log an error to some logging system you never check - you must actually be notified of it, and you should consider it a flaw in your work, like a defective pair of Nikes on an assembly line. Some kind of plumbing must exist to take these error logs (or metrics, traces, whatever) and send it to you. Otherwise you end up producing a defective product, but never know it, because there's nothing in place to tell you its flaws.

Every single day I run into somebody's broken webapp or mobile app. Not only do the authors have no idea (either because they aren't notified of the errors, or don't care about them), there is no way for me to even e-mail the devs to tell them. I try to go through customer support, a chat agent, anything, and even they don't have a way to send in bug reports. They've insulated themselves from the knowledge of their own failures.

cynicalsecurity•5m ago
It gets interesting when a company assigns 2 story points to a task that requires 6 minimum. No time for writing tests, barely any time to perform code reviews and QA. Also, next year the company tells you since we have AI now, all tickets must be done 2 times quicker.

Who popped this balloon? I know I need to change my employer, but it's not so easy. And I'm not sure another employer is going to be any better.

rglover•28m ago
"Slow the f*ck down." - Oliver Reichenstein [1]

This only happens because the software industry has fallen into the Religion of Speed. I see it constantly: justified corner-cutting, rushing shit out the door, and always loading up another feature/project/whatever with absolutely zero self-awareness. AI is just an amplifier for bad behavior that was already causing chaos.

What's not being said here but should be: discipline matters. It's part of being a professional and always precedes someone who can ship code that "just works."

[1] https://ia.net/*

simonw•14m ago
A few years ago I embraced automated tests and comprehensive documentation for even my smallest personal projects because I found that they sped me up. https://simonwillison.net/2022/Nov/26/productivity/
acrophiliac•25m ago
Perhaps off-topic, but: "Testing doesn't show the absence of errors, it shows the presence of errors" Willison says we need to submit code we have proven to work but then argues for empirical testing, not actual correctness proofs.
simonw•17m ago
If you can formally prove correctness then brilliant, go for it!

That's not something I've seen or been able to achieve in most of my professional work.

trevor-e•25m ago
> Your job is to deliver code you have proven to work.

Strong disagree here, your job is to deliver solutions that help the business solve a problem. In _most_ cases that means delivering code that you should be able to confidently prove satisfies the requirements like the OP mentioned, but I think this is an important nitpick distinction I didn't understand until later on in my career.

just_once•24m ago
I don't know if there's a word for this but this reads to me as like, software virtue signaling or software patronizing. It's bizarre to me to tell an engineer what their job is as a matter of fact and to claim a particular usage of a tool as mandated (a tool that no one really asked for, mind you), leveraging duty of all things.

I guess to me, it's either the case that LLMs are just another tool, in which case the already existing teachings of best practice should cover them (and therefore the tone and some content of this article is unnecessary) or they're something totally new, in which case maybe some of the already existing teachings apply, but maybe not because it's so different that the old incentives can't reasonably take hold. Maybe we should focus a little bit more attention on that.

The article mentions rudeness, shifting burdens, wasting people's time, dereliction. Really loaded stuff and not a framing that I find necessary. The average person is just trying to get by, not topple a social contract. For that, look upwards.

dkural•20m ago
I've really seen both I suppose. A lot of devs don't take accountability / responsibility for their code, especially if they haven't done anything that actually got shipped and used, or in general haven't done much responsible adulting.
simonw•8m ago
LLMs are just another tool, but they're disruptive enough that existing best practices need to be either updated or re-explained.

A lot of people using LLMs seem not to have understood that you can't expect them to write code that works without testing it first!

If that wasn't clearly a problem I wouldn't have felt the need to write this.

morning-coffee•11m ago
Amen
nish__•9m ago
Good framing.
onion2k•7m ago
I want to distill this post into some sort of liquid I can inject directly into my dev teams. It's absolutely spot on. Seeing a PR with a change that doesn't build is one of the most disappointing things.
yuedongze•4m ago
it's very similar to the verification engineering problem i wrote about on HN last week. AI is as good as we can prove their work is genuine. and we need humans in the loop to fill in the gaps between autonomous systems and ultimately be held accountable by human laws. it's kind of sad but the reality we are facing