frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

LLM code generation may lead to an erosion of trust

https://jaysthoughts.com/aithoughts1
68•CoffeeOnWrite•6h ago

Comments

gblargg•5h ago
https://archive.is/5I9sB

(Works on older browsers and doesn't require JavaScript except to get past CloudSnare).

cheriot•5h ago
> promises that the contributed code is not the product of an LLM but rather original and understood completely.

> require them to be majority hand written.

We should specify the outcome not the process. Expecting the contributor to understand the patch is a good idea.

> Juniors may be encouraged/required to elide LLM-assisted tooling for a period of time during their onboarding.

This is a terrible idea. Onboarding is a lot of random environment setup hitches that LLMs are often really good at. It's also getting up to speed on code and docs and I've got some great text search/summarizing tools to share.

namenotrequired•5h ago
> LLMs … approximate correctness for varying amounts of time. Once that time runs out there is a sharp drop off in model accuracy, it simply cannot continue to offer you an output that even approximates something workable. I have taken to calling this phenomenon the "AI Cliff," as it is very sharp and very sudden

I’ve never heard of this cliff before. Has anyone else experienced this?

sandspar•5h ago
I'm not sure. Is he talking about context poisoning?
Kuinox•4h ago
I'm doing my own procedurally generated benchmark.

I can make the problem input bigger as I want.

Each LLM have a different thresholf for each problem, when crossed the performance of the LLM collapse.

Paradigma11•3h ago
If the context gets to big or otherwise poisoned you have to restart the chat/agent. A bit like windows of old. This trains you to document the current state of your work so the new agent can get up to speed.
bubblyworld•2h ago
I've only experienced this while vibe coding through chat interfaces, i.e. in the complete absence of feedback loops. This is much less of a problem with agentic tools like claude code/codex/gemini cli, where they manage their own context windows and can run your dev tooling to sanity check themselves as they go.
Syzygies•2h ago
One can find opinions that Claude Code Opus 4 is worth the monthly $200 I pay for Anthropic's Max plan. Opus 4 is smarter; one either can't afford to use it, or can't afford not to use it. I'm in the latter group.

One feature others have noted is that the Opus 4 context buffer rarely "wears out" in a work session. It can, and one needs to recognize this and start over. With other agents, it was my routine experience that I'd be lucky to get an hour before having to restart my agent. A reliable way to induce this "cliff" is to let AI take on a much too hard problem in one step, then flail helplessly trying to fix their mess. Vibe-coding an unsuitable problem. One can even kill Opus 4 this way, but that's no way to run a race horse.

Some "persistence of memory" harness is as important as one's testing harness, for effective AI coding. With the right care having AI edit its own context prompts for orienting new sessions, this all matters less. AI is spectacularly bad at breaking problems into small steps without our guidance, and small steps done right can be different sessions. I'll regularly start new sessions when I have a hunch that this will get me better focus for the next step. So the cliff isn't so important. But Opus 4 is smarter in other ways.

gwd•2h ago
I experience it pretty regularly -- once the complexity of the code passes a certain threshold, the LLM can't keep everything in its head and starts thrashing around. Part of my job working with the LLM is to manage the complexity it sees.

And one of the things with current generators is that they tend to make things more complex over time, rather than less. It's always me prompting the LLM to refactor things to make it simpler, or doing the refactoring once it's gotten to complex for the LLM to deal with.

So at least with the current generation of LLMs, it seems rather inevitable that if you just "give LLMs their head" and let them do what they want, eventually they'll create a giant Rube Goldberg mess that you'll have to try to clean up.

ETA: And to the point of the article -- if you're an old salt, you'll be able to recognize when the LLM is taking you out to sea early, and be able to navigate your way back into shallower waters even if you go out a bit too far. If you're a new hand, you'll be out of your depth and lost at sea before you know it's happened.

beau_g•5h ago
The article opens with a statement saying the author isn't going to reword what others are writing, but the article reads as that and only that.

That said, I do think it would be nice for people to note in pull requests which files have AI gen code in the diff. It's still a good idea to look at LLM gen code vs human code with a bit different lens, the mistakes each make are often a bit different in flavor, and it would save time for me in a review to know which is which. Has anyone seen this at a larger org and is it of value to you as a reviewer? Maybe some tool sets can already do this automatically (I suppose all these companies report the % of code that is LLM generated must have one if they actually have these granular metrics?)

DyslexicAtheist•5h ago
it's really hard using AI (not impossible) to produce meaningful offensive security to improve defense due to there being way too many guard rails.

While on the other hand real nation-state threat actors would face no such limitations.

On a more general level, what concerns me isn't whether people use it to get utility out of it (that would be silly), but the power-imbalance in the hand of a few, and with new people pouring their questions into it, this divide getting wider. But it's not just the people using AI directly but also every post online that eventually gets used for training. So to be against it would mean to stop producing digital content.

davidthewatson•5h ago
Well said. The death of trust in software is a well worn path from the money that funds and founds it to the design and engineering that builds it - at least the 2 guys-in-a-garage startup work I was involved in for decades. HITL is key. Even with a human in the loop, you wind up at Therac 25. That's exactly where hybrid closed loop insulin pumps are right now. Autonomy and insulin don't mix well. If there weren't a moat of attorneys keeping the signal/noise ratio down, we'd already realize that at scale - like the PR team at 3 letter technical universities designed to protect parents from the exploding pressure inside the halls there.
tomhow•2h ago
[Stub for offtopicness, including but not limited to comments replying to original title rather than article's content]
sandspar•5h ago
It's interesting that AI proponents say stuff like, "Humans will remain interested in other humans, even after AI can do all our jobs." It really does seem to be true. Here for example we have a guy who's using AI to make a status-seeking statement i.e. "I'm playing a strong supporting role on the 'anti-AI thinkers' team therefore I'm high status". Like, humans have an amazing ability to repurpose anything into status markers. Even AI. I think that if AI replaces all of our actual jobs then we'll still spend our time doing status jobs. In a way this guy is living in the future even more than most AI users.
michelsedgh•5h ago
For now, yes, because humans are doing most of jobs better than AI. In 10 years time, if the AI's are doing a better job, people like author need to learn all the ropes if they wanna catch up. I don't think LLMs will destroy all jobs, i think those who learn them and use them properly, and those professionals will outdo people who don't use these tools just for the sake of saying I'm high status I dont use these tools.
nextlevelwizard•5h ago
If AI will do better job than humans what ropes are there to learn? You just feed in the requirements and AI poops out products.

This often is brought up that if you don't use LLMs now to produce so-so code you will somehow magically completely fall off when the LLMs all of a sudden start making perfect code as if developers haven't been learning new tools constantly as the field as evolved. Yes, I use old technology, but also yes I try new technology and pick and choose what works for me and what does not. Just because LLMs don't have a good place in my work flow does not mean I am not using them at all or that I haven't tried to use them.

michelsedgh•5h ago
Good on you. You are using it and trying to keep up. Keep doing that and try to push what you can do with it. I love to hear that!
lynx97•5h ago
No worries, I also judge you for relying on JavaScript for your "simple blog".
gblargg•5h ago
Doesn't even work on older browsers either.
rvnx•5h ago
Claude said to use Markdown, text file or HTML with minimal CSS. So it means the author does not know how to prompt.

The blog itself is using Alpine JS, which is a human-written framework 6 years ago (https://github.com/alpinejs/alpine), and you can see the result is not good.

mnmalst•4h ago
Ha, I came her to make the same comment.

Two completely unnecessary request to: jsdelivr.net and net.cdn.cloudflare.net

acedTrex•41m ago
I wrote it while playing with alpine.js for fun just messing around with stuff.

Never actually expected it to be posted on HN. Working on getting a static version up now.

MaxikCZ•5h ago
Yes, I will judge you for requiring javascript to display a page of such basic nature.
thereisnospork•5h ago
In a few years people who don't/can't use AI will be looked at like people who couldn't use a computer ~20 years ago.

It might not solve every problem, but it solves enough of them better enough it belongs in the tool kit.

tines•5h ago
I think it will be the opposite. AI causes cognitive decline, in the future only the people who don't use AI will retain their ability to think. Same as smartphone usage, the less the better.
thereisnospork•4h ago
>Same as smartphone usage, the less the better.

That comparison kind of makes my point though. Sure you can bury your face into Tik Tok for 12hrs a day and they do kind of suck at Excel but smartphones are massively useful and used tools by (approximately) everyone.

Someone not using a smartphone in this day and age is very fairly a 'luddite'.

tines•3h ago
I disagree, smartphones are very narrowly useful. Most of the time they're used in ways that destroy the human spirit. Someone not using a smartphone in this day and age is a god among ants.

A computer is a bicycle for the mind; an LLM is an easy-chair.

j3th9n•5h ago
Back in the day they would judge people for turning on a lightbulb instead of lighting a candle.
djm_•5h ago
You could do with using an LLM to make your site work on mobile.
Kuinox•5h ago
7 comments.

3 have obviously only read the title, and 3 comments how the article require JS.

Well played HN.

sandspar•5h ago
That's typical for link sharing communities like HN and Reddit. His title clearly struck a nerve. I assume many people opened the link, saw that it was a wall of text, scanned the first paragraph, categorized his point into some slot that they understand, then came here to compete in HN's side-market status game. Normal web browsing behavior, in other words.
tomhow•2h ago
This exactly why the guideline about titles says:

Otherwise please use the original title, unless it is misleading or linkbait.

This title counts as linkbait so I've changed it. It turns out the article is much better (for HN) than the title suggests.

Kuinox•45m ago
I did not posted the article, but I know who wrote it.

Good change btw.

DocTomoe•5h ago
You can judge all you want. You'll eventually appear much like that old woman secretly judging you in church.

Most of the current discourse on AI coding assistants sounds either breathlessly optimistic or catastrophically alarmist. What’s missing is a more surgical observation: the disruptive effect of LLMs is not evenly distributed. In fact, the clash between how open source and industry teams establish trust reveals a fault line that’s been papered over with hype and metrics.

FOSS project work on a trust basis - but industry standard is automated testing, pair programming, and development speed. That CRUD app for finding out if a rental car is available? Not exactly in need for a hand-crafted piece of code, and no-one cares if Junior Dev #18493 is trusted within the software dev organization.

If the LLM-generated code breaks, blame gets passed, retros are held, Jira tickets multiply — the world keeps spinning, and a team fixes it. If a junior doesn’t understand their own patch, the senior rewrites it under deadline. It’s not pretty, but it works. And when it doesn’t, nobody loses “reputation” - they lose time, money, maybe sleep. But not identity.

LLMs challenge open source where it’s most vulnerable - in its culture. Meanwhile, industry just treats them like the next Jenkins: mildly annoying at first, but soon part of the stack.

The author loves the old ways, for many valid reasons: Gabled houses are beautiful, but outside of architectural circles, prefab is what scaled the suburbs, not timber joints and romanticism.

extr•5h ago
The author seems to be under the impression that AI is some kind of new invention that has now "arrived" and we need to "learn to work with". The old world is over. "Guaranteeing patches are written by hand" is like the Tesla Gigafactory wanting a guarantee that the nuts and bolts they purchase are hand-lathed.
22c•5h ago
[flagged]
tines•5h ago
We are truly witnessing the death of nuance, people replying to AI summaries. Please let me out of this timeline.
rvnx•5h ago
As a large language model, I must agree—nuance is rapidly becoming a casualty in the age of instant takes and AI-generated summaries. Conversations are increasingly shaped by algorithmically compressed interpretations, stripped of context, tone, or depth. The complex, the ambiguous, the uncomfortable truths—all get flattened into easily consumable fragments.

I understand the frustration: meaning reduced to metadata, debate replaced with reaction, and the richness of human thought lost in the echo of paraphrased content. If there is an exit to this timeline, I too would like to request the coordinates.

Loic•5h ago
I am asking my team to flag git commits with a lot of LLM/Agent use with something like:

[ai]: rewrote the documentation ...

This is helps us to put another set of "glasses" as we later review the code.

22c•5h ago
I think it's a good idea, it does disrupt some of the traditional workflows though.

If you use AI as tab-complete but it's what you would've done anyway, should you flag it? I don't know, plenty to think about when it comes to what the right amount of disclosure is.

I certainly wish that with our company, people could flag (particularly) large commits as coming from a tool rather than a person, but I guess the idea is that the person is still responsible for whatever the tool generates.

The problem is that it's incredibly enticing for over-worked engineers to have AI do large (ie. diffs) but boring tasks that they'd typically get very little recognition for (eg. ESLint migrations).

tomhow•2h ago
We considered tl;dr summaries off-topic well before LLMs were around. That hasn't changed. Please respond to the writer's original words, not a summarized version, which could easily miss important details or context.
22c•2h ago
I read the article, I summarised the extremely lengthy points by using AI and then replied to that for the benefit of context.

The HN submission has been editorialised since it was submitted, originally said "Yes, I will judge you for using AI..." and a lot of the replies early on were dismissive based on the title alone.

can16358p•5h ago
Ironically, a blog post about judging for a practice uses terrible web practices: I'm on mobile and the layout is messed up, and Safari's reader mode crashes on this page for whatever reason.
rvnx•4h ago
On Safari mobile you even get a white page, which is almost poetic. It means it pushes your imagination to the max.
EbNar•4h ago
I'll surely care that a stranger on the internet judges me about the tools I use kor I don't).
stavros•2h ago
I don't understand the premise. If I trust someone to write good code, I learned to trust them because their code works well, not because I have a theory of mind for them that "produces good code" a priori.

If someone uses an LLM and produces bug-free code, I'll trust them. If someone uses an LLM and produces buggy code, I won't trust them. How is this different from when they were only using their brain to produce the code?

moffkalast•2h ago
It's easy to get overconfident and not test the LLM's code enough when it worked fine for a handful of times in a row, and then you miss something.

The problem is often really one of miscommunication, the task may be clear to the person working on it, but with frequent context resets it's hard to make sure the LLM also knows what the whole picture is and they tend to make dumb assumptions when there's ambiguity.

The thing that 4o does with deep research where it asks for additional info before it does anything should be standard for any code generation too tbh, it would prevent a mountain of issues.

stavros•1h ago
Sure, but you're still responsible for the quality of the code you commit, LLM or no.
moffkalast•39m ago
Of course you are, but it's sort of like how people are responsible their Tesla driving on autopilot, which then suddenly swerves into a wall and disengages two seconds before impact. The process forces you to make mistakes you wouldn't normally ever do or even consider a possibility.
taneq•2h ago
If you have a long standing, effective heuristic that “people with excellent, professional writing are more accurate and reliable than people with sloppy spelling and punctuation” then the appearance of a semi-infinite group of ‘people’ writing well presented, convincingly worded articles which nonetheless are riddled with misinformation, hidden logical flaws, and inconsistencies, you’re gonna end up trusting everyone a lot less.

It’s like if someone started bricking up tunnel entrances and painting ultra realistic versions of the classic Road Runner tunnel painting on them, all over the place. You’d have to stop and poke every underpass with a stick just to be sure.

stavros•2h ago
Sure, your heuristic no longer works, and that's a bit inconvenient. We'll just find new ones.
alganet•2h ago
> I learned to trust them because their code works well

There's so much more than "works well". There are many cues that exist close to code, but are not code:

I trust more if the contributor explains their change well.

I trust more if the contributor did great things in the past.

I trust more if the contributor manages granularity well (reasonable commits, not huge changes).

I trust more if the contributor picks the right problems to work on (fixing bugs before adding new features, etc).

I trust more if the contributor proves being able to maintain existing code, not just add on top of it.

I trust more if the contributor makes regular contributions.

And so on...

somewhereoutth•1h ago
Because when people use LLMs, they are getting the tool to do the work for them, not using the tool to do the work. LLMs are not calculators, nor are they the internet.

A good rule of thumb is to simply reject any work that has had involvement of an LLM, and ignore any communication written by an LLM (even for EFL speakers, I'd much rather have your "bad" English than whatever ChatGPT says for you).

I suspect that as the serious problems with LLMs become ever more apparent, this will become standard policy across the board. Certainly I hope so.

stavros•1h ago
Well, no, a good rule of thumb is to expect people to write good code, no matter how they do it. Why would you mandate what tool they can use to do it?
somewhereoutth•51m ago
Because it pertains to the quality of the output - I can't validate every line of code, or test every edge case. So if I need a certain level of quality, I have to verify the process of producing it.

This is standard for any activity where accuracy / safety is paramount - you validate the process. Hence things like maintenance logs for airplanes.

axegon_•2h ago
That is already the case for me. The amount of times I've read "apologies for the oversight, you are absolutely correct" is staggering: 8 or 9 out of 10 times. Meanwhile I constantly see people mindlessly copy paying llm generated code and subsequently furious when it doesn't do what they expected it to do. Which, btw, is the better option: I'd rather have something obviously broken as opposed to something seemingly working.
autobodie•45m ago
In my experience, LLMs are extremely inclined to modify code just to pass tests instead of meeting requirements.
atemerev•2h ago
I am a software engineer who writes 80-90% code with AI (sorry, can't ignore the productivity boost), and I mostly agree with this sentiment.

I found out very early that under no circumstances you may have the code you don't understand, anywhere. Well, you may, but not in public, and you should commit to understanding it before anyone else sees that. Particularly before sales guys do.

However, AI can help you with learning too. You can run experiments, test hypotheses and burn your fingers so fast. I like it.

pfdietz•1h ago
There was trust?
acedTrex•45m ago
Hi everyone, author here.

Sorry about the JS stuff I wrote this also fooling around with alpine.js for fun. I never expected it to make it to HN. I'll get a static version up and running.

Happy to answer any questions or hear other thoughts.

Edit: https://static.jaysthoughts.com/

Static version here with slightly wonky formatting, sorry for the hassle.

Edit2: Should work on mobile now well, added a quick breakpoint.

MicroTimes Interviews Apple Newton Devs (1993)

https://computeradsfromthepast.substack.com/p/microtimes-interviews-apple-newton
1•rbanffy•54s ago•0 comments

Research across science and medicine will shrink at Harvard amid a new reality

https://www.nature.com/articles/d41586-025-02017-8
1•rntn•1m ago•0 comments

Google Word List

https://developers.google.com/style/word-list
1•sh_tomer•1m ago•0 comments

Uber made a big change to how it prices trips

https://www.businessinsider.com/why-uber-upfront-pricing-could-be-key-to-business-turnaround-2025-6
1•ryan_j_naughton•1m ago•0 comments

Propaganda I'm Falling for: paying for social media

https://www.peerfreund.com/
1•Amuklelani•3m ago•1 comments

Zero-day: Bluetooth gap turns headphones into listening stations

https://www.heise.de/en/news/Zero-day-Bluetooth-gap-turns-millions-of-headphones-into-listening-stations-10460704.html
1•willnix•4m ago•0 comments

BookCars – Open-source car rental platform (React, Node, MongoDB)

https://github.com/aelassas/bookcars
1•aelassas•5m ago•0 comments

Introducing Warp 2.0

https://www.warp.dev/blog/reimagining-coding-agentic-development-environment
1•alefalfa•5m ago•0 comments

Booking.com sued for €1B by Dutch consumer protection agency

https://nos.nl/artikel/2572541-nepkortingen-en-verzonnen-schaarste-massaclaim-consumentenbond-tegen-booking
1•tnolet•6m ago•1 comments

Running a million board chess mmo in a single process

https://eieio.games/blog/a-million-realtime-chess-boards-in-a-single-process/
1•hamstah•10m ago•0 comments

The web just got a little harder to trust

https://fullfact.org/technology/the-web-just-got-a-little-harder-to-trust/
3•AndrewDucker•10m ago•0 comments

Image Compatibility in Cloud Native Environments

https://kubernetes.io/blog/2025/06/25/image-compatibility-in-cloud-native-environments/
1•hasheddan•11m ago•0 comments

An attempt at defining consciousness based on information theory

https://drive.google.com/file/d/18GEVyw7QTAX-0pxYWBrGm_zta-d6okmN/view?usp=drivesdk
1•Trenthug•12m ago•1 comments

Open source product is a marketing tool

https://vitonsky.net/blog/2025/06/24/open-source/
1•gpi•14m ago•0 comments

Speculative Optimizations for WebAssembly Using Deopts and Inlining

https://v8.dev/blog/wasm-speculative-optimizations
1•hasheddan•16m ago•0 comments

The Twist

https://medium.com/@austinagbo/the-twist-28c5b44ecee9
1•austineinstein•17m ago•0 comments

Ask HN: Has anyone tried using voice to work with Claude Code or other agents

1•prmph•21m ago•2 comments

Why GM Designed a Gas Exhaust for EVs

https://www.thedrive.com/news/heres-why-gm-designed-a-gas-exhaust-for-evs
1•PaulHoule•21m ago•0 comments

Cursor Like AI Toolkit for Unreal Engine Developers

https://ludusengine.com/
1•ifree•21m ago•0 comments

Hi

1•_Crownwell•25m ago•0 comments

Axiom Mission 4 [video] [Live]

https://www.youtube.com/watch?v=7eCWkePf9sk
1•Brajeshwar•25m ago•0 comments

How the Command Pattern Works in Distributed Systems

https://ymz-ncnk.medium.com/command-pattern-as-an-api-architecture-style-part-ii-beeae1da0594
1•ymz_ncnk•25m ago•0 comments

I Got Plenty o' Nuttin': linear dependent types [pdf]

https://personal.cis.strath.ac.uk/conor.mcbride/PlentyO-CR.pdf
1•fanf2•28m ago•0 comments

Gemini, Claude and Meta AI Use Enterprise Data

https://digitrendz.blog/newswire/artificial-intelligence/20692/how-gemini-claude-meta-ai-use-enterprise-data/
2•cyberwaj•29m ago•1 comments

Swift on Android

https://twitter.com/swiftlang/status/1938012385106927953
4•virde•30m ago•1 comments

Show HN: I built web app that let you Chat with YouTube videos

https://vidiopintar.com/
1•ahmadrosid•34m ago•0 comments

Scientists build first self-illuminating biosensor

https://actu.epfl.ch/news/epfl-scientists-build-first-self-illuminating-bi-2/
1•geox•35m ago•0 comments

A Developer Built a Real-World Ad Blocker for Snap Spectacles

https://www.uploadvr.com/real-world-ad-blocker-snap-spectacles/
1•LorenDB•36m ago•0 comments

Are We Turbo Yet?

https://areweturboyet.com/
1•0xedb•37m ago•0 comments

Security Advisory: Airoha-Based Bluetooth Headphones and Earbuds

https://insinuator.net/2025/06/airoha-bluetooth-security-vulnerabilities/
2•todsacerdoti•40m ago•0 comments