frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

No LLM Code in Dependencies

https://joeyh.name/blog/entry/no_LLM_code_in_dependencies/
42•edward•5h ago

Comments

skybrian•1h ago
Maybe an LLM could be used to check for this :)
neutrinobro•1h ago
Was this done by manually reviewing commit messages? I think it would be interesting/useful to have a tool that could use some basic heuristics about LLM generated code to detect code-blobs even if they are not explicitly called out in a commit message.
api•1h ago
Just like with writing, any kind of AI detection is going to be inaccurate to the point of snake oil.

LLM detection in writing is basically today's polygraph test pseudoscience. There was a blog a while ago where someone fed classic literature into one and it was detected as probably AI.

neutrinobro•59m ago
I'm not sure that is the case in this instance. Certainly general writing is a lot more variable and harder to classify, and on the other extreme certain one-line code changes don't have enough information to say anything. However, a blob with a 500+ line code change and 200+ lines of comments is a dead ringer for some of the current class of LLMs. That isn't to say it this behavior couldn't be obfuscated, but some basic categorization could probably separate the majority of human authored commits vs. AI commits. Heck, you could probably train an AI to detect commit-style just by using pre-2022 code archives and existing known-to-be-AI edits/commits.
verdverm•57m ago
An agent doesn't have to be perfect to be useful. If it can find clear examples of stuff you don't want to see in a (potential) dependency quickly, that will save you time. Give it search tools and some policies, then have it go find things. You then check them out, ask followups.

Agents as a super powered (re)search assistant is underrated.

perrygeo•46m ago
It's not just "the code itself looks LLM generated" - it's also LOC/hr by a particular author which suggests vibe coding. You could look at the author's github contributions to identify time periods when the author was generating code at super-human speeds. Combine the two signals and you might get something better than a pseudoscience?
zahlman•20m ago
The heuristics that would be used to "detect AI" here would be things that shouldn't be happening anyway, so false positives wouldn't matter.
dijksterhuis•1h ago
when i was reading this i thought of writing some quick and dirty cli tool that checks commit co-authors. wouldn't be perfect, but would eliminate a good chunk of low hanging fruit.
verdverm•1h ago
We are all figuring this new technology out and people will make mistakes. Would seem overreactionary to swear things off completely because of a single commit and reversion. Look for patterns in dependencies and your own work.
botfriendsarent•51m ago
I think this is a fair and normal reaction to AI slop. Alot of work though. I think OSS projects are at serious risk of implosion due to the vigilance required which honestly may end up being a fool's errand anyway.

But maybe we are thinking about it backward. Have you ever wondered why there is so much "free software"? Beware of strangers bearing gifts.

I have always wondered and been suspicious of people who are so eager for you to use their software. Which isnt to say OSS isnt high quality. Im just saying that maybe when people are pushing free software on you they are kind of in it for themselves.

As for whats next, me personally, last year I pulled all my personal repos about 80 of them off of bitbucket and self host that all now. I think OSS projects should setup a paywall and charge money to create PRs.

Like 10-100 bucks per PR to cover the cost of the extra vigilance. Also I could see migrations away from github, to AI free dependency hosting or something like that. Its an interesting challenge. But its not insurmountable.

Either paywall OSS projects or take them off the interwebs. Also one option the OP didnt explore I dont think is forking and freezing the dependencies. Huge maintenance burden, but its better than source corruption.

Also use fewer dependencies. Maybe set a limit of 5.

StableAlkyne•50m ago
Clicking through to https://git-annex.branchable.com/no_llm_code/

It looks like git after 2.22 was dropped because it took an LLM commit. Same with ghc.

If I have to choose between this or git and the latest ghc, I think I'm going to just wait for someone to fork annex.

I don't even feel strongly one way or the other on AI stuff; pragmatically, I'm just not going to stop using the most widely used version controller, or Haskell, just for some guy's (forkable, AGPL licensed) hobby project.

zahlman•27m ago
TFA is about the dependencies of this project. How does that prevent you from using those things yourself?
remywang•16m ago
> This will probably prevent git-annex from taking advantage of most new improvements to the Haskell language going forward. That is deeply unfortunate. This is the main reason why git-annex is not guaranteed to never change to depend on LLM generated code, because cutting it off from all future Haskell language improvements may be worse than the alternative.

Looks like they are aware, and git-annex has been around for decades written by one of the best Haskellers. “Some guys hobby project” is not fair

pseudalopex•16m ago
They said the non LLM dependency build was not default and could become untenable.

They said git-annex supports git back to 2.22. Not git after 2.22 was dropped.

An incompatible change in ghc would break compilation of other software also.

kstenerud•39m ago
This is a hill many people will choose to die on.

And the shan't be missed.

tuvix•31m ago
They will absolutely be missed, maybe not by any individual but the impact of them leaving will be felt. People willing to go to bat for code quality and who are also careful about copyright and the community aspect of open source is why this whole thing worked in the first place.
kstenerud•26m ago
Copyright won't be a problem. There's enough big business wrapped up in AI usage that the laws will bend towards them. Code quality and community don't die just because people haven't quite figured out how to use the new tools properly yet; quality merely dips for awhile, and the community continues as before. We survived PHP. We can survive this.
kordlessagain•4m ago
5 years ago, I would agree with you. But when you go ALL IN on LLM development, and use annealing with multi-agent harnesses, these issues disappear. One caveat: I build everything off other things that originated with my own hand written code. Auth for my site, for example. Also, most of my current projects are packed with advice I've rendered to the LLM on how git commits go down and cadence of those commits into deployments. Claude Code rarely fucks this up, and has memories and plan files that it updates if we find a hole. So, I'm comfortable with an occasional hiccup in the process. It'll get caught, eventually. Maybe. ;)

A recent analysis on my Claude Code prompts showed 1.5B input tokens over the last few months. I use 4-5 provider agents (all CLI) DAILY, so this is a small subset. I spend a lot of time using transcription services to drone on about how some agent fucked things up and how I want it fixed and how to do it.

To assist with that process, I'm currently building out a search engine that is exposed via MCP to allow auditing of the dev runs. I already have the foundation of file changes (ala Splunk style) that let me keep an eye on the agents, and an agentic terminal that allows one agent to keep an eye on what the other agent is whacking on. Combined with my constant badgering for proper systems development, these things are improving the process at an acclerated rate.

Look, I get being an "engineer" on these types of things, and I think there is an absolute purity in pushing LLM generated code out of a codebase you control. That said, that's not the ONLY way to do things, and your milage will vary based on your systems thinking hat. I prefer to push hard on getting the outcomes and sacrifice the exhaustive process of reviewing every single line of code.

Consider frameworks. They make things easier to do, if they are complete and stable. There's an argument here that LLM harnesses should probably not ALSO be maintained by LLMs (something I'm completely ignoring so probably ironic I'm mentioning it). But the point being is the harnesses SHOULD have eyes on most lines of code. Eyes on every package though? Hard to say. I've settled on doing most stuff in Rust nowadays, just because it keeps the LLM more honest. By bitching at it about code refactoring constantly, annealing the codebase by high level overview, not exhaustive review, I've found things get easier to work on as I go and still stay sane.

I do catch the LLMs occasionally hard coding things that belong in their own file or configs, and am a hardass about that and file length. I do read some code and hate it being overly long (and it sucks for burning tokens).

FWIW, I typed all this out on my keyboard myself. However, if I ran it through an LLM for cleanup or whatever, the very wall of text itself helps FORCE the LLM to stick to the substantive argument and steers it away from slop prompts. The same applies to code, if you are careful.

jsnell•25m ago
It's nicely symmetrical, because conversely I prefer my LLM-generated code to have no dependencies.
pull_my_finger•12m ago
Ethically, selling code or programs built on other peoples code without consent is wrong.

Legally, it's probably also unlawful, unless you believe that smoke they're selling that it was trained on code that was open licensed or in the public domain.

Professionally, it's a poor choice to ship code that wasn't produced with human care and consideration or even thorough oversight or understanding based on recent trends.

Software developers like to call themselves "engineers", but more and more they're showing they're more than happy to be configurators of black boxes of modular software. Whether that means pulling random NPM packages with thousands of other random packages as dependencies (none of which are even browsed or licenses checked), or "vibe coding" slop the LLM spits out.

When the main problem was people assembling random packages, I always likened it to "sandwich artists" at Subway. They just stand behind the counter and configure the product of random combinations of ingredients (someone else's NPM packages). Now it's like they can't even see the selection of ingredients, they just grab handfuls and shove it together until they get something sandwich shaped. Bad times in software.

LandoCalrissian•11m ago
Ah yes, open source will be better with less people who can actually write code.

Since Linux 6.9, LUKS suspend stopped wiping disk-encryption keys from memory

https://mathstodon.xyz/@iblech/116769502749142438
280•IngoBlechschmid•3h ago•136 comments

Exapunks (2018)

https://www.zachtronics.com/exapunks/
45•yu3zhou4•43m ago•13 comments

PeerTube is a free, decentralized and federated video platform

https://github.com/Chocobozzz/PeerTube
303•doener•8h ago•113 comments

Android Developer Verification: Threat masquerading as protection

https://f-droid.org/2026/07/01/adv-malware.html
1451•drewfax•16h ago•595 comments

Podman v6.0.0

https://blog.podman.io/2026/07/introducing-podman-v6-0-0/
112•soheilpro•5h ago•28 comments

How to ask for help from people who don't know you

https://pradyuprasad.com/writings/how-to-ask-for-help/
220•FigurativeVoid•6h ago•33 comments

Klara and the Sun Essay Contest – $1k Prize – AI Use Allowed

https://willpenman.com/klara/
11•fkozlowski•35m ago•9 comments

Launch HN: Manufact (YC S25) – MCP Cloud

https://manufact.com
76•pzullo•4h ago•50 comments

AI can't be listed as inventor on patent applications, Japan's top court rules

https://japannews.yomiuri.co.jp/science-nature/technology/20260306-314930/
280•mushstory•5h ago•147 comments

Spain Orders Blacklist of Palantir from Public and Private Companies

https://clashreport.com/world/articles/spain-orders-blacklist-of-us-tech-giant-palantir-from-publ...
274•mgh2•4h ago•60 comments

German button maker searched rivers of American Midwest for valuable shells

https://www.smithsonianmag.com/smithsonian-institution/how-one-german-button-maker-searched-the-r...
103•bookofjoe•4d ago•34 comments

Show HN: CLI tool for detecting non-exact code duplication with embedding models

https://github.com/rafal-qa/slopo
54•rkochanowski•5h ago•24 comments

Kimi K2.7 Code is generally available in GitHub Copilot

https://github.blog/changelog/2026-07-01-kimi-k2-7-is-now-available-in-github-copilot/
370•unliftedq•14h ago•155 comments

The Egg Bandits Made a Thousand Times the Fine They Just Paid for Price Fixing

https://www.thebignewsletter.com/p/crime-pays-the-egg-bandits-made-a
317•toomuchtodo•5h ago•145 comments

Is One Layer Enough? A Single Transformer Layer Matches Full-Parameter RL Train

https://arxiv.org/abs/2607.01232
110•tcp_handshaker•7h ago•28 comments

No LLM Code in Dependencies

https://joeyh.name/blog/entry/no_LLM_code_in_dependencies/
43•edward•5h ago•21 comments

The fall of the theorem economy

https://davidbessis.substack.com/p/the-fall-of-the-theorem-economy
211•varjag•11h ago•92 comments

Hazel (YC W24) Is Hiring for Our Largest Government Contract

https://www.ycombinator.com/companies/hazel-2/jobs/3epPWgu-full-stack-engineer-ts-sci
1•augustschen•6h ago

The primary purpose of code review is to find code that will be hard to maintain

https://mathstodon.xyz/@mjd/115096720350507897
262•ColinWright•7h ago•143 comments

Show HN: A graph paper generator that renders vector PDFs in the browser

https://freegraphpaper.net/
68•lam_hg94•5h ago•15 comments

Too many tables are bad for you

https://www.cybertec-postgresql.com/en/too-many-tables-are-bad/
8•0x54MUR41•2d ago•0 comments

Show HN: ZeroFS – A log-structured filesystem for S3

https://www.zerofs.net/
98•Eikon•5h ago•47 comments

WinPE as a stateless harness for Windows driver testing and fuzzing

https://bednars.me/blog/winpe-harness
70•piotrbednarsalt•4d ago•4 comments

CursorBench 3.1

https://cursor.com/evals
147•handfuloflight•14h ago•82 comments

Show HN: Mail Memories – A desktop app to rescue photos from Gmail

https://mailmemories.com
89•ltiger•5h ago•38 comments

How VictoriaLogs Stores Your Logs in a Columnar Layout

https://victoriametrics.com/blog/victorialogs-internals-columnar-storage-on-disk/index.html
15•eatonphil•4d ago•3 comments

Show HN: Claudoro, Pomodoro timer embedded in the Claude Code statusline

https://github.com/emson/claudoro
39•emson•1d ago•26 comments

Germany’s Infineon opens major chip plant as EU seeks tech autonomy

https://www.rfi.fr/en/international-news/20260702-germany-s-infineon-opens-major-chip-plant-as-eu...
169•giuliomagnifico•6h ago•49 comments

Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers

https://senior-swe-bench.snorkel.ai/
153•matt_d•16h ago•102 comments

Vite+ Beta

https://voidzero.dev/posts/announcing-vite-plus-beta
202•Erenay09•7h ago•127 comments