AI assistance when contributing to the Linux kernel

https://github.com/torvalds/linux/blob/master/Documentation/process/coding-assistants.rst

62•hmokiguess•2h ago

Comments

bitwize•1h ago

Good. The BSDs should follow suit. It is unreasonable to expect any developer not to use AI in 2026.

baggy_trough•57m ago

Sounds sensible.

ipython•51m ago

Glad to see the common-sense rule that only humans can be held accountable for code generated by AI agents.

pixel_popping•30m ago

Literally, insane that some projects blanket-ban AI despite being the human responsibility in the end.

daveguy•26m ago

Not insane at all. Just a very useful shortcut. Not everyone wants to move fast and break shit.

pixel_popping•25m ago

I still think it's insane, why would you care about the "origin" of the code as long as there is a human accountable (that you can ban anyway)?

59nadir•18m ago

Because you don't want to deal with people who can't write their own code. If they can, the rule will do nothing to stop them from contributing. It'll only matter if they simply couldn't make their contribution without LLMs.

pixel_popping•16m ago

So tomorrow, if a model genuinely find a bunch of real vulnerabilities, you just would ignore them? that makes no sense.

59nadir•10m ago

An LLM finding problems in code is not the same at all as someone using it to contribute code they couldn't write or haven't written themselves to a project. A report stating "There is a bug/security issue here" is not itself something I have to maintain, it's something I can react to and write code to fix, then I have to maintain that code.

pydry•20m ago

And yet it puts a stop to the tsunami of slop and it's pretty much impossible to prove anything of value was lost.

pixel_popping•19m ago

but why? it's a human making the PR and you can shame/ban that human anyway.

qsort•51m ago

Basically the rules are that you can use AI, but you take full responsibility for your commits and code must satisfy the license.

That's... refreshingly normal? Surely something most people acting in good faith can get behind.

galaxyLogic•42m ago

But then if AI output is not under GNU General Public License, how can it become so just because a Linux-developer adds it to the code-base?

afro88•39m ago

Same as if a regular person did the same. They are responsible for it. If you're using AI, check the code doesn't violate licenses

sarchertech•30m ago

How could you do that though? You can’t guarantee that there aren’t chunks of copied code that infringes.

shevy-java•22m ago

But the responsible party is still the human who added the code. Not the tool that helped do so.

sarchertech•18m ago

In a court case the responsibility party very well could be the Linux foundation because this is a foreseeable consequence of allowing AI contributions. There’s no reasonable way for a human to make such a guarantee while using AI generated code.

Chance-Device•1m ago

It’s not about the mechanism: responsibility is a social construct, it works the way people say that it works. If we all agree that a human can agree to bear the responsibility for AI outputs, and face any consequences resulting from those outputs, then that’s the whole shebang.

Cytobit•11m ago

That's not going to shield the Linux organization.

aargh_aargh•9m ago

The practical concern of Linux developers regarding responsibility is not being able to ban the author, it's that the author should take ongoing care for his contribution.

martin-t•14m ago

As opposed to an irregular person?

LLMs are not persons, not even legal ones (which itself is a massive hack causing massive issues such as using corporate finances for political gain).

A human has moral value a text model does not. A human has limitations in both time and memory available, a model of text does not. I don't see why comparisons to humans have any relevance. Just because a human can do something does not mean machines run by corporations should be able to do it en-masse.

The rules of copyright allow humans to do certain things because:

- Learning enriches the human.

- Once a human consumes information, he can't willingly forget it.

- It is impossible to prove how much a human-created intellectual work is based on others.

With LLMs:

- Training (let's not anthropomorphize: lossily-compressing input data by detecting and extracting patterns) enriches only the corporation which owns it.

- It's perfectly possible to create a model based only on content with specific licenses or only public domain.

- It's possible to trace every single output byte to quantifiable influences from every single input byte. It's just not an interesting line of inquiry for the corporations benefiting from the legal gray area.

panzi•39m ago

If the output is public domain it's fine as I understand it.

galaxyLogic•32m ago

Makes sense to me. But so anybody can take Public Domain code and place it under GNU Public License (by dropping it into a Linux source-code file) ?

Surely the person doing so would be responsible for doing so, but are they doing anything wrong?

robinsonb5•25m ago

> Surely the person doing so would be responsible for doing so, but are they doing anything wrong?

You're perfectly at liberty to relicense public domain code if you wish.

The only thing you can't do is enforce the new license against people who obtain the code independently - either from the same source you did, or from a different source that doesn't carry your license.

cwnyth•14m ago

This is correct, and it's not limited to code. I can take the story of Cinderella, create something new out of it, copyright my new work, but Cinderella remains public domain for someone else to do something with.

If I use public domain code in a project under a license, the whole work remains under the license, but not the public domain code.

I'm not sure what the hullabaloo is about.

sambaumann•20m ago

Sqlite’s source code is public domain. Surely if you dropped the sqlite source code into Linux, it wouldn’t suddenly become GPL code? I’m not sure how it works

miki123211•16m ago

Linux code doesn't have to strictly be GPL-only, it just has to be GPL-compatible.

If your license allows others to take the code and redistribute it with extra conditions, your code can be imported into the kernel. AFAIK there are parts of the kernel that are BSD-licensed.

martin-t•6m ago

This ruling is IMO/IANAL based on lawyers and judges not understanding how LLMs work internally, falling for the marketing campaign calling them "AI" and not understanding the full implications.

LLM-creation ("training") involves detecting/compressing patterns of the input. Inference generates statistically probable based on similarities of patterns to those found in the "training" input. Computers don't learn or have ideas, they always operate on representations, it's nothing more than any other mechanical transformation. It should not erase copyright any more than synonym substitution.

shevy-java•23m ago

But why should AI then be attributed if it is merely a tool that is used?

dataviz1000•45m ago

This is discussed in the Linus vs Linus interview, "Building the PERFECT Linux PC with Linus Torvalds". [0]

[0] https://youtu.be/mfv0V1SxbNA?si=CBnnesr4nCJLuB9D&t=2003

newsoftheday•36m ago

> All code must be compatible with GPL-2.0-only

How can you guarantee that will happen when AI has been trained a world full of multiple licenses and even closed source material without permission of the copyright owners...I confirmed that with several AI's just now.

tmp10423288442•32m ago

Wait for court cases I suppose - not really Linus Torvalds' job to guess how they'll rule on the copyright of mere training. Presumably having your AI actually consult codebases with incompatible licenses at runtime is more risky.

philipov•31m ago

You take responsibility. That means if the AI messes up, you get punished. No pushing blame onto the stupid computer. If you're not comfortable with that, don't use the AI.

sarchertech•26m ago

There’s no reasonable way for you to use AI generated code and guarantee it doesn’t infringe.

The whole use it but if it behaves as expected, it’s your fault is a ridiculous stance.

philipov•23m ago

If you think it's an unacceptable risk to use a tool you can't trust when your own head is on the line, you're right, and you shouldn't use it. You don't have to guarantee anything. You just have to accept punishment.

sarchertech•13m ago

That’s just it though it’s not just your head. The liability could very likely also fall on the Linux foundation.

You can’t say “you can do this thing that we know will cause problems that you have no way to mitigate, but if it does we’re not liable”. The infringement was a foreseeable consequence of the policy.

newsoftheday•6m ago

> That means if the AI messes up

I'm not talking about maintainability or reliability. I'm talking about legal culpability.

dec0dedab0de•32m ago

All code must be compatible with GPL-2.0-only

Am I being too pedantic if I point out that it is quite possible for code to be compatible with GPL-2.0 and other licenses at the same time? Or is this a term that is well understood?

philipov•27m ago

GPL-2.0-only is the name of a license. One word. It is an alternative to GPL-2.0-or-later.

compyman•26m ago

You might be being too pedantic :)

https://spdx.org/licenses/GPL-2.0-only.html It's a specific GPL license (as opposed to GPL 2.0-later)

martin-t•26m ago

This feels like the OSS community is giving up.

LLMs are lossily-compressed models of code and other text (often mass-scraped despite explicit non-consent) which has licenses almost always requiring attribution and very often other conditions. Just a few weeks ago a SOTA model was shown to reproduce non-trivial amounts of licensed code[0].

The idea of intelligence being emergent from compression is nothing new[1]. The trick here is giving up on completeness and accuracy in favor of a more probabilistic output which

1) reproduces patterns and interpolates between patterns of training data while not always being verbatim copies

2) serves as a heuristic when searching the solution-space which is further guided by deterministic tools such as compilers, linters, etc. - the models themselves quite often generate complete nonsense, including making up non-existent syntax in well-known mainstream languages such as C#.

I strongly object to anthropomorphising text transformers (e.g. "Assisted-by"). It encourages magical thinking even among people who understand how the models operate, let alone the general public.

Just like stealing fractional amounts of money[3] should not be legal, violating the licenses of the training data by reusing fractional amounts from each should not be legal either.

[0]: https://news.ycombinator.com/item?id=47356000

[1]: http://prize.hutter1.net/

[2]: https://en.wikipedia.org/wiki/ELIZA_effect

[3]: https://skeptics.stackexchange.com/questions/14925/has-a-pro...

tmp10423288442•13m ago

On https://news.ycombinator.com/item?id=47356000, it looks like the user there was intentionally asking about the implementation of the Python chardet library before asking it to write code, right? Not surprising the AI would download the library to investigate it by default, or look for any installed copies of `chardet` on the local machine.

martin-t•4m ago

The comment says "Opus 4.6 without tool use or web access"

KK7NIL•5m ago

> I strongly object to anthropomorphising text transformers (e.g. "Assisted-by").

I don't think this is anthropomorphising, especially considering they also include non-LLM tools in that "Assisted-by" section.

We're well past the Turing test now, whether these things are actually sentient or not is of no pragmatic importance if we can't distinguish their output from a sentient creature, especially when it comes to programming.

shevy-java•23m ago

Fork the kernel!

Humans for humans!

Don't let skynet win!!!

sarchertech•20m ago

This does nothing to shield Linux from responsibility for infringing code.

This is essentially like a retail store saying the supplier is responsible for eliminating all traces of THC from their hemp when they know that isn’t a reasonable request to make.

It’s a foreseeable consequence. You don’t get to grant yourself immunity from liability like this.

SirHumphrey•5m ago

Quite a lot of companies use and release AI written code, are they all liable?

sarchertech•2m ago

1. Almost definitely if discovered

2. Infringement in closed source code isn’t as likely to be discovered

3. OpenAI and Anthropic enterprise agreements agree to indemnify (pay for damages essentially) companies for copyright issues.

Show HN: DecisionNode – shared structured memory for all AI coding tools via MCP

Beyond Copy-and-Paste: How Game Studios Are Reorganizing Around AI (Research)

Climbing the infinite ladder of abstraction (2016)

I automated most of my job

Meta pulls ads aimed at recruiting plaintiffs for social media lawsuits

AI trained like a Rubik's Cube solver simplifies particle physics equations

Choosing a Programming Language (2016)

Ads in AI Chatbots? An Analysis of How LLMs Navigate Conflicts of Interest

Show HN: Agentic Web :handshake: Human Web

Abstraction is the only thing that has ever scaled

The acyclic e-graph: Cranelift's mid-end optimizer

Open Source card game cuttle.cards has its world championship Saturday at 1pm ET

Claude Mythos: 'It's a PR Scam' – Ed Zitron [video]

Apollo 8, Artemis 1 and 2 Orbit Comparison [video]

Building C/C++ libraries for HarmonyOS with vcpkg

ESA Launches 7 New Missions to Supercharge Space Data Transfer

Microsoft says Windows 11's bugs are all "resolved"

Show HN: I built a tool to bootstrap VLESS and REALITY over SSH (with rollback)

Show HN: Django app for email-based learning platforms

Scanners are too late for AI-driven actions

Combining spicy foods with mint boosts anti-inflammatory effects 100x or more

AIs diagnose people with obviously fake eye disease "bixonimania"

Anti-Distillation for Employee Skills

From Coal Tip to Clean Energy: Turning UK Coal Mines into Renewable Powerhouses

Lemlist Outage Postmortem

Nono – Runtime safety infrastructure for AI agents

Time Brings Order to the Universe

Plans to possibly retire the big-endian PowerPC/POWER platforms

We're running out of benchmarks to upper bound AI capabilities

Molotov cocktail thrown at home of OpenAI chief executive Sam Altman