AI assistance when contributing to the Linux kernel

https://github.com/torvalds/linux/blob/master/Documentation/process/coding-assistants.rst

97•hmokiguess•3h ago

Comments

bitwize•2h ago

Good. The BSDs should follow suit. It is unreasonable to expect any developer not to use AI in 2026.

baggy_trough•2h ago

Sounds sensible.

ipython•2h ago

Glad to see the common-sense rule that only humans can be held accountable for code generated by AI agents.

pixel_popping•1h ago

Literally, insane that some projects blanket-ban AI despite being the human responsibility in the end.

daveguy•1h ago

Not insane at all. Just a very useful shortcut. Not everyone wants to move fast and break shit.

pixel_popping•1h ago

I still think it's insane, why would you care about the "origin" of the code as long as there is a human accountable (that you can ban anyway)?

59nadir•1h ago

Because you don't want to deal with people who can't write their own code. If they can, the rule will do nothing to stop them from contributing. It'll only matter if they simply couldn't make their contribution without LLMs.

pixel_popping•1h ago

So tomorrow, if a model genuinely find a bunch of real vulnerabilities, you just would ignore them? that makes no sense.

59nadir•1h ago

An LLM finding problems in code is not the same at all as someone using it to contribute code they couldn't write or haven't written themselves to a project. A report stating "There is a bug/security issue here" is not itself something I have to maintain, it's something I can react to and write code to fix, then I have to maintain that code.

streetfighter64•1h ago

If your doctor told you he used an ouija board to find your diagnosis, would you care about the origin of the diagnosis or just trust that he'll be accountable for it?

pixel_popping•43m ago

If the Ouija board was powered by Opus, who knows :D

pydry•1h ago

And yet it puts a stop to the tsunami of slop and it's pretty much impossible to prove anything of value was lost.

pixel_popping•1h ago

but why? it's a human making the PR and you can shame/ban that human anyway.

yoyohello13•1h ago

> it's a human making the PR

Is it? Remember when that agent wrote a hit piece about the maintainer because he wouldn't merge it's PR?

pixel_popping•1h ago

That's a different issue actually.

podgietaru•21m ago

Volume - things take time to review. If you’re inundated with so many PRs then it’s harder to curate in general

qsort•2h ago

Basically the rules are that you can use AI, but you take full responsibility for your commits and code must satisfy the license.

That's... refreshingly normal? Surely something most people acting in good faith can get behind.

galaxyLogic•1h ago

But then if AI output is not under GNU General Public License, how can it become so just because a Linux-developer adds it to the code-base?

afro88•1h ago

Same as if a regular person did the same. They are responsible for it. If you're using AI, check the code doesn't violate licenses

sarchertech•1h ago

How could you do that though? You can’t guarantee that there aren’t chunks of copied code that infringes.

shevy-java•1h ago

But the responsible party is still the human who added the code. Not the tool that helped do so.

sarchertech•1h ago

In a court case the responsibility party very well could be the Linux foundation because this is a foreseeable consequence of allowing AI contributions. There’s no reasonable way for a human to make such a guarantee while using AI generated code.

Chance-Device•1h ago

It’s not about the mechanism: responsibility is a social construct, it works the way people say that it works. If we all agree that a human can agree to bear the responsibility for AI outputs, and face any consequences resulting from those outputs, then that’s the whole shebang.

sarchertech•1h ago

Sure we could change the law. It would be a stupid change to allow individuals, organizations, and companies to completely shield themselves from the consequences of risky behaviors (more than we already do) simply by assigning all liability to a fall guy.

bpt3•1h ago

In this case, the "fall guy" is the person who actually introduced the code in question into the codebase.

They wouldn't be some patsy that is around just to take blame, but the actual responsible party for the issue.

sarchertech•38m ago

Imagine your a factory owner and you need a chemical delivered from across the country, but the chemical is dangerous and if the tanker truck drives faster than 50 miles per hour it has a 0.001% chance per mile of exploding.

You hire an independent contractor and tell him that he can drive 60 miles per hour if he wants to but if it explodes he accepts responsibility.

He does and it explodes killing 10 people. If the family of those 10 people has evidence you created the conditions to cause the explosion in order to benefit your company, you're probably going to lose in civil court.

Linus benefits from the increase velocity of people using AI. He doesn't get to put all the liability on the people contributing.

Chance-Device•59m ago

What law exactly are you suggesting needs to be changed? How is this any different from what already happens right now, today?

sarchertech•48m ago

Right now it's very easy not to infringe on copyrighted code if you write the code yourself. In the vast majority of cases if you infringed it's because you did something wrong that you could have prevented (in the case where you didn't do anything wrong, inducement creation is an affirmative defense against copyright infringement).

That is not the case when using AI generated code. There is no way to use it without the chance of introducing infringing code.

Because of that if you tell a user they can use AI generated code, and they introduce infringing code, that was a foreseeable outcome of your action. In the case where you are the owner of a company, or the head of an organization that benefits from contributors using AI code, your company or organization could be liable.

Chance-Device•33m ago

It’s a foreseeable outcome that humans might introduce copyrighted code into the kernel.

I think you’re looking for problems that don’t really exist here, you seem committed to an anti AI stance where none is justified.

sarchertech•7m ago

A human has to willingly violate the law for that to happen though. There is no way for a human to use AI generated that doesn't have a chance of producing copyrighted code though. That's just expected.

If you don't think this is a problem take a look at the terms of the enterprise agreements from OpenAI and Anthropic. Companies recognize this is an issue and so they were forced to add an indemnification clause, explicitly saying they'll pay for any damages resulting in infringement lawsuits.

Cytobit•1h ago

That's not going to shield the Linux organization.

aargh_aargh•1h ago

The practical concern of Linux developers regarding responsibility is not being able to ban the author, it's that the author should take ongoing care for his contribution.

Andrex•1h ago

Let me introduce you to the concept of submarine patents...

martin-t•1h ago

As opposed to an irregular person?

LLMs are not persons, not even legal ones (which itself is a massive hack causing massive issues such as using corporate finances for political gain).

A human has moral value a text model does not. A human has limitations in both time and memory available, a model of text does not. I don't see why comparisons to humans have any relevance. Just because a human can do something does not mean machines run by corporations should be able to do it en-masse.

The rules of copyright allow humans to do certain things because:

- Learning enriches the human.

- Once a human consumes information, he can't willingly forget it.

- It is impossible to prove how much a human-created intellectual work is based on others.

With LLMs:

- Training (let's not anthropomorphize: lossily-compressing input data by detecting and extracting patterns) enriches only the corporation which owns it.

- It's perfectly possible to create a model based only on content with specific licenses or only public domain.

- It's possible to trace every single output byte to quantifiable influences from every single input byte. It's just not an interesting line of inquiry for the corporations benefiting from the legal gray area.

rzmmm•41m ago

In certain law cases plagiarization can be influenced by the fact if person is exposed to the copyrighted work. AI models are exposed to very large corpus of works..

panzi•1h ago

If the output is public domain it's fine as I understand it.

galaxyLogic•1h ago

Makes sense to me. But so anybody can take Public Domain code and place it under GNU Public License (by dropping it into a Linux source-code file) ?

Surely the person doing so would be responsible for doing so, but are they doing anything wrong?

robinsonb5•1h ago

> Surely the person doing so would be responsible for doing so, but are they doing anything wrong?

You're perfectly at liberty to relicense public domain code if you wish.

The only thing you can't do is enforce the new license against people who obtain the code independently - either from the same source you did, or from a different source that doesn't carry your license.

cwnyth•1h ago

This is correct, and it's not limited to code. I can take the story of Cinderella, create something new out of it, copyright my new work, but Cinderella remains public domain for someone else to do something with.

If I use public domain code in a project under a license, the whole work remains under the license, but not the public domain code.

I'm not sure what the hullabaloo is about.

sambaumann•1h ago

Sqlite’s source code is public domain. Surely if you dropped the sqlite source code into Linux, it wouldn’t suddenly become GPL code? I’m not sure how it works

miki123211•1h ago

Linux code doesn't have to strictly be GPL-only, it just has to be GPL-compatible.

If your license allows others to take the code and redistribute it with extra conditions, your code can be imported into the kernel. AFAIK there are parts of the kernel that are BSD-licensed.

jaggederest•1h ago

The core thing about licenses, in general, is that they only grant new usage. If you can already use the code because it's public domain, they don't further restrict it. The license, in that case, is irrelevant.

Remember that licenses are powered by copyright - granting a license to non-copyrighted code doesn't do anything, because there's no enforcement mechanism.

This is also why copyright reform for software engineering is so important, because code entering the public domain cuts the gordian knot of licensing issues.

martin-t•1h ago

This ruling is IMO/IANAL based on lawyers and judges not understanding how LLMs work internally, falling for the marketing campaign calling them "AI" and not understanding the full implications.

LLM-creation ("training") involves detecting/compressing patterns of the input. Inference generates statistically probable based on similarities of patterns to those found in the "training" input. Computers don't learn or have ideas, they always operate on representations, it's nothing more than any other mechanical transformation. It should not erase copyright any more than synonym substitution.

timmmmmmay•45m ago

fortunately, you aren't only operating on representations, right? lemme check my Schopenhauer right quick...

supern0va•30m ago

>LLM-creation ("training") involves detecting/compressing patterns of the input.

There's a pretty compelling argument that this is essentially what we do, and that what we think of as creativity is just copying, transforming, and combining ideas.

LLMs are interesting because that compression forces distilling the world down into its constituent parts and learning about the relationships between ideas. While it's absolutely possible (or even likely for certain prompts) that models can regurgitate text very similar to their inputs, that is not usually what seems to be happening.

They actually appear to be little remix engines that can fit the pieces together to solve the thing you're asking for, and we do have some evidence that the models are able to accomplish things that are not represented in their training sets.

Kirby Ferguson's video on this is pretty great: https://www.youtube.com/watch?v=X9RYuvPCQUA

noosphr•41m ago

Tab complete does not produce copyrightable material either. Yet we don't require software to be written in nano.

jillesvangurp•41m ago

AIs are not human and therefore their output is a human authored contribution and only human authored things are covered by copyright. The work might hypothetically infringe on other people's copyright. But such an infringement does not happen until a human decides to create and distribute a work that somehow integrates that generated code or text.

The solution documented here seems very pragmatic. You as a contributor simply state that you are making the contribution and that you are not infringing on other people's work with that contribution under the GPLv2. And you document the fact that you used AI for transparency reasons.

There is a lot of legal murkiness around how training data is handled, and the output of the models. Or even the models themselves. Is something that in no way or shape resembles a copyrighted work (i.e. a model) actually distributing that work? The legal arguments here will probably take a long time to settle but it seems the fair use concept offers a way out here. You might create potentially infringing work with a model that may or may not be covered by fair use. But that would be your decision.

For small contributions to the Linux kernel it would be hard to argue that a passing resemblance of say a for loop in the contribution to some for loop in somebody else's code base would be anything else than coincidence or fair use.

shevy-java•1h ago

But why should AI then be attributed if it is merely a tool that is used?

plmpsu•1h ago

it makes sense to keep track of what model wrote what code to look for patterns, behaviors, etc.

streetfighter64•1h ago

It isn't?

> AI agents MUST NOT add Signed-off-by tags. Only humans can legally certify the Developer Certificate of Origin (DCO).

They mention an Assisted-by tag, but that also contains stuff like "clang-tidy". Surely you're not interpreting that as people "attributing" the work to the linter?

dataviz1000•2h ago

This is discussed in the Linus vs Linus interview, "Building the PERFECT Linux PC with Linus Torvalds". [0]

[0] https://youtu.be/mfv0V1SxbNA?si=CBnnesr4nCJLuB9D&t=2003

newsoftheday•1h ago

> All code must be compatible with GPL-2.0-only

How can you guarantee that will happen when AI has been trained a world full of multiple licenses and even closed source material without permission of the copyright owners...I confirmed that with several AI's just now.

tmp10423288442•1h ago

Wait for court cases I suppose - not really Linus Torvalds' job to guess how they'll rule on the copyright of mere training. Presumably having your AI actually consult codebases with incompatible licenses at runtime is more risky.

philipov•1h ago

You take responsibility. That means if the AI messes up, you get punished. No pushing blame onto the stupid computer. If you're not comfortable with that, don't use the AI.

sarchertech•1h ago

There’s no reasonable way for you to use AI generated code and guarantee it doesn’t infringe.

The whole use it but if it behaves as expected, it’s your fault is a ridiculous stance.

philipov•1h ago

If you think it's an unacceptable risk to use a tool you can't trust when your own head is on the line, you're right, and you shouldn't use it. You don't have to guarantee anything. You just have to accept punishment.

sarchertech•1h ago

That’s just it though it’s not just your head. The liability could very likely also fall on the Linux foundation.

You can’t say “you can do this thing that we know will cause problems that you have no way to mitigate, but if it does we’re not liable”. The infringement was a foreseeable consequence of the policy.

philipov•9m ago

This policy effectively punts on the question of what tools were used to create the contribution, and states that regardless of how the code was made, only humans may be considered authors.

From the foundation's point of view, humans are just as capable of submitting infringing code as AI is, so the only requirement . If your argument is sound, then how can Linux accept contributors at all?

streetfighter64•1h ago

Yeah, but that's not a useful thing to do because not everybody thinks about that or considers it a problem. If somebody's careless and contributes copyrighted code, that's a problem for linux too, not only the author.

For comparison, you wouldn't say, "you're free to use a pair of dice to decide what material to build the bridge out of, as long as you take responsibility if it falls down", because then of course somebody would be careless enough to build a bridge that falls down.

Preventing the problem from the beginning is better than ensuring you have somebody to blame for the problem when it happens.

adikso•26m ago

Their position is probably that LLM technology itself does not require training on code with incompatible licenses, and they probably also tend to avoid engaging in the philosophical debate over whether LLM-generated output is a derivative copy or an original creation (like how humans produce similar code without copying after being exposed to code). I think that even if they view it as derivative, they're being pragmatic - they don't want to block LLM use across the board, since in principle you can train on properly licensed, GPL-compatible data.

newsoftheday•1h ago

> That means if the AI messes up

I'm not talking about maintainability or reliability. I'm talking about legal culpability.

dec0dedab0de•1h ago

All code must be compatible with GPL-2.0-only

Am I being too pedantic if I point out that it is quite possible for code to be compatible with GPL-2.0 and other licenses at the same time? Or is this a term that is well understood?

philipov•1h ago

GPL-2.0-only is the name of a license. One word. It is an alternative to GPL-2.0-or-later.

compyman•1h ago

You might be being too pedantic :)

https://spdx.org/licenses/GPL-2.0-only.html It's a specific GPL license (as opposed to GPL 2.0-later)

martin-t•1h ago

This feels like the OSS community is giving up.

LLMs are lossily-compressed models of code and other text (often mass-scraped despite explicit non-consent) which has licenses almost always requiring attribution and very often other conditions. Just a few weeks ago a SOTA model was shown to reproduce non-trivial amounts of licensed code[0].

The idea of intelligence being emergent from compression is nothing new[1]. The trick here is giving up on completeness and accuracy in favor of a more probabilistic output which

1) reproduces patterns and interpolates between patterns of training data while not always being verbatim copies

2) serves as a heuristic when searching the solution-space which is further guided by deterministic tools such as compilers, linters, etc. - the models themselves quite often generate complete nonsense, including making up non-existent syntax in well-known mainstream languages such as C#.

I strongly object to anthropomorphising text transformers (e.g. "Assisted-by"). It encourages magical thinking even among people who understand how the models operate, let alone the general public.

Just like stealing fractional amounts of money[3] should not be legal, violating the licenses of the training data by reusing fractional amounts from each should not be legal either.

[0]: https://news.ycombinator.com/item?id=47356000

[1]: http://prize.hutter1.net/

[2]: https://en.wikipedia.org/wiki/ELIZA_effect

[3]: https://skeptics.stackexchange.com/questions/14925/has-a-pro...

tmp10423288442•1h ago

On https://news.ycombinator.com/item?id=47356000, it looks like the user there was intentionally asking about the implementation of the Python chardet library before asking it to write code, right? Not surprising the AI would download the library to investigate it by default, or look for any installed copies of `chardet` on the local machine.

martin-t•1h ago

The comment says "Opus 4.6 without tool use or web access"

KK7NIL•1h ago

> I strongly object to anthropomorphising text transformers (e.g. "Assisted-by").

I don't think this is anthropomorphising, especially considering they also include non-LLM tools in that "Assisted-by" section.

We're well past the Turing test now, whether these things are actually sentient or not is of no pragmatic importance if we can't distinguish their output from a sentient creature, especially when it comes to programming.

martin-t•1h ago

Would you say "assisted by vim" or "assisted by gcc"?

It should be either something like "(partially/completely) generated by" or if you want to include deterministic tools, then "Tools-used:".

The Turing test is an interesting thought experiment but we've seen it's easy for LLMs to sound human-like or make authoritative and convincing statements despite being completely wrong or full of nonsense. The Turing test is not a measure of intelligence, at least not an artificial one. (Though I find it quite amusing to think that the point at which a person chooses to refer to LLMs as intelligence is somewhat indicative of his own intelligence level.)

> whether these things are actually sentient or not is of no pragmatic importance if we can't distinguish their output from a sentient creature, especially when it comes to programming

It absolutely makes a difference: you can't own a human but you can own an LLM (or a corporation which is IMO equally wrong as owning a human).

Humans have needs which must be continually satisfied to remain alive. Humans also have a moral value (a positive one - at least for most of us) which dictates that being rendered unable to remain alive is wrong.

Now, what happens if LLMs have the same legal standing as humans and are thus able to participate in the economy in the same manner?

zbentley•1h ago

If a linter insists on a weird line of code, I’m probably commenting that line as “recommended by whatever-linter”, yes.

shevy-java•1h ago

Fork the kernel!

Humans for humans!

Don't let skynet win!!!

aruametello•24m ago

> Fork the kernel!

pre "clanker-linux".

I am more intrigued by the inevitable Linux distro that will refuse any code that has AI contributions in it.

sarchertech•1h ago

This does nothing to shield Linux from responsibility for infringing code.

This is essentially like a retail store saying the supplier is responsible for eliminating all traces of THC from their hemp when they know that isn’t a reasonable request to make.

It’s a foreseeable consequence. You don’t get to grant yourself immunity from liability like this.

SirHumphrey•1h ago

Quite a lot of companies use and release AI written code, are they all liable?

sarchertech•1h ago

1. Almost definitely if discovered

2. Infringement in closed source code isn’t as likely to be discovered

3. OpenAI and Anthropic enterprise agreements agree to indemnify (pay for damages essentially) companies for copyright issues.

lowsong•1h ago

At least it'll make it easy to audit and replace it all in a few years.

spwa4•52m ago

Why does this file have an extension of .rst? What does that even mean for the fileformat?

adikso•50m ago

reStructuredText. Just like you have .md files everywhere.

jdreaver•48m ago

https://en.wikipedia.org/wiki/ReStructuredText

This format really took off in the Python community in the 2000's for documentation. The Linux kernel has used it for documentation as well for a while now.

the_biot•43m ago

Linux has fallen. Linus Torvalds is now just another vibe coder. I give it less than a year, or maybe a month, until Linux gets vibe-coded patches approved by LLMs.

Open source is dead, having had its code stolen for use by vibe-coding idiots.

Make no mistake, this is the end of an era.

ninjagoo•24m ago

  > Signed-Off ...
  > The human submitter is responsible for:
    > Reviewing all AI-generated code
    > Ensuring compliance with licensing requirements
    > Adding their own Signed-off-by tag to certify the DCO
    > Taking full responsibility for the contribution

  > Attribution: ... Contributions should include an Assisted-by tag in the following format:

Responsibility assigned to where it should lie. Expected no less from Torvalds, the progenitor of Linux and Git. No demagoguery, no b*.

I am sure that this was reviewed by attorneys before being published as policy, because of the copyright implications.

Hopefully this will set the trend and provide definitive guidance for a number of Devs that were not only seeing the utility behind ai assistance but also the acrimony from some quarters, causing some fence-sitting.

themafia•10m ago

> All contributions must comply with the kernel's licensing requirements:

I just don't think that's realistically achievable. Unless the models themselves can introspect on the code and detect any potential license violations.

If you get hit with a copyright violation in this scheme I'd be afraid that they're going to hammer you for negligence of this obvious issue.

Chimpanzees in Uganda locked in eight-year 'civil war', say researchers

1D Chess

WireGuard makes new Windows release following Microsoft signing resolution

Industrial design files for Keychron keyboards and mice

JSON Formatter Chrome Plugin Now Closed and Injecting Adware

Watgo – A WebAssembly Toolkit for Go

Show HN: FluidCAD – Parametric CAD with JavaScript

Helium Is Hard to Replace

What is RISC-V and why it matters to Canonical

Nowhere Is Safe

CPU-Z and HWMonitor compromised

Launch HN: Twill.ai (YC S25) – Delegate to cloud agents, get back PRs

Bluesky April 2026 Outage Post-Mortem

A security scanner as fast as a linter – written in Rust

You can't trust macOS Privacy and Security settings

Bild AI (YC W25) Is Hiring a Founding Product Engineer

Clojure on Fennel Part One: Persistent Data Structures

Show HN: A WYSIWYG word processor in Python

Simulating a 2D Quadcopter from Scratch

Show HN: Eve – Managed OpenClaw for work

AI assistance when contributing to the Linux kernel

Combining spicy foods with mint boosts anti-inflammatory effects 100x or more

The best seat in town

Molotov cocktail is hurled at home of Sam Altman

The difficulty of making sure your website is broken

A compelling title that is cryptic enough to get you to take action on it

Mysteries of Dropbox: Testing of a Distributed Sync Service (2016) [pdf]

DOJ Top Antitrust Litigators Exit After Ticketmaster Accord

FBI used iPhone notification data to retrieve deleted Signal messages

DOJ wants to scrap Watergate-era rule that makes presidential records public

Chimpanzees in Uganda locked in eight-year 'civil war', say researchers

1D Chess

WireGuard makes new Windows release following Microsoft signing resolution

Industrial design files for Keychron keyboards and mice

JSON Formatter Chrome Plugin Now Closed and Injecting Adware

Watgo – A WebAssembly Toolkit for Go

Show HN: FluidCAD – Parametric CAD with JavaScript

Helium Is Hard to Replace

What is RISC-V and why it matters to Canonical

Nowhere Is Safe

CPU-Z and HWMonitor compromised

Launch HN: Twill.ai (YC S25) – Delegate to cloud agents, get back PRs

Bluesky April 2026 Outage Post-Mortem

A security scanner as fast as a linter – written in Rust

You can't trust macOS Privacy and Security settings

Bild AI (YC W25) Is Hiring a Founding Product Engineer

Clojure on Fennel Part One: Persistent Data Structures

Show HN: A WYSIWYG word processor in Python

Simulating a 2D Quadcopter from Scratch

Show HN: Eve – Managed OpenClaw for work

AI assistance when contributing to the Linux kernel

Combining spicy foods with mint boosts anti-inflammatory effects 100x or more

The best seat in town

Molotov cocktail is hurled at home of Sam Altman

The difficulty of making sure your website is broken

A compelling title that is cryptic enough to get you to take action on it

Mysteries of Dropbox: Testing of a Distributed Sync Service (2016) [pdf]

DOJ Top Antitrust Litigators Exit After Ticketmaster Accord

FBI used iPhone notification data to retrieve deleted Signal messages

DOJ wants to scrap Watergate-era rule that makes presidential records public

AI assistance when contributing to the Linux kernel

Comments