Gentoo AI Policy

https://wiki.gentoo.org/wiki/Project:Council/AI_policy

71•simonpure•4h ago

Comments

perching_aix•1h ago

Dated 2024-04-14 and features nothing special.

tptacek•1h ago

Interestingly --- while I doubt it would make a difference to the decision Gentoo in particular would make --- the cost/benefit of LLMs for coding changed sharply just a month or two after this, when the first iteration of foundation models tuned for effective agents came out. People forget that effective coding agents are just a couple minutes old; the first research preview release of Claude Code was this past February.

blibble•1h ago

> the cost/benefit of LLMs for coding changed sharply just a month or two after thi

no, "AI" was dogshit a year ago when post was written, "AI" is dogshit today, and "AI" will still be dogshit in a year's time

and if it was worth using (which it isn't), there's still the other two points: ethics and copyright

and don't tell me to "shove this concern up your ass."

(quoted verbatim from Ptacek's magnum opus: https://fly.io/blog/youre-all-nuts/)

jatora•50m ago

lol.

tldr: "cope"

tptacek•42m ago

We're not supposed to write comments like this.

malfist•41m ago

> the cost/benefit of LLMs for coding changed sharply just a month or two after this

People say this every month.

tptacek•20m ago

Do they? I'm referring to something specific. While I happen to think LLM coding agents are pretty great, my point didn't depend on you thinking that, only on a recognition of the fact that the capabilities of these systems sharply changed very shortly after they published this --- in a very specific, noticeable way.

notherhack•40m ago

Important point. A lot has changed in coding AIs since then.

ares623•1h ago

Maybe we’ll see a (new) distro with AI assisted maintainers. That would be an interesting experiment.

Unfortunately one caveat would be it will be difficult to separate the maintainers from the financial incentives, so it won’t be a fair comparison. (e.g. the labs funding full time maintainers with salaries and donations that other distros can only dream of)

logicprog•1h ago

There are reasonable ethical concerns one may have with AI (around data center impacts on communities, and the labor used to SFT and RLHF them), but these aren't:

> Commercial AI projects are frequently indulging in blatant copyright violations to train their models.

I thought we (FOSS) were anti copyright?

> Their operations are causing concerns about the huge use of energy and water.

This is massively overblown. If they'd specifically said that their concerns were around the concentrated impact of energy and water usage on specific communities, fine, but then you'd have to have ethical concerns about a lot of other tech including video streaming; but the overall energy and water usage of AI contributed to by the actual individual use of AI to, for instance, generate a PR, is completely negligible on the scale of tech products.

> The advertising and use of AI models has caused a significant harm to employees and reduction of service quality.

Is this talking about automation? You know what else automated employees and can often reduce service quality? Software.

> LLMs have been empowering all kinds of spam and scam efforts.

So did email.

bleepblap•1h ago

>> Commercial AI projects are frequently indulging in blatant copyright violations to train their models. > I thought we (FOSS) were anti copyright?

Absolutely not! Every major FOSS license has copyright as its enforcement method -- "if you don't do X (share code with customers, etc depending on license) you lose the right to copy the code"

AdieuToLogic•1h ago

>> Commercial AI projects are frequently indulging in blatant copyright violations to train their models.

> I thought we (FOSS) were anti copyright?

No free and open source software (FOSS) distribution model is "anti-copyright." Quite to the contrary, FOSS licenses are well defined[0] and either address copyright directly or rely on copyright being retained by the original author.

0 - https://opensource.org/licenses

bombcar•54m ago

Some of the ideas behind the GPL could be anti-copyright, insofar as the concept they’d love to see is software being uncopyrightable.

Veedrac•31m ago

I get why water use is the sort of nonsense that spreads around mainstream social media, but it baffles me how a whole council of nerds would pass a vote on a policy that includes that line.

simianwords•25m ago

Because it is ideologically motivated.

CursedSilicon•11m ago

>I thought we (FOSS) were anti copyright?

FOSS still has to exist within the rules of the system the planet operates under. You can't just say "I downloaded that movie, but I'm a Linux user so I don't believe in copyright" and get away with it

>the overall energy and water usage of AI contributed to by the actual individual use of AI to, for instance, generate a PR, is completely negligible on the scale of tech products.

[citation needed]

>Is this talking about automation? You know what else automated employees and can often reduce service quality? Software.

Disingenuous strawman. Tech CEO's and the like have been exuberant at the idea that "AI" will replace human labor. The entire end-goal of companies like OpenAI is to create a "super-intelligence" that will then generate a return. By definition the AI would be performing labor (services) for capital, outcompeting humans to do so. Unless OpenAI wants it to just hack every bank account on Earth and transfer it all to them instead? Or something equally farcical

>So did email.

"We should improve society somewhat"

"Ah, but you participate in society! Curious!"

AdieuToLogic•1h ago

Perhaps the most telling portion of their decision is:

  Quality concerns. Popular LLMs are really great at 
  generating plausibly looking, but meaningless content. They 
  are capable of providing good assistance if you are careful 
  enough, but we can't really rely on that. At this point, 
  they pose both the risk of lowering the quality of Gentoo 
  projects, and of requiring an unfair human effort from 
  developers and users to review contributions and detect the 
  mistakes resulting from the use of AI.

The first non-title sentence is the most notable to consider, with the rest providing reasoning difficult to refute.

perching_aix•1h ago

> with the rest providing reasoning difficult to refute

It's literally just an opinion.

AdieuToLogic•1h ago

>> with the rest providing reasoning difficult to refute

> It's literally just an opinion.

The definition of "refute"[0] is:

  to prove wrong by argument or evidence : show to be false or erroneous

0 - https://www.merriam-webster.com/dictionary/refute

perching_aix•1h ago

Thanks for the dictionary link, here's one for you too: https://www.merriam-webster.com/dictionary/opinion

You may notice that opinions are like assholes: everyone has theirs. They're literally just "thoughts and feelings". They may masquerade as arguments from time to time, much to my dismay, but rest assured: there's nothing to "refute", debate, or even dispute on them. Not in general, nor in this specific case either.

Consider:

"This show is shit. The pacing is terrible, and the writing is cringe. Nobody should watch this garbage."

A reasoning most difficult to refute indeed. It's shit, because the person thinks it's shit, just in two other, more specific ways. And nobody should watch it, because they think it's shit, and (presumably) think that shit shows shouldn't be watched. A most terrific derivation of all times for sure.

Thinking an opinion can be refuted is analogous to thinking that definitions can be proven false. Which is to say, utterly misguided and plain wrong. They (opinions) are assertions of belief, with any use of logic in them just being rhetorical garnish, if not outright a diversion.

sgarland•53m ago

But definitions can and are proven false. I hate it, mind you, but I can’t ignore it. For example, the usage of “literally” as an intensifier, e.g. “I literally died of laughter.”

Eisenstein•48m ago

But that is their whole point -- as much as you want to make the definition something else, you can't. And this is a perfect example of that.

perching_aix•45m ago

Logical statements can be proven true/false. Definitions are not logical statements, they do not have truth values, therefore cannot be proven neither true, nor false. These are mathematical logic basics.

AdieuToLogic•30m ago

There are three possible explanations for the published policy I can identify. If there are others, please feel free to share them.

  1 - Publicity stunt

  In an effort to get more attention for the Gentoo project,
  the maintainers created an outlandish policy to drive
  traffic.  This would seem unlikely due to the policy
  decision being voted upon over a year ago.

  2 - Fear of LLM's replacing Gentoo maintainers

  This appears to not be the case based on the Gentoo minutes[0] provided:

  Policy on AI contributions and tooling
  ======================================

  Motion from the email thread:
  > It is expressly forbidden to contribute to Gentoo any content that has
  > been created with the assistance of Natural Language Processing
  > artificial intelligence tools.  This motion can be revisited, should
  > a case been made over such a tool that does not pose copyright, ethical
  > and quality concerns.

  The vote was 6y/0n/1a (all present members voted yes).

  sam noted as obiter dicta that the mail also mentioned:
  > This explicitly covers all GPTs, including ChatGPT and Copilot, which is
  > the category causing the most concern at the moment.  At the same time,
  > it doesn't block more specific uses of machine learning to problem
  > solving.

  Several council members noted that we will revisit the policy if and
  when circumstances change and that it isn't intended to permanent,
  at least not in its current form.

  3 - Experience with LLM-based change requests

  If the policy is neither a publicity stunt nor fear of
  LLM's replacing maintainers, then the simplest explanation
  remaining which substantiates the policy is maintainers
  having experience with LLM use and then publishing their
  decisions therein.

0 - https://projects.gentoo.org/council/meeting-logs/20240414-su...

perching_aix•25m ago

Was this meant in response to what I wrote or did you mean to post this elsewhere in the thread? If the former, I'm not sure what am I supposed to do with this.

ants_everywhere•4m ago

You're missing a very important reason

4 - There is a very active anti-LLM activist movement and they care more about participating in it than they care about free software.

For example, see their rationale, which are just canned anti-LLM activist talking points. You see the same ones repeated and memed ad nauseam if you lurk on anti-AI spaces.

johnfn•25m ago

But it's also difficult to prove it correct by argument or evidence. "Refute" is typically used in a context that suggests that the thing we're refuting has a strong likelihood of being true. This is only difficult to prove incorrect because it's a summary of the author's opinion.

jjmarr•10m ago

I've been using AI to contribute to LLVM, which has a liberal policy.

The code is of terrible quality and I am at 100+ comments on my latest PR.

That being said, my latest PR is my second-ever to LLVM and is an entire linter check. I am learning far more about compilers at a much faster pace than if I took the "normal route" of tiny bugfixes.

I also try to do review passes on my own code before asking for code review to show I care about quality.

LLMs increase review burden a ton but I would say it can be a fair tradeoff.

paulcole•8m ago

How is it telling at all?

It’s just what every other tech bro on here wants to believe, that using LLM code is somehow less pure than using free-range-organic human written code.

hsbauauvhabzb•1h ago

> Their operations are causing concerns about the huge use of energy and water.

I’d be curious how much energy gentoo consumes versus a binary distro.

hjdjeiejd•1h ago

This is on-brand.

There was a time that I used Gentoo, and may again one day, but for the past N years, I’ve not had time to compile everything from source, and compiling from source is a false sense of security, since you still don’t know what’s been compromised (it could be the compiler, etc.), and few have the time or expertise to adequately review all of the code.

It can be a waste of energy and time to compile everything from source for standard hardware.

But, when I’m retired, maybe I’ll use it again just for the heck of it. And I’m glad that Gentoo exists.

atrettel•57m ago

At least when I used Gentoo, the point of compiling from source was more about customization than security. I remember having to set so many different options. It was quite granular. Now I just compile certain things from scratch and modify them as needed rather than having an entire system like Gentoo do that, but I do see the appeal to some people.

bombcar•51m ago

This is exactly why I use it where I use it - on my servers. I don’t need to compile X or X support for programs that could have it, because they’re headless.

mikepurvis•50m ago

Nix is another route as far as a compile-from-source package manager with lots of options on many packages.

sgarland•49m ago

Granted, I wasn’t into Arch at the time, but in the mid-aughts, Gentoo’s forums were a massively useful resource for Linux knowledge in general. That’s why I used it, anyway. The joy of getting an obscure sound card (Chaintech AV-710) to work in Linux, and sharing that knowledge with others, was enough.

jimmaswell•27m ago

I use it on some systems so strong that most emerges hardly take much longer than a binary package install. It's pretty nice there.

danpalmer•1h ago

This is a prime example of poor AI policy. It doesn't define what AI is – is using Google translate in order to engage on their mailing lists allowed? Is using Intellisense-like tools that we've had for decades allowed? The rationale is also poor, citing concerns that can be applied far more widely than just LLMs. The ethical concerns are pretty hand-wavy, I'm pretty sure email is used to empower spam and yet I suspect Gentoo have no problem using email.

The end result is not necessarily a bad one, and I think reasonable for a project like Gentoo to go for, but the policy could be stated in a much better way.

For example: thou shalt only contribute code that is unencumbered by copyright issues, contributions must be of a high quality and repeated attempts to submit poor quality contributions may result in new contributions not being reviewed/accepted. As for the ethical concerns, they could just take a position by buying infrastructure from companies that align with their ethics, or not accepting corporate donations (time or money) from companies that they disagree with.

Spivak•1h ago

Or because this is a policy by and for human adults who all understand what we're talking about you just don't accept contributions from anyone obviously rule-lawyering in bad faith.

This isn't a court system, anyone intentionally trying to test the boundaries probably isn't someone you want to bother with in the first place.

dmead•59m ago

> It doesn't define what AI is

this is a bad faith comment.

malfist•38m ago

The whole argument smacks of bad faith "yet you participate in society" arguments.

simianwords•38m ago

> Ethical concerns. The business side of AI boom is creating serious ethical concerns. Among them: Commercial AI projects are frequently indulging in blatant copyright violations to train their models. Their operations are causing concerns about the huge use of energy and water. The advertising and use of AI models has caused a significant harm to employees and reduction of service quality. LLMs have been empowering all kinds of spam and scam efforts.

Highly disingenuous. First, AI being trained on copyrighted data is considered fair use because it transforms the underlying data rather than distribute it as is. Though I have to agree that this is the relatively strongest ethical claim to stop using AI but stands weak if looked at on the whole.

The fact that they mentioned "energy and water use" should tell you that they are really looking for reasons to disparage AI. AI doesn't use any more water or energy than any other tool. An hour of Netflix uses same energy as more than 100 GPT questions. A single 10 hour flight (per person*) emits as much as around 100k GPT prompts. It is strange that one would repeat the same nonsense about AI without primary motive being ideological.

"The advertising and use of AI models has caused a significant harm to employees and reduction of service quality." this is just a shoddy opinion at this point.

To be clear - I understand why they might ban AI for code submissions. It reduces the barrier significantly and increases the noise. But the reasoning is motivated from a wrong place.

ses1984•20m ago

The idea that models are transformative is debatable. Works with copyright are the thing that imbues the model with value. If that statement isn’t true, then they can just exclude those works and nothing is lost, right?

Also, half the problem isn’t distribution, it’s how those works were acquired. Even if you suppose models 44are transformative, you can’t just download stuff from piratebay. Buy copies, scan them, rip them, etc.

It’s super not cool that billion dollar vc companies can just do that.

simianwords•12m ago

> In Monday's order, Senior U.S. District Judge William Alsup supported Anthropic's argument, stating the company's use of books by the plaintiffs to train their AI model was acceptable.

"The training use was a fair use," he wrote. "The use of the books at issue to train Claude and its precursors was exceedingly transformative."

I agree it is debatable but it is not so cut and clear that it is _not_ transformative when a judge has ruled that it is.

shmerl•18m ago

I don't get this idea. Transformative works don't automatically equal fair use - copyright covers all kind of transformative works.

CursedSilicon•15m ago

That's quite a strawman definition of "copyright infringement" especially given the ongoing Anthropic lawsuit

It's not a question of if feeding all the worlds books into a blender and eating the resulting slurry paste is copyright infringement. It's that they stole the books in the first place by getting them from piracy websites

If they'd purchased every book ever written, scanned them in and fed that into the model? That would be perfectly legal

steveklabnik•11m ago

That’s what happened; the initial piracy was an issue, but those models were never released, and the models that were released were trained on copyrighted works they purchased.

mmaunder•36m ago

Posted April 2024. I wonder how they feel about this now. Or will next year. Claude Code wouldn’t exist for another year when this was posted. Nevermind Codex. It’s already awkward. Within 12 months it will be cringeworthy.

Grapevine cellulose makes stronger plastic alternative, biodegrades in 17 days

Betty Crocker broke recipes by shrinking boxes

Gentoo AI Policy

Show HN: Dagger.js – A buildless, runtime-only JavaScript micro-framework

Which colours dominate movie posters and why?

OCSP Service Has Reached End of Life

Titania Programming Language

Analyzing the memory ordering models of the Apple M1

Repetitive negative thinking associated with cognitive decline in older adults

Why We Spiral

Models of European metro stations

You’re a slow thinker. Now what?

Writing an operating system kernel from scratch

Trigger Crossbar

Irrlicht Engine – a cross-platform realtime 3D engine

Nicu's test website made with SVG (2007)

AMD Turin PSP binaries analysis from open-source firmware perspective

Cannabis use associated with quadrupled risk of developing type 2 diabetes

AI False information rate for news nearly doubles in one year

Introduction to GrapheneOS

Read to forget

Website is hosted on a disposable vape

Geedge and MESA leak: Analyzing the great firewall’s largest document leak

FakeIt: C++ Mocking Made Easy

Observable Notebooks Data Loaders

Fukushima insects tested for cognition

Show HN: A store that generates products from anything you type in search

SpikingBrain 7B – More efficient than classic LLMs

A single, 'naked' black hole confounds theories of the young cosmos

La-Proteina

Grapevine cellulose makes stronger plastic alternative, biodegrades in 17 days

Betty Crocker broke recipes by shrinking boxes

Gentoo AI Policy

Show HN: Dagger.js – A buildless, runtime-only JavaScript micro-framework

Which colours dominate movie posters and why?

OCSP Service Has Reached End of Life

Titania Programming Language

Analyzing the memory ordering models of the Apple M1

Repetitive negative thinking associated with cognitive decline in older adults

Why We Spiral

Models of European metro stations

You’re a slow thinker. Now what?

Writing an operating system kernel from scratch

Trigger Crossbar

Irrlicht Engine – a cross-platform realtime 3D engine

Nicu's test website made with SVG (2007)

AMD Turin PSP binaries analysis from open-source firmware perspective

Cannabis use associated with quadrupled risk of developing type 2 diabetes

AI False information rate for news nearly doubles in one year

Introduction to GrapheneOS

Read to forget

Website is hosted on a disposable vape

Geedge and MESA leak: Analyzing the great firewall’s largest document leak

FakeIt: C++ Mocking Made Easy

Observable Notebooks Data Loaders

Fukushima insects tested for cognition

Show HN: A store that generates products from anything you type in search

SpikingBrain 7B – More efficient than classic LLMs

A single, 'naked' black hole confounds theories of the young cosmos

La-Proteina

Gentoo AI Policy

Comments