Why is human downloading a file called pirating and AI scraping called training

18•nutanc•1mo ago

In this age of AI, with respect to copyright, looks like AI has more freedoms than humans. Sites like Anna's Archive, the PirateBay etc are blocked for humans(in India for example) and if you download and read a book, its called piracy. But if the same book is fed to an AI for training, apparently its fine and dandy. So Artificial Intelligence has more freedoms than Actual Intelligence?

Comments

chasing0entropy•1mo ago

They are the same, however one has Money to defend the accusations.

aebtebeten•1mo ago

Recall the "Golden Rule": those who have the gold make the rules.

gsf_emergency_6•1mo ago

Used to be "eine Handvoll Soldaten" so it's progress!

aebtebeten•1mo ago

I got nerdsniped by this quantity, and here's more or less where I wound up after monte-carlo sampling some rabbit holes:

- the Praetorian Guard, famous for having been involved in many a roman imperial coup, varied in size between 4'500-6'000.

- on 18 Brumiare VIII, Napoleon had at least 6'000 men at his disposal.

- modern brigades are around 5'000.

- the smallest successful coups since 2010 have been in Africa, with force estimates of 4'000-6'000.

At least in the pre-drone era, "handful" has quantitatively meant at least several thousand (although it's probably true that any political component would have to liaise with only a few senior officers, and a modern brigade is composed of a handful of modern battalions)

[note that Napoleon's was recursively a coup-within-a-coup; his political partners thought they were the brains and he was the muscle, but events proved them mistaken]

gsf_emergency_6•1mo ago

Gemini offers

https://en.wikipedia.org/wiki/2016_Turkish_coup_attempt

as an example of a modern failed coup where exactly 5 soldiers were killed (on the gov side)

(Not sure if drones were included in the planning)

While at least one of the dead on the other side was a history teacher

https://en.wikipedia.org/wiki/G%C3%B6khan_A%C3%A7%C4%B1kkoll...

dauertewigkeit•1mo ago

Western politics is all about constructing these narratives that hide the hypocrisy and self-serving nature of the dominant political factions. You can see it everywhere, but this is one clear example of it.

Nextgrid•1mo ago

Same reason that when a person lies (sometimes even by omission) it's called "fraud" but when a company does it it's just business as usual, or at worst, a "mistake" resolved by employee training.

_wire_•1mo ago

Time to "train" on Marshall McLuhan:

See The Gutenberg Galaxy (book) (1962)

McLuhan's Wake (documentary movie, narrated by Laurie Anderson) (2002)

Re Wake: Listen to the accompanying full interviews with McLuhan's colleagues from which the documentary is drawn.

ben_w•1mo ago

I do not take strong views of what "should" be, the following is merely my opinion on what "is".

The legal judgement in the case of Anthropic may answer your question, although with the caveat that I'm not a lawyer, that I have no legal training, and that I may be misreading what looks like plain language but which has an importantly different meaning in law.

The judgement is here: https://cases.justia.com/federal/district-courts/california/...

To quote parts of the section "overall analysis" (page 30):

  The copies used to train specific LLMs were justified as a fair use. Every factor but the nature of the copyrighted work favors this result. The technology at issue was among the most transformative many of us will see in our lifetimes.

…

  The downloaded pirated copies used to build a central library were not justified by a fair use. Every factor points against fair use. Anthropic employees said copies of works (pirated ones, too) would be retained “forever” for “general purpose” even after Anthropic determined they would never be used for training LLMs. A separate justification was required for each use. None is even offered here except for Anthropic’s pocketbook and convenience.

In a way, this seems to be a repeat of the "The 'L' in 'ML' is 'learning'" argument:

You are not allowed to use the photocopier in the library to make a copy of the entire book. If your local library is anything like the ones I remember back in the UK, there's even a sign right next to the photocopier telling you this.

You are in fact allowed to go to a public library, learn things from the books within, and apply that knowledge without paying anything to any copyright holder. If/once you buy a book, likewise, because once it's been bought you don't owe the copyright holder anything for having learned something. This is the point of a library, of education, and indeed of copyright: the word is literally the right to make a copy, as in giving authors control over who has the right to make a copy, this is not the right to an eternal rent from what is learned by reading a copy.

(If you then over-train a model so it does print verbatim copies, this is bad for both legal and technical reasons: legal, because it's a copy; technical, because using a neural net to do a lossy compression of documents is a terrible waste of resources, which is just like humans in exactly the way that nobody has any interest in reproducing in silicon).

markus_zhang•1mo ago

Because ordinary people don’t make calls.

ThrowawayR2•1mo ago

There were a series of whitepapers commissioned by the FSF a while ago on Copilot when it was first released, one of which was "Copyright Implications of the Use of Code Repositories to Train a Machine Learning Model" and its lead author was a professor of law. The analysis concluded that use of copyrighted works for training was legally defensible. The paper is here: https://www.fsf.org/licensing/copilot/copyright-implications...

general1465•1mo ago

Buy some cheap computer like X99 with Xeon from AliExpress, add some cheap GPU like Tesla K80 and "train" your LLM models on it. Now you can pirate what you want and you are untouchable because every big AI company will give you lawyers for free of charge because if judge would decide against you, then the precedents would be against them as well.

My Eighth Year as a Bootstrapped Founder

Data Modelling Open Source

Mid-life transitions

My Airships – My "No. 9," the Little Runabout

Show HN: Portview, A diagnostic-first port viewer for Linux (~930 KB, zero deps)

Show HN: Claude has a compiler, I have SlopScript

Context Is Part of the Game

Dave Farber has passed away

Researchers find brain mechanism behind 'flashes of intuition'

Extracting Xcode's Claude Code Prompt

AI is not another abstraction because god plays dice

Show HN: Tandem – An open-source, local-first AI workspace (Rust and React)

Show HN: AI Perks – A curated list of free AI credits and deals for developers

Why E cores make Apple Silicon fast

Show HN: Google Maps but for your repo (Open Source)

Djevops: Host Django on Bare Metal

How to Destroy a Space Station

Show HN: I built a framework to benchmark LLMs on System Design and Architecture

What do you expect from a Turkey-based hosting provider?

Why Files Are Not Enough as Memory for AI Agents

Nabaztag: Embodiment of "IoT" that was before its time

Show HN: Friends don't let friends do math after a few drinks

Show HN: A free, minimal CV builder I made as a side project

Show HN: Competitor Finder API – find real competitors from one hostname

Show HN: Textream: Dynamic Island-style teleprompter for macOS with voice track

How do you use AI coding tools at scale without losing architectural control?

What to do with the KDE Oxygen and Air themes?

Show HN: One app to command CLI agents across projects - RexIDE

Windows is leaving old printers behind without solution

Eight More Months of Agents