frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

US Copyright Office: Generative AI Training [pdf]

https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf
61•dave1629•9mo ago

Comments

dave1629•9mo ago
From the Conclusion: "In applying current law, we conclude that several stages in the development of generative AI involve using copyrighted works in ways that implicate the owners’ exclusive rights. The key question, as most commenters agreed, is whether those acts of prima facie infringement can be excused as fair use. ... But making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries. ... These groundbreaking technologies should benefit both the innovators who design them and the creators whose content fuels them, as well as the general public."
yieldcrv•9mo ago
So many issues with that, the copyright office doesn’t police access, which involves consuming, the copyright office polices distributing.

So then for them to determine fair use, they need the department of justice involved to say the access was illegal? since when. just to highlight the absurdity. “Illegal” meaning a terms of service violation despite the fact that everyone using the service can consume copyrighted works? This circles back to the now paradoxical issue about it not being copyright infringement to consume, but requires policing the terms of service by the copyright office which is impossible.

This is too paradoxical to even entertain, but thats why the office led with “current law”, because it is completely unaccommodating to a real social problem. A lot of artists and people are uncomfortable with the current law, and generative AI. New law could patch this except:

Artists don't actually like the generative AI that isn't trained on copyrighted works either.

The laws are going to change too slow and there are already models that fulfill the high bar that detractors started with.

New works that were specifically licensed for use in AI training and compensated.

The outcome is still the same. More people can express themselves. People with years of discipline are no longer needed.

By the time any law could actually address noncompliant models - to this new imagined standard - compliant models will already have obsoleted the same trade.

comex•9mo ago
FYI, the Copyright Office doesn’t enforce copyright law or determine its correct interpretation. Courts do. The legal analysis in this report is really just a suggestion, and judges probably won’t give it too much weight.

As for illegal access, I agree that the report uses the term a bit too loosely. But as we’ve seen in the Meta case, some companies have obtained training material not through TOS-violating downloads but through literal (unauthorized) torrents. As we’ve also seen in the Meta case, even torrenting is technically not copyright infringement if you’re not seeding. But the process does rely on someone else seeding, so the report doesn’t seem wholly unreasonable in suggesting that this could “reflect bad faith” or “bear on the character of the use”.

jawon•9mo ago
This is a standard book copyright notice:

All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the publisher, except as permitted by U.S. copyright law.

“Reproduced” and “electronic” are the relevant terms here.

I remember when gpt-3 came out and you could get it to spit out chunks of Harry Potter and I wondered why no-one was being sued.

The models are built on copyright infringement. Authors and publishers of any kind should be able to opt out of being included in training data and ideally opt-in should be the default.

And I hope one day someone trains a model without the use of works of fiction and we find a qualitative difference in their performance. Does a coding model really need to encode the customs, mores and concerns of Victorian era fictional characters to write a python function?

yieldcrv•9mo ago
> except as permitted by U.S. copyright law.

these are the relevant terms to me, that notice isn’t law at all, where the exceptions make the rule.

MoonGhost•9mo ago
Did they manage to come up with recommendations? Other than to stop it all. In this case we have DeepSeek R1. China will be happy as Trump will have to force NVidia to send best chips there.
momothereal•9mo ago
The head of the US Copyright Office has since been fired: https://www.cbsnews.com/news/trump-fires-director-of-u-s-cop...
yieldcrv•9mo ago
pwned. so Elon’s $300 million to have unfettered control of the state. aspirational
adt•9mo ago
Part 1 (replicas) https://copyright.gov/ai/Copyright-and-Artificial-Intelligen...

Part 2 (copyrightability) https://copyright.gov/ai/Copyright-and-Artificial-Intelligen...

Part 3 (GenAI training) https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...

Analysis in previous and upcoming editions of The Memo: https://lifearchitect.ai/memo/

aspenmayer•9mo ago
> Analysis in previous and upcoming editions of The Memo: https://lifearchitect.ai/memo/

I couldn't actually find any articles about this news on your substack. The newest post I saw was from last month. Could you link where you discuss OP?

kelseyfrog•9mo ago
Footnote one is where the whole thing goes off the rails. The Copyright Office asserts that the works in question are not merely "data" in the ordinary sense, but somehow "embody creative expression" in a way that constitutes protected authorship.

This is metaphysics, not law or computer science.

They're smuggling in a kind of authorial transubstantiation, as if creative essence somehow imbues the bits themselves, rendering them qualitatively different from any other arrangement of bytes. The implication is that once a work has passed through the sacrament of human intention, it permanently carries a kind of spiritual copyright residue, regardless of its subsequent transformation or use.

But that's not how data works. A copy of a copyrighted work in a training corpus is still just data. It doesn't emit rights. It's not radioactive. There's no Platonic form of "authorship" that permeates the latent space. What matters, legally, and practically, is what the system does with that data, not some mystical essence the data supposedly contains.

This is authorial essentialism dressed up as policy. And it doesn’t hold up under inspection.

aredox•9mo ago
Yeah, and painting is just oil, and music is just an arrangement of noises. And only the original manuscript touched by an author is protected by rights, and every book printed ("copied") afterwards is not covered by any rights.
aspenmayer•9mo ago
What Color are your bits?

https://ansuz.sooke.bc.ca/entry/23

Greed•9mo ago
You speak of intentionality beyond the explicit reality of the data involved as some great irrationality in their statement, but we literally have a corresponding term for that. Spirit of the law. If the law were as black and white and ends-oriented as you're implying it is, we wouldn't need judges for the interpretation of it. The fact that they have prioritized the underlying authors affected over the traditional interpretation of the law here is not the condemnation you think it is.
kelseyfrog•9mo ago
I think you're missing the deeper point. Whether or not the Copyright Office intends to assert authorial essentialism, it's doing so in effect. And when metaphysical language about "creative essence" becomes encoded in policy and enforced by courts, it's not just metaphor. It's law.

Calling it "spirit of the law" doesn't let them off the hook. If you enshrine a metaphysics that treats human-authored works as ontologically distinct kinds of data, imbued with some persistent essence that radiates rights regardless of use, you're not interpreting the law, you're institutionalizing a theology of authorship.

And yes, I care less about their intentions than about the system they're building. That system is now enforcing metaphysical categories with legal teeth. That's the problem.

whattheheckheck•9mo ago
Humans don't own anything.

Ownership is a construct.

It's all made up.

michael-sumner•9mo ago
We wrote a summary of it here for busy folks https://x.com/scoredetect/status/1921883329772548365
ycombinatornews•9mo ago
If you don’t have X account, you can’t read past the initial tweet

Study confirms experience beats youthful enthusiasm

https://www.theregister.com/2026/02/07/boomers_vs_zoomers_workplace/
1•Willingham•4m ago•0 comments

The Big Hunger by Walter J Miller, Jr. (1952)

https://lauriepenny.substack.com/p/the-big-hunger
1•shervinafshar•5m ago•0 comments

The Genus Amanita

https://www.mushroomexpert.com/amanita.html
1•rolph•10m ago•0 comments

We have broken SHA-1 in practice

https://shattered.io/
1•mooreds•11m ago•1 comments

Ask HN: Was my first management job bad, or is this what management is like?

1•Buttons840•12m ago•0 comments

Ask HN: How to Reduce Time Spent Crimping?

1•pinkmuffinere•13m ago•0 comments

KV Cache Transform Coding for Compact Storage in LLM Inference

https://arxiv.org/abs/2511.01815
1•walterbell•18m ago•0 comments

A quantitative, multimodal wearable bioelectronic device for stress assessment

https://www.nature.com/articles/s41467-025-67747-9
1•PaulHoule•20m ago•0 comments

Why Big Tech Is Throwing Cash into India in Quest for AI Supremacy

https://www.wsj.com/world/india/why-big-tech-is-throwing-cash-into-india-in-quest-for-ai-supremac...
1•saikatsg•20m ago•0 comments

How to shoot yourself in the foot – 2026 edition

https://github.com/aweussom/HowToShootYourselfInTheFoot
1•aweussom•20m ago•0 comments

Eight More Months of Agents

https://crawshaw.io/blog/eight-more-months-of-agents
3•archb•22m ago•0 comments

From Human Thought to Machine Coordination

https://www.psychologytoday.com/us/blog/the-digital-self/202602/from-human-thought-to-machine-coo...
1•walterbell•23m ago•0 comments

The new X API pricing must be a joke

https://developer.x.com/
1•danver0•23m ago•0 comments

Show HN: RMA Dashboard fast SAST results for monorepos (SARIF and triage)

https://rma-dashboard.bukhari-kibuka7.workers.dev/
1•bumahkib7•24m ago•0 comments

Show HN: Source code graphRAG for Java/Kotlin development based on jQAssistant

https://github.com/2015xli/jqassistant-graph-rag
1•artigent•29m ago•0 comments

Python Only Has One Real Competitor

https://mccue.dev/pages/2-6-26-python-competitor
3•dragandj•30m ago•0 comments

Tmux to Zellij (and Back)

https://www.mauriciopoppe.com/notes/tmux-to-zellij/
1•maurizzzio•31m ago•1 comments

Ask HN: How are you using specialized agents to accelerate your work?

1•otterley•32m ago•0 comments

Passing user_id through 6 services? OTel Baggage fixes this

https://signoz.io/blog/otel-baggage/
1•pranay01•33m ago•0 comments

DavMail Pop/IMAP/SMTP/Caldav/Carddav/LDAP Exchange Gateway

https://davmail.sourceforge.net/
1•todsacerdoti•34m ago•0 comments

Visual data modelling in the browser (open source)

https://github.com/sqlmodel/sqlmodel
1•Sean766•36m ago•0 comments

Show HN: Tharos – CLI to find and autofix security bugs using local LLMs

https://github.com/chinonsochikelue/tharos
1•fluantix•36m ago•0 comments

Oddly Simple GUI Programs

https://simonsafar.com/2024/win32_lights/
1•MaximilianEmel•37m ago•0 comments

The New Playbook for Leaders [pdf]

https://www.ibli.com/IBLI%20OnePagers%20The%20Plays%20Summarized.pdf
1•mooreds•37m ago•1 comments

Interactive Unboxing of J Dilla's Donuts

https://donuts20.vercel.app
1•sngahane•39m ago•0 comments

OneCourt helps blind and low-vision fans to track Super Bowl live

https://www.dezeen.com/2026/02/06/onecourt-tactile-device-super-bowl-blind-low-vision-fans/
1•gaws•40m ago•0 comments

Rudolf Vrba

https://en.wikipedia.org/wiki/Rudolf_Vrba
1•mooreds•41m ago•0 comments

Autism Incidence in Girls and Boys May Be Nearly Equal, Study Suggests

https://www.medpagetoday.com/neurology/autism/119747
1•paulpauper•42m ago•0 comments

Wellness Hotels Discovery Application

https://aurio.place/
1•cherrylinedev•42m ago•1 comments

NASA delays moon rocket launch by a month after fuel leaks during test

https://www.theguardian.com/science/2026/feb/03/nasa-delays-moon-rocket-launch-month-fuel-leaks-a...
2•mooreds•43m ago•0 comments