frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Ask HN: What are people doing to get off of VMware?

158•jwithington•12h ago•110 comments

Ask HN: How to stop an AWS bot sending 2B requests/month?

279•lgats•3d ago•175 comments

Ask HN: Best way to make a documentation website for an open-source project?

3•mudge•5h ago•3 comments

Ask HN: Those who applied to the OpenAI Grove program, did you ever hear back?

22•heywoods•8h ago•4 comments

Programming language agnosticism is the only way to move forward in life

25•amano-kenji•1d ago•13 comments

Warning: Gmail client Show Original can omit lines of the original

12•chrisjj•16h ago•1 comments

Ask HN: How does one build large front end apps without a framework like React?

105•thepianodan•2d ago•179 comments

Ask HN: Is there an open source HN?

8•shafkathullah•20h ago•7 comments

Ask HN: DOS Based "Multitaskers"

5•alexshendi•1d ago•1 comments

Ask HN: Estimation of copyright material used by LLM

4•megamix•1d ago•5 comments

Ask HN: What are you working on? (October 2025)

344•david927•1w ago•1039 comments

Ask HN: SQL using relational theory books?

3•shivajikobardan•21h ago•1 comments

Ask HN: Web app freezes, but not when Chrome is recording. How to debug?

5•febed•2d ago•2 comments

Ask HN: Abandoned/dead projects you think died before their time and why?

359•ofalkaed•1w ago•888 comments

Ask HN: What would an ideal matchmaking platform look like today?

7•grandimam•1d ago•10 comments

Why is my chat speaking non-words like um and uh?

4•dataspun•1d ago•4 comments

Ask HN: Best way to create a searchable knowledge base?

21•aljgz•3d ago•23 comments

Running Rust/Go on shared hosting for $5 a month

3•andreamancuso•1d ago•0 comments

Ask HN: How to properly show my skills for startup roles?

8•arabello•2d ago•19 comments

Ask HN: How do you do CI/CD in 2025?

11•labarilem•2d ago•8 comments

I wanted to work on a newsletter but I realized I was building a cage around it

6•yuwahhid•2d ago•7 comments

Ask HN: Codex is too slow. Is there any solution?

5•rule2025•2d ago•6 comments

Ask HN: Claude Code with Multiple Models?

6•iosifnicolae2•2d ago•4 comments

Ask HN: What are some of your favorite documentaries?

17•itdude•6d ago•23 comments

You've reached the end!

Open in hackernews

Ask HN: Estimation of copyright material used by LLM

4•megamix•1d ago
1. Is it true that LLMs / AI Companies have used copyrighted material for training?

2. Is it possible to estimate how much of copyrighted material has been used?

Comments

muzani•23h ago
1. Yes, but it's hard to prove. There are active lawsuits. Some of it has been under "fair use" but at the billion dollar scale, you have to really ask whether it's fair. Also anecdotally, an author friend lamented that her publisher sold the legal rights to use it... it was all perfectly legal but many authors do not agree to this.

2. This is harder as a lot of them don't disclose training sets.

dialup_sounds•22h ago
I think what you're looking for is not "copyrighted material" but material that's both 1) used without permission and 2) outside the scope of fair use.

There's no easy answer there, hence New York Times v. OpenAI.

MrVandemar•21h ago
There is an easy answer, it's just obfuscated by powerful people who are benefiting from it an obscene amount, and supported by hoards of addled and thoroughly addicted enthusiasts.

I think sticking a straw in Zlib or AA or LibGen or whatever it is, and drinking until it makes gurgling slurping noises as it hoovers up the dregs at the bottom of the barrel, is far, far removed from “fair use”.

marstall•19h ago
pretty much everything newer than ~70 years old on the internet is copyrighted, because copywright occurs automatically when you create something (in the US at least). So the answer to #1 is yes.
bjourne•18h ago
1. Yes 2. No