frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

CoreWeave's $30B Bet on GPU Market Infrastructure

https://davefriedman.substack.com/p/coreweaves-30-billion-bet-on-gpu
1•gmays•1m ago•0 comments

Creating and Hosting a Static Website on Cloudflare for Free

https://benjaminsmallwood.com/blog/creating-and-hosting-a-static-website-on-cloudflare-for-free/
1•bensmallwood•7m ago•1 comments

"The Stanford scam proves America is becoming a nation of grifters"

https://www.thetimes.com/us/news-today/article/students-stanford-grifters-ivy-league-w2g5z768z
1•cwwc•11m ago•0 comments

Elon Musk on Space GPUs, AI, Optimus, and His Manufacturing Method

https://cheekypint.substack.com/p/elon-musk-on-space-gpus-ai-optimus
2•simonebrunozzi•20m ago•0 comments

X (Twitter) is back with a new X API Pay-Per-Use model

https://developer.x.com/
2•eeko_systems•27m ago•0 comments

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

https://github.com/dmtrKovalenko/zlob
1•neogoose•30m ago•1 comments

Show HN: Deterministic signal triangulation using a fixed .72% variance constant

https://github.com/mabrucker85-prog/Project_Lance_Core
1•mav5431•30m ago•1 comments

Scientists Discover Levitating Time Crystals You Can Hold, Defy Newton’s 3rd Law

https://phys.org/news/2026-02-scientists-levitating-crystals.html
3•sizzle•30m ago•0 comments

When Michelangelo Met Titian

https://www.wsj.com/arts-culture/books/michelangelo-titian-review-the-renaissances-odd-couple-e34...
1•keiferski•32m ago•0 comments

Solving NYT Pips with DLX

https://github.com/DonoG/NYTPips4Processing
1•impossiblecode•32m ago•1 comments

Baldur's Gate to be turned into TV series – without the game's developers

https://www.bbc.com/news/articles/c24g457y534o
2•vunderba•32m ago•0 comments

Interview with 'Just use a VPS' bro (OpenClaw version) [video]

https://www.youtube.com/watch?v=40SnEd1RWUU
1•dangtony98•38m ago•0 comments

EchoJEPA: Latent Predictive Foundation Model for Echocardiography

https://github.com/bowang-lab/EchoJEPA
1•euvin•46m ago•0 comments

Disablling Go Telemetry

https://go.dev/doc/telemetry
1•1vuio0pswjnm7•47m ago•0 comments

Effective Nihilism

https://www.effectivenihilism.org/
1•abetusk•50m ago•1 comments

The UK government didn't want you to see this report on ecosystem collapse

https://www.theguardian.com/commentisfree/2026/jan/27/uk-government-report-ecosystem-collapse-foi...
4•pabs3•53m ago•0 comments

No 10 blocks report on impact of rainforest collapse on food prices

https://www.thetimes.com/uk/environment/article/no-10-blocks-report-on-impact-of-rainforest-colla...
2•pabs3•53m ago•0 comments

Seedance 2.0 Is Coming

https://seedance-2.app/
1•Jenny249•54m ago•0 comments

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

https://apps.apple.com/us/app/fitspire-5-minute-workout/id6758784938
1•devavinoth12•55m ago•0 comments

Dexterous robotic hands: 2009 – 2014 – 2025

https://old.reddit.com/r/robotics/comments/1qp7z15/dexterous_robotic_hands_2009_2014_2025/
1•gmays•59m ago•0 comments

Interop 2025: A Year of Convergence

https://webkit.org/blog/17808/interop-2025-review/
1•ksec•1h ago•1 comments

JobArena – Human Intuition vs. Artificial Intelligence

https://www.jobarena.ai/
1•84634E1A607A•1h ago•0 comments

Concept Artists Say Generative AI References Only Make Their Jobs Harder

https://thisweekinvideogames.com/feature/concept-artists-in-games-say-generative-ai-references-on...
1•KittenInABox•1h ago•0 comments

Show HN: PaySentry – Open-source control plane for AI agent payments

https://github.com/mkmkkkkk/paysentry
2•mkyang•1h ago•0 comments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

https://moli-green.is/
2•ShinyaKoyano•1h ago•1 comments

The Crumbling Workflow Moat: Aggregation Theory's Final Chapter

https://twitter.com/nicbstme/status/2019149771706102022
1•SubiculumCode•1h ago•0 comments

Pax Historia – User and AI powered gaming platform

https://www.ycombinator.com/launches/PMu-pax-historia-user-ai-powered-gaming-platform
2•Osiris30•1h ago•0 comments

Show HN: I built a RAG engine to search Singaporean laws

https://github.com/adityaprasad-sudo/Explore-Singapore
3•ambitious_potat•1h ago•4 comments

Scams, Fraud, and Fake Apps: How to Protect Your Money in a Mobile-First Economy

https://blog.afrowallet.co/en_GB/tiers-app/scams-fraud-and-fake-apps-in-africa
1•jonatask•1h ago•0 comments

Porting Doom to My WebAssembly VM

https://irreducible.io/blog/porting-doom-to-wasm/
2•irreducible•1h ago•0 comments
Open in hackernews

Anthropic destroyed millions of print books to build its AI models

https://arstechnica.com/ai/2025/06/anthropic-destroyed-millions-of-print-books-to-build-its-ai-models/
41•bayindirh•7mo ago

Comments

JohnFen•7mo ago
> In the process, the company cut millions of print books from their bindings, scanned them into digital files, and threw away the originals solely for the purpose of training AI

Oh boy. The more I learn about how genAI companies work, the more detestable they appear to be.

ThrowawayR2•7mo ago
You got suckered by the clickbait. Destructive scanning (https://en.wikipedia.org/wiki/Book_scanning#Destructive_scan...) isn't unusual for books that are common enough that an individual volume is of no particular value.
bayindirh•7mo ago
I mean, they could have gotten e-book versions of the books, or even preprint PDFs.

In an era where people are starting to calculate the environmental impact of the jobs they run on the cloud and start to optimize it, adding that much load on recycling system is not a wise choice, but only a selfish one.

ThrowawayR2•7mo ago
I'm sure they would have loved to save the hassle and expense of disassembling physical books. Presumably something legal related or cost related prevented them from going that route.
JohnFen•7mo ago
Yes, they did it as a workaround for copyright. TFA explains that aspect.
rpdillon•7mo ago
It's not a workaround for copyright. It's to obey copyright. As in: copyright law is the reason they destroyed the books.

Meta didn't have to do any of this. They just used The Pile.

AlotOfReading•7mo ago
I strongly suspect that dealing with ebooks on this scale might actually be even more onerous than the physical volumes.

The physical stuff is straightforward. Buy books from bulk sellers, rip off everything and put them into off-the-self rigs for digitization. It's straightforward, directly scalable, can use any book, and your main issue is format shifting, which anthropic successfully argued here. No DRM, you buy exactly the books you need, and every book is processed exactly the same way.

If you try to buy ebooks, you get wrapped up in onerous licensing terms about copying, and how you're able to use them, how long you're able to access them, and so on. Many books won't even be available (or can only be licensed alongside a bunch of others) and you have to deal with DRM you can't strip without creating additional copyright issues.

We've somehow created a world where physical objects are more free than bits.

rpdillon•7mo ago
No, they probably couldn't have. eBooks are notoriously DRMed and the DMCA makes it illegal to circumvent an effective copy protection mechanism even if you otherwise have legal access to work. Furthermore, first sale doctrine doesn't apply to any digital files and they can't be obtained legally in bulk.
JohnFen•7mo ago
I didn't get suckered by anything. I'm aware of the practice. I find it objectionable. That they did this is just another thing on the growing list of objectionable things that genAI companies seem to enjoy doing.

To be honest, I probably wouldn't have even commented on it if it were the only bad thing these companies do.

rpdillon•7mo ago
It was only legal because they did it this way.

> Ultimately, Judge William Alsup ruled that this destructive scanning operation qualified as fair use—but only because Anthropic had legally purchased the books first, destroyed each print copy after scanning, and kept the digital files internally rather than distributing them. The judge compared the process to "conserv[ing] space" through format conversion and found it transformative.

Very laws that the publishing industry has lobbied so heavily to make so strict are the reasons for this behavior.

CaptainFever•7mo ago
If you believe that destroying books is bad, your issue is with copyright law, not the AI companies. The AI companies are just following copyright law -- they are allowed to move data from one format to another (thereby destroying the original), but not copy it.
baobun•7mo ago
Not everything objectionable or unethical should or could necessarily be outlawed. "It's not illegal" is not really an argument or justification for anything.
Ukv•7mo ago
I don't think CaptainFever's point is that it's acceptable because it's legal, but rather that copyright law is what prevents them from, say, donating the originals instead of throwing them away.
rasz•7mo ago
Specifically his issue is with First Sale doctrine. If you own it you can destroy it and its none of anyone else's business.
JohnFen•7mo ago
I don't have an issue with the first sale doctrine. It's an important property right.

That doesn't mean I support everything that people have a right to do with their property.

JohnFen•7mo ago
> If you believe that destroying books is bad, your issue is with copyright law, not the AI companies

No, my issue is with the companies that do this. The law doesn't enter into it. Just because a thing is legal doesn't mean it's OK.

justinrubek•7mo ago
I very much have a problem with both of these things.
j_timberlake•7mo ago
How is this any worse than using disposable paper towels with images on them.
EA-3167•7mo ago
I don't like Anthropic, I think their "marketing through fear" approach to be shitty and frankly I'm over the AI "boom" anyway.

BUT... here's the only line in that whole article that really matters, because this is a headline meant to create an impression that isn't corrected for quite a while.

> The court documents don't indicate that any rare books were destroyed in this process—Anthropic purchased its books in bulk from major retailers

Books are routinely pulped and recycled, they aren't holy, and if they aren't rare then frankly who cares what techniques they use to scan them? The issue is whether or not "AI" learning represents fair use, which the courts so far have ruled that it does.

bayindirh•7mo ago
> any rare books were destroyed in this proces

Does it matter? It's waste at the end of the day. Instead they could have bought e-books. Just because we can recycle paper, it doesn't mean we have the luxury to create waste as we see fit, esp. when climate change became this severe.

> which the courts so far have ruled that it does.

Any concrete cases you can cite?

From [0], for example, while the course said that the authors failed to argue their case, the second observation is complete opposite of what you said. Citing the article directly:

    Opinion suggests AI models do generally violate law.
In the same spirit, I think I can safely assume that they violated copyright law, since they earn money by circumventing it, and fair use doesn't like for-profit copying.

[0]: https://news.bloomberglaw.com/litigation/meta-beats-copyrigh...

kirrent•7mo ago
TFA is based on the ruling which found that Anthropic training on these books was fair use.
robocat•7mo ago
> It's waste at the end of the day

Rubbish.

More likely they are taking a waste stream of books and reusing and possibly even recycling.

Few people want old books, and many people that have books are throwing them out or donating them. I don't think I know anybody under 30 with a bookshelf of books they obviously intend to keep for life. Bookshelves used to be an elite status symbol, now I often see them as image rather than reference (e.g. part off backdrop behind influencer vid).

It is likely they didn't destroy much of value, since they will have minimized their purchasing costs. Modern DRM is not helping.

cma•7mo ago
They'd have to agree to special terms that go beyond the normal first sale doctrine. If those terms don't hold up their own terms against training on their model data for foundation models might not hold up, so you can see their perverse incentive to burn books.
JohnFen•7mo ago
> Does it matter?

As someone who finds the act objectionable, I actually do think this is an important point. Destroying commodity books in this way is objectionable. Destroying precious books in this way would be abominable.

miohtama•7mo ago
Reuters news on the lawsuit

https://news.ycombinator.com/item?id=44375269

shawn_w•7mo ago
Getting flashbacks to Vernor Vinge's book Rainbows End, where there's a project to rapidly digitize the collection of the UCSD's Geisel Library by shredding all the books and photographing the fragments of pages and reassembling them via computer programs.

It's set in 2025.

igor47•7mo ago
Lol jinx!
igor47•7mo ago
Reminds me of "Rainbow's End" by Vinge. There's a machine that's like a giant worm, which slithers down the stacks in a library, vacuuming up all the books. They pass through shredders, and then the shredded remains fly down the body of the worm, which is studded with cameras. The cameras photograph the pieces and then software reconstructs the content of the books based on unique shapes of the shreds, like solving a million simultaneous jigsaw puzzles. The paper is excreted and recycled or burned.
ttepasse•7mo ago
Interestingly there was a real attempt to build an E-Puzzler for shredded documents, to reconstruct the torn Stasi files after the German reunification. But while the system worked for defined stuff, but failed for mass reconstruction of documents with different formats:

https://www.bundesarchiv.de/en/stasi-records-archive/the-rec...

mensetmanusman•7mo ago
This reminds me scrolls on Diablo. Soon real books will all disappear to dust as AI stats are improved.
ChrisArchitect•7mo ago
More discussion:

A federal judge sides with Anthropic in lawsuit over training AI on books

https://news.ycombinator.com/item?id=44367850

vaxman•7mo ago
When they run out of training data, they have to rely upon better reasoning algos which take time to develop and that’s when the party ends. The purpose of IT investment is to increase competitiveness through better efficiency and thus capture more of the market. If everyone is using the same model trained on the same max dataset, it stops being much of a competitive advantage. I’ve already heard the stock-option Billionaires bloviating with their intuition shaped by narrow corporate experiences that the AI objective is to increase GDP or similar, but obviously the ol’ man in the corner office ain’t payin’ for that and so at some point, the question of how to keep the Three Mile Island running to power this sucker becomes very real. The answer is sort of scary, if you think about it…
fithisux•7mo ago
This to be expected.

The book ‘Empire of AI’ by Karen Hao is recomended.

remus•7mo ago
Depending on their approach this might not be as bad as it seems.

Some large percentage of physical books really aren't a precious commodity. They may be precious to you as an individual but chances are no-one else is going to be that bothered. For example, a second hand book wholesaler I just found with a quick google will sell you pallets of books for 25c/book which would only go down as your purchase size went up. If they don't sell then it seems pretty likely those books are going to get pulped anyway.

If they were buying the books new and then essentially pulping them after scanning that does feel wasteful, but if they were buying in bulk from a second hand wholesaler then Im kinda glad the content of those books is going into something useful before they get pulped.

giardini•7mo ago
Why does a good LLM require millions of books? No person requires/reads nearly that many books to function!

Most well-educated persons have read far fewer books yet are presumably more intelligent than any LLM. This is a hint toward solving the problem of AI: the industry would do well to pay attention to that hint.

In other words, the industry should concentrate on creating an AI that has a far more limited corpus. Of course, maybe they are doing so but not telling anyone about it. I certainly wouldn't reveal my best ideas. At the same time one must remember that, only a few centuries ago, people learned to read English by studying the King James version of the Old Testament, a single book [not to say that a lot of work didn't go into that particular text: it really is a work of art in the English language].

Perhaps the AI industry must maintain a illusion for the unwashed masses (and especially investors) that each generation of LLM AI must always be bigger to reach AGI, whereas the truth is that some guy in his study will find the key that ties LLMs to human intelligence and will, any day, render the entire industry obsolete.

mkagenius•7mo ago
This has to be the most mind blowing article I have read in a while.

1. The sheer scale - millions of books.

2. The sheer destruction - millions of books.

3. Judge declaring it fair use.

4. How could they be not noticed by people around? is the office location isolated