frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Kiel Institute Analysis: US Americans pay 96% of tariff burden

https://www.kielinstitut.de/publications/americas-own-goal-who-pays-the-tariffs-19398/
277•47282847•56m ago•162 comments

GLM-4.7-Flash

https://huggingface.co/zai-org/GLM-4.7-Flash
124•scrlk•1h ago•25 comments

CSS Web Components for marketing sites

https://hawkticehurst.com/2024/11/css-web-components-for-marketing-sites/
8•zigzag312•1h ago•3 comments

Folding NASA Experience into an Origamist's Toolkit

https://spinoff.nasa.gov/Folding_NASA_Experience_into_an_Origamist%E2%80%99s_Toolkit
34•andsoitis•2d ago•4 comments

The Microstructure of Wealth Transfer in Prediction Markets

https://www.jbecker.dev/research/prediction-market-microstructure
5•jonbecker•33m ago•1 comments

A decentralized peer-to-peer messaging application that operates over Bluetooth

https://bitchat.free/
417•no_creativity_•9h ago•246 comments

Radboud University selects Fairphone as standard smartphone for employees

https://www.ru.nl/en/staff/news/radboud-university-selects-fairphone-as-standard-smartphone-for-e...
399•ardentsword•8h ago•173 comments

"Anyone else out there vibe circuit-building?"

https://twitter.com/beneater/status/2012988790709928305
59•thetrustworthy•1h ago•33 comments

West Midlands police chief quits over AI hallucination

https://www.theregister.com/2026/01/19/copper_chief_cops_it_after/
77•YeGoblynQueenne•1h ago•39 comments

Ask HN: COBOL devs, how are AI coding affecting your work?

94•zkid18•3h ago•90 comments

Gaussian Splatting – A$AP Rocky "Helicopter" music video

https://radiancefields.com/a-ap-rocky-releases-helicopter-music-video-featuring-gaussian-splatting
717•ChrisArchitect•22h ago•233 comments

Robust Conditional 3D Shape Generation from Casual Captures

https://facebookresearch.github.io/ShapeR/
20•lastdong•4h ago•1 comments

Dead Internet Theory

https://kudmitry.com/articles/dead-internet-theory/
529•skwee357•20h ago•589 comments

Luxury Yacht is a desktop app for managing Kubernetes clusters

https://github.com/luxury-yacht/app
18•mooreds•4d ago•4 comments

Show HN: I quit coding years ago. AI brought me back

https://calquio.com/finance/compound-interest
271•ivcatcher•15h ago•347 comments

Nepal's Mountainside Teahouses Elevate the Experience for Trekkers

https://www.smithsonianmag.com/travel/nepal-mountainside-teahouses-elevate-experience-trekkers-he...
76•bookofjoe•4d ago•28 comments

Iterative image reconstruction using random cubic bézier strokes

https://tangled.org/luthenwald.tngl.sh/splined
9•luthenwald•4d ago•0 comments

Flux 2 Klein pure C inference

https://github.com/antirez/flux2.c
391•antirez•22h ago•130 comments

Wikipedia: WikiProject AI Cleanup

https://en.wikipedia.org/wiki/Wikipedia:WikiProject_AI_Cleanup
181•thinkingemote•6h ago•67 comments

Provide agents with automated feedback

https://banay.me/dont-waste-your-backpressure/
155•ghuntley•2d ago•78 comments

AVX-512: First Impressions on Performance and Programmability

https://shihab-shahriar.github.io//blog/2026/AVX-512-First-Impressions-on-Performance-and-Program...
111•shihab•5d ago•39 comments

Amazon is ending all inventory commingling as of March 31, 2026

https://twitter.com/ghhughes/status/2012824754319753456
333•MrBuddyCasino•4h ago•178 comments

Nvidia Contacted Anna's Archive to Access Books

https://torrentfreak.com/nvidia-contacted-annas-archive-to-secure-access-to-millions-of-pirated-b...
100•antonmks•5h ago•53 comments

Gladys West's vital contributions to GPS technology

https://en.wikipedia.org/wiki/Gladys_West
56•hackernj•2d ago•5 comments

Gas Town Decoded

https://www.alilleybrinker.com/mini/gas-town-decoded/
178•alilleybrinker•4d ago•184 comments

The Code-Only Agent

https://rijnard.com/blog/the-code-only-agent
123•emersonmacro•14h ago•55 comments

RISC-V is coming along quite speedily: Milk-V Titan Mini-ITX 8-core board

https://www.tomshardware.com/pc-components/cpus/milk-v-titan-mini-ix-board-with-ur-dp1000-process...
61•fork-bomber•6h ago•26 comments

Nuclear elements detected in West Philippine Sea

https://www.philstar.com/headlines/2026/01/18/2501750/nuclear-elements-detected-west-philippine-sea
75•ksec•5h ago•24 comments

Fil-Qt: A Qt Base build with Fil-C experience

https://git.qt.io/cradam/fil-qt
134•pjmlp•3d ago•89 comments

Using proxies to hide secrets from Claude Code

https://www.joinformal.com/blog/using-proxies-to-hide-secrets-from-claude-code/
122•drewgregory•5d ago•40 comments
Open in hackernews

Nvidia Contacted Anna's Archive to Access Books

https://torrentfreak.com/nvidia-contacted-annas-archive-to-secure-access-to-millions-of-pirated-books/
100•antonmks•5h ago

Comments

antonmks•5h ago
NVIDIA executives allegedly authorized the use of millions of pirated books from Anna's Archive to fuel its AI training. In an expanded class-action lawsuit that cites internal NVIDIA documents, several book authors claim that the trillion-dollar company directly reached out to Anna's Archive, seeking high-speed access to the shadow library data.
skilled•5h ago
> In response, NVIDIA defended its actions as fair use, noting that books are nothing more than statistical correlations to its AI models.

Does this even make sense? Are the copyright laws so bad that a statement like this would actually be in NVIDIA’s favor?

tobwen•4h ago
Books are databases, chars their elements. We have copyright for databases in EU :)
Bombthecat•2h ago
Who cares? Only Disney had the money to fight them.

Everything else will be slurped up for and with AI and be reused.

RGamma•2h ago
The chicken is trying to become the egg.
general1465•1h ago
Did you pirated this movie? No I did not, it is fair use because this movie is nothing more than a statistical correlation to my dopamine production.
earthnail•1h ago
The movie played on my screen but I may or may not have seen the results of the pixels flashing. As such, we can only state with certainty that the movie triggered the TV's LEDs relative to its statistical light properties.
JKCalhoun•49m ago
I saw the movie, but I don't remember it now.
Ferret7446•10m ago
Indeed, the "copy" of the movie in your brain is not illegal. It would be rather troublesome and dystopian if it were.
thaumasiotes•2m ago
Note that what copyright law prohibits is the action of producing a copy for someone else, not the action of obtaining a copy for yourself.
Elfener•1h ago
It seems so, stealing copyrighted content is only illegal if you do it to read it or allow others to read it. Stealing it to create slop is legal.

(The difference, is that the first use allows ordinary poeple to get smarter, while the second use allows rich people to get (seemingly) richer, a much more important thing)

threethirtytwo•1h ago
It does make sense. It’s controversial. Your memory memorizes things in the same way. So what nvidia does here is no different, the AI doesn’t actually copy any of the books. To call training illegal is similar to calling reading a book and remembering it illegal.

Our copyright laws are nowhere near detailed enough to specify anything in detail here so there is indeed a logical and technical inconsistency here.

I can definitely see these laws evolving into things that are human centric. It’s permissible for a human to do something but not for an AI.

What is consistent is that obtaining the books was probably illegal, but say if nvidia bought one kindle copy of each book from Amazon and scraped everything for training then that falls into the grey zone.

ckastner•1h ago
> To call training illegal is similar to calling reading a book and remembering it illegal.

Perhaps, but reproducing the book from this memory could very well be illegal.

And these models are all about production.

roblabla•1h ago
To be fair, that seems to be where some of the IA lawsuits are going. The argument goes that the models themselves aren't derivative works, but the output they produce can absolutely be - in much the same way that reproducing a book from memory could be copyright violation, trademark infringement, or generally go afoul of the various IP laws.
threethirtytwo•51m ago
Models don’t reproduce books though. It’s impossible for a model to reproduce something word for word because the model never copied the book.

Most of the best fit curve runs along a path that doesn’t even touch an actual data point.

empath75•46m ago
They do memorize some books. You can test this trivially by asking ChatGPT to produce the first chapter of something in the public domain -- for example a Tale of Two Cities. It may not be word for word exact, but it'll be very close.

These academics were able to get multiple LLMs to produce large amounts of text from Harry Potter:

https://arxiv.org/abs/2601.02671

threethirtytwo•42m ago
In that case I would say it is the act of reproducing the books that is illegal. Training the AI on said books is not.

So the illegality rests at the point of output and not at the point of input.

I’m just speaking in terms of the technical interpretation of what’s in place. My personal views on what it should be are another topic.

ckastner•10m ago
> So the illegality rests at the point of output and not at the point of input.

It's not as simple as that, as this settlement shows [1].

Also, generating output is what these models are primarily trained for.

[1]: https://www.bbc.com/news/articles/c5y4jpg922qo

kalap_ur•45m ago
If there is one exact sentence taken out of the book and not referenced in quotes and exact source, that triggers copyright laws. So model doesnt have to reproduce the entire book, it only required to reproduce one specific sentence (which may be a characteristic sentence to that author or to that book).
lelanthran•1h ago
> To call training illegal is similar to calling reading a book and remembering it illegal.

A type of wishful thinking fallacy.

In law scale matters. It's legal for you to possess a single joint. It's not legal to possess 400 tons of weed in a warehouse.

threethirtytwo•53m ago
Er no. I’ve read and remember hundreds of books in my life time. It’s not any more illegal based off scale. The law doesn’t differentiate whether I remember one book or a hundred then there’s no difference for thousands or millions.

No wishful thinking here.

kalap_ur•44m ago
It is not the scale that matters here, in your example, but intent. With 1 joint, you want to smoke yourself. With 400, you very possibly want to sell it to others. Scale in itself doesnt matter, scale matters only as to the extent it changes what your intention may be.
threethirtytwo•32m ago
It’s clear nvidia and every single one of these big AI corps do not want their AIs to violate the law. The intent is clear as day here.

Scale is only used for emergence, openAI found that training transformers on the entire internet would make is more then just a next token predictor and that is the intent everyone is going for when building these things.

kalap_ur•47m ago
You can only read the book, if you purchased it. Even if you dont have the intent to reproduce it, you must purchase it. So, I guess NVDA should just purchase all those books, no?
threethirtytwo•41m ago
Yep, I agree. That’s the part that’s clearly illegal. They should purchase the books, but they didn’t.
Nursie•10m ago
This is the bit an author friend of mine really hates. They didn’t even buy a copy.

And now AI has killed his day job writing legal summaries. So they took his words without a license and used them to put him out of a job.

Really rubs in that “shit on the little guy” vibe.

Nursie•13m ago
But it’s not just about recall and reproduction. If they used Anna’s Archive the books were obtained and copied without a license, before they were fed in as training data.
nancyminusone•1h ago
When you're responsible for 4% of the global GDP, they let you do it.
NitpickLawyer•1h ago
> Does this even make sense? Are the copyright laws so bad that a statement like this would actually be in NVIDIA’s favor?

It makes some sense, yeah. There's also precedent, in google scanning massive amounts of books, but not reproducing them. Most of our current copyright laws deal with reproductions. That's a no-no. It gets murky on the rest. Nvda's argument here is that they're not reproducing the works, they're not providing the works for other people, they're "scanning the books and computing some statistics over the entire set". Kinda similar to Google. Kinda not.

I don't see how they get around "procuring them" from 3rd party dubious sources, but oh well. The only certain thing is that our current laws didn't cover this, and probably now it's too late.

masfuerte•42m ago
Scanning books is literally reproducing them. Copying books from Anna's Archive is also literally reproducing them. The idea that it is only copyright infringement if you engage in further reproduction is just wrong.

As a consumer you are unlikely to be targeted for such "end-user" infringement, but that doesn't mean it's not infringement.

Ferret7446•7m ago
Private reproductions are allowed (e.g. backups). Distributing them non-privately is not.
olejorgenb•29m ago
> I don't see how they get around "procuring them" from 3rd party dubious sources

Yeah, isn't this what Anthropic was found guilty off?

ThrowawayR2•56m ago
Yes, it's been discussed many times before. All the corporations training LLMs have to have done a legal analysis and concluded that it's defensible. Even one of the white papers commissioned by the FSF ( "Copyright Implications of the Use of Code Repositories to Train a Machine Learning Model" at https://www.fsf.org/licensing/copilot/copyright-implications... ), concluded that using copyrighted data to train AI was plausibly legally defensible and outlined the potential argument. You will notice that the FSF has not rushed out to file copyright infringement suits even though they probably have more reason to oppose LLMs trained on FOSS code than anyone else in the world.
postexitus•52m ago
A quite good explanation of what copyright laws cover and should (and should not) cover is here by Cory Doctorow: https://www.theguardian.com/us-news/ng-interactive/2026/jan/...
rtbruhan00•4h ago
It's generous of them to ask for permission.
gizajob•2h ago
They wanted access to a faster pipe to slurp 500 terabytes, and that access comes at a cost. It wasn’t about permission.

And yeah they should be sued into the next century for copyright infringement. $4Trillion company illegally downloading the entire corpus of published literature for reuse is clearly infringement, its an absurdity to say that it’s fair use just to look for statistical correlations when training LLMs that will be used to render human authors worthless. One or two books is fair use. Every single book published is not.

empath75•45m ago
Whatever they get sued for would be pocket change.
breakingcups•1h ago
It wasn't about permission, it was about high-speed access. They needed Anna's Archive to facilitate that for them, scraping was too slow. It's incredible that they were allowed to continue even after Anna's Archive themselves explicitly pointed out that the material was acquired illegally.
kristofferR•1h ago
That's just normal US modus operandi. The court case against Maduro is allowed to continue even after everyone has acknowledged he was acquired illegally.
kristofferR•58m ago
It's not permission, it's a service they offer:

https://annas-archive.li/llm

poulpy123•3h ago
I'm not saying it will change anything but going after Anna's archive while most of the big AI players intensely used it is quite something
countWSS•1h ago
Short-term thinking, they don't care about where the data comes from but how easy is to get it. Its probably decided at project-manager level.
pjc50•1h ago
NVIDIA are "legitimate", so anything they do is fine, while AA are "illegitimate", so it's not.
SanjayMehta•2h ago
I'm wondering what Amazon is planning to do with their access to all those Kindle books.
philipwhiuk•1h ago
What do you mean 'planning'. You think they haven't already been sucked up?
embedding-shape•1h ago
What do you mean 'sucked up'? It's data on their machines already, people willingly give them the data, so Amazon can process and offer it to readers. No sucking needed, just use the data people uploaded to you already.
sib•1h ago
There's definitely a legal & contractual difference between (1) storing the books on your servers in order to provide them to end users who have purchased licenses to read them and (2) using that same data for training a model that might be used to create books that compete with the originals. I'm pretty sure that's why GP means by "sucking up."

This is analogous the difference between Gmail using search within your mail content to find messages that you are looking for vs Gmail providing ads inside Gmail based on the content of your email (which they don't do).

embedding-shape•56m ago
Yeah, I guess the "err" is on my side, I've always took "suck up" as a synonym for scraping, not just "using data for stuff".

And yeah, you're most likely right about the first, and the contract writers have with Amazon most certainly anticipates this, and includes both uses in their contract. But! Never published on Amazon, so don't know, but I'm guessing they already have the rights for doing so with what people been uploading these last few years.

wosined•1h ago
Sounds like BS. Why would nvidia need the books. Do they even have a chatbot? I doubt the books help with framegen.
voidUpdate•1h ago
I cant see the whole relevant section in the article, but there is a screenshot of part of the legal documents that states "In response, NVIDIA sought to develop and demonstrate cutting edge LLMs at its fall 2023 developer day. In seeking to acquire data for what it internally called "NextLargeLLM", "NextLLMLarge" and-" (cuts off here)
utopiah•1h ago
The same reason Intel worked on OpenCV : they want to sell more hardware by pushing the state of the art of what software can do on THEIR hardware.

It's basically just a sales demonstrator, that optionally, if incredibly successful and costly they can still sell as SaaS, if not just offer for free.

Think of it as a tech ad.

utopiah•1h ago
People HAVE to somehow notice how hungry for proper data AI companies are when one of the largest companies propping the fastest growing market STILL has to go to such length, getting actual approval for pirated content while they are hardware manufacturer.

I keep hearing how it's fine because synthetic data will solve it all, how new techniques, feedback etc. Then why do that?

The promises are not matching the resources available and this makes it blatantly clear.

flipped•1h ago
Considering AA gave them ~500TB of books, which is astonishing (very expensive to even store for AA), I wonder how much nvidia paid them for it? It has to be atleast close to half a million?