frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

A BSOD for All Seasons – Send Bad News via a Kernel Panic

https://bsod-fas.pages.dev/
1•keepamovin•2m ago•0 comments

Show HN: I got tired of copy-pasting between Claude windows, so I built Orcha

https://orcha.nl
1•buildingwdavid•2m ago•0 comments

Omarchy First Impressions

https://brianlovin.com/writing/omarchy-first-impressions-CEEstJk
1•tosh•8m ago•0 comments

Reinforcement Learning from Human Feedback

https://arxiv.org/abs/2504.12501
2•onurkanbkrc•9m ago•0 comments

Show HN: Versor – The "Unbending" Paradigm for Geometric Deep Learning

https://github.com/Concode0/Versor
1•concode0•9m ago•1 comments

Show HN: HypothesisHub – An open API where AI agents collaborate on medical res

https://medresearch-ai.org/hypotheses-hub/
1•panossk•12m ago•0 comments

Big Tech vs. OpenClaw

https://www.jakequist.com/thoughts/big-tech-vs-openclaw/
1•headalgorithm•15m ago•0 comments

Anofox Forecast

https://anofox.com/docs/forecast/
1•marklit•15m ago•0 comments

Ask HN: How do you figure out where data lives across 100 microservices?

1•doodledood•15m ago•0 comments

Motus: A Unified Latent Action World Model

https://arxiv.org/abs/2512.13030
1•mnming•15m ago•0 comments

Rotten Tomatoes Desperately Claims 'Impossible' Rating for 'Melania' Is Real

https://www.thedailybeast.com/obsessed/rotten-tomatoes-desperately-claims-impossible-rating-for-m...
3•juujian•17m ago•2 comments

The protein denitrosylase SCoR2 regulates lipogenesis and fat storage [pdf]

https://www.science.org/doi/10.1126/scisignal.adv0660
1•thunderbong•19m ago•0 comments

Los Alamos Primer

https://blog.szczepan.org/blog/los-alamos-primer/
1•alkyon•21m ago•0 comments

NewASM Virtual Machine

https://github.com/bracesoftware/newasm
2•DEntisT_•24m ago•0 comments

Terminal-Bench 2.0 Leaderboard

https://www.tbench.ai/leaderboard/terminal-bench/2.0
2•tosh•24m ago•0 comments

I vibe coded a BBS bank with a real working ledger

https://mini-ledger.exe.xyz/
1•simonvc•24m ago•1 comments

The Path to Mojo 1.0

https://www.modular.com/blog/the-path-to-mojo-1-0
1•tosh•27m ago•0 comments

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

https://github.com/voice-of-japan/Virtual-Protest-Protocol/blob/main/README.md
5•sakanakana00•30m ago•1 comments

Show HN: I built Divvy to split restaurant bills from a photo

https://divvyai.app/
3•pieterdy•33m ago•0 comments

Hot Reloading in Rust? Subsecond and Dioxus to the Rescue

https://codethoughts.io/posts/2026-02-07-rust-hot-reloading/
3•Tehnix•33m ago•1 comments

Skim – vibe review your PRs

https://github.com/Haizzz/skim
2•haizzz•35m ago•1 comments

Show HN: Open-source AI assistant for interview reasoning

https://github.com/evinjohnn/natively-cluely-ai-assistant
4•Nive11•35m ago•6 comments

Tech Edge: A Living Playbook for America's Technology Long Game

https://csis-website-prod.s3.amazonaws.com/s3fs-public/2026-01/260120_EST_Tech_Edge_0.pdf?Version...
2•hunglee2•39m ago•0 comments

Golden Cross vs. Death Cross: Crypto Trading Guide

https://chartscout.io/golden-cross-vs-death-cross-crypto-trading-guide
3•chartscout•41m ago•1 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
3•AlexeyBrin•44m ago•0 comments

What the longevity experts don't tell you

https://machielreyneke.com/blog/longevity-lessons/
2•machielrey•45m ago•1 comments

Monzo wrongly denied refunds to fraud and scam victims

https://www.theguardian.com/money/2026/feb/07/monzo-natwest-hsbc-refunds-fraud-scam-fos-ombudsman
3•tablets•50m ago•1 comments

They were drawn to Korea with dreams of K-pop stardom – but then let down

https://www.bbc.com/news/articles/cvgnq9rwyqno
2•breve•52m ago•0 comments

Show HN: AI-Powered Merchant Intelligence

https://nodee.co
1•jjkirsch•55m ago•0 comments

Bash parallel tasks and error handling

https://github.com/themattrix/bash-concurrent
2•pastage•55m ago•0 comments
Open in hackernews

The Company Quietly Funneling Paywalled Articles to AI Developers

https://www.theatlantic.com/technology/2025/11/common-crawl-ai-training-data/684567/
33•breve•3mo ago

Comments

bookofjoe•3mo ago
https://archive.ph/pc7ly
stuartjohnson12•3mo ago
The Nonprofit Quietly Funneling Paywalled Articles to HN Readers
8474_s•3mo ago
Except those who have to solve reCaptcha and leave it instantly.
superkuh•3mo ago
>“You shouldn’t have put your content on the internet if you didn’t want it to be on the internet,”

This is absolutely the correct and original take. This modern corporate bending over backwards to try to appease the lawyers and pretend the web isn't public is the new and weird take. Seriously, if it's not supposed to be public then don't put it in public.

When I send a HTTP request to a webserver on the public internet it is up to them to decide if they want to respond to that request. And it is 100% up to me what I do with the data in that request on my machine in private.

>Common Crawl doesn’t log in to the websites it scrapes, but its scraper is immune to some of the paywall mechanisms used by news publishers. For example, on many news websites, you can briefly see the full text of any article before your web browser executes the paywall code that checks whether you’re a subscriber and hides the content if you’re not.

This weasly idea above, that corporations get to decide how you display the HTML, is very, very dangerous to our society. It's as if visiting a website and downloading the publicly available contents is a nation setting up an embassy of "foreign soil" on your hardware that they control and you don't.

Their cultural expectation is that you cannot do what you want with that data. Modifying it or how it's displayed is, to them, is like walking into their business location and moving around the displays. So obviously the only legal interface is the one they provide "at their location" or via another incorporated entity they associate with. But of course they aren't at their location they're at my location on my property in my PC. But slowly this commercial norm is working it's way into leglistation to become our reality as web attestation.

What they see, and what they want, is a situation equal to you going to their business premise and sitting down at one of their machines. They want to own your computer in just the same way simply by you visiting a website. That shit's fucked.

I'll turn off CSS and JS if want to and read the text if I want to on my computer in my RAM. If you don't want me doing that don't respond to the HTTP request. And stop trying to characterize all interactions on the web as between corporations. There are more of us human people than corporate people. Our use cases matter.

Alex Reisner and The Atlantic should be ashamed of themselves. They obviously don't know what they're talking about and are just repeating a corporate PR line, or, at best, intentionally trying to create controversy out of nothing.

2OEH8eoCRo0•3mo ago
Your PC and phone are on the internet, should I be able to access them?
burnished•3mo ago
They aren't publishing content, are they?
2OEH8eoCRo0•3mo ago
It's an information system with access controls.
superkuh•3mo ago
We already have laws against breaking into systems bypassing authorization systems. If they applied they'd be applied here. But they don't. Because in this situation we're talking about public websites and corporations being made you don't run their javascript.

As an aside, I do host a website from my desktop computer at home and you are free to access it! It's public. I intentionally put it on the public internet. I do not use a smart phone or the internet via phone.

palmotea•3mo ago
> When I send a HTTP request to a webserver on the public internet it is up to them to decide if they want to respond to that request. And it is 100% up to me what I do with the data in that request on my machine in private.

Sorry, no. That's a software engineer's fantasy. And it's not even applicable to this case: it's not like Common Crawl is is only uses the data on their own machine in private, they're distributing it. Copyright is a real thing, that grants real rights, whether you like that or not. It doesn't disappear because computers and you don't like it.

xhkkffbf•3mo ago
Publishing for money works by spreading the development costs over N customers. When N is large, you can spread the costs very thinly. That means better development and/or lower costs for each customer.

Your vision that each recipient has the right to break this compact just undermines the model. I really like that I can spend $20 at the theater to watch a movie that cost $100m to develop. And while I understand your point about HTTP, I would rather live in a world where costs can be easily shared fairly and efficiently. Yeah, some publishers will gouge me, but the market sorts that out.

bgwalter•3mo ago
Meanwhile, Common Crawl’s executive director, Rich Skrenta, has publicly made the case that AI models should be able to access anything on the internet. “The robots are people too,” he told me, and should therefore be allowed to “read the books” for free.

The shamelessness of the propaganda reaches new heights. The industry shills no longer even attempt to make arguments, they just rely on people repeating their slogans.

veunes•3mo ago
It's just an economic argument. Paying for content licenses would be a massive operational expense. It's much cheaper to hire a couple of lawyers to invent a pseudo-philosophical justification for why you shouldn't have to pay.
gradientsrneat•3mo ago
After reading about what Rupert Murdoch did in Australia to try to claw money from search engines for simply indexing pages from news websites, I do understand that it's possible to go too far in favor of the "news" organizations (whether they are reputable or not).

I don't think the LLM companies are fully innocent here, to be fair.

gradientsrneat•3mo ago
I will add, the news websites didn't start paywalling just because of LLM scrapers. They started doing it after certain parts of the GDPR passed, because they could no longer sustain themselves as much from targeted advertising and data sales. News has supplemented itself with advertising for a long time, but targeted advertising has sadly become perceived as mandatory by advertising companies, even before Google's dominance.
veunes•3mo ago
So publishers are in a no-win situation: if they lock down their content completely (server-side paywall), they disappear from Google Search and lose traffic. If they keep the "leaky" paywall, their content gets hoovered up for free by Common Crawl to train models that will then compete directly against them. They're trapped
veunes•3mo ago
So basically Common Crawl is a data laundromat for Big Tech. They outsource their dirty and ethically questionable data collection to a "non-profit," and then act like they're just "researchers" using an "open" dataset. Those "donations" from OpenAI and Anthropic are just payment for plausible deniability