frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Is Meta Scraping the Fediverse for AI?

https://wedistribute.org/2025/08/is-meta-scraping-the-fediverse-for-ai/
27•nogajun•16h ago

Comments

drannex•15h ago
Yes, obviously, next question.
01HNNWZ0MV43FF•14h ago
Either that or, to continue building the shadow profiles we know they build, and to gain intelligence on their enemies and possible enemies of the current admin
wraptile•14h ago
Why would they need to scrape fediverse when they can just get all of the data and more just through federation? Also this anti-scraping stance for a public, transparent protocol is really weird - that's the whole point of the protocol.
UltraSane•14h ago
Complaining that data available on the public internet is being read seems very strange. Whatever happened to "Information wants to be free" or "The Net Interprets Censorship As Damage and Routes Around It."
nicbou•13h ago
The information is used to build monopolies that strangle the independent web.
wraptile•12h ago
But restricting the flow of information is a really weird way of handling this issue. It's like digging pot holes on the road just because you're upset that Teslas are on it.
Mars008•11h ago
It's not that important now as AI took off the ground. New models can be trained completely on generated data. That will give them core abilities. Real world knowledge... whatever humans can get models can.
nicbou•8h ago
> New models can be trained completely on generated data.

How does that account for all the things that change in the world, but in ways only humans can observe?

How can AI discover that a beloved tourist destination has turned to crap, or that the best vacuum cleaner of 2022 has a new challenger, or that German tipping culture is shifting, or that the café down the road has great banana bread but is a little loud on Saturdays?

UltraSane•12h ago
Or it is being used to build the most useful information indexing and search algorithms ever created.
nicbou•8h ago
Until it starves out the websites and communities that provide the training data.
UltraSane•3h ago
The circle of Life.
Mars008•12h ago
There can be only one monopoly in each domain by definition. In AI world it's more like several 'fortresses'. Together they ruin click economy. Which almost eliminated printed books and magazines. Well, attention is limited resource.
nicbou•8h ago
The main difference is that the click economy did not rely on printed books and magasines' continued existence. It could produce its own original information. A magasine author could become a blogger, and they could still write their own café reviews.

Generative AI still relies on the work of the creators whose livelihood it threatens for its training data. It still relies on someone else experiencing the real world, and describing it for them. It just denies them their audience or the fruit of their labour.

Someone here put it nicely: AI companies are eating their seed corn.

1gn15•14h ago
Yes, obviously. More people should scrape and archive the Fediverse.
UltraSane•14h ago
Any data that is put on the public internet WILL be scraped and used for LLM training.
thrown-0825•11h ago
people view robots.txt and llm.txt as some kind of binding contract.

its not, and expecting companies to follow it is naive.

avazhi•8h ago
Nobody cares about robots.txt, nor should they.

I will never not be amused by people clutching pearls about this.

gradientsrneat•2h ago
"AI" corporations aren't just "scraping" the fediverse. They are DDOSing independent websites all over the internet. Blocking and hampering their scrapers is often the best and only solution for some small indie sites to remain financially viable. These companies are destroying the commons.

Even Hacker News users report being affected: https://news.ycombinator.com/item?id=43397361

There are countless examples of "AI" DDOSing of independent websites if you care to search for them.

Note: I do not endorse the linked blogger

Nginx Introduces Native Support for Acme Protocol

https://blog.nginx.org/blog/native-support-for-acme-protocol
201•phickey•2h ago•82 comments

FFmpeg 8.0 adds Whisper support

https://code.ffmpeg.org/FFmpeg/FFmpeg/commit/13ce36fef98a3f4e6d8360c24d6b8434cbb8869b
625•rilawa•8h ago•237 comments

I chose OCaml as my primary language

https://xvw.lol/en/articles/why-ocaml.html
22•nukifw•23m ago•5 comments

Launch HN: Golpo (YC S25) – AI-generated explainer videos

https://video.golpoai.com/
20•skar01•1h ago•29 comments

Cross-Site Request Forgery

https://words.filippo.io/csrf/
16•tatersolid•57m ago•1 comments

OpenIndiana: Community-Driven Illumos Distribution

https://www.openindiana.org/
48•doener•3h ago•34 comments

April Fools 2014: The *Real* Test Driven Development

https://testing.googleblog.com/2014/04/the-real-test-driven-development.html
16•omot•42m ago•3 comments

So what's the difference between plotted and printed artwork?

https://lostpixels.io/writings/the-difference-between-plotted-and-printed-artwork
115•cosiiine•5h ago•44 comments

ReadMe (YC W15) Is Hiring a Developer Experience PM

https://readme.com/careers#product-manager-developer-experience
1•gkoberger•1h ago

Coalton Playground: Type-Safe Lisp in the Browser

https://abacusnoir.com/2025/08/12/coalton-playground-type-safe-lisp-in-your-browser/
68•reikonomusha•3h ago•18 comments

Pebble Time 2* Design Reveal

https://ericmigi.com/blog/pebble-time-2-design-reveal/
60•WhyNotHugo•3h ago•22 comments

This website is for humans

https://localghost.dev/blog/this-website-is-for-humans/
301•charles_f•3h ago•152 comments

DoubleAgents: Fine-Tuning LLMs for Covert Malicious Tool Calls

https://pub.aimind.so/doubleagents-fine-tuning-llms-for-covert-malicious-tool-calls-b8ff00bf513e
53•grumblemumble•4h ago•17 comments

A case study in bad hiring practice and how to fix it

https://www.tomkranz.com/blog1/a-case-study-in-bad-hiring-practice-and-how-to-fix-it
48•prestelpirate•1h ago•42 comments

New treatment eliminates bladder cancer in 82% of patients

https://news.keckmedicine.org/new-treatment-eliminates-bladder-cancer-in-82-of-patients/
145•geox•3h ago•46 comments

The Mary Queen of Scots Channel Anamorphosis: A 3D Simulation

https://www.charlespetzold.com/blog/2025/05/Mary-Queen-of-Scots-Channel-Anamorphosis-A-3D-Simulation.html
51•warrenm•5h ago•13 comments

We caught companies making it harder to delete your personal data online

https://themarkup.org/privacy/2025/08/12/we-caught-companies-making-it-harder-to-delete-your-data
183•amarcheschi•4h ago•43 comments

How Stock Options Work

https://web.stanford.edu/class/e145/2007_fall/materials/stockoptions.html
19•jdcampolargo•56m ago•0 comments

Claude says “You're absolutely right!” about everything

https://github.com/anthropics/claude-code/issues/3382
492•pr337h4m•11h ago•391 comments

Mesmerizing Hypnoloid, a Kinetic Desktop Sculpture

https://www.core77.com/posts/138054/This-Mesmerizing-Hypnoloid-a-Kinetic-Desktop-Sculpture
7•surprisetalk•3d ago•0 comments

Honky-Tonk Tokyo (2020)

https://www.afar.com/magazine/in-tokyo-japan-country-music-finds-an-audience
15•NaOH•3d ago•3 comments

Gartner's Grift Is About to Unravel

https://dx.tips/gartner
56•mooreds•2h ago•32 comments

Tesla Diner Drops Most Menu Options and Cuts Hours Just Weeks After Opening

https://www.jalopnik.com/1938650/tesla-diner-drops-most-menu-options-cuts-hours/
9•raattgift•31m ago•4 comments

Nearly 1 in 3 Starlink satellites detected within the SKA-Low frequency band

https://astrobites.org/2025/08/12/starlink-ska-low/
158•aragilar•10h ago•138 comments

Claude Sonnet 4 now supports 1M tokens of context

https://www.anthropic.com/news/1m-context
1244•adocomplete•1d ago•658 comments

Bezier-rs – algorithms for Bézier segments and shapes

https://graphite.rs/libraries/bezier-rs/
193•jarek-foksa•4d ago•39 comments

Pebble Time 2 Design Reveal [video]

https://www.youtube.com/watch?v=pcPzmDePH3E
125•net01•5h ago•51 comments

F-Droid build servers can't build modern Android apps due to outdated CPUs

366•nativeforks•13h ago•237 comments

The Rock Art of Serrania De La Lindosa

https://www.earthasweknowit.com/pages/serrania_de_la_lindosa_rock_art
23•kkoncevicius•4d ago•2 comments

Supporting org.apache.xml.security in graalVM

https://guust.ysebie.be/blog/supporting-apache-xml-security-algorithms.html
23•whizzx•5h ago•3 comments