frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

We give data to train AI models and get nothing in return

2•whooocareslol•5h ago
I’m less worried about being replaced by AI and more frustrated that companies are stealing our data to train AI models they profit from with potential to make us less valuable over time.

Whether you’re:

- A coder writing clean, reusable functions or internal tooling,

- A UGC creator making tutorials or product demos,

- A data labeller doing precise annotations...

…all of that labor creates intellectual property that ends up training AI models.

But here’s the problem: we don’t own any of it, even though it wouldn’t exist without us.

They take our data—by hook or by crook—train a model, and extract massive value from it, while paying us nothing or, at best, a small one-time fee.

Yes, companies do play a valuable role. But they are using our work to replace us or devalue our work. So we have every right to ask for more.

If you really think about it, data mining is much like mineral mining — just as companies extract valuable resources like gold or diamonds from the earth, often exploiting labor and poorly governed regions, data mining extracts value from a poorly managed pool of people and their data, frequently without their full knowledge or consent regarding how it will be used.

I think now is the right time to build fairer systems around data for everyone—royalties? data unions? open ownership of internal contributions within companies?

This business model isn't new—some data sourcing and collection companies charge not only a one-time fee but also a usage-based fee each time the data is used.

Doing this is not only necessary to make the data supply chain fair, but also to improve AI. We all know that AI performance scales with compute, and the best way to leverage increasing compute is by applying it to new data. So, if we want AI to continue improving, we need a proper data supply chain. And if we want high-quality data for more complex tasks, we must ensure that everyone is paid fairly.

Would love to hear your thoughts on this.

Comments

airylizard•3h ago
The data "supply chain" has already surged ahead of production elsewhere. Companies aren't just passively taking what's out there, they actively harvest highly curated content, benefiting even further when we voluntarily correct and refine their models. Heck, some of us are even paying them for the privilege of training AI. The best time to have made this argument would've been when GPT originally released, but I think most people were too enamored with it to care and the idea it would be "open-source" meant we'd get it back at the end of the day.

Unrelated, but this is exactly why I've been spending time building my AI framework (TSCE). The idea is to leverage these open-weight LLMs, typically smaller and accessible, to achieve accuracy and reliability comparable to larger models. It doesn't necessarily make the models "smarter" (like retraining or fine-tuning might), but it empowers everyday users to build reliable agentic workflows or AI tools from multiple smaller LLM instances. Check it out: https://github.com/AutomationOptimization/tsce_demo

babyent•3h ago
Any code you write for your company where you’re a contractor or w2 is not “your” code. It isn’t yours, it belongs to the company.

The company benefits because your code makes the models better which makes engineers more productive.

Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic

https://arxiv.org/abs/2505.10551
1•badmonster•5m ago•0 comments

Why Apple Still Hasn't Cracked AI

https://www.bloomberg.com/news/features/2025-05-18/how-apple-intelligence-and-siri-ai-went-so-wrong
1•mfiguiere•7m ago•0 comments

OpenAI retires GPT-4

https://arstechnica.com/ai/2025/04/the-ai-that-sparked-tech-panic-and-scared-world-leaders-heads-to-retirement/
1•willmarquis•8m ago•0 comments

Life as a guide at ' oldest wooden church'

https://www.bbc.co.uk/news/articles/cx2el01zzn0o
1•zeristor•10m ago•0 comments

ArtificialCast: Type-safe transformation powered by inference

https://github.com/Zorokee/ArtificialCast
1•mpweiher•10m ago•0 comments

Anki Remote Code Exeuction Vulnerability

https://skerritt.blog/anki-0day/
1•bbno4•11m ago•0 comments

"The Curious Case of Benjamin Button" (1922) [pdf]

https://web.seducoahuila.gob.mx/biblioweb/upload/The%20Curious%20Case%20of%20Benjamin%20Button.pdf
1•bookofjoe•12m ago•0 comments

Former US President Biden diagnosed with 'aggressive' prostate cancer

https://www.reuters.com/world/us/former-us-president-biden-diagnosed-with-prostate-cancer-his-office-says-2025-05-18/
5•donsupreme•13m ago•0 comments

Bridging Math and Code: Cute Layout Algebra in CuTeDSL

https://veitner.bearblog.dev/bridging-math-and-code-cute-layout-algebra-in-cutedsl/
1•ashvardanian•15m ago•0 comments

London's Water Pumps: Where History Flows Freely

https://londonist.com/london/features/london-s-water-pump
1•joebig•16m ago•0 comments

Agent Recursion

https://choly.ca/post/agent-recursion/
2•0x696C6961•16m ago•0 comments

Show HN: AI Mobile App QA on Real Devices

https://qualgent.ai/
2•ShivamHacks•17m ago•0 comments

Toony Eye – A fun browser extension with a blinking eye that tracks your cursor

https://github.com/DanteTheKing/Eye-Tracker
1•Dante_Ushin•19m ago•0 comments

UK overtakes China as second-largest US Treasury holder

https://www.ft.com/content/894c1ce3-23cc-4648-a468-542fff034ff2
2•paulpauper•20m ago•0 comments

Alone at Sea for 95 Days, a Peruvian Fisherman Clung to Hope

https://www.nytimes.com/2025/03/17/world/americas/peruvian-fisherman-95-days-lost-at-sea.html
1•bookofjoe•22m ago•1 comments

What Desi Arnaz Could Teach Hollywood Today

https://www.nytimes.com/2025/05/18/opinion/desi-arnaz-hollywood-rules.html
3•paulpauper•22m ago•0 comments

Big U.S. cities grew in 2024, reversing Covid-era population declines

https://www.washingtonpost.com/nation/2025/05/16/census-bureau-city-population-increase/
2•paulpauper•22m ago•0 comments

Life lessons from 90-year-olds who are still working, active, financially savvy

https://www.msn.com/en-us/news/other/ar-AA1EZjvI
1•domofutu•22m ago•0 comments

Why I Use WebAssembly

https://nasso.dev/blog/why-i-use-wasm
1•azhenley•28m ago•0 comments

Show HN: Texas Hold'Em Equity Training Game

https://www.pokerpercent.com/equity
1•rexthebuilder•28m ago•0 comments

Bay to Breakers

https://en.wikipedia.org/wiki/Bay_to_Breakers
1•kaycebasques•29m ago•0 comments

A Garbage Collection Strategy

https://irreal.org/blog/?p=12989
1•azhenley•32m ago•0 comments

Mapgen4 Trade Routes

https://www.redblobgames.com/blog/2025-05-08-mapgen4-trade-routes/
2•lnyan•33m ago•0 comments

The Beta Launch of Contexa AI

https://platform.contexaai.com/
1•akshay_galande•33m ago•1 comments

A collection of quotes on the design of notation as a tool of thought

https://github.com/kai-qu/notation
2•fanf2•33m ago•0 comments

Colima: Container runtimes on macOS (and Linux) with minimal setup

https://github.com/abiosoft/colima
1•vortex_ape•33m ago•0 comments

Hyper Typing

https://pscanf.com/s/341/
4•azhenley•33m ago•16 comments

Netflix has figured out a way to make ads even worse using AI

https://www.engadget.com/entertainment/streaming/netflix-has-figured-out-a-way-to-make-ads-even-worse-using-ai-180623064.html
6•belter•35m ago•1 comments

Google Cloud announces generative AI leader certification

https://cloud.google.com/blog/topics/training-certifications/new-google-cloud-certification-in-generative-ai
1•gnabgib•35m ago•0 comments

iOS Deep-Linking with Bevy in Rust

https://rustunit.com/blog/2025/05-18-bevy-ios-deep-linking/
1•todsacerdoti•37m ago•0 comments