We give data to train AI models and get nothing in return

2•whooocareslol•5h ago

I’m less worried about being replaced by AI and more frustrated that companies are stealing our data to train AI models they profit from with potential to make us less valuable over time.

Whether you’re:

- A coder writing clean, reusable functions or internal tooling,

- A UGC creator making tutorials or product demos,

- A data labeller doing precise annotations...

…all of that labor creates intellectual property that ends up training AI models.

But here’s the problem: we don’t own any of it, even though it wouldn’t exist without us.

They take our data—by hook or by crook—train a model, and extract massive value from it, while paying us nothing or, at best, a small one-time fee.

Yes, companies do play a valuable role. But they are using our work to replace us or devalue our work. So we have every right to ask for more.

If you really think about it, data mining is much like mineral mining — just as companies extract valuable resources like gold or diamonds from the earth, often exploiting labor and poorly governed regions, data mining extracts value from a poorly managed pool of people and their data, frequently without their full knowledge or consent regarding how it will be used.

I think now is the right time to build fairer systems around data for everyone—royalties? data unions? open ownership of internal contributions within companies?

This business model isn't new—some data sourcing and collection companies charge not only a one-time fee but also a usage-based fee each time the data is used.

Doing this is not only necessary to make the data supply chain fair, but also to improve AI. We all know that AI performance scales with compute, and the best way to leverage increasing compute is by applying it to new data. So, if we want AI to continue improving, we need a proper data supply chain. And if we want high-quality data for more complex tasks, we must ensure that everyone is paid fairly.

Would love to hear your thoughts on this.

Comments

airylizard•3h ago

The data "supply chain" has already surged ahead of production elsewhere. Companies aren't just passively taking what's out there, they actively harvest highly curated content, benefiting even further when we voluntarily correct and refine their models. Heck, some of us are even paying them for the privilege of training AI. The best time to have made this argument would've been when GPT originally released, but I think most people were too enamored with it to care and the idea it would be "open-source" meant we'd get it back at the end of the day.

Unrelated, but this is exactly why I've been spending time building my AI framework (TSCE). The idea is to leverage these open-weight LLMs, typically smaller and accessible, to achieve accuracy and reliability comparable to larger models. It doesn't necessarily make the models "smarter" (like retraining or fine-tuning might), but it empowers everyday users to build reliable agentic workflows or AI tools from multiple smaller LLM instances. Check it out: https://github.com/AutomationOptimization/tsce_demo

babyent•3h ago

Any code you write for your company where you’re a contractor or w2 is not “your” code. It isn’t yours, it belongs to the company.

The company benefits because your code makes the models better which makes engineers more productive.

Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic

Why Apple Still Hasn't Cracked AI

OpenAI retires GPT-4

Life as a guide at ' oldest wooden church'

ArtificialCast: Type-safe transformation powered by inference

Anki Remote Code Exeuction Vulnerability

"The Curious Case of Benjamin Button" (1922) [pdf]

Former US President Biden diagnosed with 'aggressive' prostate cancer

Bridging Math and Code: Cute Layout Algebra in CuTeDSL

London's Water Pumps: Where History Flows Freely

Agent Recursion

Show HN: AI Mobile App QA on Real Devices

Toony Eye – A fun browser extension with a blinking eye that tracks your cursor

UK overtakes China as second-largest US Treasury holder

Alone at Sea for 95 Days, a Peruvian Fisherman Clung to Hope

What Desi Arnaz Could Teach Hollywood Today

Big U.S. cities grew in 2024, reversing Covid-era population declines

Life lessons from 90-year-olds who are still working, active, financially savvy

Why I Use WebAssembly

Show HN: Texas Hold'Em Equity Training Game

Bay to Breakers

A Garbage Collection Strategy

Mapgen4 Trade Routes

The Beta Launch of Contexa AI

A collection of quotes on the design of notation as a tool of thought

Colima: Container runtimes on macOS (and Linux) with minimal setup

Hyper Typing

Netflix has figured out a way to make ads even worse using AI

Google Cloud announces generative AI leader certification

iOS Deep-Linking with Bevy in Rust