Whether you’re:
- A coder writing clean, reusable functions or internal tooling,
- A UGC creator making tutorials or product demos,
- A data labeller doing precise annotations...
…all of that labor creates intellectual property that ends up training AI models.
But here’s the problem: we don’t own any of it, even though it wouldn’t exist without us.
They take our data—by hook or by crook—train a model, and extract massive value from it, while paying us nothing or, at best, a small one-time fee.
Yes, companies do play a valuable role. But they are using our work to replace us or devalue our work. So we have every right to ask for more.
If you really think about it, data mining is much like mineral mining — just as companies extract valuable resources like gold or diamonds from the earth, often exploiting labor and poorly governed regions, data mining extracts value from a poorly managed pool of people and their data, frequently without their full knowledge or consent regarding how it will be used.
I think now is the right time to build fairer systems around data for everyone—royalties? data unions? open ownership of internal contributions within companies?
This business model isn't new—some data sourcing and collection companies charge not only a one-time fee but also a usage-based fee each time the data is used.
Doing this is not only necessary to make the data supply chain fair, but also to improve AI. We all know that AI performance scales with compute, and the best way to leverage increasing compute is by applying it to new data. So, if we want AI to continue improving, we need a proper data supply chain. And if we want high-quality data for more complex tasks, we must ensure that everyone is paid fairly.
Would love to hear your thoughts on this.
airylizard•3h ago
Unrelated, but this is exactly why I've been spending time building my AI framework (TSCE). The idea is to leverage these open-weight LLMs, typically smaller and accessible, to achieve accuracy and reliability comparable to larger models. It doesn't necessarily make the models "smarter" (like retraining or fine-tuning might), but it empowers everyday users to build reliable agentic workflows or AI tools from multiple smaller LLM instances. Check it out: https://github.com/AutomationOptimization/tsce_demo