It's not, it's a statistical model of existing text. There's no "genAI", there is "genMS" - generative machine statistics.
When you look at it that way, it's obvious the models are build on enormous amounts of work done by people publishing the the internet. This amount of work is many orders of magnitude more hours than it took to produce the training algorithms.
As a result, most of the money people pay to access these models should go to the original authors.
And that even ignores the fact that a model trained on AGPL code should be licensed under AGPL as well as its output - even if no single training input can be identified in the output, it's quite straightforwardly _derived_ from enormous amounts of trainings data input and a tiny (barely relevant) bit of prompt input. It's _derivative_ work.
fwiw, I mostly agree with you (ai training stinks of some kind of infringement), but legal precedent is not favouring copyright holders at least for now.
In Bartz v. Anthropic and Kadrey v. Meta "judges have now held that copying works to train LLMs is “transformative” under the fair use doctrine" [1]
i.e. no infrigement - bearing in mind this applies only in the US. The EU and the rest of the world are setting their own precedents.
Copyright can only be contested in the jurisdiction that the alleged infringement occurred, and so far it seems that fair use is holding up. I'm curious to watch how it all plays out.
It might end up similarly to Uber vs The World. They used their deep pockets to destabilise taxis globally and now that the law is catching up it doesn't matter any more - Uber already won.
[1] https://www.ropesgray.com/en/insights/alerts/2025/07/a-tale-...
I know. I am describing how it should be.
Copyright was designed in a time when concealing plagiarism was time-consuming. Now it's a cheap mechanical operation.
What I am afraid is that this is being decided by people who don't have enough technical undersanding and who might be swayed by everyone calling it "AI" and thinking there's some kind of intelligence behind it. After all, they call genMS images/sounds/videos "AI" too, which is obviously nonsense.
Also, not sure what you mean by “statistics”.
If you mean that a parameter for a parameterized probability distribution is chosen in order to make the distribution align with a dataset, ok, that’s true.
That’s not generally what I think of when I hear “statistics” though?
Maybe but it should - see sibling comment.
Statistics as in taking a large input and processing it into much fewer values which describe the input in some relevant ways (and allow reproducing it). Admittedly it's pretty informal.
Often when tech comes out that does something better than people, it makes sense for people to stop doing it. But in the case of "books explaining things", AI only learned how to explain things by examining the existing corpus - and there won't be any more human-generated content to continue to learn and evolve from, so the explanatory skills of AI could wind up frozen in 2025.
An alternative would of course be that humans team up with AI to write better books of this sort, and are able to develop new and better ways of explaining things at a more rapid pace as a result.
A relatively recent example that sticks in my mind is how data visualization has improved. Documents from the second half of the 1900's are shockingly bad at data presentation, and the shock is due to how the standard of practice has improved in the last few decades. AI probably wouldn't have figured this out on its own, but is now able to train on many, many examples of good visualization.
I mean thats just nature. Darwin etc.
Maybe you should try the DPRK?
The resulting content lacks consistency and coherency. The generated prose is rather breathless: in carefully articulated content, every sentence should have a place. Flagship models (Opus 4~) don’t seem to understand the value of a sentence.
I’ve tried to prompt engineer this behavior (one should carefully attend to each sentence: how is it contributing to the technical narrative in this section, and overall?), but didn’t have much success.
I suspect this might be solved by research on grounding generation against world models: much of verifying “is this sentence correct here?” has to do with my sharing of a world model of some domain with my audience. I use that world model to debug my own writing.
I don't think it will ever counter the change, but I suspect there will be some interesting developments in culture worldwide caused by this.
I suppose it will also depend on how affordable/accessible these models will be.
Also, I just purchased LazyVim For Ambitious Developers. I've used the online edition a number of times in recent months. Thanks for your work!
I think it's safe to say it is pretty clear.
As an example, you can power 10 developers with the highest tier of Claude Code Max for a year under the price of a new developer. At this point, having plenty of personal experience with the tool, I'd pick the former option.
There, one less job for a developer.
Maybe this will change at some point in the future, but for now there's no way I would substitute a well-written book on a subject for AI slop. These models are trained on human-written material anyway, why not just go straight to the source?
k310•1h ago
The writer seems to assume that people can learn entirely from computer displays. (Including glasses) That would be a world where our entire lives or a great deal of them, are devoted to computer generated facts, experiences, well, everything.
There are still both creative and mundane experiences like the door molding that needs a fix.
I've been through a few revolutions that "changed everything" From the phone without a dial (true) to cell phones that solve the formerly horrifying "I couldn't find you at the airport" situation.
And so on.
I guess that the existential question is: "What is the purpose, meaning, and joy in life?"
It's not defined by the gadgets and technologies outside us, but truly by the relationships we have with people. So, if AI "replaces" my highly personal photographic experiences captured on film or memory, it's shared with close friends who know the story behind it and its value as an experience shared. I rarely post my photos anywhere, thanks to the image bureau and AI dragnets. They commoditize and destroy that personsl value. Art is self-expression. Read Joseph Conrad's preface [0]
Likewise, as long as life is full of experiences away from computer screens (and glasses), those are real life, and the technology is just ornament.
I had a plan for after-school education in which kids would go out and measure things and use computers to analyze the data. Like Kepler, but a lot faster. And the learning is in the doing.
Very long ago, 50 years to be factual, I wanted to get a third class Electrician's Mate rating, and in those days, you could strike for a rating. Pass some exams and show expertise with real gear. One of the Warrant Officers was tickled that I did it "the old way" instead of "A" school, because it's all "A" school now. Probably computerized.
Now, I'm retired, and relationships with people mean even more. I have more experiences and wisdom to share, and in a way that's unique to each person, not a multidimensional "profile" which might even be more complete than my understanding, but in a personal way that comes from the shared experience of being human and cares deeply about feelings, because we've had similar ones.
Not defined by the technology of the time: Mom and Dad's old "Operator" phone or 53 Ford. But by what we shared as persons, equal in humanity though as different as a kid and parent can be, and through evolving lives and times.
[0] https://standardebooks.org/ebooks/joseph-conrad/the-nigger-o...
pillefitz•1h ago