Physical print encyclopedias got replaced by Wikipedia, but AI isn't a replacement (can't ever see how either). While AI is a method of easier access for the end user, the purpose of Wikipedia stands on its own.
I've always scoffed at the Wikimedia Foundation's warchest and continuously increasing annual spending. I say now is the time to save money. Become self sustaining through investments so it can live for 1000 years.
To me, it is an existence for the common good and should be governed as such.
what are they increasing spending on? Are they still trying to branch out to other initiatives?
I understand, even with static pages, that hosting one of the largest websites in the world won't be cheap, but it can't be rising that much, right?
Grants & movement support was 25%.
Hosting was 3.4%. Facilities was 1.4%.
The Wikimedia Foundation is another Komen Foundation.
I'm sure all those editors with decades of experience can do quickly outdo OpenAI and Grok and what have you.
It's all very open if anyone wants to track down details themselves: https://meta.wikimedia.org/wiki/Category:Wikimedia_Foundatio...
2025–2026 is in-progress: https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_...
>Similar to last year, technology-related work represents nearly half of the Foundation's budget at 47% alongside priorities to protect volunteers and defend the projects of an additional 29% – a total of 76% of the Foundation's annual budget. Expenses for finance, risk management, fundraising, and operations account for the remaining 24%.
We who were born before this era really took off, are spoiled by the journalism standards and information purity levels of the past, especially post the fall of the USSR.
Wikipedia is impressive on what it manages to coordinate on a daily basis, especially given only 644 FT staff.
Wikipedia had its day, in between print encyclopedias and quick query AI. Its place in history is now set.
Something else will come along soon enough.
This is why Wikipedia is not a source, but can provide links to sources (which then, in turn, often send you down a rabbit hole trying to find their sources), and it's then up to you to determine the value and accuracy of those sources. For instance I enjoy researching historic economic issues and you'll often find there's like 5 layers of indirection before you can finally get to a first party source, and at each step along the road it's like a game of telephone of being played. It's the exact same with LLMs.
[1] - https://xkcd.com/978/
Wikipedia almost certainly has this in a nice table, which I can sort by any column, and all the countries are hyperlinked to their own articles, and it probably links to the concept of population estimation too.
There will be a primary source - But would a primary source also have articles on every country? That are ad-free, that follow a consistent format? That are editable? Then it's just Wikipedia again. If not, then you have to rely on the LLM to knit together these sources.
I don't see wikis dying yet.
At work, I had rigged one of my internal tools so that when you were looking at a system's health report, it also linked to an internal wiki page where we could track human-edited notes about that system over time. I don't think an AI can do this, because you can't fine-tune it, you can't be sure it's lossless round-tripping, and if it has to do a web search, then it has to search for the wiki you said is obsolete.
OpenStreetMap does the same thing. Their UIs automatically deep-link every key into their wiki. So if you click on a drinking fountain, it will say something like "amenity:drinking_water" and the UI doesn't know what that is, but it links you to the wiki page where someone's certainly put example pictures and explained the most useful ways to tag it.
There has to be a ground truth. Wikipedia and alike are a very strong middle point on the Pareto frontier between primary sources (or oral tradition, for OSM) and LLM summary
AI companies should be donating large sums of money to Wikipedia and other such sites to keep them healthy. Without good sources, we’re going to have AI training off AI slop.
Printed texts are still useful but so is Wikipedia (I continue to use both).
Right up there with anime torrenting sites.
But seriously, AI trained on Wikipedia should donate to Wikipedia. Why are the AI companies not doing this, or are they?
Wikipedia is a victim of it's own success, it was excellent at avoiding bias for quite awhile and the vast majority of articles are extremely well written.
However it's massive popularity and dominance have also led to, well this guy put it best: https://en.wikipedia.org/wiki/John_Dalberg-Acton,_1st_Baron_...
I always wondered why more companies or organizations didn’t do this. Pile up money during the good years to allow themselves to not need continued outside income to keep going, so they can do what is right instead of compromising their vision for the sake of hitting quarterly earnings. That isn’t to say they can’t keep making money, but do it for the right reasons that will keep the core business around for the long run.
I recently visited Scotland and on a visit to a distillery they mentioned they bought land in the US to grow trees that will make their barrels one day. The trees take over 100 years to grow (if I remember correctly). How is it we can invest ~200 years into a glass of scotch, yet we aren’t willing to take the same care and long term thinking in most other areas.
Even without being around for 1,000 years, I’d think doing this would de-stress and de-risk. Somewhere along way it became a bad thing to have a good, stable, long-lasting business. The only thing that seems to matter now is growth, even if they means instability, stress, excessive risk, and a short stay.
A poor comparison is how much money coca cola spends on advertisement, even though it is one of the best known brands in the entire world. And most of their advertisement is simply "This is our name, we exists", not even a value proposition or call to action.
If Wikimedia sets themselves up to pay for servers and maintenance for perpetuity, they will fall into obscurity.
With that being said, I also don't think they are spending their money in a good way.
Many more scoffed at that, saying those people were just stuck in their old ways and unable to adjust to the obviously superior new thing.
Is that you? AI applications are different than Wikipedia and are better in some ways: Coverage is much greater - you can get a detailed article on almost any topic. And if you have questions after reading a Wikipedia article, Wikipedia can't help you; the AI software can answer them. Also, it's a bit easier to find the information you want.
Personally, I'm with the first group, at the top if this comment. And now truth, accuracy, and epistemology, and public interest in those things, take another major hit in the post-truth era.
I know it’s completely normalized and the official name, but this has to be the most dangerous euphemism of our time.
It’s the era of lies.
But “era of lies” doesn’t sound nice because nobody wants to be a liar… so “post-truth” sounds better: “I'm telling the truth. Almost. But I'm not lying.”
What is that going to look like? How does one hedge against that eventuality?
Also, LLMs don't produce truth. They don't have a concept of it. Or lies for that matter. If you are using LLMs do study something you know nothing about the information provided by them is as good as useless if you don't verify it with external sources written by a person. Wikipedia isn't perfect, nothing is, but I trust their model a shitload more then an LLM.
665 ChatGPT-User
396 Bingbot
296 Googlebot
037 PerplexityBot
Fascinating.
About 80% of traffic to my sites (a few personal blogs and a community site) is from ai bots, search engine spiders or seo scrapers.
But at the same time I continue to contribute edits to Wikipedia. Because it's the source of so much data. To me, it doesn't matter if the information I contribute gets consumed on Wikipedia or consumed via LLM. Either way, it's helping people.
Wikipedia isn't going away, even if its website stops being the primary way most people get information from it.
https://news.ycombinator.com/item?id=34106982
>2022
>It’s the dishonesty of Wikipedia that bothers me. The implication is that donations are urgently needed to keep the website running. In reality they have $300m in the bank and revenue is growing every year[0]. Even Wikipedia says only 43% of donations are used for site operations[1], and that includes all of their sites, not just Wikipedia. Fully 12% of the money they collect from you is. . . used to ask you for more money[1]
Many of them are sites that have built themselves without any original reporting. Where will they scrape the content they've used to grow if their sources take the same attitude?
People rightfully get upset about individual editors having specific agendas on Wikipedia and I get it. Often that is the case. But the chat interface for LLMs allows for a back and forth where you can force them to look past some text to get closer to a truth.
For my part, I think it's nice to be part of making that base substrate of human knowledge in an open way, and some kinds of fixes to Wikipedia articles are very easy. So what little I do, I'll keep doing. Makes me happy to help.
Some of the fruit is really low-hanging, take a look at this garbage someone added to an article:
https://en.wikipedia.org/w/index.php?title=Salvadoran_gang_c...
It's _kinda_ cheap. Wikipedia is so cheap you can fit it all on a phone and search it instantly.
I agree overall but LLMs are just so heavy. I don't know if most people can afford to run one locally, and they're lossy. Both on a phone would be great. I fret a lot about data ownership, you know
This is the kind of capitalistic behavior that is repugnant to our idea of how things should work.
This is not what the commons is for - taking the work of creators, repackaging it, and using platform capability to re-sell it.
At this point, I am coming around to the argument that governments should make their own local/national AI.
crmd•8h ago
I always assumed the need for metastatic growth was limited to VC-backed and ad-revenue dependent companies.
sublinear•8h ago
lwansbrough•8h ago
undeveloper•6h ago
KPGv2•6h ago
qingcharles•8h ago
And their costs are even increasing because while human viewers are decreasing they are getting hugged to death by AI scrapes.
johnnyanmac•8h ago
For such purposes, I'd naively just setup some weekly job to download Wikipedia and then run a "scrape" on that. Even weekly may be overkill; a monthly snapshot may do more than enough.
yorwba•6h ago
cm2012•8h ago
crmd•8h ago
AstroBen•7h ago
Something tells me a person is way less likely to donate if they're consuming the content through an LLM middleman
khamidou•7h ago
I doubt that they're getting "hugged to death" by AI scrapers.
intended•5h ago
It means that now, people are paying for their AI subscriptions, while they don’t see Wikipedia at all.
The primary source is being intermediated - which is the opposite of what the net was supposed to achieve.
This is the piracy argument, except this time its not little old ladies doing it, but massive for profit firms.
rkomorn•5h ago