Physical print encyclopedias got replaced by Wikipedia, but AI isn't a replacement (can't ever see how either). While AI is a method of easier access for the end user, the purpose of Wikipedia stands on its own.
I've always scoffed at the Wikimedia Foundation's warchest and continuously increasing annual spending. I say now is the time to save money. Become self sustaining through investments so it can live for 1000 years.
To me, it is an existence for the common good and should be governed as such.
what are they increasing spending on? Are they still trying to branch out to other initiatives?
I understand, even with static pages, that hosting one of the largest websites in the world won't be cheap, but it can't be rising that much, right?
Grants & movement support was 25%.
Hosting was 3.4%. Facilities was 1.4%.
The Wikimedia Foundation is another Komen Foundation.
Wikimedia accepts Paypal, Apple Pay, Google Pay, Visa, Mastercard, Amex, Check, ACH and Money Order.
Pretty hard to argue that mainstream processors don't like them.
Processors charge higher fees to merchants that are in lines of business with high fraud and chargeback risk, has nothing to do with whether they agree with them morally.
They refuse merchants with business they don't like.
If it were the case that processors didn't like what wikipedia publishes, they would not be able to accept payment, not have high fees.
I can't imagine that wikipedia has high chargeback rates, and clearly the processors don't mind doing business with them.
The processing line item probably includes not just the fees that they have to pay to processors, but FX fees, the cost of banking, the cost of paying people to open envelopes, the cost of accounting, etc.
Using a platform with its own fee on top of payment processor fees would explain the 6.4%.
Its actually somewhat common for people who steal credit cards to use non profits like wikipedia to "test" them. Typically such sites have no minimum donation, have donations from all over the world so fraud detection wont think its weird you're spending money half way across the world.
I'm sure all those editors with decades of experience can do quickly outdo OpenAI and Grok and what have you.
A lot of it is engineers who work on improving the software that runs Wikipedia, and keeping the site running, which you can see happening at https://phabricator.wikimedia.org -- outside of security issues, all the dev work is done in the open. There's constant ongoing work on making Wikipedia and all the related projects work better.
There's also people who do the fundraising, community management, legal defense, etc. Then there's general HR infrastructure around employing hundreds of people.
Basically, that "Hosting was 3.4%. Facilities was 1.4%." point gets brought up, and neglects to mention that you then need to pay for a bunch of people to manage those servers and facilities.
(Disclaimer: I'm an employee of the WMF. I'm just an engineer, so I'm not speaking authoritatively about financial details.)
It's all very open if anyone wants to track down details themselves: https://meta.wikimedia.org/wiki/Category:Wikimedia_Foundatio...
2025–2026 is in-progress: https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_...
>Similar to last year, technology-related work represents nearly half of the Foundation's budget at 47% alongside priorities to protect volunteers and defend the projects of an additional 29% – a total of 76% of the Foundation's annual budget. Expenses for finance, risk management, fundraising, and operations account for the remaining 24%.
We who were born before this era really took off, are spoiled by the journalism standards and information purity levels of the past, especially post the fall of the USSR.
Wikipedia is impressive on what it manages to coordinate on a daily basis, especially given only 644 FT staff.
Wikipedia's pages are not static.
Realistically though most of the budget is spent on improving the website not just treading water.
Wikipedia had its day, in between print encyclopedias and quick query AI. Its place in history is now set.
Something else will come along soon enough.
This is why Wikipedia is not a source, but can provide links to sources (which then, in turn, often send you down a rabbit hole trying to find their sources), and it's then up to you to determine the value and accuracy of those sources. For instance I enjoy researching historic economic issues and you'll often find there's like 5 layers of indirection before you can finally get to a first party source, and at each step along the road it's like a game of telephone of being played. It's the exact same with LLMs.
[1] - https://xkcd.com/978/
Haven't yet had the same issue with Wikipedia.
Wikipedia almost certainly has this in a nice table, which I can sort by any column, and all the countries are hyperlinked to their own articles, and it probably links to the concept of population estimation too.
There will be a primary source - But would a primary source also have articles on every country? That are ad-free, that follow a consistent format? That are editable? Then it's just Wikipedia again. If not, then you have to rely on the LLM to knit together these sources.
I don't see wikis dying yet.
At work, I had rigged one of my internal tools so that when you were looking at a system's health report, it also linked to an internal wiki page where we could track human-edited notes about that system over time. I don't think an AI can do this, because you can't fine-tune it, you can't be sure it's lossless round-tripping, and if it has to do a web search, then it has to search for the wiki you said is obsolete.
OpenStreetMap does the same thing. Their UIs automatically deep-link every key into their wiki. So if you click on a drinking fountain, it will say something like "amenity:drinking_water" and the UI doesn't know what that is, but it links you to the wiki page where someone's certainly put example pictures and explained the most useful ways to tag it.
There has to be a ground truth. Wikipedia and alike are a very strong middle point on the Pareto frontier between primary sources (or oral tradition, for OSM) and LLM summary
AI companies should be donating large sums of money to Wikipedia and other such sites to keep them healthy. Without good sources, we’re going to have AI training off AI slop.
One thing that I would really like to see is some kind of hefty tax on any kind of income derived from models trained on Wikipedia. Basically, make it legal to train, to share weights etc freely, and hosting them locally. But the moment you start charging people for subscription, the society should start charging you to maintain the commons that you are profiting from.
(This likely goes for more than Wikipedia, but that case is especially simple since there's a single legal entity that could be given the money.)
Printed texts are still useful but so is Wikipedia (I continue to use both).
Right up there with anime torrenting sites.
But seriously, AI trained on Wikipedia should donate to Wikipedia. Why are the AI companies not doing this, or are they?
Wikipedia is a victim of it's own success, it was excellent at avoiding bias for quite awhile and the vast majority of articles are extremely well written.
However it's massive popularity and dominance have also led to, well this guy put it best: https://en.wikipedia.org/wiki/John_Dalberg-Acton,_1st_Baron_...
Yeah, yeah, let me guess, the truth has a liberal bias.
But continue to insist that everyone pointing should just move somewhere else, that will certainly make it less biased and more factual.
I find this kind of a fascinating social phenomena.
I guess to give a personal example, I was trying to update a few pages about a country's Olympic history - that is, their Olympic bids, a few athletes, etc.
Unknowingly, I had stumbled across a particular power-editor's fiefdom, because they created all these pages and they were very aggressive in policing their articles to meet a certain style, tone and their beliefs.
Searching this person's username up (they went by their real name), they were closely related to that country's Olympic committee and an employee of a Ministry of Sport of sorts. The articles had lots of anonymous IP address edits from an university network this person was affiliated with in their program portfolio.
There was a clear conflict of interest, and I tried to point that out when they mass-reverted my edits, but they seemed committed to accusing me of edit warring by not sandboxing my changes and waiting for their personal review and approval, and quoted at least 12 different Wikipedia policies on notability, style, acceptable citations, etc. I still feel I was in the right, but I didn't have the willpower or stamina to fight against several requests for comments, speedy deletions, etc. They did get a warning from an administrator and some detractors in the discussion threads, but they weren't willing to let it go and at that point, it wasn't fun anymore for me. I have better things to do than to fight factual and nitpicky disputes on Wikipedia.
This is so untrue and this is a harmful belief. And I'm pretty much as liberal as they come.
Also Wikipedia has the problem of being one type of article many subjects really need a other way to explain it.
Even facts can be slander.
For example, it would be stupid for me to entirely dismiss all of Wikipedia just because I know that there are some horribly biased articles and it'd be a disgrace if I shrugged such behavior off saying "hey, we all have biases, can't help it". That's what a child or even an animal would do.
2) For several others, see Alon Amit's superb Quora answer to "What are the most interesting or popular probability puzzles in which the intuition is contrary to the solution?" ([2], login-walled). Mentions the very counterintuitive Penney's Game [0].
3) Berkson's Paradox, aka "People in hospital/getting treatment tend to have worse health indicators".
4) Asymmetric dice behavior is counterintuitive, when you first see it.
5) Benford's Law, on quantities occurring in nature (e.g. river lengths), as opposed to uniform distribution.
6) There are lots of counterintuitive things about Platonic solids.
7) Bayes' Theorem itself, superbly useful but possibly one of the things in probability most abused on a daily basis by bad journalism and bad statistics.
8) The Multiple Testing Problem/p-hacking/aka the xkcd "Green jelly beans cause acne" and as a corollary: 8a) Most published (academic) findings aren't replicable, aka "Why Most Published Research Findings Are False", Joannidis (2005)
[9] Almost-integers
---
[1]: https://en.wikipedia.org/wiki/Monty_Hall_problem
[2]: https://www.quora.com/What-are-the-most-interesting-or-popul...
[3]: https://en.wikipedia.org/wiki/Berkson%27s_paradox
[5]: https://en.wikipedia.org/wiki/Benford%27s_law
[8]: https://en.wikipedia.org/wiki/Multiple_comparisons_problem
[9]: https://mathworld.wolfram.com/AlmostInteger.html
[0]: https://en.wikipedia.org/wiki/Penney%27s_game
More: https://en.wikipedia.org/wiki/Category:Probability_theory_pa...
In fact, the claim that “bias can be avoided and should be absolutely”, that is implicit in your resposne reflects a bias of its own: a bias toward moral or intellectual purity, as if the parent recognizing bias is equivalent to endorsing it. I get that this is a pedantic point to make but to come at the parent with such vigour for being realistic, again seems a bit unfair
A helpless "we couldn't possibly do anything about that issue" sort of mentality.
That said if we're talking literally then I fully agree with you that heuristics are a form of bias and can sometimes be a very good thing on a case by case basis.
Nonsense. Your definition of the word "bias" includes any assertion whatsoever. Bias is distortion from reality and truth. Saying that we can avoid distortion is not itself a distortion. I never claimed that all bias should be avoided, but the post I responded to said that bias can't be avoided.
Also,
> a bias toward moral or intellectual purity, as if the parent recognizing bias is equivalent to endorsing it.
There is no conceptual connection here between "purity" and the equivalence of recognizing bias with endorsing it, nor is saying "bias can be avoided" related to, or a kind of, "purity" in any useful sense. Stop using abstract words for effect and speak simply.
> I get that this is a pedantic point to make but to come at the parent with such vigour for being realistic, again seems a bit unfair
If the parent were being realistic, they'd say that we can't even recognize bias, which is actually more agreeable to me. But instead the parent admits that we can recognize bias. Since we've gotten that far, then I can say that failing to avoid it, when we should avoid it, is merely a lack of will and integrity rather than some inescapable fate.
In this case, I simply felt your judgment of the parent wasn’t fair, and showed a moral bias. Maybe they weren’t perfectly clear, and too absolute, but your response wasn’t proportionate either. It condemned more than it understood. I interpreted it as an epistemic observation and you interpreted it as an offense. The very fact that we came away with two completely different readings of the same short sentence rather proves the point.
Thank you for putting words in my mouth regarding my definition of the word bias, but lets use your own: "Bias is a distortion from reality and truth." If that is the case, we can never hope to avoid it, because we will never have perfect information. Using that definition, we are quite literally constantly in a state of bias. Your very own definition is far more broadly supportive of the notion that bias can't be avoided and consequently suggests bias is effectively ubiquitous. This to me is the primary point the parent was making.
I was perhaps too charitable, and you not enough. We are both biased, and going by the advice of the parent, I'm pointing it out. I don't think there is much more I can do.
I'm not interested in talking about what could have been implied, only about what was stated. I'm arguing against an idea that was articulated, not the person who articulated it.
One of the problems of the current-day liberals, in my opinion, is that they make universal statements that they don't mean in order to sound punchy and snag a few morality points. "Believe all women," "men are trash," "defund the police," "all cops are bastards" are all things you'd hear from a person who doesn't actually mean or want any of these things, even though the root of each of those is just and good. The idea that "bias can't be eliminated, only made explicit" is another one of these. If we don't believe it, then let's not say it.
> I certainly have a bias to judge it more charitably than I do someone who leaps straight to moral outrage and judgement this early in the interaction.
I'm not sure where you're reading outrage moral or otherwise. Was it that I used the word "so" in "so harmful"? And where's your bias against someone who tells another person to go to Conservapedia if they think bias can and should be avoided?
> Maybe they weren’t perfectly clear, and too absolute, but your response wasn’t proportionate either. It condemned more than it understood.
I merely stated that the belief was untrue and harmful, I don't think that's disproportionate at all. I can only understand what is stated, and according to my understanding we ought to condemn it.
> Thank you for putting words in my mouth regarding my definition of the word bias, but lets use your own: "Bias is a distortion from reality and truth." If that is the case, we can never hope to avoid it, because we will never have perfect information. Using that definition, we are quite literally constantly in a state of bias. Your very own definition is far more broadly supportive of the notion that bias can't be avoided and consequently suggests bias is effectively ubiquitous.
This is shifting the goalposts. First, I never claimed we could know the complete truth; it was the original post who stated that we couldn't course-correct upon learning new truth ("bias can't be avoided"). And second, the context of the original statement is bias in reporting, not epistemological certainty. We're not talking about positions of atoms here. We don't need perfect information to stop being biased against women in the workplace or against black people or whatever the subject. Even as individuals.
> This to me is the primary point the parent was making.
If that is their point then they can say it.
> We are both biased, and going by the advice of the parent, I'm pointing it out. I don't think there is much more I can do.
I have to ask, if I can't avoid my bias and you can't avoid yours, then what's the point of pointing out bias at all? Is it for other people to avoid our bias? How can they do that? I guess we're trying to minimize its effects, like you said.
> I was perhaps too charitable, and you not enough.
If I lack charity, it's in response to the original uncharitableness of the person telling someone to go to Conservapedia. If he would have mercy, let him show mercy.
I'm not saying there's a definitive interpretation with how terse it is, just that we aren't necessarily on the same page and attempts to come to any sort of agreement with each other might be a waste of time as we are practically talking about two different ideas. I take this response as pretty fair, and I think the point you're making is totally valid, I just think our respective ideas would never converge as we are talking about 2 distinct things. (Interesting how much conversation a lack of clarity can generate).
This is the first time I've heard about it. Is that meant to be a satire website? I can't tell anymore.
https://www.conservapedia.com/Main_Page
It started out crazy but it only got crazier as time went on.
The biggest problem with wikipedia competitors, is the only people who tend to put in effort to make one are usually crazy.
That said, it is indeed so insane that trolls sometimes become contributors just to see how far they can push it. There were several known cases of trolls going far enough undetected to get various admin rights (to block etc).
If you want a site that tells you that Jews launched a genocide of Nazis.
Conservapedia is intellectual cancer.
I always wondered why more companies or organizations didn’t do this. Pile up money during the good years to allow themselves to not need continued outside income to keep going, so they can do what is right instead of compromising their vision for the sake of hitting quarterly earnings. That isn’t to say they can’t keep making money, but do it for the right reasons that will keep the core business around for the long run.
I recently visited Scotland and on a visit to a distillery they mentioned they bought land in the US to grow trees that will make their barrels one day. The trees take over 100 years to grow (if I remember correctly). How is it we can invest ~200 years into a glass of scotch, yet we aren’t willing to take the same care and long term thinking in most other areas.
Even without being around for 1,000 years, I’d think doing this would de-stress and de-risk. Somewhere along way it became a bad thing to have a good, stable, long-lasting business. The only thing that seems to matter now is growth, even if they means instability, stress, excessive risk, and a short stay.
Humans don't live that long, and there's a constant onslaught of fleeting fancies, especially in business (Wikipedia foundation should buy some crypto for it's treasury!)
Tradition is simply brand value to be monetized for most businessmen (to add to your criticism). Just look at scotch whisky and multinational conglomerate acquisitions. They would never plant trees in America, they simply order giant vats used for the strongest PX Sherry to get maximum flavor per euro for their blending process.
[] Indifferent, spoiled rotten progeny seeking maximum return upon inheritance, selling distillery to Seagram's for an immediate gratification windfall fortune.
[] John Cooper VII, the last barrel maker, skill lost at retirement, or Master Cooper VIIth lost savings to Mister Market, or an expensive clandestine affair, extorts 16X per barrel.
[] Seagram's hires brilliant Bill Burr(Breaking Bad car wash business) to pose as EPA agent, threatening federal lawsuit for illegal violation of dumping distillery toxic runoff for past centuries, and/or white oak barrels cause cancer and the distillery has killed victims for hundreds of years.
[] A society grows fallow when old men plant trees whose shade they know they shall never sit in, while the president's daughter threatens to enforce federal white oak forest's toxic(false, but that's ok!) leaching prosecution if landowners do not immediately purchase a fortune of President altCoin.
In this painfully craven hostile world, Benedictine liqueur would seem to be more durable spirit than any Scotch.
The fluff is important to have a engaged super users. It is also important to get acceptance in certain circles.
It just seems like every wiki results in defensive mod cabals
It's not a bad strategy. I've looked at Wikimedia's financial statements and have no problem giving a small monthly amount to them considering how much value I get from the site.
I certainly prefer my money going to them than to Zuckerberg or Altman or MSFT shareholders.
If you look at revenue vs spend they are net positive by about 7mm last year.
No it wouldn't, because
> the money is still at risk (from a shareholder’s point of view) if something bad happens to the company (lawsuit or market problem).
retained earnings by definition are the accumulation of net incomes, and net income by definition is post tax
what went into the produce the retained earnings (profit) has been taxed
but the retained earnings themselves are not subject to additional taxation (with a few exceptions)
Having a war chest wasn’t going to help any of the retail companies, the technology companies who couldn’t pivot, etc. There is no reason for a company to live forever.
Typically, individuals want to pile up money so that they don't need outside income to keep going, and the shareholders of a "quarterly earnings" company will squeeze the entity to get it for themselves in the form of dividends or higher share-price.
It sounds weird, but this is better for shareholders and the economy (and companies can raise capital as needed down the line) than having all companies hold 3x the cash on the balance sheet.
The argument would be different for a foundation by wikipedia, albeit you still have problems between what the wikipedia management might want (high wages, little accountability) and everyone else.
But the management is aware that the shareholders can apply (direct or indirect) pressure for the money to be used in certain ways. Ultimately the shareholder can sue the management if they think the money is misused.
A poor comparison is how much money coca cola spends on advertisement, even though it is one of the best known brands in the entire world. And most of their advertisement is simply "This is our name, we exists", not even a value proposition or call to action.
If Wikimedia sets themselves up to pay for servers and maintenance for perpetuity, they will fall into obscurity.
With that being said, I also don't think they are spending their money in a good way.
Many more scoffed at that, saying those people were just stuck in their old ways and unable to adjust to the obviously superior new thing.
Is that you? AI applications are different than Wikipedia and are better in some ways: Coverage is much greater - you can get a detailed article on almost any topic. And if you have questions after reading a Wikipedia article, Wikipedia can't help you; the AI software can answer them. Also, it's a bit easier to find the information you want.
Personally, I'm with the first group, at the top if this comment. And now truth, accuracy, and epistemology, and public interest in those things, take another major hit in the post-truth era.
I know it’s completely normalized and the official name, but this has to be the most dangerous euphemism of our time.
It’s the era of lies.
But “era of lies” doesn’t sound nice because nobody wants to be a liar… so “post-truth” sounds better: “I'm telling the truth. Almost. But I'm not lying.”
What is that going to look like? How does one hedge against that eventuality?
Stephen Emmott ends his book Ten Billion with the line "Teach my son how to use a gun."
Obviously I have not read the book, but do you think it holds up in 2025?
The Roman Empire has been crumbling for 400 years, so it's likely that we won't experience the collapse of society as described in most history books either - life is too short for that. Unless a black swan comes along…
To answer your question, I like to come back to the book because it's written in the style of Dan Brown :) - short, punchy chapters. And it still makes sense (to me).
Also, biofuels based on most crops other than sugar cane, in addition to not being very helpful in the fight against AGW, triggered large price spikes and political turmoil in a dozen different countries at once. Perhaps you heard of this event, we call it the Arab Spring. We are still dealing with the fallout of it.
Does this mean that politics will finally get out of everything and give me back my sports that will be free of the constant political pandering?
It indicates that it's a follow-on to postmodernism. To a significant degree the post-truth era is built on a reactionary attack on postmodernism - you can see it on HN, where many people reflexively attack like a mob anything they perceive is postmodern. You can see it in so many people who will accept lies and disaster over postmodernism.
And post-truth is a postmodern term - ironic, ridiculing, makes you think, has some energy to it. How absurd to be literally //post-truth//.
> era of lies
That's a post-postmodern term. No irony or wit; a term of despair. :)
How? it's just postmodernism itself.
There is no truth because everything is relative. There is no singular, objective truth, facts are intrinsically bound to their context, hence post-truth.
As someone who grew up being de-programmed from Soviet propaganda by my parents every time I came home, starting from pre-school, I cannot even begin to communicate how allergic I am to this discourse. "There is no truth" is some grade-A bullshit to me. What's next? Maybe Stalin wasn't Hitler's military ally to start WWII? Maybe we live in a simulation? What's the point of anything? 1+1=3!
I could just be dumb, but my theoretical view from 30,000 feet, or 30,000 years in the future can be read here:
That's not postmodernism but a caricature of it by its critics.
Postmodernism cares deeply about truth. It is highly skeptical of power, bias, perception, etc. and provides tools to mitigate these risks to truth.
Post-truth is cares very little about truth and is especially non-skeptical, imho.
I think someone told you this and you believed it, but the track record of postmodernists in academia tells a different story.
I once saw a history lecture from a dean of the history department who had written of a book on WWII in the Philippines. This is one of the most heavily studied and well documented periods in human history. There are dozens of books and hundreds of papers on this topic.
My father-in-law, who was born there and lived through the war there, read the book. He called it historical fiction. The book cherry-picked facts badly, ignored events which countered the narrative of the book and in general made every effort to promote an entirely counter-factual narrative of events. In addition, the author had no real expertise in military or naval warfare and didn't seem to understand even the basics about how wars are fought.
When things like this happen on a regular basis, it is hard to say that postmodernism cares about objective truth in any way. In fact, they seem to actively dislike reality. This isn't a caricature, this is from the horse's mouth. If you want people to have a different perception of postmodernism, make events like this have some sort of penalty, because he's still the dean many years later.
Obviously you don't care about the truth but just fabricate bullshit about other people that's convenient to you. I know the topic well; you clearly don't.
Your father is one source; he no doubt has his experience and memory which are important to him. If you look at the sources of a scholarly history book, you'll see thousands of people - that's how history is done, by researching primary sources like your father. Of course not everyone will agree; people will have widely varying perspectives (which is something that postmodernism helps you understand and dealt with).
These cheap shots at me, scholarship, and postmodernism - which has its flaws, but not this stuff - are an insult to everyone's intelligence, including yours.
People with decades long scholarly careers write this shit (some even having the highest credentials), and people eat it up.
History as written by (especially US) historians is just racist boomer fanfiction (pushed as propaganda to enforce the national myth) for your own country. This is especially true of the US.
And their errors and not subtle by any means. When they write something that is so wrong about you that every man on the street who even has cursory familiarity with the subject would reject as not even wrong, so far from the truth it's clear to the observer that the one who made these claims doesn't even have basic familiarity with the subject.
You've countered this argument with an appeal to authority (his word against thousands of researchers) - how fortunate was a person like Galileo who could just make people look into a telescope and show his numerous and highly distinguished opponents that they we wrong - unfortunately no such thing can exist for history. The next best thing I could recommend to US scholars is to have their work reviewed by top and highly respected local scholars for obvious errors, not biases of overarching narrative but, basic shit and continuity errors that common man on the street wold laugh at.
How could this be? Do I believe Americans to be specially dumb? Just like Big Tobacco pushing studies on the health benefits of smoking for pregnant mothers, and Coca Cola delegitimizing the view that sugar is bad, US historians have a vested interest in propping up US imperialism - or are acting as reactionaries saying everything the US (or white people) has ever done was purely evil, what you have is a partisan shouting match (also called activism), that is the exact opposite of scholarly work.
If all of this sounds ridiculous that's another matter, if I actually wanted to cast shade upon them then gee, I'd just quote their stances on sex with minors.
Given that perspective, my thought was: "Hey Bob, look at these morons, they called easily proven lies 'post-truth!' Can you believe that? In a civilization based on science, with AI, nuclear, and biological weapons?! No wonder they died out right after this. How did they not see this coming? Anyway, what's for lunch?"
Also, LLMs don't produce truth. They don't have a concept of it. Or lies for that matter. If you are using LLMs do study something you know nothing about the information provided by them is as good as useless if you don't verify it with external sources written by a person. Wikipedia isn't perfect, nothing is, but I trust their model a shitload more then an LLM.
Where is Wikipedia without all the learning and information from other sources, many of which it put out of business?
> Also, LLMs don't produce truth. They don't have a concept of it. Or lies for that matter. If you are using LLMs do study something you know nothing about the information provided by them is as good as useless if you don't verify it with external sources written by a person. Wikipedia isn't perfect, nothing is, but I trust their model a shitload more then an LLM.
Wikipedia produces consensus that correlates with truth to some degree. LLMs produce statistical output, which in a way is a automated consensus of the LLM's input, that also correlates with truth to some degree - and the correlation is hardly zero.
I agree that information has no value if you don't know its accuracy; it's always a sticking point for me. IMHO Wikipedia has the same problem: I have no idea how accurate it is without verifying it with an external source (and when I've done that, I've often been disappointed).
Has anyone researched the relative accuracy of Wikipedia and LLMs?
Which businesses did Wikipedia put out of business? You will frequently see a 5k word article used for a couple of sentences in a Wikipedia page, with the entire Wikipedia page itself being smaller than one paper it cites for one small corner of said page. When I’m researching events, I frequently go to Wikipedia to find sources as search engines have a drastically larger recentism bias.
> Has anyone researched the relative accuracy of Wikipedia and LLMs
No comparative research on this specific topic has been conducted afaik, and most comparative research is aging (likely, to Wikipedia’s own detriment–general consensus is that Wikipedia’s reliability has increased over time). However at the time of research publication, the consensus seems to be that Wikipedia is generally only slightly less reliable than peers in a given field (ie textbooks or británica), although Wikipedia is often less in depth. The most frequently cited study is a 2005 comparison in Nature which found 4 major errors in both Wikipedia and Británica, and 130 minor errors on Británica whereas 160 on Wikipedia. All studies are documented on Wikipedia itself, see [[Reliability of Wikipedia]]. LLMs… do not have this same reputation.
Just as a start, other sources of reference, including encylopedias, dictionaries, websites, etc. For example, I'm sure it impacts McGraw-Hill's AccessScience, which likely you've never heard of.
> This is documented on Wikipedia itself
Maybe there's a little bias there? Would Wikipedia accept Wikipedia's analysis of its own reliability as a valid source?
I've heard that claim, but having no knowledge of the accuracy of any particular article, it's not worth very much to me.
> LLMs… do not have this same reputation.
They don't with you, but many people obviously use them that way. Also, reputation does not correlate strongly with reality.
This just seems like healthy competition. I thought we were talking about a situation where Wikipedia’s use of other encyclopedias is an instrument of their demise.
> Maybe there's a little bias there
Paradoxically, I suspect you’d be pleasantly surprised about how tough this article is on itself. A lot of attention is given to bias in this case.
> Would Wikipedia accept Wikipedia's analysis of its own reliability as a valid source?
First, it is not Wikipedia’s own analysis. Editors should not present their own conclusions from research, just what each paper says. See [[WP:SYNTH]]. Second, generally Wikipedia discourages anyone citing it as it is not a stable source of information. Much better is to use the sources the article itself conveniently cites inline. As a general policy citing any encyclopedia is discouraged.
> having no knowledge of the accuracy of any particular article, it's not worth very much to me.
Wikipedia does have internal metrics grading the quality of an article. [[WP:ASSESS]]. In general though, even entirely discounting the Wikipedia component of the británica comparison, based on británicas own failures it seems wise to verify each and every claim in an encyclopedia, which Wikipedia does an excellent job of helping you do.
> They don't with you, but many people obviously use them that way. Also, reputation does not correlate strongly with reality
OpenAIs own benchmarks show much higher hallucination rates than any study on Wikipedia. Wikipedia itself is quite close to a ban on LLMs for reliability issues. If you ask literally any layman “has ChatGPT ever been wrong for you” they will say yes, either in that moment or after only a little prompting. It is much harder to elicit such a response regarding Wikipedia in my experience
https://cdn.openai.com/pdf/2221c875-02dc-4789-800b-e7758f372...
You're sincerely claiming that people can't think of times they've seen vandalism on Wikipedia?
> This just seems like healthy competition. I thought we were talking about a situation where Wikipedia’s use of other encyclopedias is an instrument of their demise.
Somewhere above, someone complained that LLMs were harming Wikipedia, a source of its information. My point is that Wikipedia did the same to others.
> Just as a start, other sources of reference, including encylopedias, dictionaries, websites, etc. For example, I'm sure it impacts McGraw-Hill's AccessScience, which likely you've never heard of.
Your “for example” in response to a question about what businesses Wikipedia put out of business is a business that is...still in business?
The difference is humans have a concept of truth, humans have intent. A person, taking an aggregation of their research, expertise, and experience to produce an article is (presumably) trying to produce something factual. Other humans then come along, with similar intent, and verify it. Studies in the past have shown Wikipedia's accuracy rate is roughly on par with traditional encyclopedias, and more importantly sources are clearly documented. Making validation and further research fairly easy. And if something isn't sourced I know immediately it's more suspect.
LLMs have no concept of truth, they have no "intent". They just slap words down based on statistics. It is admittedly very impressive how good they are at doing that, but they don't produce truth in any meaningful way, it more a by product. On top of that all its sources get smashed together, making it much more difficult to verify the validity of any given claim. It's also unpredictable, so the exact same prompt could produce truth one time, and a hallucination another (a situation I have run into when it comes to engineering tasks). And worst of all. Not only will an LLM be wrong, but it will be confidently and persuasively wrong.
I've learned that when people don't have any merits to argue, they turn to ridicule. Right back at you buddy.
> The difference is humans have a concept of truth, humans have intent. A person, taking an aggregation of their research, expertise, and experience to produce an article is (presumably) trying to produce something factual.
It's pretty naive to think that humans have intent and motivation for the truth, and no others. Just look around you in the world - most communication disregards the truth either carelessly or incidentally (because they are motivated to believe or claim something else) or intentionally (lots of that).
> LLMs have no concept of truth, they have no "intent".
My calculator app has no intent or concept of truth, but outputs truth pretty reliably.
Calculators aren't a useful analogy for LLMs. They produce a deterministic output based on a (relatively) narrow range of inputs. The calculations to produce those outputs follow very rigid and well defined rules.
LLMs by their very nature are non-deterministic, and the inputs/outputs are far more complicated.
People produce false information for many reasons besides malice, and even when they intend to be honest.
Look how much bad information was out there before the advent of scientific method.
If you have something actually relavent to say you're welcome to say it.
> It's not an insult when it's true.
It's not slander, but it's certainly an insult. If you tell someone they are fat and ugly, it's an insult regardless of its truth and you shouldn't say it, ever. There's never a good reason for personal insults.
> it's true
> you're welcome to
This assumes your perspective is truth. That is the case for nobody in the world; in fact, I also have a perspective that I'm confident in, as do many others. Your statements also assume that, perhaps as the arbiter of truth, you have some authorization or power to enforce it. Again, that's nobody's business.
We're in a world of peers, generally speaking, and none of us know who is right. We need strategies to navigate that world, not the one where truth is given to you.
> you didn't give me anything to work with
When I feel like you do, it's a signal I need to listen better - the other person probably does have something to say and I'm missing it. It's possible we're talking past each other, but that's never a reason for insults.
(human intercourse)
Note that the signal is that I need to do something, not the other person. That's not because I'm 'wrong' or 'right' - those are mostly unknowable and irrelevant because 1) We're in a world of peers, generally speaking, and none of us know who is right. Also, 2) I'm the only one I can control and am responsible for, and ...
3) Respecting other people is always more important. That's a strategy for, and wisdom in, a world of uncertainty (as described), as opposed to a world of certainty. Also, it's a strategy for social creatures in social groups - it keeps groups strong and functioning. Finally, it's strategy for both loving and respecting yourself - you deserve it. You're better than insults, I'm sure; and I sometimes say the wrong thing, but I'm better than that too.
I guess this argument was supposed to convince me to stop being such a luddite and accept the inevitable future, but really, in an increasingly post-truth world, it made me want to go and get myself a stack of reference books.
But that's not how the world has worked out recently ...
Perhaps Wikimedia is only mentioning a drop in traffic as it suggests how www users may be gathering information, not because it is commercially significant, e.g., decline in audience for advertising
Indeed, it is okay for traffic to fall when a website is not trying sell out www users to advertisers. It does not mean the information offered by the website is any less valuable
Maybe not less valuable but definitely less relevant. I think both are important.
https://en.wikipedia.org/wiki/Value_of_information
In the world of so-called "tech" companies, who use public information found on the www as "bait" to attract potential ad targets for data collection, surveillance and sometimes programmatic, targeted advertising, the information may have a value to those companies, for that purpose. They might measure that value by how popular the information is amongst www users
But I'm referring to the value of information published on Wikipedia to www users, not so-called "tech" companies who seek to intermediate access to free, public information that they did not themselves produce
But it can also mean "popular"
https://en.wiktionary.org/wiki/relevant
"Popular" information, evidenced by web traffic for example, may be important for so-called "tech" companies seeking to profit from intermediating access to free, public information, i.e., acting as a middleman. The information may be valuable to the so-called "tech" company middleman for purposes of supporting data collection, surveillance and online advertising services
However, the so-called "tech" company does not determine the value of the information to others. For example, a small group of people may find information on a web page to be valuable for their individual purposes. The web page may receive little traffic. The amount of traffic does not determine the value of the information to the small group
"AI search" might cause a decline in active users, but I cannot see how "AI search" would cause a decline in Wikipedia editors
Many Wikipedia editors are responsible for low traffic pages. I would imagine they are motivated to maintain these pages due to interest in the subject matter not interest in web traffic
Falling traffic is a problem. As editors turn over and fewer new editors show up, Wikipedia will become harder to maintain. For example, these days, defending against AI slop vandalism is a real problem that needs real humans to tackle - even as AI tools become more useful at detecting low-level vandalism.
What did get reverted was a trivial [citation needed] fix, for a musician's page, for a sentence stating they were involved in scoring a film. I found a relevant citation and this was promptly reverted, for reasons that were explained but, at least for me, utterly incomprehensible
https://en.wikipedia.org/wiki/Wikipedia:Contentious_topics#L...
The entire world's knowledge has some controversial topics? Oh my! Burn it! Burn it all!!
The librarians of Alexandria would have killed for Wikipedia. It's easily our greatest digital achievement.
This is a hill on which I would happily fight to my metaphorical death, here and now. If you disagree, let's discuss please.
Wikipedia is so open, that they even have their own "controversial" section! Is that not the coolest thing ever?
The chip on my shoulder is that there is a concerted effort to destroy and discredit Wikipedia.
The accomplishment of Wikipedia is not just beating the Library of Alexandria by many orders of magnitude, but doing so while keeping moderation logs in the open as well.
Ask @dang, or anyone that has ever had anything to do with forum moderation, if they would be cool with their moderation logs being completely open. Almost everyone with experience would say 100% no. They likely tried that and saw how much nutso drama it creates. Wikipedia actually does that, at the largest possible scale!
[0] Of course that exists, apparently it's called Baidu Baike
It takes a certain mentality. That's rare but I think it makes for much better communities on the whole.
However I think most participants, not just moderators, don't like the environment that sort of mentality results in. When anything and everything, including the moderation itself, is up for civilized debate that tends to foster an environment in which it's acceptable to question core parts of people's worldviews. There's little shared doctrine beyond "argue any position you'd like" which most people seem to find intensely uncomfortable.
There is at least one exception to that rule. Users who attract the ArbCom's attention may get a general block. If they ask what they're blocked for, the ArbCom rep will tell them to read their email. These moderation decisions are not public, not even in a form with PII redacted.
https://en.wikipedia.org/wiki/Wikipedia:Oversight
There are also "office actions", where essentially the Wikimedia Foundation and/or its legal counsel have been required to do something. In most cases, the office actions are visible and logged, unless they've been required to use Oversight as well. But the main thing is that the office actions will generally not be explained to anyone, as it usually stems from some legal threat to Wikipedia.
Currently ROFL, given grokpedia or whatever objectively dumb shit to which we are now exposed. I should not have bitten my tongue. Self-censorship is the worst kind.
"In May 2024, it was announced that Yasuke would be a major character in an upcoming video game (Assassin's Creed Shadows). While onwiki disagreement about Yasuke's status as a samurai predates this announcement, the historical figure's Samurai status became part of a culture war around video games (J2UDY7r00CRjH evidence) that media sources have described as a continuation of or successor to Gamergate, leading to an increase in attention to the article. (Symphony Regalia evidence)"
* There aren't that many real historical accounts about him so people can argue all day about stuff like "Was Yasuke a real samurai?" without clear evidence about who's right.
But in the intervening years, Ubisoft and/or the entire AAA games industry is now seemingly driven by a need to conspicuously showcase diversity and inclusion (from some gamers' viewpoints). You could also view it as the gaming industry trying to broaden its audience and get out of the pigeonhole of catering to the base desires of sweaty manchildren, but either way, it's upsetting a certain type of game consumer.
So, when Ubisoft finally got around to setting Assassin's Creed in Japan, and they picked pretty much the only person around that time period who wasn't Japanese as the main protagonist, seemingly to meet diversity goals, capital-G Gamers went bananas over it, like it was a personal affront to them.
All making sense so far. But then:
Article title capitalization.
Somehow this is peak wikipedia.
* A claim on "Fisting" that "seasoned fisters can insert their arm up to the shoulder into the anus", "supported" by a deleted PornHub video.
* Fighting on a Production Car top ten list when Tesla announced that Ludicrous Mode was coming the next year and "expected" to have certain performance stats, where multiple editors fell over each other to make sure it stayed at the top of the list, even when they eventually had to add a column just for Tesla where every other result had "Actual Results" and the Tesla had "Projected/Expected Results".
* A collation of John Deere tractors that described multiple models as "light years ahead of the competition".
* An article on an Australian drug smuggler where exhibits from court case were being removed as "biased".
My statement was that the quality of Wikipedia overall is high, and that one of the reasons for that is because they set and enforce standards for contributions.
Certainly many people are put off by the process and will not have time to deal with it, but my belief would be that such cases are more likely on more controversial topics, and less likely for less controversial topics. Inherently, collaborating on difficult topics will be a difficult process, which also means that there are likely no easy answers for how to make this process not discourage anyone.
Is it clear what they should rather be - and are there any examples of mechanisms that have worked better at a scale like this? How are you judging that they are not what they are supposed to be?
If the resulting body of work, which is the totality of Wikipedia, is able to be a curated and high signal collection of knowledge as a result of these mechanisms, how can it be said that they are not working? Having forcing factors, even if they are not ideally aligned or executed, which pushes contributors to increase the quality of their edits to pass, seems overall like a good thing. I'm not saying that its processes and mechanisms cannot be improved, I'm saying I believe it is incorrect to say that they are not working as a whole.
> "Credible" sources and citations are exclusively up to the article moderators personal tastes which are very rarely objective.
Overall I believe Wikipedia to be curated by a large group of people which coordinate through various rules and consensus mechanisms, where I don't believe it is correct to state that sources and citations are exclusively up to any specific article moderators, as they need to be able to build consensus and co-exist with other moderation.
Exactly because Wikipedia is such a large body of work it seems more resistant to corruption to have a large number of curators with different tastes and motivations. How would you determine that their selection of sources and citations is very rarely objective - especially when objectiveness itself seems quite hard to agree upon for many of the topics covered?
From my perspective it seems far more important to consider the quality and value of the totality of Wikipedia, which is massive and signs that many things are working, rather than insisting that it is not working, especially in times where knowledge is being broadly attacked, and where Wikipedia is one of the targets.
An example though is that several historically relevant facts are edited out to favor some narrative.
E.g. Crimea, the Ukrainian autonomous republic that seceded/was annexed by Russia, tried to split 3 times from Ukraine, once even during the Soviet Union, but those events have been edited out on multiple pages to favor some western-centric narratives.
https://en.wikipedia.org/wiki/Republic_of_Crimea_(1992%E2%80...
That's in 1992, there was another occurrence in the previous decade.
Source: the Wikipedia founder himself..
https://www.dailysignal.com/2025/10/12/wikipedia-co-founder-...
And the real reason that editing Wikipedia is difficult, is because of the ideological bias of the moderators so they support only the editors to conform to their ideology and reject any edits that go against their ideology even if those edit are literal proven truths.
Source: I am an editor on Wikipedia and my own informative useful edits to some important topics on my country have been rejected, while the misinformative and even malicious edits by agents of the enemy nation on those same topics were allowed and continue to persist and mislead whoever reads those articles.
Wikipedia has become a weapon of misinformative propaganda, and it's not a tool or repository of useful accurate information. This is why Wikipedia is banned in schools and universities, because its information may not be credible and Wikipedia has long ago lost the integrity that any worldwide free information repository should have had.
This sounds really interesting, can you provide an example?
Ironically, the supposedly-censored Wikipedia has a lengthy article covering all of it including Sanger's accusations: https://en.wikipedia.org/wiki/Larry_Sanger
They can turn content disputes into conduct disputes and conduct disputes into social contests which are either shown on ANI or quietly adjudicated with an administrator block.
The content of Wikipedia is great. Its culture, not so much.
WHAT should the Wikimedia foundation invest in, that's viable for a thousand years?
That requires a Wall St/hedge fund and/or Buffett mindset.
The Wikimedia foundation is none of those, and they're not big enough to make even a ripple in the investment landscape.
it’s certainly doable but requires selfless dedication at the top
Money rules. There’s no ifs and buts about it.
If it doesn’t have a reasonable rate of return, nobody gives a shit.
Ironically, I was just listening to an interview with Jimmy Wales in which he said that, as individuals, most humans are basically good. They don't meet someone on the street and think "what is the rate of return on interacting with this thing?"
There is more and more of these "essential global infrastructure" projects, many of them non-profit, yet I'm not sure we're seeing a lot of investment from the globe into those projects.
You’re not seeing investment into these kinds of “essential global infrastructure “ projects, because there are NO such things.
We can’t even agree on what “global” means.
It's not even certain if Wikipedia itself can exist for such a long period, given fragility of technological civilisation and data storage.
https://www.usatoday.com/story/news/politics/2025/08/27/wiki...
If we're going to take the actions of a couple of low level dope legislatures as, "the US government" and a toothless investigation as "actively plotting against", sure.
Investments can fail, such as bank deposits (bank collapses), shares (company goes bankrupt), government bonds (government defaults), commodities (price fluctuations as they go in or out of favour.)
Theft can also occur, including by corrupt insiders (sometimes even legally, just by inflating their salaries to ridiculous levels.)
The chance of remaining intact for 1000 years seems very low.
> ... [its mission is...] to act as a permanent fund that can support in perpetuity the operations and activities of current and future Wikimedia projects, which are projects that are approved by and advance the purposes of the Foundation or its successor if the Foundation ceases to exist
[0] https://wikimediaendowment.org/
[1] https://upload.wikimedia.org/wikipedia/foundation/f/f6/Wikim...
By having a separate fund that the Wikimedia Foundation can access to help Wikipedia to have the technical expertise and knowledge workers required to continue the work of the Wikimedia Foundation.
Should the Wikimedia Foundation cease to exist, the funds in the endowment can be redirected to a successor.
EDIT: this is similar in style to the UK's Guardian Foundation, who provides funding to The Guardian newspaper. https://theguardianfoundation.org/
Surely, they can set up shop elsewhere on the globe?
Even the US isn’t 1000 years old. Who’s to say it will even exist in a remotely recognizable form in 1000 years.
The last 1000 years changed at a snails pace compared to the last 25 years.
It’s not Ragnar Lodbrok turning into ABBA. It’s a mostly peaceful peasant society getting tossed around by history until it became a mostly peaceful service economy - and the shaking never really stopped.
History is not a straight progress line. Ask the Romans.
Ask a medieval peasant and he wouldnt even be able to describe how his life is different from his father's.
Nitpick: The US Voting Rights Act became law in 1965 [2], which is more like 60 years ago. Not sure if that's what you were getting at with "basically an apartheid" but it was the closest concrete landmark near your 50 year timeline.
[1]: https://en.wikipedia.org/wiki/The_Economist_Democracy_Index
[2]: https://en.wikipedia.org/wiki/Voting_Rights_Act_of_1965
(since posting Wikipedia links seems appropriate.)
the article you linked in no way supports that sentence.
(there's just a committee looking into claims that edits are used for propaganda in general.)
665 ChatGPT-User
396 Bingbot
296 Googlebot
037 PerplexityBot
Fascinating.
About 80% of traffic to my sites (a few personal blogs and a community site) is from ai bots, search engine spiders or seo scrapers.
But at the same time I continue to contribute edits to Wikipedia. Because it's the source of so much data. To me, it doesn't matter if the information I contribute gets consumed on Wikipedia or consumed via LLM. Either way, it's helping people.
Wikipedia isn't going away, even if its website stops being the primary way most people get information from it.
Wikipedia gives away your creation for free. The LLM companies do not. Google is operating a loss leader and not "helping people." In fact, quite the opposite.
Wikipedia should still exist, but I'm fine giving OpenAI 20 bucks a month to make my life easier.
They LLM companies are so far. Just like Google has given search results away for free for decades now.
https://news.ycombinator.com/item?id=34106982
>2022
>It’s the dishonesty of Wikipedia that bothers me. The implication is that donations are urgently needed to keep the website running. In reality they have $300m in the bank and revenue is growing every year[0]. Even Wikipedia says only 43% of donations are used for site operations[1], and that includes all of their sites, not just Wikipedia. Fully 12% of the money they collect from you is. . . used to ask you for more money[1]
Many of them are sites that have built themselves without any original reporting. Where will they scrape the content they've used to grow if their sources take the same attitude?
TikTok replaces YouTube and Tumblr before that. It's not like people are watching it instead of reading history books or the New Yorker.
And just because you use TikTok doesn't mean you don't check actual news headlines or listen to two-hour-long podcasts.
gift link: https://www.wsj.com/health/wellness/anti-depressants-lifesty...
> “younger generations are seeking information on social video platforms rather than the open web.”
YouTube I'll buy it, but the search function in Instagram and tiktok is beyond useless!
People rightfully get upset about individual editors having specific agendas on Wikipedia and I get it. Often that is the case. But the chat interface for LLMs allows for a back and forth where you can force them to look past some text to get closer to a truth.
For my part, I think it's nice to be part of making that base substrate of human knowledge in an open way, and some kinds of fixes to Wikipedia articles are very easy. So what little I do, I'll keep doing. Makes me happy to help.
Some of the fruit is really low-hanging, take a look at this garbage someone added to an article:
https://en.wikipedia.org/w/index.php?title=Salvadoran_gang_c...
It's _kinda_ cheap. Wikipedia is so cheap you can fit it all on a phone and search it instantly.
I agree overall but LLMs are just so heavy. I don't know if most people can afford to run one locally, and they're lossy. Both on a phone would be great. I fret a lot about data ownership, you know
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5022797
https://media4.manhattan-institute.org/wp-content/uploads/is...
https://www.researchgate.net/publication/241750096_Is_Wikipe...
This is the kind of capitalistic behavior that is repugnant to our idea of how things should work.
This is not what the commons is for - taking the work of creators, repackaging it, and using platform capability to re-sell it.
At this point, I am coming around to the argument that governments should make their own local/national AI.
I think this is a terrible idea. How do current Chinese-built LLMs respond when asked about Tianamen Square? How would American-built LLMs response when asked about Israel/Palestine?
I can think of an infinite ways why this is bad.
The alternative is what we have now. I don’t think the current situation is worth leaving uncontested.
Seriously - we’re copying all the art, music, lit, data and converting it into subscription revenue for the biggest firms on the planet ?
Which are based in the US?
And the alternative is Chinese open models ? Really ? This is the future.
In that case the only other groups left to enter the fray are national governments.
Frankly, after seeing what I am in Asia, I suspect they are already planning to do that anyway.
The curious thing is that big LLM folks put together RAG systems which act much like wikipedia. But it's more than that. They built dictionaries, book repos(borderline illegal), news repos, and data knowledge base. These are bigger than wikipedia. Better because you dont have anonymous partisans.
Wikipedia is at a point where they have purged multiple perspectives and it has left an unreliable systemic bias in wikipedia. They are dealing with this problem and competitors are popping up because of these problems.
Larry put out his theses on his user page: https://en.wikipedia.org/wiki/User:Larry_Sanger/Nine_Theses
Originally being heavily censored, vandalized, and deleted. It does seem to have been allowed, despite it being in user space.
Every one of those theses is correct.
With ChatGPT or Google AI Mode, you get all answers directly in the chat. And you can even ask follow-up questions. There is no need to click on a link.
From the data I have seen, 40% of searches on Google used to lead to a click to another website. In ChatGPT and Google AI Mode, this number is lower than 5%. One study (with a small N) even came to the conclusion that the number is 0%.
The further aways ppl stay from it the better.
How could I trust them on things I do not know, if I know for a fact they are unrepentantly wrong about things I do know?
But I've always hated the way Wikipedia works.
And they actually need it.
https://en.wikipedia.org/wiki/User:Guy_Macon/Wikipedia_has_C...
The TL;DR is that Wikipedia has a massive spending problem, and if they brought it back to something reasonable, their existing war chest would last them for at least a decade, even with no additional funding.
No? That's because it doesn't happen anymore now that WMF has a reasonable budget.
Now it is again feeding on and regurgitating Wikipedia but again in a way that will end up destroying the thing it is summarizing. Aggregators are parasitic on the thing they derive information from.
We would just need a good system to a) keep humans in the loop and b) sort out bad actors.
https://en.wikipedia.org/wiki/Wikipedia:Top_25_Report
Most of them are trending public figures and media, which lines up with what I remembered from my prior Wikipedia analysis. So LLMs are likely replacing a lot of this.
Also I firmly believe the Wikipedia app is key to their sustained relevance. Users get forwarded to the app from web browsers with wiki links, this gets people in the dedicated interface.
from a design standpoint there need to be more avenues on each page inviting people to browse and explore, spend more time there.
Oh who says that? Bloomberg.
https://www.bloomberg.com/news/newsletters/2023-10-24/why-wi...
So not people just bloomberg... trash article. Probably just stealth donation marketing.
Don't get me wrong. Wikipedia is a GREAT resource that the world absolutely needs.
But it's a horribly boring "read" when it comes to human consumption.
It's an encyclopedia. A list of facts.
When I want to learn something about a particular topic, I rarely ever want to read a list of facts.
I couldn't possibly disagree more. The website I visit most often after Reddit and HN is Wikipedia, and I often spend hours just diving through article after article. Learning facts is fun.
It's just not fun reading them as a list of facts.
I am not a computer. I enjoy reading prose.
I must be reading a different Wikipedia, then, because all the articles I read are prose, rather than a bulleted list of facts as you claim.
wikipedia gets money to stay afloat, ai companies continue to get access to their huge human-curated knowledge graph.
[1] https://meta.wikimedia.org/wiki/Wikimedia_Foundation/Organiz...
[2] https://en.wikipedia.org/wiki/User:Guy_Macon/Wikipedia_has_C...
Let's put aside wikipedia being rotten with bureaucracy and obsession-driven bias, which is similar to stackoverflow preexisting flows before LLMs streamrolled.
Fact is, wikipedia is a human driven summarization engine of secondary sources, hopefully in a way that echos the sources consensus.
This is exactly what LLMs are best for, summarizing huge amount of text, and training can easily focus on high quality books and thus exceed wikipedia in quality.
It's enough to read an AI summary where the first line talks about the subject in hand, compared to wikipedia where the first line is the product of some petty argument about a political disagreement
Is your argument that Wikipedia rules are not highly tailored to suppress original research and rely on secondary sources
For the latter LLMs are better than wikipedia today, the former arguably current alignment is better at suppressing bias than current editors
They will habe there own kind of spam problem.
And wait what happens to their quality as soon as the AI companies habe to generate profit.
I doubt you can collect second source references at the same price Wikipedia does. People aren’t as eager to help a billion dollar company for free to make profit.
Then it will never exceed Wikipedia. There are no high quality books to many of Wikipedia‘s articles and even if who qualifies those books as as high quality.
And less traffic is not only a problem of Wikipedia but also other websites.
So AI is killing the source it feeds on.
This is the golden age of AI. The next age will be filled with less good sources, more AI generated content and sites that actively try to poison LLMs
LLMs cannot in real time find and read thousands of secondary sources. Especially not if some of these might have already disappeared or are not digitalized.
I can see a future where LLM labs a) donate to Wikipedia and b) contribute to it with agents that suggest edits and review facts.
LLMs lack the human nuance that a good Wikipedia article requires. Weighing quality sources and digesting them in the most useful way that a human would want and expect — that is very difficult for both humans and machines, and it is why Wikipedia as a whole is such a treasure: Because a community of editors take the time to tweak the articles and aim for perfection.
There are guidelines across all Wikipedia articles that make a good experience for the reader. We can’t even get the world’s greatest LLMs to follow a set of rules in a small conversation.
In my experience when using LLMs as a replacement for Wikipedia (learning about history), it is often of higher quality in niche topics and far less biased in political contentious areas
Wikipedia is the tabloid equivalent for scientific topics.
LLMs tend to be much more useful for niche topics, because they've most likely been trained directly on the source itself.
Not to mention all the sites that will pop up to poison the LLMs in their favor.
On top of that LLMs will either cost more or will get adds too.
Yes I do use the internet for “medical opinion” and information and seriously some of the falsehoods it’s provided…similarly anything related to construction. Steer clear.
With more AI tools mining existing knowledge and presenting it in increasingly accessible ways, I don’t think AI search fundamentally changes how information and knowledge are organized.
Of course, AI could reshape the organization of knowledge through areas like:
1. Fact-checking and sourcing
2. Drafting new pages
3. Editing and refining wording
…and more
---
Just like Wikipedia already has many bots running behind the scenes, if all these tasks were eventually handled by AI, there would still be things left for humans (or perhaps another AI) to decide:
1. When a fact has multiple perspectives, how should it be phrased to represent different viewpoints fairly?
> I still remember countless word battles on Wikipedia over this.
2. In the age of smartphones and social media, historical moments are documented not only by journalists or influencers but by thousands — even millions — of ordinary people. How should Wikipedia process and summarize such vast, distributed facts?
3. How do we properly incentivize contributors, whether human or AI?
> Wikipedia was born in an era when the Internet lacked reliable information, and building a shared, sustainable, independent knowledge base was a mission that resonated with its early contributors — traffic rewards came later.
4. And of course, geopolitics — Wikipedia must remain independent.
---
A bit of background: I once led a Chinese wiki product, but I eventually gave up on it — because almost no one cared why a wiki should exist beyond being just another searchable content platform.
There are many small tasks simple AI agents could handle.
For example: Go to every article in the English Wikipedia of a Spanish city. Check data like inhabitants, major, etc. and compare them to the Spanish Wikipedia article. (The assumption here is that the Spanish Wikipedia will be more up-to-date for Spanish cities than the English Wikipedia.) Double-check what is written in the Spanish article. Update the English article accordingly.
If such an agents is only allowed to create drafts, human editors could review them and we would get a lot of small updates in.
The Wikipedia version virtually doesn't mention the stoning of Hypatia or the burning of the library, except for a different smaller fire caused by a war during Julius Caesar.
This page certainly reads like Christian apology, where the almost total destruction of the library by Christian fanatics didn't happen.
For me, this heavy bias in basically every article is the reason Wikipedia traffic is falling.
It is definitely influenced by the book Cosmos by Carl Sagan. I feel I read it a hundred times.
Quote:
Cosmos does not make Hypatia’s death so much a religious issue as an anti-intellectual on, but the truth is that it was actually a political one. A second problem comes when Dr. Sagan links her death to the destruction of the Great Library. In fact, in the final episode of the original Cosmos, “Who Speaks for Earth”, Carl Sagan says, “The last remains of the library were destroyed within a year of Hypatia’s death.”
The problem with this is that the last remnant of the Library of Alexandria were almost certainly destroyed in 391, 24 years before Hypatia’s death, and most of the library was likely destroyed, by accident, centuries earlier.
It sounds strange, but we actually don’t have a very good idea of when the Library of Alexandria was destroyed. As best we can tell, much of it was burned unintentionally when a fire spread through the city during Julius Caesar’s invasion in 48 BC. While the majority of the library may have survived that war, it was almost certainly destroyed in the war between Emperor Aurelian and Queen Zenobia of Palmyra in the 270s AD. This also appears to have been unintentional, as a large part of the city was burned.
What little was left of the library was deliberately destroyed in 391, when Emperor Theodosius I banned Paganism. The remaining repository of books in Alexandria was destroyed along with the Pagan temple it was stored in. [Even this has no evidence as pointed out by Tim O'Neill in the blog comments]
I admire most of Dr. Sagan’s and Dr. Tyson work, but when they characterize Hypatia’s death and the burning of the Great Library as the deliberate (and linked) actions of an anti-intellectual mob, they are simply misrepresenting the history.
What's the next improvement over Wikipedia that changes things to a similar extent?
crmd•3mo ago
I always assumed the need for metastatic growth was limited to VC-backed and ad-revenue dependent companies.
sublinear•3mo ago
lwansbrough•3mo ago
undeveloper•3mo ago
KPGv2•3mo ago
qingcharles•3mo ago
And their costs are even increasing because while human viewers are decreasing they are getting hugged to death by AI scrapes.
johnnyanmac•3mo ago
For such purposes, I'd naively just setup some weekly job to download Wikipedia and then run a "scrape" on that. Even weekly may be overkill; a monthly snapshot may do more than enough.
yorwba•3mo ago
parpfish•3mo ago
skywal_l•3mo ago
At the time I though, well it's a bunch of hippies with a small budget, who can blame them? Now I learn that there is 600 of them with a budget in the hundreds of millions??
This is becoming another Mozilla foundation...
philipkglass•3mo ago
There are a bunch of mainly-compatible third party parsers in various languages. The best one I've found so far is Sweble but even it mishandles a small percentage of rare cases.
Gander5739•3mo ago
yorwba•3mo ago
cm2012•3mo ago
ethmarks•3mo ago
[1]: https://wikimediafoundation.org/wp-content/uploads/2025/04/W...
cm2012•3mo ago
bawolff•3mo ago
There is probably a lot to criticize, but you need to go deeper than "salaries" are bad. You need some of those to actually run the website.
ThrowawayTestr•3mo ago
ryan_lane•3mo ago
There's also the need to support the staff and volunteer developers, which includes wikimedia cloud services, git hosting, config management and orchestration, CI, community hosted tool/bot services, etc.
WMF has ~600 employees, and that's quite lean, for a service of their complexity.
crmd•3mo ago
AstroBen•3mo ago
Something tells me a person is way less likely to donate if they're consuming the content through an LLM middleman
crazygringo•3mo ago
But that means I'm still using it. Especially for more reference stuff like lists of episodes, filmographies, etc. As well as equations, math techniques, etc.
If you're the kind of person who donates to Wikipedia, you're probably still using it some even if less, and continue to recognize its importance. Possibly even more, as a kind of collaboratively-edited authority like Wikipedia only becomes more important as AI "slop" becomes more prevalent across blogs etc.
arbol•3mo ago
crazygringo•3mo ago
Does it matter if you see the banner 10 times or 100 times in a month?
arbol•3mo ago
khamidou•3mo ago
I doubt that they're getting "hugged to death" by AI scrapers.
Liquix•3mo ago
skeaker•3mo ago
1. I think their spending is a good thing. Charitable scholarships for kids and initiatives to have a more educated populous in general are things that I am happy to donate to.
2. As stated in the article, hosting is still a relatively simple expenditure compared to the rest of their operation. If Wikipedia really eats a huge loss, falling back to just hosting wouldn't be unrealistic, especially since the actual operations of Wikipedia are mostly volunteer run anyways. In the absolute worst case, their free data exports would lead to someone making a successor that can be moved to more or less seamlessly.
The only real argument in my eyes is that their donation campaigns can seem manipulative. I still think it's fine at the end of the day given that Wikipedia is a free service and donating at all is entirely optional.
mminer237•3mo ago
The second biggest line item is grants at $25 million, primarily for users to travel to meet up.
Then $10 million for legal fees, $7 million for Wikipedia-hosted travel.
I think it's pretty unethical to say you have to donate to keep Wikipedia running when you're practically paying for C-suite raises and politically-aligned contributors' vacations.
bawolff•3mo ago
Paying the travel for a bunch of highly active volunteer contributors to meet up ocassionally and hash out complex community issues pays massive dividends. It keeps the site moving forward. Its also pretty cheap when you consider how much free labour those volunteers provide.
Whenever people criticize wikimedia finances, i think they miss the forest for the trees. I actually think there is a lot to potentially crticize, but in my opinion everyone goes for the wrong things.
gtsop•3mo ago
Also, asking out of ignorance, what things need to move forward? I thought wikipedia is a solved problem, the only work i would expect it to need is maintenance work, security patches etc.
bawolff•3mo ago
I think criticism should be based on looking at what they were trying to accomplish by spending the money, was it a worthwhile thing to try and do and was the solution executed effectively.
Just saying they spent $X, X is a big number, it must be wasteful without considering the value that is attempting to be purchssed with that money is a bit meaningless.
> Also, asking out of ignorance, what things need to move forward? I thought wikipedia is a solved problem, the only work i would expect it to need is maintenance work, security patches etc.
I think the person who i was responding to was referring to volunteer travel not staff travel (which of course also happens but i believe would be a different budget line item). This would be mostly for people who write the articles but also for people who do moderation activity. In person meetings can help resolve intractable disputes, share best practises, figure out complex disagreements, build relationships. All the same reasons that real companies fly their staff to expensive offsites.
Software is never done, there are always going to be things that come up and things to be improved. Some of them may be worth it some not.
As an example, there are changes coming to how ip addresses are handled, especially for logged out users. Nobody is exactly saying why, but im 99% sure its GDPR compliance related. That is a big project due to some deeply held assumptions, and probably critical.
A more mid-tier example might be, last year WMF rolled out a (caching) server precense in Brazil. The goal was to reduce latency for South American users. Is that worth it? It was probably a fair bit of money. If WMF was broke it wouldn't be, but given they do have some money, it seems like a reasonable improvement to me. Reasonable minds could probably disagree of course.
And an example of stupid projects might be WMF's ill-fated attempt at making an AI summarizer. That was a pure waste of money.
I guess my point it, WMF is a pretty big entity, some of the things they do are good, some are stupid, and i think people should criticize the projects they embark on rather than the big sum of money taken out of context.
bawolff•3mo ago
If you use AWS, the people hired to manage the servers is part of the price tag. When you own your own you have to actually hire those people.
khamidou•3mo ago
ryan_lane•3mo ago
You're also ignoring the need for infrastructure/network engineers, software engineers, fundraising engineers, product managers, community managers, managers, HR, legal, finance/accounting, fundraisers, etc.
DeusExMachina•3mo ago
Rebelgecko•3mo ago
ryan_lane•3mo ago
WatchDog•3mo ago
_def•3mo ago
Aurornis•3mo ago
Wikipedia is not getting hugged to death by AI scrapers.
The source letter shows a relatively small portion of traffic was reclassified as bot traffic.
They get a lot of page views globally. It’s a popular website. The bot traffic is not crushing their servers.
intended•3mo ago
It means that now, people are paying for their AI subscriptions, while they don’t see Wikipedia at all.
The primary source is being intermediated - which is the opposite of what the net was supposed to achieve.
This is the piracy argument, except this time its not little old ladies doing it, but massive for profit firms.
rkomorn•3mo ago
busymom0•3mo ago
> Sony tells SCOTUS that people accused of piracy aren’t “innocent grandmothers”
https://arstechnica.com/tech-policy/2025/10/sony-tells-scotu...
rkomorn•3mo ago
crazygringo•3mo ago
Most people are not paying a cent. And the people that are, are paying for stuff like coding assistance or classification, not the kind of info you get on Wikipedia.
Looking up Wikipedia-style information on LLM's is not a driving factor in paid subscriptions to ChatGPT etc.
thehappypm•3mo ago
IshKebab•3mo ago
No really, it was in the news a few years ago but nothing changed as far as I know.