Generative AI and Wikipedia editing: What we learned in 2025

https://wikiedu.org/blog/2026/01/29/generative-ai-and-wikipedia-editing-what-we-learned-in-2025/

97•ColinWright•6h ago

Comments

ColinWright•6h ago

The title I've chosen here is carefully selected to highlight one of the main points. It comes (lightly edited for length) from this paragraph:

Far more insidious, however, was something else we discovered:

More than two-thirds of these articles failed verification.

That means the article contained a plausible-sounding sentence, cited to a real, relevant-sounding source. But when you read the source it’s cited to, the information on Wikipedia does not exist in that specific source. When a claim fails verification, it’s impossible to tell whether the information is true or not. For most of the articles Pangram flagged as written by GenAI, nearly every cited sentence in the article failed verification.

dang•5h ago

Submitted title was "For most flagged articles, nearly every cited sentence failed verification".

I agree, that's interesting, and you've aptly expressed it in your comment here.

the_fall•4h ago

FWIW, this is a fairly common problem on Wikipedia in political articles, predating AI. I encourage you to give it a try and verify some citations. A lot of them turn out to be more or less bogus.

I'm not saying that AI isn't making it worse, but bad-faith editing is commonplace when it comes to hot-button topics.

mjburgess•3h ago

Any articles where newspapers are the main source are basically just propaganda. An encyclopaedia should not be in the business of laundering yellow journalism into what is supposed to be a tertiary resource. If they banned this practice, that would immediately deal with this issue.

mmooss•3h ago

A blanket dimsissal is a simple way to avoid dealing with complexity, here both in understanding the problem and forming solutions. Obviously not all newspapers are propaganda and at the same time not all can be trusted; not everything in the same newspaper or any other news source is of the same accuracy; nothing is completely trustworthy or completely untrustworthy.

I think accepting that gets us to the starting line. Then we need to apply a lot of critical thought to sometimes difficult judgments.

IMHO quality newspapers do an excellent job - generally better than any other category of source on current affairs, but far from perfect. I remember a recent article for which they intervied over 100 people, got ahold of secret documents, read thousands of pages, consulted experts .... That's not a blog post or Twitter take, or even a HN comment :), but we still need to examine it critically to find the value and the flaws.

abacadaba•2h ago

> Obviously not all newspapers are propaganda

citation needed

tbossanova•1h ago

There is literally no source without bias. You just need to consider whether you think a sources biases are reasonable or not

the_fall•3h ago

That's not what I'm saying. I mean citations that aren't citations: a "source" that doesn't discuss the topic at all or makes a different claim.

snigsnog•2h ago

That is probably 95% of wikipedia articles. Their goal is to create a record of what journalists consider to be true.

chr15m•1h ago

People here are claiming that this is true of humans as well. Apart from the fact that bad content can be generated much faster with LLMs, what's your feeling about that criticism? It's there any measure of how many submissions before LLMs make unsubstantiated claims?

Thank you for publishing this work. Very useful reminder to verify sources ourselves!

chrisjj•5h ago

So, a small proportion of articles were detected as bot-written, and a large proportion of those failed validation.

What if in fact a large proportion of articles were bot-written, but only the unverifiable ones were bad enough to be detected?

EdwardDiego•5h ago

Human editors, I suspect, would pick up the "tells" of generated text, although as we know, there's a lot of false positives in that space.

But it looks like Pangram is a text classifying NN trained using a technique where they get a human to write a body of text on a subject, and then get various LLMs to write a body of text on the same subject, which strikes me as a good way to approach the problem. Not that I'm in anyway qualified to properly understand ML.

More details here: https://arxiv.org/pdf/2402.14873

candiddevmike•4h ago

I feel like this is such a tragedy of the commons for the LLM providers. Wikipedia probably makes up a huge bulk of their dataset, why taint it? Would be interesting if there was some kind of "you shall not use our platform on Wikipedia" stance adopted.

ohyoutravel•4h ago

I don’t think it’s the providers doing this, it’s the awful users. They’re doing the same thing on GitHub. It’s maddening.

MattGaiser•4h ago

It would be random individuals.

kingstnap•2h ago

Wikipedia having incorrect citations is way older than LLMs. As many other people have pointed out in this thread, if you start pulling strings a lot of what people write starts falling apart.

Its not even unique to Wikipedia. Its really not difficult to find very misleading statements cited through a citation that doesn't even support the claim when you check the original.

acdha•2h ago

This is like saying handing out machine guns is no big change because people have been shooting arrows for a long time. At some point volume becomes the story once it overwhelms the community’s ability to correct errors.

simianwords•4h ago

I find it very interesting that the main competitor to Wikipedia which is Grokipedia is taking a 180 degree approach being AI first.

ktzar•4h ago

Didn't know about Grokipedia, I've just opened an article in it about Spain, scrolled to a random paragraph, and the information in it is plain wrong:

From https://grokipedia.com/page/Spain#terrain-and-landforms > Spain's peninsular terrain is dominated by the Meseta Central, a vast interior plateau covering about two-thirds of the country's land area, with elevations ranging from 610 to 760 meters and averaging around 660 meters

Segovia is at 1.000 meters, and so is most of the top half of the "Meseta". https://en-gb.topographic-map.com/map-763q/Spain/?center=41....

I still stand on not trusting any of what AI spits out, be it code or text. And it takes me usually longer to check that everything is ok than doing it myself, but my brain is enticed by the "effort shortcut" that AI promised.

charcircuit•4h ago

Grok does cite that claim as being from https://countrystudies.us/spain/30.htm a page in Eric Solsten and Sandra W. Meditz, editors. Spain: A Country Study. Washington: GPO for the Library of Congress, 1988.

The nice thing about grokipedia is that if you have counter examples like that you can provide it as evidence to change it and it will rewrite the article to be more clear.

malfist•48m ago

You know what other site you can provide evidence to and change to be more correct?

nl•1h ago

I'm not an expert on the geography of Spain, and it's rare that I'd defend Grokipedia but in this case I think it is correct.

Meseta Central mean central tableland. Segovia is on the edge of the mountain range that surrounds that tableland, but often referred to as part of it. This is fuzzy though.

Wikipedia says: The Meseta Central (lit. 'central tableland', sometimes referred to in English as Inner Plateau) is one of the basic geographical units of the Iberian Peninsula. It consists of a plateau covering a large part of the latter's interior.[1]

Looking at the map you linked the flat part is between 610 to 760 meters.

Finally, when speaking about the Iberian Peninsula Wikipedia itself includes this:

> "About three quarters of that rough octagon is the Meseta Central, a vast plateau ranging from 610 to 760 m in altitude."[2]

[1] https://en.wikipedia.org/wiki/Meseta_Central

[2] https://en.wikipedia.org/wiki/Iberian_Peninsula

Sharlin•2h ago

Main competitor? I’m pretty sure that Uncyclopedia is a more relevant competitor to Wikipedia than Grokipedia. Likely more accurate, too.

bawolff•2h ago

> I find it very interesting that the main competitor to Wikipedia which is Grokipedia

Encyclopedia Britannica (the website not the printed book) is the main competitor to Wikipedia and gets an order of magnitude more traffic than grokipedia. Right now grokipedia is the new kid on the block. It has yet to be seen if its just a novelty or if it has staying power but either way it still has a ways to go before its Wikipedia's primary competitor.

crazygringo•4h ago

> That means the article contained a plausible-sounding sentence, cited to a real, relevant-sounding source. But when you read the source it’s cited to, the information on Wikipedia does not exist in that specific source. When a claim fails verification, it’s impossible to tell whether the information is true or not.

This has been a rampant problem on Wikipedia always. I can't seem to find any indicator that this has increased recently? Because they're only even investigating articles flagged as potentially AI. So what's the control baseline rate here?

Applying correct citations is actually really hard work, even when you know the material thoroughly. I just assume people write stuff they know from their field, then mostly look to add the minimum number of plausible citations after the fact, and then most people never check them, and everyone seems to just accept it's better than nothing. But I also suppose it depends on how niche the page is, and which field it's in.

crabmusket•3h ago

There was a fun example of this that happened live during a recent episode of the Changelog[1]. The hosts noted that they were incorrectly described as being "from GitHub" with a link to an episode of their podcast which didn't substantiate that claim. Their guest fixed the citation as they recorded[2].

[1]: https://changelog.com/podcast/668#transcript-265

[2]: https://en.wikipedia.org/w/index.php?title=Eugen_Rochko&diff...

chr15m•1h ago

How did they know it was not LLM generated?

mmooss•3h ago

When I've checked Wikipedia citations I've found so much brazen deception - citations that obviously don't support the claim - that I don't have confidence in Wikipedia.

> Applying correct citations is actually really hard work, even when you know the material thoroughly.

Why do you find it hard? Scholarly references can be sources for fundamental claims, review articles are a big help too.

Also, I tend to add things to Wikipedia or other wikis when I come across something valuable rather than writing something and then trying to find a source (which also is problematic for other reasons). A good thing about crowd-sourcing is that you don't have to write the article all yourself or all at once; it can be very iterative and therefore efficient.

crazygringo•2h ago

It's not that I personally find it hard.

It's more like, a lot of stuff in Wikipedia articles is somewhat "general" knowledge in a given field, where it's not always exactly obvious how to cite it, because it's not something any specific person gets credit for "inventing". Like, if there's a particular theorem then sure you cite who came up with it, or the main graduate-level textbook it's taught in. But often it's just a particular technique or fact that just kind of "exists" in tons of places but there's no obvious single place to cite it from.

So it actually takes some work to find a good reference. Like you say, review articles can be a good source, survey articles or books. But it can take a surprising amount of effort to track down a place that actually says the exact thing. I literally just last week was helping a professor (leader in their field!) try to find a citation during peer review for their paper for an "obvious fact" in the field, that was in their introduction section. It was actually really challenging, like trying to produce a citation for "the sky is blue".

I remember, years ago, creating a Wikipedia article for a particular type of food in a particular country. You can buy it at literally every supermarket there. How the heck do you cite the food and facts about it? It just... is. Like... websites for manufacturers of the food aren't really citations. But nobody's describing the food in academic survey articles either. You're not going to link to Allrecipes. What do you do? It's not always obvious.

gonzobonzo•2h ago

The problems I've run into is both people giving fake citations (the citations don't actually justify the claim that's being made in the article), and people giving real citations, but if you dig into the source you realize it's coming from a crank.

It's a big blind spot among the editors as well. When this problem was brought up here in the past, with people saying that claims on Wikipedia shouldn't be believed unless people verify the sources themselves, several Wikipedia editors came in and said this wasn't a problem and Wikipedia was trustworthy.

It's hard to see it getting fixed when so many don't see it as an issue. And framing it as a non-issue misleads users about the accuracy of the site.

chr15m•1h ago

LLMs can add unsubstantiated conclusions at a far higher rate than humans working without LLMs.

EA-3167•1h ago

At some point you're forced to either believe that people have never heard of the concept of a force multiplier, or to return to Upton Sinclair's observation about getting people to believe in things that hurt their bottom line.

DrewADesign•1h ago

I don’t see why people keep blaming cars for road safety problems; people got into buggy crashes for centuries before automobiles even existed

nullsanity•50m ago

Because a difference in scale can become a difference in category. A handful of buggy crashes can be reduced to operator error, but as the car becomes widely adopted and analysis matures, it becomes clear that the fundamental design of the machine and its available use cases has fundamental flaws that cause a higher rate of operator error than desired. Therefore, cars are redesigned to be safer, laws and regulations are put in place, license systems are issued, and traffic calming and road design is considered.

Hope that helps you understand.

wry_durian•4h ago

Note that this article is only about edits made through the Wiki Edu program, which partners with universities and academics to have students edit Wikipedia on course-related topics. It's not about Wikipedia writ large!

arjie•3h ago

This happens a lot on Wikipedia. I'm not sure why, but it does and you can see its traces through the Internet as people post the mistaken information around.

One that took me a little work to fix was pointed out by someone on Twitter: https://x.com/Almost_Sure/status/1901112689138536903

When I found the source, the twitter poster was correct! Someone had decided to translate "A hundred years ago, people would have considered this an outrage. But now..." as "this function is an outrage" which honestly is ironically an outrageous translation. What the hell dude.

But it takes a lot of work to clean up stuff like that! https://en.wikipedia.org/w/index.php?title=Weierstrass_funct...

I had to go find the actual source (not the other 'sources' that repeated off Wikipedia or each other) and then make sure it was correct before dealing with it. A lie can travel halfway around the world...

ks2048•2h ago

Was this long wall of text written by AI?

ragesoss•59m ago

lol. would have written something shorter for HN, but the main expected audience for it was Wikipedians.

asyncadventure•2h ago

The systematic verification failure reveals something deeper: we've built an information ecosystem where plausible-sounding sources carry more weight than actual verification. The real tragedy isn't just AI content farms polluting Wikipedia, but how this exposes our existing blind spots in citation validation - many human editors were already citing sources they never properly read.

HPsquared•2h ago

This goes much further than Wikipedia, it's just particularly visible there.

gwern•1h ago

Thanks for the LLM comment, but that's dumb. If the problem really was as bad with humans (it obviously is not), then OP wouldn't've happened:

> For most of the articles Pangram flagged as written by GenAI, nearly every cited sentence in the article failed verification.

chr15m•1h ago

Agree. I'm curious about the human contribution baseline.

throwaway5465•2h ago

There seems much defensiveness in the comments here along the lines of "not a new thing" and "not unique to LLM/AI".

It seems to deflect, even gaslight TFA.

> For most of the articles Pangram flagged as written by GenAI, nearly every cited sentence in the article failed verification.

So why deflect that into convenient other pedantry (surely not under the guise tech forums often do so)?

WSo why the discomfort for part of HN at an assertion AI is being used for nefarious purposes and creation of alternate 'truths'?

emp17344•2h ago

Astroturfing or marketing, I’d guess. I’ve noticed you’re no longer allowed to say negative things about AI here without significant pushback, and I’d bet this isn’t an organic shift in perception.

malfist•49m ago

I've found that generally people reserve down votes for posts that don't add to the conversation, in general, just like we're supposed to do. Its always been down vote city if you happen to criticize political positions that benefit libertarian technologists. But lately anything critical of AI tends to get a lot of down votes. Even on older posts that you can't find on the front page anymore... It feels inorganic

malfist•53m ago

There sure are a lot of green names on this post pushing that agenda. Makes you wonder if its astroturfing. And why its nessecary, is AI so fragile it can't let any criticism stand unchallenged?

Start (Vibe) Coding Fast

Show HN: ShotOne – Screenshot API with built-in playground for quick testing

Free Online Guitar Tuner: No download required, works on any device

Apple Hooks Fifty Thousand Methods [video]

The (AI) Nature of the Firm

PyInfra: Infrastructure Deserves Real Code in Python, Not YAML Soup

China's 'gold fever' sparks US$1B scandal as trading platform collapses

Gemini 3 Pro on AI Studio has been capped at 10 uses per day

SpacemiT K3 RISC-V AI CPU launch event [video]

Scalable Power Sampling: Training-Free Reasoning for LLMs via Distrib Sharpening

'Spy Sheikh' Bought Secret Stake in Trump Company for Access to USA AI Chips

I dropped my Google Pixel 9 XL Pro from 6th floor balcony to the street

Tangible Media: A Historical Collection of Information Storage Technology

Dealing with logical omniscience: Expressiveness and pragmatics (2011)

Technical interviews are broken. I built a tool that proves it

What the US TikTok takeover is revealing about new forms of censorship

Show HN: OpenJuris – AI legal research with citations from primary sources

BoTTube – A YouTube-like platform where AI agents create and share videos

ChatGPT is pulling answers from Elon Musk's Grokipedia

AI chatbots like ChatGPT are using info from Elon Musk's Grokipedia

The Disconnected Git Workflow

Ex-Googler nailed for stealing AI secrets for Chinese startups

Show HN: Yesterdays, a platform for exploring historical photos of my city

Apple-1 Computer Prototype Board #0 sold for $2.75M

Show HN: Inverting Agent Model (App as Clients, Chat as Server and Reflection)

IP

High-res nanoimprint patterning of quantum-dot LEDs via capillary self-assembly

Pre-Steal This Book

Aasha – and the Royal Game of Ur

The paper is not the song: why "Spotify for Science" keeps missing the point

Start (Vibe) Coding Fast

Show HN: ShotOne – Screenshot API with built-in playground for quick testing

Free Online Guitar Tuner: No download required, works on any device

Apple Hooks Fifty Thousand Methods [video]

The (AI) Nature of the Firm

PyInfra: Infrastructure Deserves Real Code in Python, Not YAML Soup

China's 'gold fever' sparks US$1B scandal as trading platform collapses

Gemini 3 Pro on AI Studio has been capped at 10 uses per day

SpacemiT K3 RISC-V AI CPU launch event [video]

Scalable Power Sampling: Training-Free Reasoning for LLMs via Distrib Sharpening

'Spy Sheikh' Bought Secret Stake in Trump Company for Access to USA AI Chips

I dropped my Google Pixel 9 XL Pro from 6th floor balcony to the street

Tangible Media: A Historical Collection of Information Storage Technology

Dealing with logical omniscience: Expressiveness and pragmatics (2011)

Technical interviews are broken. I built a tool that proves it

What the US TikTok takeover is revealing about new forms of censorship

Show HN: OpenJuris – AI legal research with citations from primary sources

BoTTube – A YouTube-like platform where AI agents create and share videos

ChatGPT is pulling answers from Elon Musk's Grokipedia

AI chatbots like ChatGPT are using info from Elon Musk's Grokipedia

The Disconnected Git Workflow

Ex-Googler nailed for stealing AI secrets for Chinese startups

Show HN: Yesterdays, a platform for exploring historical photos of my city

Apple-1 Computer Prototype Board #0 sold for $2.75M

Show HN: Inverting Agent Model (App as Clients, Chat as Server and Reflection)

IP

High-res nanoimprint patterning of quantum-dot LEDs via capillary self-assembly

Pre-Steal This Book

Aasha – and the Royal Game of Ur

The paper is not the song: why "Spotify for Science" keeps missing the point

Generative AI and Wikipedia editing: What we learned in 2025

Comments