frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Wikipedia: WikiProject AI Cleanup

https://en.wikipedia.org/wiki/Wikipedia:WikiProject_AI_Cleanup
66•thinkingemote•1h ago

Comments

maxbaines•1h ago
This is hardly surprising given - New partnerships with tech companies support Wikipedia’s sustainability. Which relies on Human content.

https://wikimediafoundation.org/news/2026/01/15/wikipedia-ce...

jraph•1h ago
I agree with the dig, although it's worth mentioning that this AI Cleanup page's first version was written on the 4th of December 2023.
Antibabelic•1h ago
I found the page Wikipedia:Signs of AI Writing[1] very interesting and informative. It goes into a lot more detail than the typical "em-dashes" heuristic.

[1]: https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing

jcattle•40m ago
An interesting observation from that page:

"Thus the highly specific "inventor of the first train-coupling device" might become "a revolutionary titan of industry." It is like shouting louder and louder that a portrait shows a uniquely important person, while the portrait itself is fading from a sharp photograph into a blurry, generic sketch. The subject becomes simultaneously less specific and more exaggerated."

eurekin•35m ago
That's actually putting into words, what I couldn't, but felt similar. Spectacular quote
jcattle•19m ago
I'm thinking quite a bit about this at the moment in the context of foundational models and their inherent (?) regression to the mean.

Recently there has been a big push into geospatial foundation models (e.g. Google AlphaEarth, IBM Terramind, Clay).

These take in vast amounts of satellite data and with the usual Autoencoder architecture try and build embedding spaces which contain meaningful semantic features.

The issue at the moment is that in the benchmark suites (https://github.com/VMarsocci/pangaea-bench), only a few of these foundation models have recently started to surpass the basic U-Net in some of the tasks.

There's also an observation by one of the authors of the Major-TOM model, which also provides satellite input data to train models, that the scale rule does not seem to hold for geospatial foundation models, in that more data does not seem to result in better models.

My (completely unsupported) theory on why that is, is that unlike writing or coding, in satellite data you are often looking for the needle in the haystack. You do not want what has been done thousands of times before and was proven to work. Segmenting out forests and water? Sure, easy. These models have seen millions of examples of forests and water. But most often we are interested in things that are much, much rarer. Flooding, Wildfire, Earthquakes, Landslides, Destroyed buildings, new Airstrips in the Amazon, etc. etc.. But as I see it, the currently used frameworks do not support that very well.

But I'd be curious how others see this, who might be more knowledgeable in the area.

robertjwebb•29m ago
The funny thing about this is that this also appears in bad human writing. We would be better off if vague statements like this were eliminated altogether, or replaced with less fantastical but verifiable statements. If this means that nothing of the article is left then we have killed two birds with one stone.
bspammer•28m ago
That sounds like Flanderization to me https://en.wikipedia.org/wiki/Flanderization

From my experience with LLMs that's a great observation.

embedding-shape•26m ago
I think that's a general guideline to identify "propaganda", regardless of the source. I've seen people in person write such statements with their own hands/fingers, and I know many people who speak like that (shockingly, most of them are in management).

Lots of those points seems to get into the same idea which seems like a good balance. It's the language itself that is problematic, not how the text itself came to be, so makes sense to 100% target what language the text is.

Hopefully those guidelines make all text on Wikipedia better, not just LLM produced ones, because they seem like generally good guidelines even outside the context of LLMs.

andrepd•15m ago
Outstanding. Praise wikipedia, despite any shortcomings wow, isn't it such a breath of fresh air in the world of 2026.
paradite•12m ago
Ironically this is a goldmine for AI labs and AI writer startups to do RL and fine-tuning.
KolmogorovComp•1h ago
I wish they also spent on the reverse: automatic rephrasing of the (many) obscure and very poorly worded and/or with no neutral tone whatsoever.

And I say that as a general Wikipedia fan.

philipwhiuk•1h ago
WP:BOLD and start your own project to do it.
progbits•55m ago
The Sanderson wiki [1] has a time-travel feature where you read a snapshot just before a publication of a book, ensuring no spoilers.

I would like a similar pre-LLM Wikipedia snapshot. Sometimes I would prefer potentially stale or incomplete info rather than have to wade through slop.

1: https://coppermind.net/wiki/Coppermind:Welcome

Antibabelic•50m ago
But you can already view the past version of any page on Wikipedia. Go to the page you want to read, click "View history" and select any revision before 2023.
progbits•49m ago
I know but it's not as convenient if you have to keep scrolling through revisions.
weli•39m ago
I don't see how this is going to work. 'It sounds like AI' is not a good metric whatsoever to remove content.
ramon156•32m ago
This is about wiping unsourced and fake AI generated content, which can be confirmed by checking if the sources are valid
csande17•26m ago
Wikipedia agrees: https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing#...

That's why they're cataloging specific traits that are common in AI-generated text, and only deleting if it either contains very obvious indicators that could never legitimately appear in a real article ("Absolutely! Here is an article written in the style of Wikipedia:") or violates other policies (like missing or incorrect citations).

embedding-shape•25m ago
If that's your takeaway, you need to read the submission again, because that's not what they're suggesting or doing.
feverzsj•21m ago
Didn't they just sells access to all the AI giants?

Radboud University selects Fairphone as standard smartphone for employees

https://www.ru.nl/en/staff/news/radboud-university-selects-fairphone-as-standard-smartphone-for-e...
155•ardentsword•3h ago•80 comments

Show HN: Kacet – a freelancer marketplace with crypto-native payments

https://kacet.com/
13•wrux•1h ago•12 comments

A decentralized peer-to-peer messaging application that operates over Bluetooth

https://bitchat.free/
226•no_creativity_•4h ago•136 comments

Nepal's Mountainside Teahouses Elevate the Experience for Trekkers

https://www.smithsonianmag.com/travel/nepal-mountainside-teahouses-elevate-experience-trekkers-he...
20•bookofjoe•4d ago•0 comments

Gaussian Splatting – A$AP Rocky "Helicopter" music video

https://radiancefields.com/a-ap-rocky-releases-helicopter-music-video-featuring-gaussian-splatting
653•ChrisArchitect•18h ago•212 comments

Wikipedia: WikiProject AI Cleanup

https://en.wikipedia.org/wiki/Wikipedia:WikiProject_AI_Cleanup
70•thinkingemote•1h ago•21 comments

Provide agents with automated feedback

https://banay.me/dont-waste-your-backpressure/
139•ghuntley•2d ago•68 comments

Dead Internet Theory

https://kudmitry.com/articles/dead-internet-theory/
367•skwee357•15h ago•457 comments

Flux 2 Klein pure C inference

https://github.com/antirez/flux2.c
354•antirez•18h ago•120 comments

Show HN: I quit coding years ago. AI brought me back

https://calquio.com/finance/compound-interest
156•ivcatcher•11h ago•193 comments

A Social Filesystem

https://overreacted.io/a-social-filesystem/
430•icy•1d ago•186 comments

Fil-Qt: A Qt Base build with Fil-C experience

https://git.qt.io/cradam/fil-qt
114•pjmlp•2d ago•65 comments

Gladys West's vital contributions to GPS technology

https://en.wikipedia.org/wiki/Gladys_West
30•hackernj•2d ago•2 comments

The Code-Only Agent

https://rijnard.com/blog/the-code-only-agent
88•emersonmacro•9h ago•38 comments

AVX-512: First Impressions on Performance and Programmability

https://shihab-shahriar.github.io//blog/2026/AVX-512-First-Impressions-on-Performance-and-Program...
77•shihab•5d ago•29 comments

Fluid Gears Rotate Without Teeth

https://phys.org/news/2026-01-fluid-gears-rotate-teeth-mechanical.html
9•vlachen•4d ago•24 comments

Gas Town Decoded

https://www.alilleybrinker.com/mini/gas-town-decoded/
147•alilleybrinker•4d ago•128 comments

Two Concepts of Intelligence

https://cacm.acm.org/blogcacm/two-concepts-of-intelligence/
3•1970-01-01•5d ago•0 comments

Self Sanitizing Door Handle

https://www.jamesdysonaward.org/en-US/2019/project/self-sanitizing-door-handle/
19•rendaw•3d ago•25 comments

RISC-V is coming along quite speedily: Milk-V Titan Mini-ITX 8-core board

https://www.tomshardware.com/pc-components/cpus/milk-v-titan-mini-ix-board-with-ur-dp1000-process...
16•fork-bomber•1h ago•4 comments

Show HN: AWS-doctor – A terminal-based AWS health check and cost optimizer in Go

https://github.com/elC0mpa/aws-doctor
32•elC0mpa•7h ago•15 comments

40% of Kids Can't Read and Teachers Are Quitting [video]

https://www.youtube.com/watch?v=XTugyu2F0pc
5•squillion•46m ago•0 comments

Astrophotography visibility plotting and planning tool

https://airmass.org/
36•NKosmatos•3d ago•5 comments

Simulating the Ladybug Clock Puzzle

https://austinhenley.com/blog/ladybugclock.html
31•azhenley•1d ago•6 comments

High-speed train collision in Spain kills at least 39

https://www.bbc.com/news/articles/cedw6ylpynyo
167•akyuu•12h ago•153 comments

Using proxies to hide secrets from Claude Code

https://www.joinformal.com/blog/using-proxies-to-hide-secrets-from-claude-code/
90•drewgregory•5d ago•32 comments

Show HN: Beats, a web-based drum machine

https://beats.lasagna.pizza
99•kinduff•14h ago•29 comments

Greenpeace pilot brings heat pumps and solar to Ukrainian community

https://www.pveurope.eu/power2heat/greenpeace-pilot-brings-heat-pumps-and-solar-ukrainian-community
16•doener•2h ago•11 comments

Show HN: Dock – Slack minus the bloat, tax, and 90-day memory loss

https://getdock.io/
143•yadavrh•15h ago•131 comments

Sins of the Children

https://asteriskmag.com/issues/07/sins-of-the-children
158•maxall4•18h ago•72 comments