This is essentially the modern version of having a library of encyclopedias.
https://internet-in-a-box.org/
They provide offline access to Wikipedia, OpenStreetMap, Project Gutenberg, and many other resources.
https://arstechnica.com/information-technology/2025/04/ai-bo...
Because people are people. And will always prioritize egotism over respect for the common good.
Of course, the problem then is getting all these scrapers and bots to actually use the alternative, but Wikimedia could potentially redirect suspected clients in that direction..
maybe a view full website link loaded on js so bots dont see it idk
me too tbh
someone pointed out you can enable by default reader mode on safar under settings but even then not all website’s pages are seeved as reader mode enabled pages
It would be nice to have something like this more decentralized.
I kid you not; through a process of attrition they've attacked the very reliability and reputation of every source, including Fox News and the like, and they've told editors sitewide that they simply can't be cited as a "Reliable Secondary Source", like at all.
I am not sure if that is an accurate assessment of the situation on the ground for mainstream media, but it certainly exposes some real systemic bias.
And this is the highest-order and most enduring method of ingraining systemic bias in the project: by weeding out sources with unfavorable viewpoints and perspectives, saying they publish lies and untruth, and being able to prohibit them globally from any use.
And I was pondering this state of affairs and just thinking about Karoline Leavitt's press room, and wondering what will the landscape be, if there is precious little intersection between press outlets who may be favorable or deferent to the present administration, and those which are allowed to be cited on Wikipedia? Ouch!
And you know, I wouldn't be surprised if people hurling those accusations somehow believe that the lies and misinformation are one-sided and partisan. As if leftism has some sort of monopoly on Truth and Goodness bestowed from above.
It's really been sickening to see the media outlets just lay down thick trails of bullshit that is designed to distract us, to instill fear, uncertainty, and doubt, to make us hate one another, to keep us hanging on that channel or that subscription for the next tidbit. It's disgusting and manipulative, and the Right has absolutely no monopoly on those tactics.
Wikipedia is simply a microcosm of the prevailing zeitgeist, so they are as likely to cure systemic bias as a leopard can change its spots.
Anyone who wants to have access while off-line, for whatever reason. This can be as simple as saving costs via more complicated as accessing content from regions with spotty and/or expensive connectivity (you're on a ship out of reach of shore-based mobile networks, you do not have access to Starlink or something similar, you're deep in the jungle, deep underground, etc) to some prepper scenario where connectivity ends at the cave entry because the 'net has ceased to exist.
I would like to have a less politically biased online encyclopedia for the latter scenario, it would be a shame to start a new society based on the same bad ideas which brought down the previous one. If ever a politically neutral LLM becomes available that'd be one of the first tasks I'd put it to: point out bias - any bias - in articles, encyclopedias and other 'sources' (yes, I know, WP is not an original source but for this purpose it is) of knowledge.
The problem is: even if you report only facts, there is an editorial function in choosing which facts to report, because it is physically impossible to report all facts. So someone can always point to some sort of bias on choosing which facts to report.
I don’t think that’s fair. Not that Wikipedia is without bias, but that their ivory tower biases are worlds apart from the lying brutal animalistic Hollywood signals herding the masses in “our democracy”.
Here's a few, from https://www.allsides.com/blog/wikipedia-biased
Six studies, including two from Harvard researchers, have found a left-wing bias at Wikipedia:
A 2024 analysis [1] by researcher David Rozado that used AllSides Media Bias Ratings [2] found Wikipedia associates right-of-center public figures with more negative sentiment than left-wing figures, and tends to associate left-leaning news organizations with more positive sentiment than right-leaning ones.
A Harvard study [3] found Wikipedia articles are more left-wing than Encyclopedia Britannica.
Another paper [4] from the same Harvard researchers found left-wing editors are more active and partisan on the site.
A 2018 analysis [5] found top-cited news outlets on Wikipedia are mainly left-wing.
Another analysis [6] using AllSides Media Bias Ratings found that pages on American politicians cite mostly left-wing news outlets.
American academics found [7] conservative editors are 6 times more likely to be sanctioned in Wikipedia policy enforcement.
There are far more sources out there.
If I show examples of biased pages - the one on Antifa is a good example - this will just devolve into a quibble about this or that sentence.
[1] https://davidrozado.substack.com/p/is-wikipedia-politically-...
[2] https://www.allsides.com/media-bias/ratings
[3] https://www.semanticscholar.org/paper/Do-Experts-or-Collecti...
[4] https://www.hbs.edu/faculty/Publication%20Files/17-028_e7788...
[7] https://thecritic.co.uk/the-left-wing-bias-of-wikipedia/
This is not kindergarten so let's no go down this path. Asking for a politically neutral (see my explanation elsewhere in this thread if you don't understand what that means) source of information is not 'bad politics' but intended to avoid bad politics. I suspect that you 'identify' as either 'liberal' or 'progressive' so I assume you'd be less than thrilled if Wikipedia had a conservative bias. The same goes for conservatives and (traditional) capital-L Liberals who are less than thrilled to see Wikipedia having a 'left-wing' or 'progressive' bias. It just makes WP end up being lumped together with the legacy media, known to be untrustworthy where it counts and that is a shame for a site which in many ways still is a valuable resource as long as you avoid any and all subjects which have been pulled into the polarised political discourse.
> who needs to download the whole Wikipedia
Anyone archiving the site. Wikipedia is, for its faults, one of the best-curated collections of summarized human knowledge, probably in history.Replicating that knowledge helps build data resilience and protect it against all sorts of disasters. I used to seed their monthly data dump torrent for a while.
The whole enchilada: https://download.kiwix.org/zim/wikipedia/wikipedia_en_all_ma...
Other versions: https://library.kiwix.org/#lang=eng&category=wikipedia
This comment was downvoted and instead, it'd better merit a comment as to "why" it wasn't contributing to the discussion?
I didn't downvote the comment, but it's not an incredibly deep contribution, is it?
If you really wish to contribute, perhaps you can say what the "'essentials' text version" contained and why you found it interesting?
I haven't looked for documentation on creating my own zim file.
> Low-background steel, also known as pre-war steel and pre-atomic steel, is any steel produced prior to the detonation of the first nuclear bombs in the 1940s and 1950s.
Military A.I. was likely in use earlier, and since PSYOPS are the most used and most effective weapon in the U.S. Military's arsenal, you absolutely know it was used. It ain't a war crime the first time...
But sometimes, they mean "likely" in the more colloquial sense of a guesstimation, which can range anywhere from informed guess to low effort fan-fiction. I default toward the latter unless otherwise specified.
boom
[0]: <https://web.archive.org/web/20221007114937/https://download....>
Most people probably won't seed many versions, so it's a losing effort, and you need to allocate a huge chunk of space for each version.
Deduplicating filesystems are sadly not in vogue.
Can anyone please point to information on how we can download a copy of one specific language version?
I may not be using the term correctly here. In short, I would love a local LLM + Wikipedia snapshot so that I can have an offline, self-hosted ... Hitchhiker's Guide to Earth.
Here’s a few results: https://huggingface.co/search/full-text?q=Wikipedia+embeddin...
And the first result, which is probably what you’ll want to use: https://huggingface.co/datasets/Upstash/wikipedia-2024-06-bg...
I recommend you go for pgvector or a similar self hosted solution to calculate the similarities instead of a service like Vector.
Something using solar for power, with a rugged and water-resistant enclosure, made of extremely high-quality components that won't break for hundreds of years at least. Maybe add an IRDA port for good measure, to make it possible to transfer all the data out somewhat quickly.
You could make hundreds of these and put them in hard-to-reach locations around the world, to make sure at least one survives whatever calamity might befall us in the future.
What would happen if I print all this down at the scale we have been
discussing? How much space would it take? It would take, of course, the
area of about a million pinheads ... All of the information which all of
mankind has every recorded in books can be carried around in a pamphlet
in your hand — and not written in code, but a simple reproduction of
the original pictures, engravings, and everything else on a small scale
without loss of resolution.
Need a good magnifying glass, though (:[1] https://web.pa.msu.edu/people/yang/RFeynman_plentySpace.pdf
My last download of English Wikipedia was ~110 GB and includes images! It's impressively small for the volume of information available.
https://f-droid.org/packages/itkach.aard2
I have many current and old dumps and can switch between a few years. Very nice in case of deleted articles or to check old time stamped versions. It also supports more than just Wikipedia like wikiquote or wikivoyage or cooking wiki. You can compile own mediawikis too
Seems like you would want it to be stored digitally. Ideally, people would have the ability to access it remotely, in case their local copy is somehow corrupted. For that, you would need a physical network by which the data can be transmitted. Economies of scale would seem to suggest that there would be one or a few entities that would “serve” the content to individuals who request it. Of course, you would want those individuals to be able to access this information without having detailed technical knowledge and ability. I guess they would have pre-packaged software “browsers” they could use to access the network.
In order to maintain this arrangement, you would want enough political stability to allow for the physical upkeep of this infrastructure, including human infrastructure (feeding the engineers who make it all possible). In order to make it worthwhile, you would need people who want to access the information too. I suspect political stability, a sufficient abundance of the necessities for human life, and the political will to make sure that everyone’s needs are met so that they can safely be curious about the world would help here too.
All of this requires sources of power. I suspect that a combination of nuclear power, solar/batteries, and geothermal energy would be sufficient and would avoid the problem of running out of fossil fuels at some point in the future. The nice side-effect here of reducing the impact of calamities exacerbated by the greenhouse effect.
For the information to continue being relevant, you would have to update it with new knowledge, and correct inaccuracies. How best to accomplish this? Well, I guess you would need a systematic way to interrogate the causes behind the various effects we observe in the world. I would propose a system where people create hypotheses, and perform experiments that exclude the influence of as many factors as possible external to the phenomenon being studied. People would then share their findings, and I guess would critique each other’s arguments in a sort of “peer review” to try to come to a consensus. You would have to feed and provide for these people at a certain basic level to make sure they are comfortable and safe enough to continue doing this work. I guess you would want to encourage the value systems compatible with this method of interrogating the world.
Just my 2 cents.
https://meta.wikimedia.org/wiki/Data_dumps/What_the_dumps_ar...
btbuildem•8h ago