frontpage.

WP2TXT is a command-line tool that extracts plain text from Wikipedia dump files. I originally built it in 2006 for corpus linguistics research and have maintained it since. The latest version (2.1) was largely rewritten with features for selective extraction:

- Auto-download dumps by language code (350+ languages) - Extract specific articles by title without downloading the full dump - Extract articles from a Wikipedia category with subcategory recursion - Extract specific sections by name with alias matching (e.g., "Plot" also matches "Synopsis") - Template expansion (dates, coordinates, unit conversions → readable text) - Content type markers ([MATH], [TABLE], etc.) instead of silent removal - Category metadata preserved in output - JSON/JSONL output - Parallel processing (English Wikipedia 24 GB dump: ~2 hours on Apple M4) - Written in Ruby.

Show HN: Snake and Foes – The classic snake game but with enemies and power-ups

There Isn't a Hacker Community for Fundamental Physics (and What It Tells Us)

Typed Assembly Language

The UK tourist with a valid visa detained by ICE for six weeks

Ajail: A basic jail for programs you don't trust

Show HN: How much has the ad industry spent targeting you?

Show HN: OffKit – an iOS app blocker that adds friction

LDOS: Toward a Learning-Directed Operating System

PromptSpy ushers in the era of Android threats using GenAI

Colorado moves age checks from websites to operating systems

Pb-ext: Enhanced PocketBase server with monitoring, logging and API docs

Ruby Is the Best Language for Building AI Apps

Back to textbooks: Denmark rolls back digital learning

Show HN: Free tool to migrate OpenAI Assistants

A Galaxy Composed Almost of Dark Matter Has Been Confirmed

Python creator Guido van Rossum asks Elon Musk what SpaceX uses for coding

Infographic of the Navy and Air Force build up nearby Iran

Acme Weather, from the Creators of Dark Sky

Free Shadcn/UI patterns for faster UI delivery

Show HN: Fix-my-mic – stop macOS from switching to AirPods mic every connection

Formula: A VST for coding custom DSP inside your DAW

A 3000W Water-Cooled Power Supply (With GAN and Sic) [video]

OpenClaw Partners with VirusTotal for Skill Security

The Rolling Layoffs at Jack Dorsey's Block

Ask HN: How would you distribute a privacy-first AI chat for teams?

Enable automatic coverage workflow setup

Show HN: Resilient OpenClaw Browser Relay – Survives WS Drops and MV3 Restarts

Show HN: Stop Pasting Credentials in Slack

Show HN: Skill Check CLI for your skill.md

Show HN: WebhookStream – Receive, relay, send and debug webhooks from 1 platform

Show HN: WP2TXT – Wikipedia dump text extractor with category/section filtering