I scraped and open-sourced a corpus of 518,255 Vietnamese legal documents — laws, decrees, circulars, decisions — spanning a century of legislation. Metadata + full Markdown text, ~3.6 GB parquet, CC BY 4.0.
Vietnamese legal text is nearly absent from existing NLP datasets despite Vietnam having one of the more prolific legislative systems in Southeast Asia.