HHS released a massive dataset of every Medicaid payment to every provider in the US: 227 million rows covering $1.09 trillion in spending across 617,000 billing providers. The data was released explicitly to crowdsource fraud detection.
The raw data is a 2.9 GB Parquet file. I built MedicaidSpending.org to make it searchable and browsable.
You can search by provider name or NPI, browse by state/city/specialty, and see individual provider pages with monthly spending trends, billing code breakdowns, and automated billing flags for statistical outliers.
Some of the patterns are striking. Brooklyn alone accounts for $31.8 billion in personal care services (code T1019) _ more than most states spend on all Medicaid combined. Some authorized officials control hundreds of billing entities. Early analysts scanning just 0.16% of providers flagged $90 billion in likely fraudulent payments.
Technical details:
- Go single binary, ~15 MB
- 3.3 GB SQLite database (read-only, pre-aggregated from the 227M rows using DuckDB)
- 900,000+ indexable pages generated from 13 templates
- No JavaScript framework _ server-rendered HTML, Chart.js for one chart per provider page
- Runs on a single VPS behind Caddy
pw•1h ago
The raw data is a 2.9 GB Parquet file. I built MedicaidSpending.org to make it searchable and browsable.
You can search by provider name or NPI, browse by state/city/specialty, and see individual provider pages with monthly spending trends, billing code breakdowns, and automated billing flags for statistical outliers.
Some of the patterns are striking. Brooklyn alone accounts for $31.8 billion in personal care services (code T1019) _ more than most states spend on all Medicaid combined. Some authorized officials control hundreds of billing entities. Early analysts scanning just 0.16% of providers flagged $90 billion in likely fraudulent payments.
Technical details: - Go single binary, ~15 MB - 3.3 GB SQLite database (read-only, pre-aggregated from the 227M rows using DuckDB) - 900,000+ indexable pages generated from 13 templates - No JavaScript framework _ server-rendered HTML, Chart.js for one chart per provider page - Runs on a single VPS behind Caddy
Data sources: HHS Medicaid Provider Spending dataset, NPPES provider registry, HCPCS code descriptions, OIG exclusion list, NUCC taxonomy codes.
All public data, no login required.
floxy•1h ago