The Canada Revenue Agency publishes T3010 forms for every registered charity, but they're scattered across clunky databases with no standardization or comparability. I collected 15 years of filings for all 138,203 charities and built a trust scoring system on top.
Stack: - Python + Playwright for CRA data collection (4s rate-limited) - PostgreSQL (Supabase) — 12 T3010 tables, 138K charities, 457K directors, 362K directorship links - Express.js REST API on Fly.io - Daily GitHub Actions sync for new filings - On-demand narrative generation via Claude Haiku
Scoring algorithm: Three 0-100 scores per charity: 1. Legitimacy (filing consistency, directorship stability, CRA compliance) 2. Effectiveness (program spending ratio, overhead, donation efficiency) 3. Compliance (sanctions screening, FATF risk, political activity limits)
Each charity gets a letter grade (A+ to F, or NR for insufficient data).
Findings: - Only 186 out of 85,507 registered charities scored A+ - Average effectiveness score: 51.6/100 - 487,692 flags generated (directorship overlap, compensation issues, filing gaps, etc.)
The core search/view is free. I'm building a tiered REST API for professional use cases (due diligence firms, grant-making orgs, etc.).
Code is closed-source for now, but the underlying CRA data is public domain. Happy to discuss the data pipeline, scoring methodology, or data collection approach.