I'm building an open, verifiable record of businesses for sale in the US. Think EDGAR for Main Street.
The problem: ~7,500 business brokers each maintain their own listings on their own websites. There's no central registry, no standardized data, and no way to audit what's actually on the market. The same listing might appear on 4 different sites. A business that sold 6 months ago can still show as "available."
DealLedger scrapes 1,700 broker websites daily using a combination of specialized scrapers for major franchise networks and ML-based pattern detection for the long tail. Every listing is source-linked, timestamped, and hashed. Snapshots are committed to GitHub daily.
It's not a marketplace. We don't rank, recommend, or broker. It's infrastructure.
The stack: Python scrapers, Playwright for JS-heavy sites, GitHub Actions for daily automation, an AI agent (Claude) that generates new scraper configs from any broker URL you submit. Everything outputs to flat files — JSON/CSV committed to git. The git history IS the ledger.
What's open source:
The scraper framework (specialized + ML-based)
The broker registry (1,700 URLs)
An AI scraper builder agent
All daily data snapshots
The methodology documentation
Live data browser + broker submission form: https://dealledger.org
Source: https://github.com/jeffsosville/dealledger
Background: I've brokered businesses for 12 years (200+ transactions, $75M+). The data infrastructure in this industry hasn't changed since 2005. This is my attempt to fix that, starting with transparency.
Looking for: feedback on the approach, broker URLs we're missing, and anyone interested in adding verticals (HVAC, landscaping, vending, etc.).