So something big got approved a few miles from my house -- a data center complex -- which I found out through a local news provider. The story sparked my curiosity, and I soon went down a rabbit hole of local city government websites and public data to see what other projects might be in the works.
Then the thought occurred to me: what if I could just... scrape all of it?
So one API led to another and another.... I ended up writing 200+ scrapers across 85 cities. It turns out that when the City of Columbus uses Accela, the City of Austin uses Amanda, the City of Chicago has its own thing, and half the other cities dump CSVs on an FTP server that may or not be online -- "just scrape it all" stops being simple quickly.
Some things I learned along the way:
- There is no standard for permit data. Every city invents its own schema.
- Geocoding more than a million addresses sounds straightforward until you come to the conclusion that half of the addresses are things like "LOT 4 BLK 2 UNIT 7".
- Government APIs have rate limits that appear to be set by someone that assumed no one would use them.
- The estimated cost field is a work of creative fiction. A $200 million data center will sometimes be listed at $1.
PermitRadar is the result -- an interactive map + search across 1.6M+ results. You can lookup any city, filter by date/cost/type, and see what's going on. If you care about a specific address (homeowner, contractor, investor), you can setup alerts that notify you when new permits are filed.
The city pages (e.g. /permits/los-angeles-ca) are server-rendered and public -- no login required. The stack is Express/TypeScript + Next.js + PostGIS + Redis + BullMQ. Scrapers run on a cron job and feed a queue that handles geocoding, normalization, and AI classification (Claude Haiku 4.5).
I'm happy to answer any questions that you have regarding scraping, the data normalization hellscape, or anything under the sun.
navaed01•1h ago
Really interesting project!
Did you use AI at all to build the scrapers?
twincipher•1h ago
Yes and no. Claude code helped with boilerplate and debugging, however every city's permit system is different enough to warrant hands-on work - figuring out the API endpoints, understanding the schema, pagination quirks, etc... AI was extremely useful for the repetitive parts (parsing HTML and mapping fields), but the hard part is comprehending each city's unique data and normalizing it into something consistent. That's still a human problem.
twincipher•1h ago
Then the thought occurred to me: what if I could just... scrape all of it?
So one API led to another and another.... I ended up writing 200+ scrapers across 85 cities. It turns out that when the City of Columbus uses Accela, the City of Austin uses Amanda, the City of Chicago has its own thing, and half the other cities dump CSVs on an FTP server that may or not be online -- "just scrape it all" stops being simple quickly.
Some things I learned along the way:
- There is no standard for permit data. Every city invents its own schema. - Geocoding more than a million addresses sounds straightforward until you come to the conclusion that half of the addresses are things like "LOT 4 BLK 2 UNIT 7". - Government APIs have rate limits that appear to be set by someone that assumed no one would use them. - The estimated cost field is a work of creative fiction. A $200 million data center will sometimes be listed at $1.
PermitRadar is the result -- an interactive map + search across 1.6M+ results. You can lookup any city, filter by date/cost/type, and see what's going on. If you care about a specific address (homeowner, contractor, investor), you can setup alerts that notify you when new permits are filed.
The city pages (e.g. /permits/los-angeles-ca) are server-rendered and public -- no login required. The stack is Express/TypeScript + Next.js + PostGIS + Redis + BullMQ. Scrapers run on a cron job and feed a queue that handles geocoding, normalization, and AI classification (Claude Haiku 4.5).
I'm happy to answer any questions that you have regarding scraping, the data normalization hellscape, or anything under the sun.