With relatively minimal effort, I was able to spin up a little standalone container that wrapped around the service and exposed a basic API to parse a raw address string and return it as structured data.
Address parsing is definitely an extremely complex problem space with practically infinite edge cases, but libpostal does just about as well as I could expect it to.
They've managed to create a great working implementation of a very, very small model of a very specific subset of language.
<https://news.ycombinator.com/item?id=18775099> Libpostal: A C library for parsing/normalizing street addresses around the world - 117 points by polm23 on Dec 29, 2018 (25 comments)
<https://news.ycombinator.com/item?id=11173920> Libpostal: international street address parsing in C trained on OpenStreetMap (mapzen.com) 74 points by riordan on Feb 25, 2016 (7 comments)
The problem is the hardest to parse addresses are also often the hardest to match, making the problem somewhat circular. I wrote about this more in a recent blog on address matching: https://www.robinlinacre.com/address_matching/
Discussed on HN here: https://news.ycombinator.com/item?id=8907301
And because I had no idea before I worked on a project where we had to deal with customer data: many companies also use commercial services for address and phone number validation and normalization.
jandrese•5h ago
monero-xmr•5h ago