frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Libpostal: C library for parsing/normalizing street addresses around the world

https://github.com/openvenues/libpostal
62•nateb2022•10h ago

Comments

jandrese•8h ago
Wow, ambitious project. Anybody who has tried to verify addresses can tell you that the staggering number of different formats and conventions around the world make it and almost intractable problem. So many countries have wildly informal standards and people putting down just whatever they want because the mailman "just knows".
monero-xmr•8h ago
Maxmind is the quintessential example of what devs want to build in their heart of hearts. Low-touch sales but b2b. Almost a monopoly. Prints money for decades. Not a public company so they never increase costs to a usurious amount. Open source never quite meets the level needed
derdi•20m ago
> Anybody who has tried to verify addresses

Why would one try to "verify" addresses that one knows nothing about?

> because the mailman "just knows"

The mailman does "just know", and the mailman is who the address is for. Web forms I have seen that have tried to "verify" my address have never done so in a way that made the address better for the mailman.

Ameo•6h ago
I used this at a previous company with quite good success.

With relatively minimal effort, I was able to spin up a little standalone container that wrapped around the service and exposed a basic API to parse a raw address string and return it as structured data.

Address parsing is definitely an extremely complex problem space with practically infinite edge cases, but libpostal does just about as well as I could expect it to.

degamad•6h ago
Ditto - I was impressed with how well it handled the weird edge cases in our data.

They've managed to create a great working implementation of a very, very small model of a very specific subset of language.

ethan_smith•1h ago
Worth noting that libpostal requires ~2GB RAM when fully loaded due to its comprehensive data models. For containerized deployments, we reduced memory usage by ~70% by compiling with only the specific country models needed for our use case.
degamad•6h ago
Previously:

<https://news.ycombinator.com/item?id=18775099> Libpostal: A C library for parsing/normalizing street addresses around the world - 117 points by polm23 on Dec 29, 2018 (25 comments)

<https://news.ycombinator.com/item?id=11173920> Libpostal: international street address parsing in C trained on OpenStreetMap (mapzen.com) 74 points by riordan on Feb 25, 2016 (7 comments)

RobinL•6h ago
There are many useful applications of libpostal, and it's an impressive library, but one I would caution against is for the purpose of address matching, at least as the 'primary' approach.

The problem is the hardest to parse addresses are also often the hardest to match, making the problem somewhat circular. I wrote about this more in a recent blog on address matching: https://www.robinlinacre.com/address_matching/

kleiba•5h ago
Relevant? -> "Falsehoods programmers believe about addresses" (https://www.mjt.me.uk/posts/falsehoods-programmers-believe-a...)

Discussed on HN here: https://news.ycombinator.com/item?id=8907301

weinzierl•4h ago
In the same vein, there is also Google's excellent libphonenumber for parsing, formatting, and validating international phone numbers.

And because I had no idea before I worked on a project where we had to deal with customer data: many companies also use commercial services for address and phone number validation and normalization.

kerkeslager•3h ago
I think fundamentally, no parsing/normalizing library can be effective for addresses. A much better approach is to have a search library which finds the address you're looking for within a dataset of all the addresses in the world.

Addresses are fundamentally unstructured data. You can't validate them structurally. It's trivial to create nonexistent addresses which any parsing library will parse just fine. On the flipside, there's enough variety in real addresses that your parser has to be extremely tolerant in what it accepts--so tolerant that it basically tolerates everything. The entire purpose of a parser for addresses is to reject invalid addresses, so if your parser tolerates everything it's pointless.

The only validation that makes any sense is "does this address exist in the real world?". And the way to do that is not parsing, it's by comparing to a dataset of all the addresses in the world.

I haven't evaluated this project enough to understand confidently what they're doing, but I hope they're approaching this as a search engine for address datasets, and not as a parsing/normalizing library.

vidarh•1h ago
And keeping such datasets up to date is another matter entirely, because clearly a lot of companies rely datasets that were outdated before their company even existed.

A trivially simple example of just how messy this is when people try to constrain it is that it's nearly random whether or not a given carrier would insist on me giving an incorrect address for my previous place, seemingly because traditionally and prior to 1965 the address was in Surrey, England.

The "postcode area name" for my old house is Croydon, and Croydon has legally been in London since 1965, and was allocated it's own postcode area in 1966. "Surrey" hasn't been correct for addresses in Croydon since then.

But at least one delivery company insisted my old address was invalid unless I changed the town/postcode area to "Surrey", and refused to even attempt a delivery. Never mind they had my house number and postcode, which was sufficient to uniquely identify my house.

shakna•2h ago
I somehow doubt this will pass the snifftest of one of my old addresses, which Australia Post successfully delivered to on a weekly basis:

    Third on right of main,
    Tiwi College,
    Melville Island, 0822, AU.
You can try to normalize that... But "Main Road" is in another city. Because I wasn't living in a city. There were no road names. And the 3rd position was an empty plot, not the third house. We had a bunch of houses around a strip of land, a few minutes from the airstrip - the only egress.
mrweasel•2h ago
You also have to account for interestingly worded addresses. We had "

  Streetname 5, behind the glazier business.
  It might say <some other name> on the door
That's very specific, but also not really an address.
gorgoiler•57m ago
I have a real soft spot for these codifications of everyday things. A lot of us do. See also tzdata, GNU units, pluralize(noun), humanize(timestamp), and SPICE astronavigation. And yes, locating Mars in the night sky is indeed an everyday thing!

What are some others?

Million Times Million

https://susam.net/million-times-million.html
22•susam•46m ago•16 comments

Ruby 3.4 frozen string literals: What Rails developers need to know

https://www.prateekcodes.dev/ruby-34-frozen-string-literals-rails-upgrade-guide/
65•thomas_witt•3d ago•21 comments

Is the doc bot docs, or not?

https://www.robinsloan.com/lab/what-are-we-even-doing-here/
76•tobr•4h ago•36 comments

Helm local code execution via a malicious chart

https://github.com/helm/helm/security/advisories/GHSA-557j-xg8c-q2mm
114•irke882•6h ago•53 comments

ESIM Security

https://security-explorations.com/esim-security.html
44•todsacerdoti•3h ago•11 comments

Why LLMs Can't Write Q/Kdb+: Writing Code Right-to-Left

https://medium.com/@gabiteodoru/why-llms-cant-write-q-kdb-writing-code-right-to-left-ea6df68af443
9•gabiteodoru•1d ago•2 comments

Most RESTful APIs aren't really RESTful

https://florian-kraemer.net//software-architecture/2025/07/07/Most-RESTful-APIs-are-not-really-RESTful.html
92•BerislavLopac•5h ago•145 comments

7-Zip for Windows can now use more than 64 CPU threads for compression

https://www.7-zip.org/history.txt
113•doener•2d ago•53 comments

Bootstrapping a side project into a profitable seven-figure business

https://projectionlab.com/blog/we-reached-1m-arr-with-zero-funding
582•jonkuipers•1d ago•128 comments

RapidRAW: A non-destructive and GPU-accelerated RAW image editor

https://github.com/CyberTimon/RapidRAW
184•l8rlump•9h ago•76 comments

Astro is a return to the fundamentals of the web

https://websmith.studio/blog/astro-is-a-developers-dream/
98•pumbaa•3h ago•88 comments

IKEA ditches Zigbee for Thread going all in on Matter smart homes

https://www.theverge.com/smart-home/701697/ikea-matter-thread-new-products-new-smart-home-strategy
75•thunderbong•2h ago•38 comments

Phrase origin: Why do we "call" functions?

https://quuxplusone.github.io/blog/2025/04/04/etymology-of-call/
94•todsacerdoti•8h ago•66 comments

I'm Building LLM for Satellite Data EarthGPT.app

https://www.earthgpt.app/
61•sabman•2d ago•7 comments

Breaking Git with a carriage return and cloning RCE

https://dgl.cx/2025/07/git-clone-submodule-cve-2025-48384
334•dgl•18h ago•129 comments

US Court nullifies FTC requirement for click-to-cancel

https://arstechnica.com/tech-policy/2025/07/us-court-cancels-ftc-rule-that-would-have-made-canceling-subscriptions-easier/
177•gausswho•13h ago•195 comments

Proposal: GUI-first, text-based mechanical CAD inspired by software engineering

13•thinkmachyx•3d ago•26 comments

Serving a half billion requests per day with Rust and CGI

https://jacob.gold/posts/serving-half-billion-requests-with-rust-cgi/
4•feep•1d ago•0 comments

Where can I see Hokusai's Great Wave today?

https://greatwavetoday.com/
78•colinprince•9h ago•64 comments

Frame of preference A history of Mac settings, 1984–2004

https://aresluna.org/frame-of-preference/
127•K7PJP•12h ago•19 comments

Supabase MCP can leak your entire SQL database

https://www.generalanalysis.com/blog/supabase-mcp-blog
746•rexpository•18h ago•399 comments

Smollm3: Smol, multilingual, long-context reasoner LLM

https://huggingface.co/blog/smollm3
314•kashifr•20h ago•62 comments

Show HN: I rewrote an outdated React Native map clustering library

https://github.com/suwi-lanji/rn-maps-clustering
22•hadat•5h ago•7 comments

Bug Stories

https://500mile.email/
22•thinkingemote•5h ago•4 comments

SUSE launches new European digital sovereignty service to meet surging demand

https://www.zdnet.com/article/suse-launches-new-european-digital-sovereignty-support-service-to-meet-surging-demand/
77•saubeidl•5h ago•26 comments

Radium Music Editor

http://users.notam02.no/~kjetism/radium/
223•ofalkaed•18h ago•48 comments

Libpostal: C library for parsing/normalizing street addresses around the world

https://github.com/openvenues/libpostal
62•nateb2022•10h ago•15 comments

Brut: A New Web Framework for Ruby

https://naildrivin5.com/blog/2025/07/08/brut-a-new-web-framework-for-ruby.html
187•onnnon•18h ago•64 comments

Zorin OS

https://zorin.com/os/
104•oldfuture•20h ago•98 comments

iPod Linux (2017)

http://www.ipodlinux.org/
55•nickysielicki•10h ago•21 comments