That being said, we are currently working on getting our Google S2 Rust bindings open-sourced. This is a geo-hashing library that makes it very easy to write a reverse geocoder, even from a point-in-polygon or polygon-intersection perspective.
I'd be very happy to use simpler more bulletproof solutions with a subset of ES's features for different use cases.
The one issue I remember is: On ES 5 we once had an issue early on where it regularly went down, turns out that some _very long_ input was being passed into the search by some scraper and killed the cluster.
It's really not something that needs much attention in my experience.
Memory-mapping lets us get pretty far, even with global coverage. We are always able to add more RAM, especially since we're running in the cloud.
Backfills and data updates are also trivial and can be performed in an "immutable" way without having to reason about what's currently in ES/Mongo, we just re-index everything with the same binary in a separate node and ship the final assets to S3.
Especially in the context of embedding search, which this article is also trying to do. We need database that can efficiently store/query high-dimensional embeddings, and handle the nuance of real-world applications as well such as filtered-ANN. There is a ton of innovation in this space and it's crucial to powering the next generation architectures of just about every company out there. At this point, data-stores are becoming a bottleneck for serving embedding search and I cannot understate that advancements in this are extremely important for enabling these solutions. This is why there is an explosion of vector-databases right now.
This article is a great example of where the actual data-providers are not providing the solutions companies need right now, and there is so much room for improvement in this space.
But then the follow on question begs: "Am I really suffering the same problems that a niche already-scaled business is suffering"
A question that is relevant to all decision making. I'm looking at you, people who use the entire react ecosystem to deploy a blog page.
It sounds like they had the wrong architecture to start with and they built a database to handle it. Kudos. Most would have just thrown cache at it or fine tuned a readonly postgis database for the geoip lookups.
Without benchmarks it’s just bold claims we’ll have to ascertain.
Both https://typesense.org/ and https://duckdb.org/ (with their spatial plugin) are excellent geo performance wise, the latter now seems really production ready, especially when the data doesn’t change that often. Both fully open source including clustered/sharded setups.
No affiliation at all, just really happy camper.
A while ago I tried to create something that has duckdb + its spatial and SQLite extensions statically linked and compiled in. I realized I was a bit in over my head when my build failed because both of them required SQLite symbols but from different versions.
We will have some more blog posts in the future describing different parts of the system in more detail. We were worried too much density in a single post would make it hard to read.
maelito•2h ago
It's a mini-revolution in the OSM world, where most apps have a bad search experience where typos aren't handled.
https://github.com/komoot/photon