For deciding if a user is in Texas you could create a simple polygon completely inside Texas and one in Oklahoma. 99% would fall in the simple polygon and the rest go to the detailed polygons. Or create bounds near the complex river borders and use the detailed polygons there.
On the other hand I just use simple, non-optimized functions for qquiz.com.
This seems like the obvious optimized v1: create extremely compressed (simplified) polygons wholly within the proper geopolitical borders. You get 100% true positives for a significant fraction of queries, and any negatives you can still kick to GMaps. I understand wholly-local is the goal here, but as others have pointed out, even small error rates can be unacceptable in some scenarios.
It can be self-hosted, with constant replication. There's also Photon which is a cut-down version of it: https://photon.komoot.io
An in-between for OP could be something like opencagedata.com, which is still a third-party API but an order of magnitude less expensive than Google. (not affiliated but have previously explored the service)
And of course with edge cases, there are lots of them but mostly it's fine. One case that comes to mind is that of the border town of Baarle-Nassau On the border with the Netherlands and Belgium. This village has some of the weirdest borders in the world. There are Belgian exclaves inside Dutch enclaves. In some cases the border runs through houses and you can enter in one country and leave in another. Some of the exclaves are just a few meters. There are a few more examples like this around the world.
Another issue is the fractal nature of polygons. I once found a polygon for New Zealand that was around 200MB that broke my attempts to index it. This doesn't matter of course for resolving country codes because it is an island. But it's a reason I implemented the Douglas Peucker algorithm to simplify the polygon mentioned in the article at some point.
Really cool
Anyway, a Kotlin library I wrote uses a similar technique to make requests for the majority of locations immediate, while also handling the edge cases - i.e. when querying a location near a border.
https://github.com/westnordost/countryboundaries (also available in Rust)
What it does is to slice up the input geometry (e.g. a GeoJson) into many small cells in a raster. So, when querying for a location, one doesn't need to do point-in-polygon checks for potentially huge polygons, but just for those little slices that are in the cell one is querying for. And of course, if a country completely covers a cell, we don't even need to do any point-in-polygon check anymore. All this slicing is done in a preprocessing step, so the actual library consumes a serialized data structure that is already in this sliced-up format.
I needed it to be fast because in my app I display a lot of POIs on the map for which there is logic that is dependent on in which country/state the POI is located.
There are many similar things of course but nothing that was multiplatform, which I needed. I actually created a multiplatform kotlin library for working with language and country codes a few months ago: https://github.com/jillesvangurp/ko-iso
It seems we have some shared interests. I'll check out your library.
What you describe is nice strategy for indexing things. I've done some similar things. Another library (jillesvangurp/geogeometry) I maintain allows you to figure out which map tiles cover a polygon cover a polygon. Map tiles are nice because they are basically quad tree paths. I have a similar algorithm that does that with geohashes. You could use both for indexing geospatial stuff.
Slicing up the polygons sounds interesting. I've been meaning to have a go at intersect/union type operations on geometries. I added a boolean intersects recently to check whether geometries intersect each other. I already had containment check.
There are 249 ISO 3166-1 country codes:
* https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes
But 193 sovereign states recognized by the UN:
* https://en.wikipedia.org/wiki/Member_states_of_the_United_Na...
Some of the discrepancy can be accounted for by "legacy" codes like .su for the Soviet Union.
All of which are also in the 249 ISO 3166-1 list; it's a super set. It doesn't include the historical ones anymore. Codes for those are interesting if you have old data perhaps.
Maybe a good iteration of this is use the .01 accuracy line work for the 99.9% of users but anything within 100m of a border could be sent to google API to get the edge cases. Probably would be in the free tier.
It might be interesting to see how the edge cases mentioned in the article are impacted by switching to, for example, Visvalingam-Whyatt [0].
[0]: For a Python implementation: https://github.com/urschrei/simplification
This is a common topic and easily dealt with by working with topology-informed geometries; most simplification algorithms support topology handling between different features. For instance, TopoJSON can be used.
But I’d like to know!
You can look into TopoJSON here: https://github.com/topojson/topojson And a good general introduction to topology in GIS setting is nicely found in QGIS documentation: https://docs.qgis.org/3.40/en/docs/gentle_gis_introduction/t...
Worth noting that there is a 6 decimal precision on the coordinates of the 90kb (gz) `coord2state.min.js` ... which suggests an accuracy that may not be present in the simplified data (i.e. <1m).
Before you increase tolerance to decrease filesize, you could consider lowering this decimal precision to 5, 4 or even 3 decimals given the "country, state, or city" requirement.
I also like the idea of using a heavily cached, heavily compressed image that is perfect for the >95% of the country that isn't within a pixel of a border. With a subsequent request for another heavily cached vector tile that encompasses any lat/lng within your 1px tolerance.
> You are pointing to Waldo on a page... on a specific date. Because of tectonic plates movement.
If your pre-simplification input geometries form a coverage[0], you can use e.g. ST_CoverageSimplify[1] or coverage.simplify[2] to simplify them without introducing gaps.
[0] http://lin-ear-th-inking.blogspot.com/2022/07/polygonal-cove... [1] https://postgis.net/docs/ST_CoverageSimplify.html [2] https://shapely.readthedocs.io/en/2.1.0/reference/shapely.co...
Of course, mapbox provides a parameter in the API to reduce the number of points using Douglas-Peucker algorithm. But I didn't want to make API call every single time, so we stored it and used a simple distilling depending on the use case.
Also, Missouri has more vertices than Kansas, suck it!
In a .js file, each character is UTF-16 (2 bytes). Your current encoding uses 23 characters per coordinate, or 46 bytes.
Using 16-bit floats for lat/lon gives you accuracy down to 1 meter. You would need 4 bytes per coordinate. So that's a reduction by 91%.
You can't store raw binary bytes in a .js file so it would need to be a separate file. Or you can use base64 encoding (33% bigger than raw binary) in .js file (more like 6 bytes per coordinate).
(Edited to reflect .min.js)
What? I'd like to challenge this. The in-memory representation of a character may be UTF-16, but the file on disk can be UTF-8. Also UTF-16 doesn't mean "2 bytes per character": https://stackoverflow.com/a/27794229
The file https://github.com/AZHenley/coord2state/blob/main/dist/coord... doesn't use anything other than the 1-byte ASCII characters.
Thanks for the correction
Not for Longitude it doesn't with values > abs(128), as that for example means 132.0 has the next possible value of 132.125.
float16 precision at values > 16 is pretty poor.
Converting that discrepency (132.125 - 132.0) to KM gives 10 KM.
Did you maybe mean Fixed-point? (but even then that's not enough precision for 1m)
codingdave•1d ago
I'd be curious if the reliability is different if, instead of random locations, you limited it to locations with some level of population density. Because a lot of the USA is rural, so that random set is not going to correlate well to where people actually are. It probably matters more the farther east you go as well, as the population centers overlap borders more when you get to the eastern seaboard.
azhenley•1d ago
madcaptenor•1d ago
nocoiner•1d ago
timewizard•1d ago
bigiain•1d ago