It was originally based on Pelias, but we've since augmented with additional data sources (such as Foursquare OS Places) and significantly improved the baseline for performance and accuracy.
Happy to answer questions on the topic!
Is that something I should report as a bug, or is that the way it is supposed to be?
Definitely drop us a line. Our v2 response structure is still undergoing some iteration, especially around the particulars of labeling, so this may be intended (depending on the specific query), but we can certainly look into it to confirm that.
Can I keep those points if I'm no longer a customer?
Can I resyndicate those stored points via my own API?
> permanently storing results for future use (e.g., as a database column), in part or in whole, from the Stadia Maps Geocoding APIs without an active Standard, Professional, or Enterprise subscription with appropriate permissions;
(Having scanned those terms I'm still not 100% certain I can confidently answer all three of my questions though. A classic challenge with this is that terms often have language that relates to map tile images, but it's hard to derive if those same terms apply to geocoded lat/lon points.)
https://docs.stadiamaps.com/geocoding-search-autocomplete/bu...
> Can I Store Geocoding API Results?
> Unlike most vendors, we won't charge you 10x the standard fee per request to store geocoding results long-term! However, we do require an active Standard, Professional, or Enterprise subscription to permanently store results (e.g. in a database). Temporary storage in the normal course of your work is allowed on all plans. See our terms of service for the full legal terms.
For a good geocoder, you need many other data sources (which can be open). OpenAddresses (https://openaddresses.io/) is an example of a vital dataset to delivering anything of any quality.
Returning real results requires extensive parsing and awareness of addresses and place data (including localization of them), and this is not something you get for free based on OSM data.
For context, I've tried it! I've been working on a free library for geocoding UK addresses quickly and accurately. It comes with the caveat that you need access to the dataset of all addresses you're geocoding against - which could be your own list, or a commercial product like addressbase: https://github.com/robinL/uk_address_matcher/
So I work in data analytics, not so much web-mapping. For those applications, IMO local solutions, like ESRI, are good options if you are limited to addresses in the US, https://crimede-coder.com/blogposts/2024/LocalGeocoding.
Googles TOS was that you can't even cache the results, https://cloud.google.com/maps-platform/terms. So saving to a database and doing analysis of your data is not allowed AFAICT.
Ed Freyfogle (the founder) is a nice person, very knowledgeable about all things geo, pretty approachable and co-runs the geomob podcast (worth checking out), associated meetups (worth going to). If you are unsure, get your free API key and just give it a try. His documentation is awesome and the API is really easy to get started with.
Disclaimer, Ed's a friend and I'm a user of his product.
- Can I store the latitude/longitude points I get back from the API in my own database forever, and use them for things like point-in-polygon or point-closest-to queries?
- Can I "resyndicate" those latitude/longitude points in my own APIs?
I've encountered quite a few popular geocoding APIs (including Google's) that disallow both of these if you take the time to read the small print. This massively limits how useful they are: you can build a "show me information about this location right now" feature but if you have a database of thousands of addresses you can't meaningfully annotate data in a way that's valuable in the medium-to-long-term.
The API thing matters because it's not great having location data in your database that you can't expose in an API to other partners!
I really like OpenCage for exactly this reason: https://opencagedata.com/why-use-open-data
"Store geocoding results as long as you like. Keep results even after you stop being a customer."
I was surprised to see AWS' location service wasn't compared in this write-up. They are unique in that they offer both options. They ask when you provision the service if you plan on storing the data. The service works the same, but the cost is 8x. A fair trade, if your use-case involves referencing that data often.
Our experience (10+ years of offering a geocoding service) is that many people (of course depending on exact needs and use case) are significantly over-spending and could be using open data to reduce costs by 80+%.
Happy to chat if interested
They're surely going to just have a column for 'user_country' in their users database which is prepopulated from the users IP and used for all kinds of uses.
The trouble with most of the geo and map stuff is, that pricing is one dimension, but most of them have very different rules regarding usage. For example some prohibit you from persisting the geocoded locations. Others want you to pay more if you do something they consider "asset tracking".
If you still need geocoding very happy to have a conversation, or you can just check out our site: https://opencagedata.com
We use only open data, you can store it forever (even if no longer a customer) and use it for whatever you like.
For example: quality (not generally, but versus your actual input data), terms & conditions of what you can do with the data, support, data enhancements (things like timezones, etc, etc), ease of use, documentation, terms of payment, and more.
The only real answer to "which geocoding service is best" is "it depends".
We have a comprehensive geocoding buyer's guide on our site: https://opencagedata.com/guides/how-to-compare-and-test-geoc...
Please get in touch if you need geocoding, hapyp to tell you if your needs are a good match for our service. Happy also to tell you if not.
2 Qs:
1. How does OpenCage correctness/completeness compare to Google Maps API, especially in rural and industrial regions where you have addresses like “AcmeCo Industries, 234-XY Unit C, Jebel Ali Free Zone, Dubai”? I’d like to confidently query the most precise location that still matches/contains my query.
2. Do you support querying by business names? Google’s geocoding doesn’t return the business name in the result (that’s a separate API), but it does use business names to resolve queries.
Great. The only real answer is you should sign up for a free trial (takes 2 min, requires just an email address) and test with your actual input data. Which language are you working in? We have SDKs for almost all (30+) and detailed tutorials for many: https://opencagedata.com/sdks
You can also test manually on our demo page: https://opencagedata.com/demo
You can do a lot to help us by formatting the input data well: https://opencagedata.com/guides/how-to-format-your-geocoding...
re: company names, it is a real challenge as they introduce a lot of noise.
Please can you follow up by email with specific questions: support @ opencagedata.com
I hope we have the chance to work with you
To summarize the main point of roll-your-own vs. a pay-per-request api the main point seems to be updating with updated/new OSM data.
In terms of comparing Google Maps vs. Open Cage vs. roll your own OSM / Nominatim what would you say are the main features that are different? (not dev time or infra stuff- just what's different about the request/result)
though really the key difference is the fact that we use open data. Googles data is not open, this significatly restricts what you can do with the data.
And here is a similar comparison versus running your own Nominatim https://opencagedata.com/guides/how-to-switch-from-nominatim
Please let us know if anything is out of date or can be made more clear. Thanks.
Is there a chance you guys will ever switch to a per-request pricing model?
That said we do have enterprise customers with other pricing models to meet their exact needs. Please get in touch if we can help you.
I suppose you can't please everyone.
correct
- Get a (cheap) docker capable server.
- Install the OSM/Nominatim stack using docker.
Setting this up used to be a pita, but due to docker its super easy.
This has a couple of benefits.
Like fixed, predicable costs. You can whatever you want without thinking about weird API points which costs a random amount of money. You can serve whatever traffic you want and a cheap v-server gets you an astonishingly long way. There are no 3rd-party privacy issues you can just embed your maps without annoying cookie banners.
"For a full planet import 128GB of RAM or more are strongly recommended. Do not report out of memory problems if you have less than 64GB RAM."
That's ~$150/mo at Hetzner on bare metal, $672/mo at Digital Ocean, starting at $487/mo at AWS. For a non-redundant, low-availability configuration.
But it doesn't mention why you need this amount of RAM or how you could opt out of that requirement? i.e., if the queries run directly against the DB w/o indexes, etc. why the high RAM requirement?
Nominatim also doesn't support any sort of typeahead queries. There's Photon(https://github.com/komoot/photon), which works in concert with Nominatim and is similarly tied to OSM as a data source.
There's also Pelias(https://pelias.io/), an open-source geocoder that handles all types of geocoding in one and supports OSM, OpenAddresses and many other sources (including custom data) out of the box. Admittedly it can have more moving parts as a result, but it can be set up with relatively little RAM if you're careful (I bet a planet install could be done somewhat slowly with 32GB RAM). Full disclosure, I have been a core-maintiner of the project since 2015.
If not, we securely run geocoding batches of that size at Geocode Earth all the time at pretty competitive rates. We are flexible on data transfer, usually we have customers set up an SFTP server or an S3 bucket and send us the credentials. We spin up a ton of hardware in EC2 to geocode it real fast (<24 hours even for a few hundred million addresses), will work with any data format you need, and then send it back.
If you _do_ need to run it locally, we're also the creators of the Pelias geocoder(https://pelias.io), which is open source like Nominatim but supports more than just OSM data (which is not a comprehensive address source globally), so often can return better results. We can help you set it up if you need.
> The article was updated on June 26, 2023, to include LocationIQ per the provider's request.
There are a few more options now (Stadia, Geocodio, among others). And I'm surprised this doesn't include MapBox, which surely existed then and has (comparatively) reasonable prices.
My team has had issues where SIEM alerts are difficult to investigate because Microsoft inaccurately declares an IP geographically distant, then fires a second alert for "Atypical travel" since they seem to have traversed a vast distance between logging in on say, one's laptop and mobile.
(For whatever reason, mobile IPs, specifically IPv6 IPs, are the worst)
For me it's not an issue of cost, it's that if the data is inaccurate it is worse than useless -- it eats up my time chasing bad SIEM alerts.
See: https://opencagedata.com/guides/how-ip-geolocation-differs-f...
We used them until we moved to fastly, which does iplocation stuff as part of their service.
I joined Mapzen in 2015 which ostensibly was part of a Samsung startup accelerator, but looking back, it's more descriptive to say it was an open-source mapping software R&D lab. We built what is now foundational open-source geospatial tools like the Pelias geocoder (my team) and the Valhalla routing engine. A lot more projects like the Tangram map renderer are still really useful post-Mapzen.
A reasonable, but very wrong, first assumption about geocoding is that with a database of places you're almost there. Inputs are often structured, like some addresses, but the structure has so many edge cases you also have to effectively consider it unstructured. The data is the same, sometimes worse as a lot of data sources are quite bad.
Over the last 10 years we've explored most strategies for full text search, and no ONE solution knocks it out of the park. We started with really simple "bag of words" search, just looking at token matches. That, fairly predictably was mostly a mess. With billions of places in the world recorded in open datasets, there's going to be something irrelevant somewhere that matches, and probably drowns out whatever you're looking for.
Parsing inputs for structure is an enticing option too, but for any pattern you can come up with, there's either a search query or some data that will defeat that structure (try me).
The previous generation of ML and a lot of sweat by Al Barrentine produced libpostal(https://github.com/openvenues/libpostal), which is a really great full-text address parser. It's fast and accurate, but it doesn't handle partial inputs (like for autocomplete search), doesn't offer multiple parsing interpretations, and still isn't always right.
What we've settled on for now for autocomplete is a pretty sophisticated but manually configured parser, which can return multiple interpretations and is also quick to fall back to "i don't know" (how can you really parse meaning out of a short input like "123": is it the start of a postalcode? a housenumber? the name of a restaurant?). It's also runtime bound to make sure it always returns in a few milliseconds or less, since autocomplete is extremely latency sensitive. Then we can either search with the benefit of more structure, or worst case fall back to unstructured, with a LOT of custom logic, weights, filters, and other tricks as well.
A big question right now is will next generation LLMs completely solve geocoding, and honestly I'm not sure. Even older ML is really eager to over-generalize rules, and while newer LLMs do that less, they also still hallucinate, which is pretty much a dealbreaker for geocoding. At least for now LLMs are also orders of magnitude too slow, and would never be cost effective at current prices. Personally I think us geocodeurs will be in business a while longer.
There's so much more about geocoding I love talking about, it's truly a niche filled with niches all the way down. This is the sort of stuff we are always iterating on with our business Geocode Earth (https://geocode.earth/). We think we have a really compelling combination of functionality, quality, liberal usage license (hi simonw!), respect for privacy, and open-source commitment. We always love hearing from people interested in anything geocoding so say hello :)
To help with this, a group of folks (including me) started OpenAddresses (https://openaddresses.io/ and https://github.com/openaddresses/openaddresses/) with the goal of finding every open address dataset in the world. We produce a zip file with 100M's of addresses that several of the APIs mentioned in this thread use as a major part of their dataset. We've been going for well over 10 years now, but it would be great to have more eyes looking for more address sources. Check us out!
I'm simplifying slightly, but it's essentially OSM's Nominatim geocoder data with a ready-to-download db, autocomplete capabilities, and a single .jar to install. If you're happy with the limitations of OSM's data (basically, patchy housenumber coverage) then it's easy and fast.
jasongill•6h ago