They are both one line.
(Disclaimer, I work on duckdb-spatial @duckdblabs)
Docker has been a big improvement (when I was first learning PostGIS, the amount of time I had to hunt for proj directories or compile software just to install the plugin was a major hurdle), but it's many steps away from:
``` $ duckdb D install spatial; ```
Getting started from 0 with geo can be difficult for those unfamiliar. DuckDB packages everything into one line with one dependency.
I have no major qualm with pandas and geopandas. However I use it when it's the only practical solution, not because I enjoy using it as a library. It sounds like pandas (or similar) vs a database?
If you're using JavaScript you install Turf. The concept that you can readily install spatial libraries is hardly earth shattering.
(Geopandas is great, too.)
Most of the time I store locations and compute distance to them. Would that being faster to implement with duckdb
In my use case, I use DuckDB because of speed at scale. I have 600GBs of lat-longs in Parquet files on disk.
If I wanted to use Postgis, I would have to ingest all this data into Postgres first.
With DuckDB, I can literally drop into a Jupyter notebook, and do this in under 10 seconds, and the results come back in a flash: (no need to ingest any data ahead of time)
import duckdb
duckdb.query("INSTALL spatial; LOAD spatial;")
duckdb.query("select ST_DISTANCE(ST_POINT(lng1, lat1), ST_POINT(lng2, lat2)) dist from '/mydir/*.parquet'")
Also as a side note, is everyone just using DuckDB in memory? Because as soon as you want some multiple session stuff I'd assume you'd use DuckDB on top of a local database, so again I don't see the point but I'm sure I'm missing something.
Folks who have been doing DevOps work are exasperated with crummy SaaS vendors or antiquated OSS options that have a high setup cost. DuckDB is just a mature project that offers an alternative, hence an easy fan favorite among hobbyists (I imagine at scale the opportunity costs change and it becomes less attractive).
I'm still getting feedback that many devs are not too comfortable with reading and writing SQL. They learned simple SELECT statements in school, but get confused by JOINs and GROUP BYs.
I'd pick that before traveling the DuckDB path.
Yes, DuckDB does a whole lot more, vectorized larger-than-memory execution, columnar compressed storage and a ecosystem of other extensions that make it more than the sum of its parts. But while Ive been working hard on making the spatial extension more performant and more broadly useful (I designdd a new geometry engine this year, and spatial join optimization just got merged on the dev-branch), the fact that you can e.g. convert too and from a myriad of different geospatial formats by utilizing GDAL, transforming through SQL, or pulling down the latest overture dump without having the whole workflow break just cause you updated QGIS has probably been the main killer feature for a lot of the early adopters.
(Discmaimer, I work on duckdb-spatial @ duckdblabs)
Yeah, I feel old.
But what DuckDB as an engine does is it lets me work directly on parquet/geoparquet files at scale (vectorized and parallelized) on my local desktop. It beats geopandas in that respect. It's a quality of life improvement to say the least.
DuckDB also has an extension architecture that admits more exotic geospatial features like Hilbert curves, Uber H3 support.
https://duckdb.org/docs/stable/extensions/spatial/functions....
"Hexagons were an important choice because people in a city are often in motion, and hexagons minimize the quantization error introduced when users move through a city. Hexagons also allow us to approximate radiuses easily, such as in this example using Elasticsearch."
[Edit]Maybe https://www.researchgate.net/publication/372766828_YOLOv8_fo...
This article doesn't acknowledge how niche this stuff is and it's a lot of training to get people to up to speed on coordinate systems, projections, transformations, etc. I would replace a lot of my custom built mapping tools with Felt if it were possible, so I could focus on our core geospatial processes and not the code to display and play with it in the browser, which is almost as big if not bigger in terms of LOC to maintain.
As mentioned by another commenter, this DuckDB DX as described is basically the same as PostGIS too.
Yes, there are so many great tools to handle the complexity for the capital-G Geospatial work.
I love Felt too! Sam and team have built a great platform. But lots of times a map isn't needed; an analyst just needs it as a column.
PostGIS is also excellent! But having to start up a database server to work with data doesn't lend itself to casual usage.
The beauty of DuckDB is that it's there in a moment and in reach for data generalists.
To me, I think it's mostly a frontend problem stopping the spread of mapping in consumer apps. Backend geo is easy tbh. There is so much good, free tooling. Mapping frontend is hell and there is no good off the shelf solution I've seen. Some too low level, some too high level. I think we need a GIS-lite that is embeddable to hide the complexity and let app developers focus on their value add, and not paying the tax of having frontend developers fix endless issues with maps they don't understand.
edit: to clarify, I think there's a relationship between getting mapping valued by leadership such that the geo work can be even be done by analysts, and having more mapping tools exist in frontend apps such that those leaders see them and understand why geo matters. it needs to be more than just markers on the map, with broad exposure. hence my focus on frontend web. sorry if that felt disjointed
DuckDB is great, but the fact that it makes it easier for data generalists to make mistakes with geospatial data is mark against it, not in its favor.
https://georeferenced.wordpress.com/2014/05/22/worldmapblund... https://www.economist.com/asia/2003/05/15/correction-north-k...
This can mostly be avoided entirely with a proper spheroidal reference system, computational geometry implementation, and indexing. Most uses of geospatial analytics are not cartographic in nature. The map is at best a presentation layer, it is not the data model, and some don’t use a map at all. Forcing people to learn obscure and esoteric cartographic systems to ask simple intuitive questions about geospatial relationships is a big part of the problem. There is no reason this needs to be part of the learning curve.
I’ve run experiments on unsophisticated users a few times with respect to this. If you give them a fully spheroidal WGS84 implementation for geospatial analytics, it mostly “just works” for them anywhere on the globe and without regard for geospatial extent. Yes, the software implementation is much less trivial but it is qualitatively superior UX because “the world” kind of behaves how people intuit it should without having to know anything about projections, transforms, etc. And to be honest, even if you do know about projections and transforms, the results are still often less than optimal.
The only issue that comes up is that a lot of cartographic visualization toolkits are somewhat broken if you have global data models or a lot of complex geometry. Lots of rendering artifacts. Something else to work on I guess.
Aren't most geospatial tools just doing simple geometry? And therefore need to work on some sort of projection?
If you can do the math on the spheroidal model, ok you get better results and its easier to intuit like you said, but it's much more complicated math. Can you actually do that today with tools like QGIS and GDAL?
It is entirely possible to do this in databases. That is how it is actually done. The limitations of GEOS are not the limitations of software, it is not a particularly sophisticated implementation (even PostGIS doesn’t use it for the important parts last I checked). To some extent you are affirming that there is a lack of ambition in this part of the market in open source.
An additional issue is that the spheroidal implementations have undergone very little optimization, perhaps because they are not the defaults. So when people figure out how to turn them on, performance is suddenly terrible. Now you have people that believe spheroidal implementations are terribly slow, when in reality they just used a pathologically slow implementation. Really good performance-engineered spheroidal implementations are much faster than people assume based on the performance of open source implementations.
Good god.
QGIS is amazing. It's really great. It also came out in 2002, so I think the headline is safe.
I've been able to "CREATE EXTENSION postgis;" for more than a decade. There have been spatial extensions for PG, MySQL, Oracle, MS SQL Server, and SQLite for a long time. DuckDB doesn't make any material difference in how easy it is to install.
DuckDB on the other hand works with data as-is (Parquet, TSV, sqlite, postgres... whether on disk, S3, etc.) with requiring an ETL step (though if the data isn't already in a columnar format, things are gonna be slow... but it will still work).
I work with Parquet data directly with no ETL step. I can literally drop into Jupyter or a Python REPL and duckdb.query("from '*.parquet'")
Correct me if I'm wrong, but I don't think that's possible with Postgis. (even pg_parquet requires copying? [1])
[1] https://www.crunchydata.com/blog/pg_parquet-an-extension-to-...
I think this is a fundamental problem with the SQL pattern. You can try to make things just work, but when they fall then what?
SQL is a DSL and yes, all Domain Specific Languages will only enable what the engine parsing the DSL supports.
But all SQL database I'm aware of let you write custom extensions, which are exactly that: they extend the base functionality of the database with new paradigms. I.e postgis enabling geospatial in postgres or the extensions that enable fuzzy-matching/searching.
And as SQL is pretty much a turing-complete DSL, there is very little you can't do with it, even if the syntax might not agree with everyone
For geospatial analysis, the most important thing that could happen in software would be no longer treating it, either explicitly or implicitly, as having anything to do with cartography. Many use cases are not remotely map-driven but the tools require users to force everything through the lens of map-making.
Strongly agree with your sentiment around maps: most people can’t read them, they color the entire workflow and make it more complex, and (imo) lead to a general undervaluing of the geospatial field. Getting the data into columns means it’s usable by every department.
fithisux•10h ago