SedonaDB: A new geospatial DataFrame library written in Rust

ZeroCool2u · 2025-09-24T18:45:00 1758739500

Everyone asking why this exists when DuckDB or PostGIS or the JVM based Sedona already exists, clearly has not run into the painful experience of working on these large geospatial workloads when the legacy options are either not viable or not an option for other reasons, which happens more often than you might expect! And the CRS awareness!!! Incredible! This is such a huge source of error when you throw folks that are doing their best, but don't have a lot of experience with GIS workloads. Very expensive queries have had to be rerun with drastic changes to the results, because someone got their CRS mixed up.

I don't get to do geospatial work as much anymore, but I would have killed for this just a year ago.

jinjin2 · 2025-09-25T05:04:07 1758776647

I usually start with PostGIS for single-node workloads and then switch to Exasol when I get to truly massive datasets (Exasol has a more limited set of spatial operators, but scales effortlessly across multiple nodes).

It will be great with some more options in this space, especially if it makes a smooth transition from single-node/local interactions to multi-node scale-out.

throwmeaway222 · 2025-09-24T19:03:04 1758740584

well for one, it's not crashing at some larger use-cases when duckdb does. according to the graph unless I'm mis-reading

cbzbc · 2025-09-24T19:16:55 1758741415

I'd like to know the details of the errors -- because it could have been as simple as running out of memory.

ratmice · 2025-09-25T02:06:13 1758765973

I doubt this hypothesis, because duckdb written in c++ should be able to tolerate memory failure, while this written in rust has to deal with rusts memory allocation failures are panic's behavior.

That is to say that if the issue is duckdb running out of memory, it is most likely because the rust implementation is using memory more efficiently for whatever query is crashing duckdb, rather than graceful handling of memory allocation failure.

Where it is possible in c++ to gracefully handle memory allocation failure, it is not really a thing in rust I'm not even sure whether it is possible to catch_unwind it. I say this as a rust person who doesn't fancy c++ in the slightest...

deschutes · 2025-09-25T04:16:08 1758773768

You cannot use the rust standard library in environments where arbitrary allocations may fail but neither can you use the STL. The difference is the rust standard library doesn't pretend that it has some reasonable way to deal with allocation failure. std::bad_alloc is mainly a parlor trick used to manufacture the idea that copy and move fallibility are reasonable things.

I wouldn't wager a nickel on someone's life if it depended on embedded STL usage.

vlovich123 · 2025-09-25T02:27:35 1758767255

I’ve never seen anyone try to catch allocation failures in C++ code and in many cases doing so correctly is very difficult, not least of which is that writing exception-safe code is the exception, not the rule.

eptcyka · 2025-09-26T19:08:06 1758913686

There's an effort to expose allocation errors in the standard library for the Linux kernel. Pretty sure it is well under way.

cyanydeez · 2025-09-24T22:48:19 1758754099

OOM are still something a DB can "avoid" so it's not like that class of bugs is some special issue that nullifies thing.

cinntaile · 2025-09-24T21:26:16 1758749176

Crashing when running out of memory is not acceptable software behavior in my opinion.

cbzbc · 2025-09-24T22:37:22 1758753442

Right, but all it says is that an error was thrown.

MrPowers · 2025-09-24T21:23:39 1758749019

You can generate the dataset with the instructions in this readme: https://github.com/apache/sedona-spatialbench/tree/main

Here are the queries: https://github.com/apache/sedona-spatialbench/blob/main/prin...

They should be fairly easy to replicate!

larodi · 2025-09-24T16:58:39 1758733119

Somehow I dont see this applicable for 90% of all current spatial needs, where PostGIS does just right, and same IMHO goes for DuckDB. There perhaps exists 10% of business where data is so immense you want to hit it with Rust & whatnot, but all others do just fine im Postgre.

My bet is most of actually useful spatial ST_ functions are not implemented in this one, as they are not in the DuckDB offering.

mattforrest · 2025-09-24T17:49:43 1758736183

I wrote a book on PostGIS and used it for years and these single node analytical tools make sense when PostGIS performance starts to break down. For many tasks PostGIS works great, but again you are limited by the fact that your tables have to live in the DB and can only scale as much as the computing resources you have allocated.

In terms of number of functions PostGIS is still the leader, but for analytical functions (spatial relationships, distances, etc) having those in place in these systems is important. DuckDB started this but this has a spatial focused engine. You can use the two together, PostGIS for transactional processing and queries, and then SedonaDB for processing and data prep.

A combination of tools makes a lot of sense here especially as the data starts to grow.

larodi · 2025-09-25T16:27:21 1758817641

Not saying these shouldn't be used together, but even then, increased complexity will pay only in very limited scenarios. The generic SQLite can perhaps handle 80% of all wordpress needs.

Postgres made gigantic leaps in recent years - both in performance and feature-set. I don't think ever comparing the new contenders with daddy is fair. But then there are the DuckDB advocates who claim it pioneered spatial, which is so much not true.

Postgres is amazing system, which is also available free. We don;t have too many of these, and too many aging that well.

th0ma5 · 2025-09-24T18:04:52 1758737092

I think this is a great perspective in my professional experience it was very common to be using multiple tools. ESRI for some things, GDAL for others, and then some hacks here and there like most complex analytical systems. Some of it vendor shenanigans but some of it is specific features.

wichert · 2025-09-25T06:47:45 1758782865

Not having a dependency on a running service can be an excellent reason to use tools like DuckDB.

czbond · 2025-09-24T17:57:01 1758736621

Question: Does SedonaDB support custom / alternative coordinate systems?

For example, if i wanted to define a 4d region called (fish, towel, mouse, alien) and there were floats for each of fish/towel/mouse/alien?

paleolimbot · 2025-09-24T18:18:25 1758737905

SedonaDB can decode PROJJSON and authority:code CRSes at the moment, although the underlying representation is just a string. In this case you might want something like CZBOND:999 or

{ "type": "EngineeringCRS", "name": "Fish, Towel, Mouse", "datum": {"name": "Wet Kitty + Mouse In Peril"}, "coordinate_system": { "subtype": "Cartesian", "axis": [ {"name": "Fish", "abbreviation": "F", "direction": "east"}, {"name": "Towel", "abbreviation": "T", "direction": "north"}, {"name": "Mouse", "abbreviation": "M", "direction": "up"}, ] } }

(Subject to the limitations of PROJJSON, such as a 4D CRS having a temporal axis and a limited set of acceptable "direction" values)

czbond · 2025-09-24T19:28:24 1758742104

Baller references and customization. Thank you for taking the time to craft that, I really appreciate it. Looking now because that was a main requirement of mine)

gangtao · 2025-09-25T00:20:58 1758759658

I found a simple way to be success, rewrite some exisinting project using rust!

drewda · 2025-09-24T16:29:11 1758731351

Interesting, but why share the Sedona name?

I thought Apache Sedona is implemented in Java/Scala for distributed runtimes like Spark and Flink. Wouldn't Rust tooling for interactive use be built atop a completely different stack?

paleolimbot · 2025-09-25T02:27:24 1758767244

It's built on a separate stack but conceptually it's very similar (DataFusion shares a number of idioms with Spark and has a number of projects implementing various Spark compatibility)...I think the idea was to bring the successful pieces of Sedona Spark to a wider audience.

ZeroCool2u · 2025-09-24T17:22:29 1758734549

Apache Sedona, not so well loved in the GIS space in my experience, so I don't think it's a huge issue even if it is a bit confusing.

benrutter · 2025-09-24T20:03:55 1758744235

Wait is this a sifferent.apache sedona to the spark based apache sedona GIS dataframe engine I've cone into contact with before?

Surely they're the same? Two sedona projects is one thing, but two apache sedona projects is sheer madness?

ZeroCool2u · 2025-09-24T20:30:57 1758745857

Yes, this is Apache SedonaDB and the other is just Apache Sedona (Spark)

whinvik · 2025-09-24T18:15:58 1758737758

What is the advantage over Duckdb with Spatial Extension.

paleolimbot · 2025-09-24T18:24:09 1758738249

Currently, lazier GeoParquet reads, a K-nearest neigbours join, Coordinate Reference System tracking, and built-in GeoPandas IO. These aren't things that DuckDB spatial can't or won't do, but they are things that DuckDB hasn't prioritized over the last year that are essential to a lot of spatial pipelines.

neilfrndes · 2025-09-24T18:25:21 1758738321

While DuckDB is excellent, I've found the spatial extension still has some rough edges compared to more mature solutions like PostGIS.

1. The latitude/longitude ordering for points differs from PostGIS and most standard geospatial libraries, which creates friction due to muscle memory.

2. Anecdotal: spatial joins haven't matched PostGIS performance for similar operations, though this may vary by use case and data size.

3. The spatial extension has a backlog of long-standing GitHub issues.

WD-42 · 2025-09-24T17:12:20 1758733940

I’ve been out of the geo loop for a while. I’m struggling to understand why I’d use this over postgis. There used to be the argument that installing extensions was painful, but now that docker exists pulling the postgis image is just as easy as normal Postgres. And RDS has supported it for a while.

What am I missing? The api even looks the same.

paleolimbot · 2025-09-24T17:15:11 1758734111

PostGIS is great when your data is already in a Postgres table! SedonaDB and DuckDB are much faster when your data starts elsewhere (e.g., GeoParquet files).

MrPowers · 2025-09-24T17:22:12 1758734532

The "DuckDB is probably the most important geospatial software of the last decade" post has a nice related discussion: https://news.ycombinator.com/item?id=43881468

WD-42 · 2025-09-24T18:09:02 1758737342

Oh I see. So if you have some kind of a pipeline in which you don’t need or want to load the data into a DB first. That makes total sense, thanks!

zigzag312 · 2025-09-24T18:47:58 1758739678

Looks interesting for more efficient geospatial operations. Congratulations!

0x9e3779b6 · 2025-09-24T21:38:07 1758749887

There is another great lib built on Apache Arrow - polars dataframe, which has amazing DSL.

It comes a disappointment for me that SedonaDB hasn’t adopted a similar approach.

Apache stack provides everything needed, but for small things I would not prefer SQL exactly

paleolimbot · 2025-09-25T02:42:31 1758768151

Agreed that the polars interface is far superior to SQL! There are a few ways to do this if there's interest...polars wasn't an option because we needed Arrow extension types (https://github.com/pola-rs/polars/issues/9112).

mkesper · 2025-09-24T16:53:04 1758732784

Please don't use emojis in titles. Immediately looks like written by AI.

thirtygeo · 2025-09-25T05:17:47 1758777467

... Was likely written by AI

dr-jia-yu · 2025-09-25T05:33:23 1758778403

This was written by the Sedona team. And I asked AI to put some colors to the blog post.

itsthecourier · 2025-09-25T00:38:15 1758760695

PostGIS not being included in benchmarks got me suspicious

dr-jia-yu · 2025-09-25T04:18:46 1758773926

You’re absolutely asking the right question. As we noted in Future Work section of the SpatialBench result (https://sedona.apache.org/spatialbench/single-node-benchmark...), this benchmark is focused on geospatial analytical queries. For these workloads, features like columnar layout, vectorized execution, zero-copy, and zero SerDe provide huge performance benefits.

While PostGIS is often used for spatial analytics because of its rich spatial function coverage, it is fundamentally a transactional database. This design makes it less suited for analytical query performance, and including it directly in SpatialBench would risk claims of being an “apples-to-oranges” comparison. That’s why we exclude PostGIS from the published benchmark results.

That said, we do continuously validate against PostGIS. For every single function in SedonaDB, we maintain an automated PyTest benchmark framework (https://github.com/apache/sedona-db/tree/main/benchmarks) that compares both speed and correctness against DuckDB and PostGIS. This ensures we catch regressions early and guarantees correctness. You can even run these benchmarks yourself to see how SedonaDB performs. It is often extremely fast in practice.

dzonga · 2025-09-24T21:12:44 1758748364

is written in Rust a value add ? vs say Accessible not only via Python but first party support in Ruby, C#, Javascript ?

MrPowers · 2025-09-24T21:29:16 1758749356

Rust is a good language for performant computing in general, but especially for data projects because there are so many great OSS data libraries like DataFusion and Arrow.

SedonaDB currently supports SQL, Python, R, and Rust APIs. We can support APIs for other languages in the future. That's another nice part about Rust. There are lots of libraries to expose other language bindings to Rust projects.

jrozner · 2025-09-24T16:50:55 1758732655

Whats the point of this over polars?

MrPowers · 2025-09-24T16:55:39 1758732939

There is a project called GeoPolars: https://github.com/geopolars/geopolars

From the README:

> Update (August 2024): GeoPolars is blocked on Polars supporting Arrow extension types, which would allow GeoPolars to persist geometry type information and coordinate reference system (CRS) metadata. It's not feasible to create a geopolars. GeoDataFrame as a subclass of a polars. DataFrame (similar to how the geopandas. GeoDataFrame is a subclass of pandas.DataFrame) because polars explicitly does not support subclassing of core data types.

orlp · 2025-09-24T20:22:03 1758745323

I'm working on implementing extension types in Polars. Stay tuned.

jedisct1 · 2025-09-24T16:31:20 1758731480

Was "written in Rust" really necessary?

What does it do besides being written in Rust?

MrPowers · 2025-09-24T16:50:01 1758732601

SedonaDB builds on libraries in the Rust ecosystem, like Apache DataFusion, to provide users with a nice geospatial DataFrame experience. It has functions like ST_Intersects that are common in spatial libraries, but not standard in most DataFrame implementations.

There are other good alternatives, such as GeoPandas and DuckDB Spatial. SedonaDB has Python/SQL APIs and is very fast. New features like full raster support and compatibility with lakehouse formats are coming soon!

tomtom1337 · 2025-09-24T16:41:14 1758732074

Look, this article is absolutely excellent, and answers your questions. Please read the article before commenting this sort of thing.

As someone who has had to use geopandas a lot, having something which is up to an order of magnitude faster is a real dream come true.