Implementing a Kalman Filter in Postgres

https://neon.com/blog/implementing-a-kalman-filter-in-postgres-to-smooth-gps-data

71•carlotasoto•4mo ago

Comments

TrackerFF•4mo ago

Interestingly, in image 2, the filtered data seems to be worse than the actual noisy data?

Sure, the large spikes from sensor data were reduced, as seen with the blue line up in north which was considerably reduced, but seemingly at the cost of the more accurate tracks. We can see some "ground truth" - namely the map roads. I think if the source of the tracks are someone moving on a road (in a car etc.), it is safe to assume that the roads will be the most likely place to find them. In that image, it seems like we're seeing the tracks of some object moving on the road.

EDIT: But nice work anyway, I work a lot with noisy GPS data for vessels, where there are no roads - only shipping routes / paths, and increased GPS jamming in some areas makes prediction models more useful.

n4r9•4mo ago

Yeah, this sounds like a way to "smooth" the GPS trail to remove anomalies quickly, without paying attention to the road network.

The problem of snapping a noisy GPS trail to the road network is known as map-matching. Good map-matching algorithms tend to use hidden Markov models, which are sort of like discrete Kalman filters. The state of the model is something like "which road segment is the truck on", and the predictive step employs routing algorithms to calculate transition probabilities between states. This is a dynamic algorithm that can be done on the fly - i.e. as each GPS point comes in - but I'd be very reluctant to do it in postgres.

foota•4mo ago

So apps like Google maps do this? I'm always surprised when it jumps between roads. Like... You knew I've been on this road for the last ten minutes, you think I'm going to teleport into the tunnel beneath me?

n4r9•4mo ago

I'm not sure about Maps to be honest, but that sort of glitch is a strong indicator that they're just snapping to the nearest current road rather than doing proper routing calculations.

My Toyota has a speed limit symbol on the dashboard which will occasionally show the speed of a slip-road going onto the motorway I'm already on. I'm guessing it's a similar phenomenon.

whilenot-dev•4mo ago

I share the confusion. It depends on the measuring intent I guess, and it'd have been nice to say something about that and include some kind of indicator for these outliers. Here's the thing in Google Maps: https://www.google.com/maps/@47.1745904,7.2745602,14z/data=!...

From looking at the company website[0] I'd assume the goal could've been to get a better estimate about the total distance travelled during tracking analysis? Keeping that goal in mind, the error from the outliers was reduced significantly without causing too much disturbance on the accurate data. Nonetheless, including further measurements from speedo- and odometer in the sensor fusion at certain intervals would make this goal redundant and provide an even better estimate.

[0]: https://traconiq.ch/

fifilura•4mo ago

I have done this with AWS Athena. At the end of the day a kalman filter is just a number of multiplications and divisions.

My version would calculate one step at a time so it is a bit simplified (since that was a requirement, processing one measurement of incoming data daily). And also only in one dimension (here is two).

For the offline version (calculating many steps in a chunk), i'd imagine i'd use the array functions in Athena. But it may very well be possible to recreate using window functions. The state is just more column/columns after all.

em500•4mo ago

This is nfortunately limited to 2-dimensional state/measurements. In this case the covariance matrix is only 3 numbers, so the required linear algebra can be easily be done in a loop. The generic Kalman handles arbitrary dimensions, but requires general matrix multiplication and inversions, which are not easy to implement in Postgres.

Still, 2d is a useful special case, and if it addresses the problem at hand, there's no need to overbuild. (Even the 1d Kalman filter, which often boils down to exponential smoothing, is a useful special case.)

fifilura•4mo ago

I'd imagine 90% of the kalman filters out there are for 2 or maybe 3 dimensions, since the use case is mostly this, determining a position.

The filter fails is when there is not a single "true" answer to aim for, but there are many true answers. A position is clearly defined as long as it is not quantum physics.

thekoma•4mo ago

Yeah. Using the Kalman filter just to determine the position from noisy position measurements really undercuts the capability of the filter to use system physics to estimate the true state.

In one of the most common applications of Kalman filters, autonomous robots (e.g., a robot vacuum or a commercial drone), the filters are around 9 to 12 dimensions.

em500•4mo ago

Right, in addition to the position you usually want the velocity, and sometimes also the acceleration, in all dimensions. More ambitious (or optimistic) practitioners could add more sensor measurements, like gyroscopes.

fifilura•4mo ago

You are right of course and I was out of my depth. I wonder if the vector types now being added to databases for ML/AI stuff could help with this.

tech_ken•4mo ago

Wow this is extremely cool/impressive, but if my manager asked me to implement this I'd quit lol. The "state" headaches alone seem like a nightmare, nevermind all the whacky linear algebra you're going to have hand-roll (Like does Postgres even have a matrix type?? Did you have to implement matrix inversion in SQL from scratch?? I get nauseous just thinking about it.)

edit: I guess in 2D a lot of this becomes simpler than in general high-dimensions.

A Bid-Based NFT Advertising Grid

AI readability score for your documentation

NASA Study: Non-Biologic Processes Don't Explain Mars Organics

I inhaled traffic fumes to find out where air pollution goes in my body

X said it would give $1M to a user who had previously shared racist posts

155M US land parcel boundaries

Private Inference

Font Rendering from First Principles

Show HN: Seedance 2.0 AI video generator for creators and ecommerce

Wally: A fun, reliable voice assistant in the shape of a penguin

Rewriting Pycparser with the Help of an LLM

Lobsters Vibecoding Challenge

E-Commerce vs. Social Commerce

Avoiding Modern C++ – Anton Mikhailov [video]

Show HN: AegisMind–AI system with 12 brain regions modeled on human neuroscience

Zig – Package Management Workflow Enhancements

AI-powered text correction for macOS

AppSecMaster – Learn Application Security with hands on challenges

Fibonacci Number Certificates

AI Overviews are killing the web search, and there's nothing we can do about it

City skylines need an upgrade in the face of climate stress

1979: The Model World of Robert Symes [video]

Satellites Have a Lot of Room

1980s Farm Crisis

Show HN: FSID - Identifier for files and directories (like ISBN for Books)

Show HN: Holy Grail: Open-Source Autonomous Development Agent

Show HN: Minecraft Creeper meets 90s Tamagotchi

Show HN: Termiteam – Control center for multiple AI agent terminals

The only U.S. particle collider shuts down

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

A Bid-Based NFT Advertising Grid

AI readability score for your documentation

NASA Study: Non-Biologic Processes Don't Explain Mars Organics

I inhaled traffic fumes to find out where air pollution goes in my body

X said it would give $1M to a user who had previously shared racist posts

155M US land parcel boundaries

Private Inference

Font Rendering from First Principles

Show HN: Seedance 2.0 AI video generator for creators and ecommerce

Wally: A fun, reliable voice assistant in the shape of a penguin

Rewriting Pycparser with the Help of an LLM

Lobsters Vibecoding Challenge

E-Commerce vs. Social Commerce

Avoiding Modern C++ – Anton Mikhailov [video]

Show HN: AegisMind–AI system with 12 brain regions modeled on human neuroscience

Zig – Package Management Workflow Enhancements

AI-powered text correction for macOS

AppSecMaster – Learn Application Security with hands on challenges

Fibonacci Number Certificates

AI Overviews are killing the web search, and there's nothing we can do about it

City skylines need an upgrade in the face of climate stress

1979: The Model World of Robert Symes [video]

Satellites Have a Lot of Room

1980s Farm Crisis

Show HN: FSID - Identifier for files and directories (like ISBN for Books)

Show HN: Holy Grail: Open-Source Autonomous Development Agent

Show HN: Minecraft Creeper meets 90s Tamagotchi

Show HN: Termiteam – Control center for multiple AI agent terminals

The only U.S. particle collider shuts down

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

Implementing a Kalman Filter in Postgres

Comments