A short introduction to optimal transport and Wasserstein distance (2020)

https://alexhwilliams.info/itsneuronalblog/2020/10/09/optimal-transport/

40•sebg•5mo ago

Comments

smokel•5mo ago

This is very helpful for understanding generative AI. See for example the amazing lectures of Stefano Ermon for Stanford's CS236 Deep Generative Models [1]. All lectures are available on YouTube [2].

[1] https://deepgenerativemodels.github.io/

[2] https://youtube.com/playlist?list=PLoROMvodv4rPOWA-omMM6STXa...

jethkl•5mo ago

Wasserstein distance (Earth Mover’s Distance) measures how far apart two distributions are — the ‘work’ needed to reshape one pile of dirt into another. The concept extends to multiple distributions via a linear program, which under mild conditions can be solved with a linear-time greedy algorithm [1]. It’s an active research area with applications in clustering, computing Wasserstein barycenters (averaging distributions), and large-scale machine learning.

[1] https://en.wikipedia.org/wiki/Earth_mover's_distance#More_th...

ForceBru•5mo ago

Is the Wasserstein distance useful for parameter estimation instead of maximum likelihood? BTW, maximum likelihood basically estimates minimum KL divergence. All I see online and in papers is how to _compute_ the Wasserstein distance, which seems to be pretty hard in itself. In 1D, this requires computing a nasty integral of inverse CDFs when p!=1. Does it mean that "minimum Wasserstein estimation" is prohibitively expensive?

317070•5mo ago

It is.

But!

Wasserstein distances are used instead of a KL inside all kinds of VAE's and diffusion models, because while the Wasserstein distance is hard to compute, it is easy to make distributions whose expectation is the gradient wrt to the Wasserstein distance. So you can easily get unbiased gradients, and that is all you need to train big neural networks. [0] Pretty much any time you sample from your current and the target distribution and take the gradient of the distance between the points, you will be minimizing a Wasserstein distance.

[0] https://arxiv.org/abs/1711.01558

JustFinishedBSG•5mo ago

Wasserstein itself is expensive but you can instead optimize arbitrarily close entropic regularizations of it ( Sinkhorn algorithm) that are both easy to optimize and differentiable

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

The world heard JD Vance being booed at the Olympics. Except for viewers in USA

Haskell for all: Beyond agentic coding

OpenClaw Is Changing My Life

SectorC: A C Compiler in 512 bytes (2023)

Total surface area required to fuel the world with solar (2009)

Software factories and the agentic moment

Speed up responses with fast mode

LLMs as the new high level language

Hoot: Scheme on WebAssembly

Brookhaven Lab's RHIC concludes 25-year run with final collisions

The Architecture of Open Source Applications (Volume 1) Berkeley DB

Stories from 25 Years of Software Development

Vocal Guide – belt sing without killing yourself

Substack confirms data breach affects users’ email addresses and phone numbers

First Proof

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Vouch

Al Lowe on model trains, funny deaths and working with Disney

Start all of your commands with a comma (2009)

Show HN: A luma dependent chroma compression algorithm (image compression)

LineageOS 23.2

The AI boom is causing shortages everywhere else

FDA intends to take action against non-FDA-approved GLP-1 drugs

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Where did all the starships go?

Selection rather than prediction

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI