frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

What Is a Collation, and Why Is My Data Corrupt? – PG Phridays with Shaun Thomas

2•pgedge_postgres•1h ago
Postgres has relied on the OS to handle text sorting for most of its history. When glibc 2.28 shipped in 2018 with a major Unicode collation overhaul, every existing text index built under the old rules became invalid... but silently. No warnings, no errors. Just wrong query results and missed rows.

Postgres 17 added a builtin locale provider that removes the external dependency entirely:

initdb --locale-provider=builtin --locale=C.UTF-8

This change helps sorting to become stable across OS upgrades. glibc is still the default in Postgres 18, so this must be specified when creating a new cluster.

For clusters already running: Postgres 13+ will log a warning when a collation version changes. That warning is an instruction to rebuild affected indexes.

Get more details here in this week's PG Phriday blog post from Shaun Thomas: https://www.pgedge.com/blog/what-is-a-collation-and-why-is-my-data-corrupt