Apache Iceberg V3 Spec new features for more efficient and flexible data lakes

https://opensource.googleblog.com/2025/08/whats-new-in-iceberg-v3.html

47•talatuyarer•3h ago

Comments

talatuyarer•2h ago

This new version has some great new features, including deletion vectors for more efficient transactions and default column values to make schema evolution a breeze. The full article has all the details.

hodgesrm•2h ago

This Google article was nice as a high level overview of Iceberg V3. I wish that the V3 spec (and Iceberg specs in general) were more readable. For now the best approach seems to be read the Javadoc for the Iceberg Java API. [0]

[0] https://javadoc.io/doc/org.apache.iceberg/iceberg-api/latest...

twoodfin•58m ago

The Iceberg spec is a model of clarity and simplicity compared to the (constantly in flux via Databricks commits…) Delta protocol spec:

https://github.com/delta-io/delta/blob/master/PROTOCOL.md

ahmetburhan•1h ago

Cool to see Iceberg getting these kinds of upgrades. Deletion vectors and default column values sound like real quality-of-life improvements, especially for big, messy datasets. Curious to hear if anyone’s tried V3 in production yet and what the performance looks like.

amluto•1h ago

> ALTER TABLE events ADD COLUMN version INT DEFAULT 1;

I’ve always disliked this approach. It conflates two things: the value to put in preexisting rows and the default going forward. I often want to add a column, backfill it, and not have a default.

Fortunately, the Iceberg spec at least got this right under the hood. There’s “initial-default”, which is the value implicitly inserted in rows that predate the addition of the column, and there’s “write-default”, which is the default for new rows.

drivenextfunc•59m ago

Many companies seem to be using Apache Iceberg, but the ecosystem feels immature outside of Java. For instance, iceberg-rust doesn't even support HDFS. (Though admittedly, Iceberg's tendency to create many small files makes it a poor fit for HDFS anyway.)

hodgesrm•34m ago

Seems like this is going to be a permanent issue, no? Library level storage APIs are complex and often quite leaky. That's based on looking at the innards of MySQL and ClickHouse for a while.

It seems quite possible that there will be maybe three libraries that can write to Iceberg (Java, Python, Rust, maybe Golang), while the rest at best will offer read access only. And those language choices will condition and be conditioned by the languages that developers use to write applications that manage Iceberg data.

Wikipedia loses challenge against Online Safety Act

I tried every todo app and ended up with a .txt file

Neki – sharded Postgres by the team behind Vitess

GitHub is no longer independent at Microsoft after CEO resignation

Claude Is the Drug, Cursor Is the Dealer

OpenSSH Post-Quantum Cryptography

Byte Buddy is a code generation and manipulation library for Java

The Joy of Mixing Custom Elements, Web Components, and Markdown

The Value of Institutional Memory

Pricing Pages – A Curated Gallery of Pricing Page Designs

How Boom uses software to accelerate hardware development

UI vs. API. vs. UAI

Trellis (YC W24) Is Hiring: Automate Prior Auth in Healthcare

Learn, Reflect, Apply, Prepare: The Four Daily Practices That Changed How I Live

Claude Code is all you need

The Chrome VRP Panel has decided to award $250k for this report

White Mountain Direttissima

36B solar mass black hole at centre of the Cosmic Horseshoe gravitational lens

AP to end its weekly book reviews

Launch HN: Halluminate (YC S25) – Simulating the internet to train computer use

A Guide Dog for the Face-Blind

Porting to OS/2 – GitPius

Designing Software in the Large

Faster substring search with SIMD in Zig

Mistral Integration Improved in Llama.cpp

Token growth indicates future AI spend per dev

Apache Iceberg V3 Spec new features for more efficient and flexible data lakes

Optimizing my sleep around Claude usage limits

A simple pixel physics simulator in Rust using Macroquad

Ollama and gguf

Apache Iceberg V3 Spec new features for more efficient and flexible data lakes

Comments

Wikipedia loses challenge against Online Safety Act

I tried every todo app and ended up with a .txt file

Neki – sharded Postgres by the team behind Vitess

GitHub is no longer independent at Microsoft after CEO resignation

Claude Is the Drug, Cursor Is the Dealer

OpenSSH Post-Quantum Cryptography

Byte Buddy is a code generation and manipulation library for Java

The Joy of Mixing Custom Elements, Web Components, and Markdown

The Value of Institutional Memory

Pricing Pages – A Curated Gallery of Pricing Page Designs

How Boom uses software to accelerate hardware development

UI vs. API. vs. UAI

Trellis (YC W24) Is Hiring: Automate Prior Auth in Healthcare

Learn, Reflect, Apply, Prepare: The Four Daily Practices That Changed How I Live

Claude Code is all you need

The Chrome VRP Panel has decided to award $250k for this report

White Mountain Direttissima

36B solar mass black hole at centre of the Cosmic Horseshoe gravitational lens

AP to end its weekly book reviews

Launch HN: Halluminate (YC S25) – Simulating the internet to train computer use

A Guide Dog for the Face-Blind

Porting to OS/2 – GitPius

Designing Software in the Large

Faster substring search with SIMD in Zig

Mistral Integration Improved in Llama.cpp

Token growth indicates future AI spend per dev

Apache Iceberg V3 Spec new features for more efficient and flexible data lakes

Optimizing my sleep around Claude usage limits

A simple pixel physics simulator in Rust using Macroquad

Ollama and gguf