Optimizing writes to OLAP using buffers (ClickHouse, Redpanda, MooseStack)

https://www.fiveonefour.com/blog/optimizing-writes-to-olap-using-buffers

19•oatsandsugar•5d ago

Comments

flexiflex•5d ago

Weird, I always think real time when I think OLAP, but I guess that’s in the “consumption reactivity side” not the “batch inserts are good” side

boomskats•3h ago

See it's the exact opposite for me, although my experience is mostly a) building giant cubes in giant enterprise orgs with hourly data volumes you couldn't fit in memory, and b) 10-15 years old (so the hardware sucked and we didn't have duckDB). But yeah, I don't think the O in OLAP standing for 'online' ever really made sense.

I'm curious to know how much of this article is OLAP specific vs just generic good practice for tuning batch insert chunk size. The whole "batch your writes, use 100k rows or 1s worth of data" thing applies equally to pretty much any database, they're just ignoring the availability of builtin bulkload methods so they can arguing that INSERTs are slow so they can fix it by adding Kafka, for reasons? Maybe I'm missing something.

schmidtleonard•2h ago

Well yeah that's the sales pitch :)

It's a tradeoff. Analytics databases are often filled with periodic dumps of transactional databases and this feels so dirty that it's easy to accidentally forget that it isn't just a hack, it's actually a structural workaround for the poor random-write performance of analytics DBs:

OLTP = more read amplification on analytics workflows, less write amplification of random insert

OLAP = less read amplification on analytics workflows, more write amplification of random insert

If that's too theoretical, the other day I saw 1-row updates of about 10kb data lead to 1GB of writes in Redshift: 1MB block size times 300 columns times a log+shuffle factor of about 3. That's a write amplification factor of 100000. Crazy stuff.

coxley•2h ago

Off-topic rant: I hate when websites hide the scrollbar. By all means, apply minimal styling to make it cohesive with the website background and foreground. But don't completely hide it.

This is included on that page's stylesheet:

    ::-webkit-scrollbar {
        width: 0;
        height: 0;
        display: none;
    }

doix•1h ago

Another reason to use Firefox, it doesn't respect that CSS :)

bonobocop•1h ago

Why add RedPanda/Kafka over using async insert? https://clickhouse.com/docs/optimize/asynchronous-inserts

It’s recommended in the docs over the Buffer table, and is pretty much invisible to the end user.

At ClickHouse Inc itself, this scaled far beyond millions of rows per second: https://clickhouse.com/blog/building-a-logging-platform-with...

olavgg•15m ago

The biggest reason is that you may also have other consumers than just Clickhouse.

BERT Is Just a Single Text Diffusion Step

Commodore 64 Ultimate

DeepSeek OCR

Space Elevator

Servo v0.0.1 Released

Matrix Conference 2025 Highlights

How to stop Linux threads cleanly

Docker Systems Status: Full Service Disruption

Anthropic and Cursor Spend This Much on Amazon Web Services

Modeling Others' Minds as Code

Entire Linux Network stack diagram (2024)

Show HN: Playwright Skill for Claude Code – Less context than playwright-MCP

How to Enter a City Like a King

Pointer Pointer (2012)

AWS Multiple Services Down in us-east-1

The Peach meme: On CRTs, pixels and signal quality (again)

Forth: The programming language that writes itself

State-based vs Signal-based rendering

Qt Group Buys IAR Systems Group

AWS Outage: A Single Cloud Region Shouldn't Take Down the World. But It Did

Optimizing writes to OLAP using buffers (ClickHouse, Redpanda, MooseStack)

Fractal Imaginary Cubes

Novo Nordisk's Canadian Mistake

Major AWS Outage Happening

Introduction to reverse-engineering vintage synth firmware

Duke Nukem: Zero Hour N64 ROM Reverse-Engineering Project Hits 100%

Give Your Metrics an Expiry Date

Gleam OTP – Fault Tolerant Multicore Programs with Actors

Airliner hit by possible space debris

Major AWS outage takes down Fortnite, Alexa, Snapchat, and more

BERT Is Just a Single Text Diffusion Step

Commodore 64 Ultimate

DeepSeek OCR

Space Elevator

Servo v0.0.1 Released

Matrix Conference 2025 Highlights

How to stop Linux threads cleanly

Docker Systems Status: Full Service Disruption

Anthropic and Cursor Spend This Much on Amazon Web Services

Modeling Others' Minds as Code

Entire Linux Network stack diagram (2024)

Show HN: Playwright Skill for Claude Code – Less context than playwright-MCP

How to Enter a City Like a King

Pointer Pointer (2012)

AWS Multiple Services Down in us-east-1

The Peach meme: On CRTs, pixels and signal quality (again)

Forth: The programming language that writes itself

State-based vs Signal-based rendering

Qt Group Buys IAR Systems Group

AWS Outage: A Single Cloud Region Shouldn't Take Down the World. But It Did

Optimizing writes to OLAP using buffers (ClickHouse, Redpanda, MooseStack)

Fractal Imaginary Cubes

Novo Nordisk's Canadian Mistake

Major AWS Outage Happening

Introduction to reverse-engineering vintage synth firmware

Duke Nukem: Zero Hour N64 ROM Reverse-Engineering Project Hits 100%

Give Your Metrics an Expiry Date

Gleam OTP – Fault Tolerant Multicore Programs with Actors

Airliner hit by possible space debris

Major AWS outage takes down Fortnite, Alexa, Snapchat, and more

Optimizing writes to OLAP using buffers (ClickHouse, Redpanda, MooseStack)

Comments