Scaling request logging with ClickHouse, Kafka, and Vector

https://www.geocod.io/code-and-coordinates/2025-10-02-from-millions-to-billions/

55•mjwhansen•5d ago

Comments

rozenmd•1h ago

Great write-up!

I had a similar project back in August when I realised my DB's performance (Postgres) was blocking me from implementing features users commonly ask for (querying out to 30 days of historical uptime data).

I was already blown away at the performance (200ms to query what Postgres was doing in 500-600ms), but then I realized I hadn't put an index on the Clickhouse table. Now the query returns in 50-70ms, and that includes network time.

nasretdinov•54m ago

BTW you could've used e.g. kittenhouse (https://github.com/YuriyNasretdinov/kittenhouse, my fork) or just a simpler buffer table, with 2 layers and a larger aggregation period than in the example.

Alternatively, you could've used async insert functionality built into ClickHouse: https://clickhouse.com/docs/optimize/asynchronous-inserts . All of these solutions are operationally simpler than Kafka + Vector, although obviously it's all tradeoffs.

devmor•51m ago

There were a lot of simpler options that came to mind while reading through this, frankly.

But I imagine the writeup eschews myriad future concerns and does not entirely illustrate the pressure and stress of trying to solve such a high-scale problem.

Ultimately, going with a somewhat more complex solution that involves additional architecture but has been tried and tested by a 3rd party that you trust can sometimes be the more fitting end result. Assurance often weighs more than simplicity, I think.

nasretdinov•43m ago

While kittenhouse is, unfortunately, abandonware (even though you can still use it and it works), you can't say the same about e.g. async inserts in ClickHouse: it's a very simple and robust solution to tackle exactly the problem the PHP (and some other languages') backends often face when trying to use ClickHouse

frenchmajesty•42m ago

Thanks for sharing I enjoyed reading this.

tlaverdure•36m ago

Thanks for sharing. I really enjoyed the breakdown, and great to see small tech companies helping each other out!

mperham•7m ago

Seems weird not to use Redis as the buffering layer + minutely cron job. Seems a lot simpler than installing Kafka + Vector.

NanoChat – The best ChatGPT that $100 can buy

First device based on 'optical thermodynamics' can route light without switches

Show HN: SQLite Online – 11 years of solo development, 11K daily users

Dutch government takes control of Chinese-owned chipmaker Nexperia

Root cause analysis? You're doing it wrong

JSON River – Parse JSON incrementally as it streams in

Scaling request logging with ClickHouse, Kafka, and Vector

Ask HN: Has AI stolen the satisfaction from programming?

Android's sideloading limits are its most anti-consumer move

CRDT and SQLite: Local-First Value Synchronization

Spotlight on pdfly, the Swiss Army knife for PDF files

Optery (YC W22) – Hiring Tech Lead with Node.js Experience (U.S. & Latin America)

More random home lab things I've recently learned

Reverse Engineering a 1979 Camera's Spec

American solar farms

The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 2025

Smartphones and being present

Programming in Assembly Is Brutal, Beautiful, and Maybe Even a Path to Better AI

Roger Dean – His legendary artwork in gaming history (Psygnosis)

MPTCP for Linux

Strudel REPL – a music live coding environment living in the browser

Environment variables are a legacy mess: Let's dive deep into them

AI and the Future of American Politics

Control your Canon Camera wirelessly

Clockss: Digital preservation services run by academic publishers and libraries

Putting a dumb weather station on the internet

America's future could hinge on whether AI slightly disappoints

Jeep software update bricks vehicles, leaves owners stranded

LaTeXpOsEd: A Systematic Analysis of Information Leakage in Preprint Archives

Ofcom fines 4chan £20K and counting for violating UK's Online Safety Act

Scaling request logging with ClickHouse, Kafka, and Vector

Comments

NanoChat – The best ChatGPT that $100 can buy

First device based on 'optical thermodynamics' can route light without switches

Show HN: SQLite Online – 11 years of solo development, 11K daily users

Dutch government takes control of Chinese-owned chipmaker Nexperia

Root cause analysis? You're doing it wrong

JSON River – Parse JSON incrementally as it streams in

Scaling request logging with ClickHouse, Kafka, and Vector

Ask HN: Has AI stolen the satisfaction from programming?

Android's sideloading limits are its most anti-consumer move

CRDT and SQLite: Local-First Value Synchronization

Spotlight on pdfly, the Swiss Army knife for PDF files

Optery (YC W22) – Hiring Tech Lead with Node.js Experience (U.S. & Latin America)

More random home lab things I've recently learned

Reverse Engineering a 1979 Camera's Spec

American solar farms

The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 2025

Smartphones and being present

Programming in Assembly Is Brutal, Beautiful, and Maybe Even a Path to Better AI

Roger Dean – His legendary artwork in gaming history (Psygnosis)

MPTCP for Linux

Strudel REPL – a music live coding environment living in the browser

Environment variables are a legacy mess: Let's dive deep into them

AI and the Future of American Politics

Control your Canon Camera wirelessly

Clockss: Digital preservation services run by academic publishers and libraries

Putting a dumb weather station on the internet

America's future could hinge on whether AI slightly disappoints

Jeep software update bricks vehicles, leaves owners stranded

LaTeXpOsEd: A Systematic Analysis of Information Leakage in Preprint Archives

Ofcom fines 4chan £20K and counting for violating UK's Online Safety Act