frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN

https://vibecolors.life/
1•tusharnaik•30s ago•0 comments

OpenAI is Broke and so is everyone else [video][10M]

https://www.youtube.com/watch?v=Y3N9qlPZBc0
1•Bender•52s ago•0 comments

We interfaced single-threaded C++ with multi-threaded Rust

https://antithesis.com/blog/2026/rust_cpp/
1•lukastyrychtr•2m ago•0 comments

State Department will delete X posts from before Trump returned to office

https://text.npr.org/nx-s1-5704785
2•derriz•2m ago•0 comments

AI Skills Marketplace

https://skly.ai
1•briannezhad•2m ago•1 comments

Show HN: A fast TUI for managing Azure Key Vault secrets written in Rust

https://github.com/jkoessle/akv-tui-rs
1•jkoessle•2m ago•0 comments

eInk UI Components in CSS

https://eink-components.dev/
1•edent•3m ago•0 comments

Discuss – Do AI agents deserve all the hype they are getting?

1•MicroWagie•6m ago•0 comments

ChatGPT is changing how we ask stupid questions

https://www.washingtonpost.com/technology/2026/02/06/stupid-questions-ai/
1•edward•7m ago•0 comments

Zig Package Manager Enhancements

https://ziglang.org/devlog/2026/#2026-02-06
2•jackhalford•8m ago•1 comments

Neutron Scans Reveal Hidden Water in Martian Meteorite

https://www.universetoday.com/articles/neutron-scans-reveal-hidden-water-in-famous-martian-meteorite
1•geox•9m ago•0 comments

Deepfaking Orson Welles's Mangled Masterpiece

https://www.newyorker.com/magazine/2026/02/09/deepfaking-orson-welless-mangled-masterpiece
1•fortran77•11m ago•1 comments

France's homegrown open source online office suite

https://github.com/suitenumerique
3•nar001•13m ago•1 comments

SpaceX Delays Mars Plans to Focus on Moon

https://www.wsj.com/science/space-astronomy/spacex-delays-mars-plans-to-focus-on-moon-66d5c542
1•BostonFern•13m ago•0 comments

Jeremy Wade's Mighty Rivers

https://www.youtube.com/playlist?list=PLyOro6vMGsP_xkW6FXxsaeHUkD5e-9AUa
1•saikatsg•14m ago•0 comments

Show HN: MCP App to play backgammon with your LLM

https://github.com/sam-mfb/backgammon-mcp
2•sam256•16m ago•0 comments

AI Command and Staff–Operational Evidence and Insights from Wargaming

https://www.militarystrategymagazine.com/article/ai-command-and-staff-operational-evidence-and-in...
1•tomwphillips•16m ago•0 comments

Show HN: CCBot – Control Claude Code from Telegram via tmux

https://github.com/six-ddc/ccbot
1•sixddc•17m ago•1 comments

Ask HN: Is the CoCo 3 the best 8 bit computer ever made?

2•amichail•19m ago•1 comments

Show HN: Convert your articles into videos in one click

https://vidinie.com/
3•kositheastro•22m ago•1 comments

Red Queen's Race

https://en.wikipedia.org/wiki/Red_Queen%27s_race
2•rzk•22m ago•0 comments

The Anthropic Hive Mind

https://steve-yegge.medium.com/the-anthropic-hive-mind-d01f768f3d7b
2•gozzoo•25m ago•0 comments

A Horrible Conclusion

https://addisoncrump.info/research/a-horrible-conclusion/
1•todsacerdoti•25m ago•0 comments

I spent $10k to automate my research at OpenAI with Codex

https://twitter.com/KarelDoostrlnck/status/2019477361557926281
2•tosh•26m ago•1 comments

From Zero to Hero: A Spring Boot Deep Dive

https://jcob-sikorski.github.io/me/
1•jjcob_sikorski•26m ago•0 comments

Show HN: Solving NP-Complete Structures via Information Noise Subtraction (P=NP)

https://zenodo.org/records/18395618
1•alemonti06•31m ago•1 comments

Cook New Emojis

https://emoji.supply/kitchen/
1•vasanthv•34m ago•0 comments

Show HN: LoKey Typer – A calm typing practice app with ambient soundscapes

https://mcp-tool-shop-org.github.io/LoKey-Typer/
1•mikeyfrilot•37m ago•0 comments

Long-Sought Proof Tames Some of Math's Unruliest Equations

https://www.quantamagazine.org/long-sought-proof-tames-some-of-maths-unruliest-equations-20260206/
1•asplake•38m ago•0 comments

Hacking the last Z80 computer – FOSDEM 2026 [video]

https://fosdem.org/2026/schedule/event/FEHLHY-hacking_the_last_z80_computer_ever_made/
2•michalpleban•38m ago•0 comments
Open in hackernews

Scaling request logging with ClickHouse, Kafka, and Vector

https://www.geocod.io/code-and-coordinates/2025-10-02-from-millions-to-billions/
136•mjwhansen•4mo ago

Comments

rozenmd•3mo ago
Great write-up!

I had a similar project back in August when I realised my DB's performance (Postgres) was blocking me from implementing features users commonly ask for (querying out to 30 days of historical uptime data).

I was already blown away at the performance (200ms to query what Postgres was doing in 500-600ms), but then I realized I hadn't put an index on the Clickhouse table. Now the query returns in 50-70ms, and that includes network time.

fermuch•3mo ago
Materialized views are a great tool for aggregating data in CH since they are automatically updated on insert from the original table. I recommend you to take a look and try it out, maybe it'll go down to single digit milliseconds!
ansgri•3mo ago
And there are 2 kinds of those: the other is refreshable materialized views, which run on schedule, can have dependencies between them, thus can implement quite complex data transformation pipelines.
nasretdinov•3mo ago
BTW you could've used e.g. kittenhouse (https://github.com/YuriyNasretdinov/kittenhouse, my fork) or just a simpler buffer table, with 2 layers and a larger aggregation period than in the example.

Alternatively, you could've used async insert functionality built into ClickHouse: https://clickhouse.com/docs/optimize/asynchronous-inserts . All of these solutions are operationally simpler than Kafka + Vector, although obviously it's all tradeoffs.

devmor•3mo ago
There were a lot of simpler options that came to mind while reading through this, frankly.

But I imagine the writeup eschews myriad future concerns and does not entirely illustrate the pressure and stress of trying to solve such a high-scale problem.

Ultimately, going with a somewhat more complex solution that involves additional architecture but has been tried and tested by a 3rd party that you trust can sometimes be the more fitting end result. Assurance often weighs more than simplicity, I think.

nasretdinov•3mo ago
While kittenhouse is, unfortunately, abandonware (even though you can still use it and it works), you can't say the same about e.g. async inserts in ClickHouse: it's a very simple and robust solution to tackle exactly the problem the PHP (and some other languages') backends often face when trying to use ClickHouse
ajayvk•3mo ago
Yes, had similar questions. Wouldn't tuning the settings for the buffer table have helped avoid the TOO_MANY_LINKS error?
frenchmajesty•3mo ago
Thanks for sharing I enjoyed reading this.
tlaverdure•3mo ago
Thanks for sharing. I really enjoyed the breakdown, and great to see small tech companies helping each other out!
mperham•3mo ago
Seems weird not to use Redis as the buffering layer + minutely cron job. Seems a lot simpler than installing Kafka + Vector.
SteveNuts•3mo ago
Vector is very simple to operate and (mostly) stateless, and can handle buffering if you choose.

Kafka and Redis is a "pick your poison" IMO, scaling and operating those have their own headaches.

otterley•3mo ago
Redis isn’t a good durable message queue.
albertgoeswoof•3mo ago
Currently at the millions stage with https://mailpace.com relying mostly on Postgres

Tbh this terrifies me! We don’t just have to log the requests but also store the full emails for a few days, and they can be up to 50 mib in total size.

But it will be exciting when we get there!

fnord77•3mo ago
How does Clickhouse compare to Druid, Pinot or Star Tree?
jamesblonde•3mo ago
Here's a good performance study by OneHouse comparing Clickhouse, StarRocks, Trino:

https://www.onehouse.ai/blog/apache-spark-vs-clickhouse-vs-p...

Druid is real-time analytics, similar to Clickhouse. StarRocks is best at Joins - Clickhouse is not good for joins.

manish_gill•3mo ago
> Clickhouse is not good for joins

This is less and less true as time goes on tbh. 25.9 introduced Join Reordering as well - https://clickhouse.com/blog/clickhouse-release-25-09

saisrirampur•3mo ago
Sai from ClickHouse here. Very compelling story! Really love your emphasis on using the right tool for the right job - power of row vs column stores.

We recently added a MySQL/MariaDB CDC connector in ClickPipes on ClickHouse Cloud. This would have simplified your migration from MariaDB.

https://clickhouse.com/docs/integrations/clickpipes/mysql https://clickhouse.com/docs/integrations/clickpipes/mysql/so...

ch2026•3mo ago
1) clickhouse async_insert would have solved all your issues: https://clickhouse.com/docs/optimize/asynchronous-inserts

1a) If you’re still having too many files/parts, then fix your partition by, and mergetree primary key.

2) why are you writing to kafka when vector dev does buffering / batching?

3) if you insist on kafka, https://clickhouse.com/docs/engines/table-engines/integratio... consumes directly from kafka (or since you’re on CHC, use clickhouse pipes) — what’s the point of vector here?

Your current solution is unnecessarily complex. I’m guessing the core problem is your merge tree primary key is wrong.

momothereal•3mo ago
Writing to Kafka allowed them to continue their current ingestion process into MariaDB at the same time as ClickHouse. Kafka consumer groups allow the data to be consumed twice by different consumer pools that have different throughput without introducing bottlenecks.

From experience the Kafka tables in ClickHouse are not stable at a high volumes, and harder to debug when things go sideways. It is also easier to mutate your data before ingestion using Vector's VRL scripting language vs. ClickHouse table views (SQL) when dealing with complex data that needs to be denormalized into a flat table.

ch2026•3mo ago
> Writing to Kafka allowed them to continue their current ingestion process into MariaDB at the same time as ClickHouse.

The one they're going to shut down as soon as this works? Yeah, great reason to make a permanent tech choice for a temporary need. Versus just keeping the MariaDB stuff exactly the same on the PHP side and writing to 2 destinations until cutover is achieved. Kafka is wholly unnecessary here. Vector is great tech but likely not needed. Kafka + Vector is absolutely the incorrect solution.

Their core problem is the destination table schema (which they did not provide) and a very poorly chosen primary key + partition.

est•3mo ago
can you just buffer some writes in Vector and eliminate Kafka?

I setup some Vector to buffer ElasticSearch writes years ago, also for logs, it ran so well without any problems that I almost fogot about it.

anticodon•3mo ago
Or vice versa: make ClickHouse ingest batches directly from Kafka. Messages are already buffered in Kafka, I don't get why Vector is necessary here.
ch2026•3mo ago
tbh the only thing they needed was a correct schema that didn’t constantly spawn new parts and async_insert enabled.
est•3mo ago
https://clickhouse.com/docs/knowledgebase/kafka-to-clickhous...

For anyone if curious.

pachico•3mo ago
I shared this article internally and my peers were impressed about how similar it is to our final implementation. (It differs in the fact that we use Redis as queue.)

Happy to exchange notes about our journey too.

Cheers

solatic•3mo ago

  Geocodio offers a pay-as-you-go metered plan where users get 2,500 free geocoding lookups per day. This means we need to:
  Track the 2,500 free tier requests
  Continue tracking above that threshold for billing
  Let users view their usage in real-time on their dashboard
  Give admins the ability to query this data for support and debugging
  Store request details so we can replay customer requests when debugging issues
Just on the basis of what you wrote here, I'm not convinced ClickHouse is the right tool. ClickHouse very much would help with helping you crunch statistics for latencies etc., but just for billing and getting individual query data? 1) push the request to Kafka/Pub Sub/etc. 2) one consumer pushing to TigerBeetle for tracking request usage within the free tier and other billing 3) one consumer to push individual requests to object storage, which scales out infinitely-ish, allows you to get full request details for an individual request, lifecycle rules will automatically async delete old requests for you. If request statistics is important for business analysis, then instead of (boring) object storage you could look at one of the newer Iceberg-based options on top of object storage, e.g. S3 tables; as long as querying an individual request remains fast and getting statistics can be generated, say, for a nightly report. Another cheap approach could hook up another consumer to the PubSub, any request with too-high latency above a reasonable threshold, dump it into a Slack channel with a reference to the request ID so someone can look into debugging it.
matthewaveryusa•3mo ago
I shimmed vector in my log pipeline recently and it really is a wonderfully simple and powerful tool. It's where I transform logs of software I don't own in to prometheus metrics and drop useless logs from making it to loki.
enether•3mo ago
weird you have to adopt Kafka AND Vector just to batch a bit of writes into Clickhouse...