frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

The bottleneck might be the air in the room

https://blog.mikebowler.ca/2026/07/03/co2-and-decision-making/
192•gslin•2h ago•109 comments

Agentic coding notes from Galapagos Island

https://danluu.com/ai-coding/#appendix-agentic-loops-and-writing-this-post
86•gm678•4h ago•30 comments

Performance per dollar is getting faster and cheaper

https://www.wafer.ai/blog/glm52-amd
240•latchkey•11h ago•79 comments

Mir Books – Books from the Soviet Era

https://mirtitles.org
65•clmul•3d ago•25 comments

Leanstral 1.5: Proof abundance for all

https://mistral.ai/news/leanstral-1-5/
229•programLyrique•10h ago•60 comments

Giant trees have no trouble pumping water to top branches: new research

https://news.exeter.ac.uk/faculty-of-environment-science-and-economy/giant-trees-have-no-trouble-...
195•hhs•10h ago•90 comments

Synthesis is harder than analysis

https://surfingcomplexity.blog/2026/07/03/synthesis-is-harder-than-analysis/
79•azhenley•6h ago•20 comments

Postgres data stored in Parquet on S3: LTAP architecture explained

https://www.databricks.com/blog/lakebase-ltap-rethinking-database-storage
16•andrenotgiant•2d ago•5 comments

Maybe you should learn something

https://www.marginalia.nu/log/a_135_learn/
72•tylerdane•5h ago•32 comments

Steam Controller Auto-Charge – pilot to magnetic charging puck using CV

https://github.com/FossPrime/Steam-Controller-Auto-Charge
135•zdw•10h ago•28 comments

MSI Center – How to gain SYSTEM privileges in seconds

https://mrbruh.com/msicenter/
84•MrBruh•8h ago•28 comments

SearXNG: A free internet metasearch engine

https://github.com/searxng/searxng
210•theanonymousone•13h ago•57 comments

FreeBSD ate my RAM

https://crocidb.com/post/freebsd-ate-my-ram/
136•theanonymousone•14h ago•46 comments

The firefighting system of the Van der Heyden brothers in 17th century Amsterdam

https://worksinprogress.co/issue/how-amsterdam-invented-the-fire-department/
90•zdw•10h ago•16 comments

Jamesob's guide to running SOTA LLMs locally

https://github.com/jamesob/local-llm
341•livestyle•18h ago•151 comments

Odin, Wikipedia and engagement farming

https://katamari64.se/posts/2026/odin-wikipedia/
142•stock_toaster•10h ago•198 comments

Soatok's Informal Guide to Threat Models

https://soatok.blog/2026/06/30/soatoks-informal-guide-to-threat-models/
86•zdw•8h ago•10 comments

New serious vulnerabilities spiked around release of Claude Mythos Preview

https://epoch.ai/data-insights/cve-severity-spike
101•cubefox•12h ago•32 comments

The Scanline Sweeper: A Glyph Rendering Algorithm [pdf]

https://rookandpossum.com/papers/scanline_sweeper_preprint.pdf
3•kouosi•2d ago•1 comments

Show HN: Classify mechanical faults using Contrastive Language-Audio Pretraining

https://github.com/adam-s/car-diagnosis
14•dataviz1000•2d ago•0 comments

David Beazley – Programming Courses

https://www.dabeaz.com/courses.html
78•gregsadetsky•3h ago•25 comments

Reverse-engineering Codemasters' BIGF archive format in Ruby

https://davidslv.uk/2026/06/30/reading-binary-in-ruby.html
18•davidslv•3d ago•4 comments

Costco is the anti-Amazon

https://phenomenalworld.org/analysis/the-anti-amazon/
405•bookofjoe•18h ago•378 comments

Study reveals what people see when they read lips

https://news.ku.edu/news/article/study-reveals-what-people-really-see-when-they-read-lips
12•giuliomagnifico•3d ago•2 comments

Applied Category Theory Course (2018)

https://math.ucr.edu/home/baez/act_course/index.html
108•measurablefunc•12h ago•8 comments

Gone but Not Forgotten: Recovering the Dead Web

https://blog.archive.org/2026/04/23/gone-but-not-forgotten-recovering-the-dead-web/
64•wslh•3d ago•19 comments

Hunting a 16-year-old SQLite WAL bug with TLA+

https://ubuntu.com/blog/hunting-a-16-year-old-sqlite-bug-with-tla-is-dqlite-affected
212•peterparker204•3d ago•23 comments

Factories are just rooms

https://interconnected.org/home/2026/07/03/factories
242•arbesman•18h ago•101 comments

Espionage Against the European Parliament

https://citizenlab.ca/research/member-of-committee-investigating-spyware-hacked-with-pegasus/
373•ledoge•12h ago•91 comments

Infracost (YC W21) Is Hiring a Marketing Lead to Shift FinOps Left

https://www.ycombinator.com/companies/infracost/jobs/YTJcFwr-marketing-lead
1•akh•12h ago
Open in hackernews

Postgres data stored in Parquet on S3: LTAP architecture explained

https://www.databricks.com/blog/lakebase-ltap-rethinking-database-storage
16•andrenotgiant•2d ago

Comments

andrenotgiant•2d ago
Here's what I don't understand:

Part of the value of doing an ETL pipeline via streaming replication is you get the full history of data in a table. An SCD type 2 table where each row also has a valid_from and valid_to timestamp column.

How would someone do the same thing with this architecture?

eveningtree•27m ago
Rather than answering directly, I'm thinking about this problem from the other end altogether ever since I saw the dbricks rt demo. Apologies for the rambling response, as I haven't yet finished thinking about this problem...

We ended up with 'hot' data in oltp and 'cold/archival' data in olap because the storage size of oltp has always been limited.

(1) Limited by computation - there's only so much data that we can store on disks and nvme

(2) Limited by wallet - disks and nvme are EXPENSIVE

Also, the tight coupling of compute and data didn't help. It limited the size of databases on the individual expensive compute nodes.

So, another question will be -

What's currently stopping me from keeping the scd history tables right in my oltp db? what's forcing me to copy state into my etl/elt pipeline and the process it into scd into a dedicated olap db?

To some extent,the answer is still the same - the oltp cannot scale for the storage size required for keeping historical data. So, I've had to take out the 'cold' historical data and keep it in my olap freezer.

Now, if oltp itself is scaling, I'm not gonna bother with the copying step. I'll just prefer to store the history in oltp itself.

In my perspective (majorly from handling IoT systems), I need olap for 2 reasons - (1) storage scalability, and (2) analytical processing speed

I now consider (1) to be a solved problem

As for (2), I'm still not sure how this architecture ends up matching the query processing speeds of column-oriented storages. But again, I need to study more.

The SCD pipeline still remains in some form. Either in the form of (1) scd rows that we currently keep (etl pipeline) , or (2) as older lsn rows that simply don't get deleted (existing db engine).

I've done quite a lot of experimentation with (2), and it is a pretty solid concept to work with.

I've spent quite a lot of years hammering my brain at databases and datastores in general. And I've now got a feeling that this is it. Finally.

hasyimibhar•4m ago
It wouldn't be possible to do this with LTAP architecture since (I'm assuming) the individual logical changes are not visible. But honestly I've always seen SCD type 2 table as a workaround due to lack of data modeling experience in the source database. If you design your tables correctly, you shouldn't need SCD type 2 downstream.

For example, if you know your user can change emails, and there might be events from another source that is keyed by user email (e.g. marketing-related events), then naturally you will need some sort of email_history table that has historical mapping of user id to email (you probably need it for audit purposes too). Then in this case there is no need to build SCD type 2 table of user from CDC, it's already there.

PunchyHamster•35m ago
I don't wanna see that S3 bandwidth bill after running some big query
khurs•29m ago
There are self hosted object stores which use the same protocol as S3. One example: https://github.com/minio/minio

Also Databricks wre kind enough to donate the underlying tech to Apache and so it's OpenSource https://github.com/delta-io/delta