frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Have your cake and decompress it too

https://spiraldb.com/post/cascading-compression-with-btrblocks
12•emschwartz•2d ago

Comments

gopalv•1h ago
Mid 2015, I spent months optimizing Apache ORC's compression models over TPC-H.

It was easier to beat Parquet's defaults - ORC+zlib seemed to top out around the same as the default in this paper (~178Gb for the 1TB dataset, from the hadoop conf slides).

We got a lot of good results, but the hard lesson we learned was that scan rate is more important than size. A 16kb read and a 48kb read took about the same time, but CPU was used by other parts of the SQL engine, IO wasn't the bottleneck we thought it was.

And scan rate is not always "how fast can you decode", a lot of it was encouraging data skipping (see the Capacitor paper from the same era).

For example, when organized correctly, the entire l_shipdate column took ~90 bytes for millions of rows.

Similarly, the notes column was never read at all so dictionaries etc was useless.

Then I learned the ins & outs of another SQL engine, which kicked the ass of every other format I'd ever worked with, without too much magical tech.

Most of what I can repeat is that SQL engines don't care what order rows in a file are & neither should the format writer - also that DBAs don't know which filters are the most useful to organize around & often they are wrong.

Re-ordering at the row-level beats any other trick with lossless columnar compression, because if you can skip a row (with say an FSST for LIKE or CONTAINS into index values[1] instead of bytes), that is nearly infinite improvement in the scan rate and IO.

[1] - https://github.com/amplab/succinct-cpp

pella•1h ago
Thanks!

Looks similar to OpenZL ( https://openzl.org/ ) "OpenZL takes a description of your data and builds from it a specialized compressor optimized for your specific format."

Motorola announces a partnership with GrapheneOS Foundation

https://motorolanews.com/motorola-three-new-b2b-solutions-at-mwc-2026/
137•km•1h ago•35 comments

Computer-generated dream world: Virtual reality for a 286 processor

https://deadlime.hu/en/2026/02/22/computer-generated-dream-world/
75•MBCook•3h ago•7 comments

If AI writes code, should the session be part of the commit?

https://github.com/mandel-macaque/memento
182•mandel_x•7h ago•193 comments

WebMCP is available for early preview

https://developer.chrome.com/blog/webmcp-epp
261•andsoitis•9h ago•145 comments

Evolving descriptive text of mental content from human brain activity

https://www.bbc.com/future/article/20260226-how-ai-can-read-your-thoughts
14•ggm•2h ago•7 comments

Show HN: Timber – Ollama for classical ML models, 336x faster than Python

https://github.com/kossisoroyce/timber
109•kossisoroyce•6h ago•14 comments

Everett shuts down Flock camera network after judge rules footage public record

https://www.wltx.com/article/news/nation-world/281-53d8693e-77a4-42ad-86e4-3426a30d25ae
183•aranaur•3h ago•30 comments

Right-sizes LLM models to your system's RAM, CPU, and GPU

https://github.com/AlexsJones/llmfit
109•bilsbie•8h ago•27 comments

How to record and retrieve anything you've ever had to look up twice

https://ellanew.com/2026/03/02/ptpl-197-record-retrieve-from-a-personal-knowledgebase
32•Curiositry•3h ago•10 comments

Process-Based Concurrency: Why Beam and OTP Keep Being Right

https://variantsystems.io/blog/beam-otp-process-concurrency
15•linkdd•2h ago•4 comments

Making Video Games in 2025 (without an engine)

https://www.noelberry.ca/posts/making_games_in_2025/
8•alvivar•3d ago•0 comments

Ghostty – Terminal Emulator

https://ghostty.org/docs
708•oli5679•19h ago•305 comments

Tove Jansson's criticized illustrations of The Hobbit (2023)

https://tovejansson.com/hobbit-tolkien/
156•abelanger•2d ago•71 comments

An interactive intro to Elliptic Curve Cryptography

https://growingswe.com/blog/elliptic-curve-cryptography
8•vismit2000•1h ago•4 comments

Little Free Library

https://littlefreelibrary.org/
111•TigerUniversity•9h ago•54 comments

The inner workings of TCP zero-copy

https://blog.tohojo.dk/2026/02/the-inner-workings-of-tcp-zero-copy.html
12•mfrw•2h ago•0 comments

Enable CORS for Your Blog

https://www.blogsareback.com/guides/enable-cors
18•cdrnsf•2d ago•9 comments

Why does C have the best file API

https://maurycyz.com/misc/c_files/
102•maurycyz•12h ago•74 comments

When does MCP make sense vs CLI?

https://ejholmes.github.io/2026/02/28/mcp-is-dead-long-live-the-cli.html
363•ejholmes•14h ago•230 comments

Decision trees – the unreasonable power of nested decision rules

https://mlu-explain.github.io/decision-tree/
465•mschnell•22h ago•74 comments

Have your cake and decompress it too

https://spiraldb.com/post/cascading-compression-with-btrblocks
12•emschwartz•2d ago•2 comments

Next-gen spacecraft are overwhelming communication networks

https://atempleton.bearblog.dev/how-next-gen-spacecraft-are-overwhelming-our-communication-networks/
60•korrz•2d ago•18 comments

Long Range E-Bike (2021)

https://jacquesmattheij.com/long-range-ebike/
153•birdculture•3d ago•231 comments

Microgpt explained interactively

https://growingswe.com/blog/microgpt
258•growingswe•22h ago•37 comments

Ape Coding [fiction]

https://rsaksida.com/blog/ape-coding/
167•rmsaksida•17h ago•112 comments

Frankensqlite a Rust reimplementation of SQLite with concurrent writers

https://frankensqlite.com/
44•rahimnathwani•3d ago•42 comments

Setting up phones is a nightmare

https://joelchrono.xyz/blog/setting-up-phones-is-a-nightmare/
142•bariumbitmap•3d ago•180 comments

Why XML tags are so fundamental to Claude

https://glthr.com/XML-fundamental-to-Claude
195•glth•16h ago•132 comments

Flightradar24 for Ships

https://atlas.flexport.com/
217•chromy•20h ago•46 comments

C64 Copy Protection

https://www.commodoregames.net/copyprotection/
46•snvzz•3d ago•3 comments