frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

How TimescaleDB compresses time-series data

https://roszigit.com/en/blog/timescaledb-compression-hypercore
54•lkanwoqwp•2h ago

Comments

blackoil•1h ago
Gorilla by Facebook had this. Value is stored as delta and time as delta of delta.
lokar•1h ago
They say they are using “gorilla compression “

I’m still amazed every time I go back and read how the compression for floating point values works.

f311a•1h ago
It's used in ClickHouse as well. CH supports all known compression algos and they are documented pretty well.
gopalv•1h ago
> What does compression do to query performance?

That section is the most relevant whenever compression in a DB is discussed.

The purpose of a database is to find, aggregate or update data - storage is where the trade-off gets expressed. There are no silver bullets here.

Any method of compression which speeds up either filter rejection or scan rate is better than something that only trades off IO for CPU usage.

For example, dictionary encoding can be slower to read (because you decompress the whole dictionary and not just the skip read after filter), but not if you can squeeze out an IN clause by turning string comparisons into O(1) dictionary followed by a simple integer filter. Remember, this can be arbitrarily complex (Druid is a great example of this) and then the bitmaps can be used because the dictionary index will be a dense 0-N.

Even better if that can feed a deterministic operation like UPPER() so that you do it over the dictionary hits once, instead of each row. You can even use it over the same hash slot, instead of another dictionary collision check or hash computation.

If anyone is looking at JSONB compression, go take a long look at the Variant encoding proposals from Databricks/Snowflake for Iceberg[1].

Turning a single column "payload" JSONB field into chunks which are columnarized and strictly typed allows you to do all the tricks mentioned here, but on loosely typed data but chunk by chunk.

[1] - https://github.com/apache/parquet-format/blob/master/Variant...

PaulWaldman•2m ago
There’s an issue tracking TimescaleDB JSONB compression: https://github.com/timescale/timescaledb/issues/2978
tudorg•24m ago
I have been working on another PG extension for timeseries (https://github.com/xataio/deltax) for a few months, and trying to score as good as possible on ClickBench.

This is a project that is simply lot of fun to work on. There are many tricks that can be used to speed-up analytics, besides just type-aware compression:

* for each segment you will keep things like max/min/sum, number of distinct values, bloom filters, etc. For a good amount of common queries, you can answer them just based on this metadata, so you don't need to decompress the columns at all.

* for text column, you compress them differently based on cardinality. Low cardinality (think labels or similar) is dictionary based compression. High cardinality is LZ4.

* Generally the smaller the data on disk, the higher the cold runs performance. This is because you need less IO to load it in memory. I have discovered that on top of the type-aware compression, it's worth doing another round of LZ4. There's also some research that it's sometimes worth doing multiple passes of LZ4.

* Partition and segment pruning. If you can tell from the metadata or bloom filters that the filter doesn't match a partition or segment, you skip the whole thing.

* Push down of filters in the decompression layer. Depending on the compression algorithm, while you decompress you can also filter out the values that you don't need. This avoids passing data and allocating memory for elements that will be later discarded anyway.

* Organization of data on disk is more important than almost anything else. Of course, that's the main point of columnar storage, but there are level of details on how to organize the data so that IO is minimized during queries. I have tried 3-4 different layouts before settling on one.

* For top N type of queries, which are really common in analytics, you want to stop the reading from disk / decompressed as soon as you have enough data to guarantee that you have a correct top N to satisfy the query.

* Parallelize everything: at least ClickBench runs on instances with a lot of CPU cores, so you need to parallelize every step of the way. This is done differently depending on the query type. For example for top N, each worker can take a subset of the segments and get the top N from each of them. Then you combine those in a single result.

AI's Brokenomics

https://www.wheresyoured.at/brokenomics/
1•7777777phil•14s ago•0 comments

Map Clustering Is Not My Favorite

https://blog.greg.technology/2026/06/12/map-clustering-is-not-my-favorite.html
1•evakhoury•51s ago•0 comments

Git is not great for deployment configuration

https://medium.com/@jesperfj/git-is-not-great-for-deployment-configuration-b1bbad7a5428
1•RyeCombinator•1m ago•0 comments

Understand and reduce token usage with ContextSpy context profiler

https://github.com/RimantasZ/contextspy
1•iezhy•1m ago•0 comments

Flock Misappropriates MythBusters

https://ipvm.com/reports/flock-mythbusters
1•jhonovich•1m ago•0 comments

What is a human? A miserable little pile of clades

http://approximateknowledge.net/misc/2026/06/13/clades.html
1•evakhoury•3m ago•0 comments

Prtokens – See how much AI agent tokens cost a PR

https://github.com/SamuelZ12/prtokens/
1•SamuelZ12•6m ago•0 comments

Netflix has an ambitious milestone in sight for 2027

https://www.thestreet.com/investing/stocks/nflx-stock-netflix-has-a-stunning-milestone-in-sight-f...
1•andsoitis•6m ago•0 comments

Microsoft sued by shareholders over expenses, cloud business, AI

https://www.reuters.com/business/microsoft-sued-by-shareholders-over-expenses-cloud-business-ai-2...
3•onemoresoop•6m ago•0 comments

US connected-car rule prompts Ford, to seek licenses for China-built models

https://www.reuters.com/business/autos-transportation/us-connected-car-rule-prompts-ford-other-au...
1•onemoresoop•8m ago•0 comments

Boost Game – retro game where terrain is generated by SIMD kernel computation

https://boost.modular.com/
1•timmyd•8m ago•0 comments

Show HN: StarScope – Free astronomy dashboard for observers outside the US/UK

https://starscope.live/feed
1•xenophin•9m ago•0 comments

FreeBSD 15 on a Laptop

https://www.sacredheartsc.com/blog/freebsd-15-on-a-laptop/
1•cullumsmith•10m ago•1 comments

Nixpkgs Cooldowns

https://determinate.systems/blog/nixpkgs-cooldown/
2•jmartens•10m ago•0 comments

Flame (Malware)

https://en.wikipedia.org/wiki/Flame_(malware)
1•hyperific•11m ago•0 comments

Ask HN: What have you built with Claude Managed Agents?

1•david_shi•12m ago•0 comments

Show HN: A directory for Discord and Telegram bots

https://botyard.in
1•njac•12m ago•0 comments

American Dads Rock: Fathers Are Doing More At Home Than Ever

https://ifstudies.org/blog/american-dads-rock-fathers-are-doing-more-at-home-than-ever
1•Anon84•13m ago•0 comments

Tech's World Cup Takeover

https://betakit.com/techs-world-cup-takeover/
1•builtbystef•14m ago•0 comments

Hyperglycosylation is a metabolic driver of Alzheimer's disease

https://www.nature.com/articles/s42255-026-01538-4
1•bookofjoe•14m ago•0 comments

CloudBridge – A zero-trust, egress-only network mechanism [video]

https://www.youtube.com/watch?v=u6HMpf3h4Fo
1•gowthamsadasiva•16m ago•1 comments

OptinMonster supply chain attack hits 1.2M WordPress sites

https://sansec.io/research/optinmonster-supply-chain-attack
1•gwillem•16m ago•0 comments

FBI disrupts AI-powered phishing service using a million URLs

https://www.bleepingcomputer.com/news/security/fbi-disrupts-massive-ai-powered-phishing-service-u...
3•devonnull•16m ago•0 comments

Digital Sleep-Wake Cycle Metrics and Dementia Prediction in Older Adults

https://jamanetwork.com/journals/jamaneurology/fullarticle/2849323
1•brandonb•17m ago•0 comments

Humanity Protocol Hacked for $36M

https://twitter.com/WuBlockchain/status/2064651937484796155
1•lschueller•17m ago•1 comments

ShinyHunters hacked 100 orgs by exploiting an Oracle PeopleSoft 0-day

https://www.theregister.com/cyber-crime/2026/06/11/shinyhunters-claims-oracle-peoplesoft-0-day-hi...
7•billybuckwheat•20m ago•0 comments

Fonable: International calls from the browser, no subscription

https://fonable.io/
3•rondo•21m ago•1 comments

Show HN: Behavioral gauge for Claude Code sessions

https://github.com/softcane/ccverdict
1•pradeep1177•21m ago•0 comments

Drapetomania

https://en.wikipedia.org/wiki/Drapetomania
1•chistev•21m ago•0 comments

Ask HN: How are you adapting technical interviews in this agentic era?

3•jcgr•22m ago•0 comments