We keep a precomputed cityhash64 value for a few columns we know are going to be used for aggregations. Rather than relying on ClickHouse to do it internally, this explicity behavior I've found is faster.
Especially if it's a multi tenant architecture, it helps to have the cityHash64 caclulated as a combination of tenant ID and another column, so the overall amount of data scanned is lowered too.
arionmiles•2w ago
Especially if it's a multi tenant architecture, it helps to have the cityHash64 caclulated as a combination of tenant ID and another column, so the overall amount of data scanned is lowered too.