So it’s the same spirit of Clickhouse, how does VictoriaLogs scale?
dengolius•1d ago
The answer to your question was deleted, so I'll post it again:
Vertically on a single machine, the two are quite similar, both fan work out across all CPU cores.
The different is on scaling out.
ClickHouse scales by making you describe the cluster yourself. You decide how many shards to split the data into, how many copies (replicas) each shard keeps, which row goes to which shard.
The copies are kept in sync by a consensus system ClickHouse Keeper. This is flexible but also more works on operators.
VictoriaLogs takes the opposite bet. When logs come in, the inserter just spreads them across all storage nodes on its own, so there is no sharding key for you to design.
When a query runs, the selector asks every storage node in parallel and merges the results. There is no consensus system at all. If you want high availability, you run 2 independent clusters and send your logs to both, rather than having the database copy data internally. So this is simpler and less learning curve.
See more here https://victoriametrics.com/blog/victorialogs-architecture-b...
winrid•7m ago
So basically if you have queries that are hard on the query planner, that constant fan out has higher CPU cost than the alternatives.
a012•4d ago
dengolius•1d ago
Vertically on a single machine, the two are quite similar, both fan work out across all CPU cores. The different is on scaling out.
ClickHouse scales by making you describe the cluster yourself. You decide how many shards to split the data into, how many copies (replicas) each shard keeps, which row goes to which shard. The copies are kept in sync by a consensus system ClickHouse Keeper. This is flexible but also more works on operators.
VictoriaLogs takes the opposite bet. When logs come in, the inserter just spreads them across all storage nodes on its own, so there is no sharding key for you to design. When a query runs, the selector asks every storage node in parallel and merges the results. There is no consensus system at all. If you want high availability, you run 2 independent clusters and send your logs to both, rather than having the database copy data internally. So this is simpler and less learning curve. See more here https://victoriametrics.com/blog/victorialogs-architecture-b...
winrid•7m ago