We also went from like 6 seconds to 50ms. Huge speedup.
Reference
https://dev.mysql.com/doc/refman/9.4/en/index-condition-push...
ICP in MySQL (which can be built on top of ref/eq_ref, but isn't part of the normal index lookup per se) is a fairly weird concept where the storage engine is told to evaluate certain predicates on its own, without returning the row to the optimizer. This is to a) reduce the number of round-trips (function calls) from the executor down into the storage engine, and b) because InnoDB's secondary indexes need an extra storage round-trip to return the row (secondary indexes don't point at the row you want, they contain the primary key and then you have to lookup the actual row from the PK), so if you can remove the row early, you can skip the main row lookup.
Just an endless sequence of misbehavior and we’re waving it off as rows work good for specific lookups but columns for aggregations, yet here it is all the other stuff that is unreasonably slow.
It's an example of old things being new again maybe. Or reinventing the wheel because the wheel wasn't known to them.
Yes I know, nobody wants to pay that tax or make that guy richer, but databases like Oracle have had JPPD for a long time. It's just something the database does and the optimizer chooses whether to do it or not depending on whether it's the best thing to do or not.
[1] https://www.cockroachlabs.com/blog/building-cost-based-sql-o...
“We filtered first instead of reading an entire table from disk and performing a lookup”
Where both OLAP and OLTP dbms would benefit.
To your point, it’s clear certain workloads lend themselves to OLAP and columnar storage much better, but “an endless sequence of misbehavior” seems a bit harsh .
Naive example: let's say you had a query that was a table scan and it computed the average age of all users in the users table. If you insert a new row into the users table and then rerun the query, you'd expect another table scan, so it will grow over time. In a traditional setup, you might cache this query and only update it "every once in a while." Instead, ReadySet can decompose this query into an "incremental program", run it, cache the result -- and then when it sees the insert into the table it incrementally updates the underlying data cache. That means the second run would actually be fast, and the cost to update the cache is only proportional to the underlying change, not the size of the table.
ReadySet is derived from research on Noira, whose paper is here: https://www.usenix.org/conference/osdi18/presentation/gjengs...
Push down filters to read the least data possible.
Or, know your data and be able to tell the query engine which kind of join strategy you would like (hash vs push down)
marceloaltmann•4d ago
LtdJorge•2h ago
airstrike•2h ago
It could just be that they translated from their original language to English and got that as a byproduct. Many such cases.
Sesse__•1h ago