For some DBs (SQL Server definitely), RAND() and similar are handled as if they are deterministic and so are called once per use. For instance:
SELECT TOP 10 RAND() FROM sys.objects
SELECT TOP 10 RAND() FROM sys.objects
just returned ten lots of 0.680862566387624 and ten lots of 0.157039657790194. SELECT TOP 10 RAND(), RAND(), RAND()-RAND() FROM sys.objects
returns a different value for each column (0.451758385842036 & 0.0652620609942665, -0.536618123021777), so the optimisation is per use not per statement or even per column (if it were per column that last value would be 0, or as close to as floating point arithmetic oddities allow).This surprises a lot of people when they try “… ORDER BY RAND()” and get the same order on each run.
One workaround for this is to use a non-deterministic function like NEWID(), though you need some extra jiggery-pokery to get a 0≤v<1 value to mimic rand:
SELECT TOP 10 CAST(CAST(CAST(NEWID() AS VARBINARY(4)) AS BIGINT) AS FLOAT)/(4.0*1024*1024*1024) FROM sys.objects
For the example of sorting, the outer cast is not needed. You might think just using “ORDER BY NEWID()” would be sufficient, but that is an undefined behaviour so you shouldn't rely upon it. It might work now, a quick test has just worked as expected here, but at any point the optimiser could decide it is more efficient to consider all UUIDs as having the same weight for sorting purposes.You know it gets wild when you read "... Here's the core of the raycasting algorithm in SQL"!
You can play it here: https://patricktrainer.github.io/duckdb-doom/
Pressing “L” enables (very) verbose logging in the dev console and prints much of the sql being executed.
It really is magic!
You can check it out here.
mritchie712•3h ago
Queries that normally take 1s to 2s can run in 25ms, so you get under the "100ms rule" which is very uncommon in analytics applications.
We DuckDB server side and have experimental support for DuckDB WASM on the client-side at https://www.definite.app/ and sometimes I don't trust that a query ran because of how fast it can happen (we need some UX work there).
esafak•3h ago
randomtoast•3h ago
jasonjmcghee•1h ago
GP comment is (seemingly) describing keeping an entirely client side instance (data stored locally / in memory) snapshot of the back-end database.
Parent comment is asking how the two are kept in sync.
It's hard to believe it would be the method you're describing and take 25ms.
If you're doing http range requests, that suggests you're reading from a file which means object storage or disk.
I have to assume there is something getting triggered when back end is updating to tell the client to update their instance. (Which very well could just be telling it to execute some sql to get the new / updated information it needs)
Or the data is entirely in memory on the back end in an in memory duckdb instance with the latest data and just needs to retrieve it / return it from memory.
immibis•10m ago