Very interesting! Nice work on your thesis. I am curious: if the data is not resident on the GPU (e.g. multi-TB datasets, line-rate packet inspection, etc.), is this approached bottle necked by the PCIe bus?
(You may have addressed this in your thesis, feel free to tell me to go RTFD ;)
tdortman•1mo ago
I haven't tested this but I would be very surprised if the PCIe bus wasn't a severe bottleneck in that case, unless you can somehow amortize the cost of the memcpy.
Though that being said, with such massive datasets you'll already be bottlenecked by the necessary communication between GPUs (sadly even with NVLink) since the queried data always lives on the GPU.
dgacmu•1mo ago
Kudos!
It would be interesting if in your performance analysis on the readme you also showed the false positive rate, assuming the memory use between the data structures you're comparing is identical.
tdortman•1mo ago
Sure thing, I added them for the two cases in the readme. There's a section in the thesis about the FPR for more fixed sizes if you're curious (spoiler: it's pretty much exactly in the middle, notably higher than the CPU Cuckoo Filter though because really small buckets are bad for performance)
shetaye•1mo ago
(You may have addressed this in your thesis, feel free to tell me to go RTFD ;)
tdortman•1mo ago
Though that being said, with such massive datasets you'll already be bottlenecked by the necessary communication between GPUs (sadly even with NVLink) since the queried data always lives on the GPU.