frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Extending RocksDB to Deduplicate Values

https://github.com/demajh/prestige
2•demajh•5h ago
I've come across the problem a few times to need to remove duplicate values from my data. Usually, the data are higher level objects like images or text blobs. I end up writing custom deduplication pipelines every time.

I got sick of doing this over and over, so I wrote a wrapper around RocksDB that deduplicates values after a Put() operation. Currently only exact deduplication is performed, but I want to extend it in a number of ways, including semantic (fuzzy) deduplication for things like images and text.

Any feedback on the project would be appreciated:

https://github.com/demajh/prestige