Splitting a file into chunks, hashing them in parallel, and then hashing the resulting hashes is certainly a valid method but not the same as hashing a file the traditional way.
Unless the world changes how they publish hashes of files available for download, I don’t see the point.
I_like_tomato•2mo ago
The reasoning here is to improve getting hash of a large file (let say size > 100GB). Reading the file content sequently and hashing it will take a lot longer
BobbyTables2•2mo ago
I agree, but there is no way to compute the equivalent of the sequential hash using any parallel method.
This isn’t like gzip which can be parallelized.
Without standardization of a parallelized hash computation, it’s just a toy exercise in an embarrassingly parallel problem.
dabiged•2mo ago
Why not use a faster hashing algorithm like xxhash?
This code is using sha256 which, whilst cryptography secure, is a massive burden for computation.
I_like_tomato•2mo ago
yup, I agree. I should update it to use a faster hashing algorithm. That is the next step
BobbyTables2•2mo ago
Splitting a file into chunks, hashing them in parallel, and then hashing the resulting hashes is certainly a valid method but not the same as hashing a file the traditional way.
Unless the world changes how they publish hashes of files available for download, I don’t see the point.
I_like_tomato•2mo ago
BobbyTables2•2mo ago
This isn’t like gzip which can be parallelized.
Without standardization of a parallelized hash computation, it’s just a toy exercise in an embarrassingly parallel problem.