Open
Description
datasketch/datasketch/minhash.py
Line 12 in ebe4ca4
in this implementation of minhash, it seems like the hasher is using 32 bits (sha1_hash32
)
why is the _max_hash = np.uint64((1 << 32) - 1)
using np.uint64
?
I tried experiments with np.uint32
with the mersenne prime np.uint64((1 << 31) - 1)
and it seems there arent much difference in the results.
If I understand correctly, this will automatically halve memory consumption as well.
Is there a reason to insist on np.uint64
?
Activity