Skip to content

Choice of np.uint64? #212

Open
Open
@chris-ha458

Description

_max_hash = np.uint64((1 << 32) - 1)

in this implementation of minhash, it seems like the hasher is using 32 bits (sha1_hash32)
why is the _max_hash = np.uint64((1 << 32) - 1) using np.uint64 ?
I tried experiments with np.uint32 with the mersenne prime np.uint64((1 << 31) - 1) and it seems there arent much difference in the results.
If I understand correctly, this will automatically halve memory consumption as well.

Is there a reason to insist on np.uint64?

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions