Skip to content

Choice of np.uint64? #212

Open
Open
@chris-ha458

Description

@chris-ha458

_max_hash = np.uint64((1 << 32) - 1)

in this implementation of minhash, it seems like the hasher is using 32 bits (sha1_hash32)
why is the _max_hash = np.uint64((1 << 32) - 1) using np.uint64 ?
I tried experiments with np.uint32 with the mersenne prime np.uint64((1 << 31) - 1) and it seems there arent much difference in the results.
If I understand correctly, this will automatically halve memory consumption as well.

Is there a reason to insist on np.uint64?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions