Open
Description
opened on Nov 20, 2024
CAGRA has been observed to yield low recall when filtering is enabled, especially when the ratio of filtered-out values is high. This can be related in part to #208 and #472 , but there also may be fundamental reasons for the lower recall.
This feature request tracks the progress and suggestions to enable high-recall strongly filtered CAGRA.
As an experiment, I suggest to try the following tweaks, enabled by a boolean search parameter:
- Disable the maximum search iterations limit to allow longer search
- Replace the hashmap with a dataset-long bitset. It's used to track the visited nodes. By replacing a small hashmap with the bitset we will eliminate hash collisions (thus, false-positives) and prevent CAGRA from early-stopping.
Metadata
Assignees
Labels
Type
Projects
Status
Todo
Activity