[BUG] Decreasing recall when increasing threads in ANN benchmark 

**Describe the bug**

The `throughput` mode in ANN benchmark supposes to increase the QPS for small batch queries without impacting the recall level. However, I found that increasing the number of threads in `throughput` mode decreases the achieved recall. The recall decrease is not huge but noticeable. 

**Steps/Code to reproduce bug**

```
docker   run --gpus all --rm -it -u $(id -u) --entrypoint /bin/bash --privileged rapidsai/raft-ann-bench:24.08a-cuda12.2-py3.10
python -m raft_ann_bench.run --dataset wiki_all_1M --algorithms raft_cagra --search -k 100 -bs 1 -m throughput --configuration <path_to_config> --dataset-path <path_to_data> 
```
I also mounted volumes for input data and config file when doing `docker run`. 

By using `throughput` mode, ANN bench would shmoo search threads by default. The default is power of twos between min=1 and max=`<num hyperthreads>`

I am using `wiki-1M` dataset with 768 dim. Here is my configuration file for CAGRA
```
name: raft_cagra
constraints:
  build: raft_ann_bench.constraints.raft_cagra_build_constraints
  search: raft_ann_bench.constraints.raft_cagra_search_constraints
groups:
  base:
    build:
      graph_degree: [32]
      intermediate_graph_degree: [64]
      graph_build_algo: ["NN_DESCENT"]
      dataset_memory_type: ["mmap"]
    search:
      itopk: [512]
      search_width: [1]
      algo: ["multi_cta"]
```

And I got the following results, and you can see the decreasing recall there. I also tried adding `--benchmark_min_time=10000x` to ensure each thread runs 10k iterations (total number of queries), but it didn't fix the issue. 
 
```
-- Using RAFT bench found in conda environment.
2024-07-01 22:59:50 [info] Using the query file '../data/wiki_all_1M/queries.fbin'
2024-07-01 22:59:50 [info] Using the ground truth file '../data/wiki_all_1M/groundtruth.1M.neighbors.ibin'
2024-07-01T22:59:50+00:00
Running /opt/conda/bin/ann/RAFT_CAGRA_ANN_BENCH
Run on (32 X 3200 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 256 KiB (x16)
  L3 Unified 20480 KiB (x2)
Load Average: 0.25, 0.20, 0.18
dataset: wiki_all_1M
dim: 768
distance: euclidean
gpu_driver_version: 12.4
gpu_mem_bus_width: 256
gpu_mem_freq: 5001000000.000000
gpu_mem_global_size: 15642329088
gpu_mem_shared_size: 65536
gpu_name: Tesla T4
gpu_runtime_version: 12.2
gpu_sm_count: 40
gpu_sm_freq: 1590000000.000000
max_k: 100
max_n_queries: 10000
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark
                Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k  n_queries search_width total_queries
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:1       0.285 ms        0.284 ms        10000   278.171u   284.522u   0.985367    2.84522       3.51467k/s        512        100          1            1           10k algo="multi_cta"
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:2       0.187 ms        0.374 ms        20000   366.896u   375.096u   0.985735    3.75144       5.34108k/s        512        100          1            1           20k algo="multi_cta"
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:4       0.158 ms        0.631 ms        40000   623.354u   634.679u   0.967445    6.34719       6.32244k/s        512        100          1            1           40k algo="multi_cta"
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:8       0.157 ms         1.24 ms        80000   1.23527m   1.26796m   0.949066    12.6799       6.37292k/s        512        100          1            1           80k algo="multi_cta"
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:16      0.159 ms         2.53 ms       160000   1.36368m   2.55572m   0.945477    25.5594       6.27904k/s        512        100          1            1          160k algo="multi_cta"
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:32      0.162 ms         5.15 ms       320000   1.42218m   5.20363m   0.945503    52.0375       6.17188k/s        512        100          1            1          320k algo="multi_cta"
```

**Expected behavior**
The recall level should not decrease. 

**Environment details (please complete the following information):**
 - Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]: Docker 
 - Method of RAFT install: [conda, Docker, or from source]: Docker 
   - If method of install is [Docker], provide `docker pull` & `docker run` commands used: provided above 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Decreasing recall when increasing threads in ANN benchmark #208

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development