Skip to content

[BUG] Decreasing recall when increasing threads in ANN benchmark  #208

Open
@abc99lr

Description

Describe the bug

The throughput mode in ANN benchmark supposes to increase the QPS for small batch queries without impacting the recall level. However, I found that increasing the number of threads in throughput mode decreases the achieved recall. The recall decrease is not huge but noticeable.

Steps/Code to reproduce bug

docker   run --gpus all --rm -it -u $(id -u) --entrypoint /bin/bash --privileged rapidsai/raft-ann-bench:24.08a-cuda12.2-py3.10
python -m raft_ann_bench.run --dataset wiki_all_1M --algorithms raft_cagra --search -k 100 -bs 1 -m throughput --configuration <path_to_config> --dataset-path <path_to_data> 

I also mounted volumes for input data and config file when doing docker run.

By using throughput mode, ANN bench would shmoo search threads by default. The default is power of twos between min=1 and max=<num hyperthreads>

I am using wiki-1M dataset with 768 dim. Here is my configuration file for CAGRA

name: raft_cagra
constraints:
  build: raft_ann_bench.constraints.raft_cagra_build_constraints
  search: raft_ann_bench.constraints.raft_cagra_search_constraints
groups:
  base:
    build:
      graph_degree: [32]
      intermediate_graph_degree: [64]
      graph_build_algo: ["NN_DESCENT"]
      dataset_memory_type: ["mmap"]
    search:
      itopk: [512]
      search_width: [1]
      algo: ["multi_cta"]

And I got the following results, and you can see the decreasing recall there. I also tried adding --benchmark_min_time=10000x to ensure each thread runs 10k iterations (total number of queries), but it didn't fix the issue.

-- Using RAFT bench found in conda environment.
2024-07-01 22:59:50 [info] Using the query file '../data/wiki_all_1M/queries.fbin'
2024-07-01 22:59:50 [info] Using the ground truth file '../data/wiki_all_1M/groundtruth.1M.neighbors.ibin'
2024-07-01T22:59:50+00:00
Running /opt/conda/bin/ann/RAFT_CAGRA_ANN_BENCH
Run on (32 X 3200 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 256 KiB (x16)
  L3 Unified 20480 KiB (x2)
Load Average: 0.25, 0.20, 0.18
dataset: wiki_all_1M
dim: 768
distance: euclidean
gpu_driver_version: 12.4
gpu_mem_bus_width: 256
gpu_mem_freq: 5001000000.000000
gpu_mem_global_size: 15642329088
gpu_mem_shared_size: 65536
gpu_name: Tesla T4
gpu_runtime_version: 12.2
gpu_sm_count: 40
gpu_sm_freq: 1590000000.000000
max_k: 100
max_n_queries: 10000
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark
                Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k  n_queries search_width total_queries
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:1       0.285 ms        0.284 ms        10000   278.171u   284.522u   0.985367    2.84522       3.51467k/s        512        100          1            1           10k algo="multi_cta"
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:2       0.187 ms        0.374 ms        20000   366.896u   375.096u   0.985735    3.75144       5.34108k/s        512        100          1            1           20k algo="multi_cta"
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:4       0.158 ms        0.631 ms        40000   623.354u   634.679u   0.967445    6.34719       6.32244k/s        512        100          1            1           40k algo="multi_cta"
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:8       0.157 ms         1.24 ms        80000   1.23527m   1.26796m   0.949066    12.6799       6.37292k/s        512        100          1            1           80k algo="multi_cta"
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:16      0.159 ms         2.53 ms       160000   1.36368m   2.55572m   0.945477    25.5594       6.27904k/s        512        100          1            1          160k algo="multi_cta"
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:32      0.162 ms         5.15 ms       320000   1.42218m   5.20363m   0.945503    52.0375       6.17188k/s        512        100          1            1          320k algo="multi_cta"

Expected behavior
The recall level should not decrease.

Environment details (please complete the following information):

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]: Docker
  • Method of RAFT install: [conda, Docker, or from source]: Docker
    • If method of install is [Docker], provide docker pull & docker run commands used: provided above

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions