Description
Describe the bug
The throughput
mode in ANN benchmark supposes to increase the QPS for small batch queries without impacting the recall level. However, I found that increasing the number of threads in throughput
mode decreases the achieved recall. The recall decrease is not huge but noticeable.
Steps/Code to reproduce bug
docker run --gpus all --rm -it -u $(id -u) --entrypoint /bin/bash --privileged rapidsai/raft-ann-bench:24.08a-cuda12.2-py3.10
python -m raft_ann_bench.run --dataset wiki_all_1M --algorithms raft_cagra --search -k 100 -bs 1 -m throughput --configuration <path_to_config> --dataset-path <path_to_data>
I also mounted volumes for input data and config file when doing docker run
.
By using throughput
mode, ANN bench would shmoo search threads by default. The default is power of twos between min=1 and max=<num hyperthreads>
I am using wiki-1M
dataset with 768 dim. Here is my configuration file for CAGRA
name: raft_cagra
constraints:
build: raft_ann_bench.constraints.raft_cagra_build_constraints
search: raft_ann_bench.constraints.raft_cagra_search_constraints
groups:
base:
build:
graph_degree: [32]
intermediate_graph_degree: [64]
graph_build_algo: ["NN_DESCENT"]
dataset_memory_type: ["mmap"]
search:
itopk: [512]
search_width: [1]
algo: ["multi_cta"]
And I got the following results, and you can see the decreasing recall there. I also tried adding --benchmark_min_time=10000x
to ensure each thread runs 10k iterations (total number of queries), but it didn't fix the issue.
-- Using RAFT bench found in conda environment.
2024-07-01 22:59:50 [info] Using the query file '../data/wiki_all_1M/queries.fbin'
2024-07-01 22:59:50 [info] Using the ground truth file '../data/wiki_all_1M/groundtruth.1M.neighbors.ibin'
2024-07-01T22:59:50+00:00
Running /opt/conda/bin/ann/RAFT_CAGRA_ANN_BENCH
Run on (32 X 3200 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x16)
L1 Instruction 32 KiB (x16)
L2 Unified 256 KiB (x16)
L3 Unified 20480 KiB (x2)
Load Average: 0.25, 0.20, 0.18
dataset: wiki_all_1M
dim: 768
distance: euclidean
gpu_driver_version: 12.4
gpu_mem_bus_width: 256
gpu_mem_freq: 5001000000.000000
gpu_mem_global_size: 15642329088
gpu_mem_shared_size: 65536
gpu_name: Tesla T4
gpu_runtime_version: 12.2
gpu_sm_count: 40
gpu_sm_freq: 1590000000.000000
max_k: 100
max_n_queries: 10000
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark
Time CPU Iterations GPU Latency Recall end_to_end items_per_second itopk k n_queries search_width total_queries
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:1 0.285 ms 0.284 ms 10000 278.171u 284.522u 0.985367 2.84522 3.51467k/s 512 100 1 1 10k algo="multi_cta"
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:2 0.187 ms 0.374 ms 20000 366.896u 375.096u 0.985735 3.75144 5.34108k/s 512 100 1 1 20k algo="multi_cta"
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:4 0.158 ms 0.631 ms 40000 623.354u 634.679u 0.967445 6.34719 6.32244k/s 512 100 1 1 40k algo="multi_cta"
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:8 0.157 ms 1.24 ms 80000 1.23527m 1.26796m 0.949066 12.6799 6.37292k/s 512 100 1 1 80k algo="multi_cta"
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:16 0.159 ms 2.53 ms 160000 1.36368m 2.55572m 0.945477 25.5594 6.27904k/s 512 100 1 1 160k algo="multi_cta"
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:32 0.162 ms 5.15 ms 320000 1.42218m 5.20363m 0.945503 52.0375 6.17188k/s 512 100 1 1 320k algo="multi_cta"
Expected behavior
The recall level should not decrease.
Environment details (please complete the following information):
- Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]: Docker
- Method of RAFT install: [conda, Docker, or from source]: Docker
- If method of install is [Docker], provide
docker pull
&docker run
commands used: provided above
- If method of install is [Docker], provide
Metadata
Assignees
Labels
Type
Projects
Status
Todo
Activity