-
Notifications
You must be signed in to change notification settings - Fork 160
Description
Describe the bug
The throughput mode in ANN benchmark supposes to increase the QPS for small batch queries without impacting the recall level. However, I found that increasing the number of threads in throughput mode decreases the achieved recall. The recall decrease is not huge but noticeable.
Steps/Code to reproduce bug
docker run --gpus all --rm -it -u $(id -u) --entrypoint /bin/bash --privileged rapidsai/raft-ann-bench:24.08a-cuda12.2-py3.10
python -m raft_ann_bench.run --dataset wiki_all_1M --algorithms raft_cagra --search -k 100 -bs 1 -m throughput --configuration <path_to_config> --dataset-path <path_to_data>
I also mounted volumes for input data and config file when doing docker run.
By using throughput mode, ANN bench would shmoo search threads by default. The default is power of twos between min=1 and max=<num hyperthreads>
I am using wiki-1M dataset with 768 dim. Here is my configuration file for CAGRA
name: raft_cagra
constraints:
build: raft_ann_bench.constraints.raft_cagra_build_constraints
search: raft_ann_bench.constraints.raft_cagra_search_constraints
groups:
base:
build:
graph_degree: [32]
intermediate_graph_degree: [64]
graph_build_algo: ["NN_DESCENT"]
dataset_memory_type: ["mmap"]
search:
itopk: [512]
search_width: [1]
algo: ["multi_cta"]
And I got the following results, and you can see the decreasing recall there. I also tried adding --benchmark_min_time=10000x to ensure each thread runs 10k iterations (total number of queries), but it didn't fix the issue.
-- Using RAFT bench found in conda environment.
2024-07-01 22:59:50 [info] Using the query file '../data/wiki_all_1M/queries.fbin'
2024-07-01 22:59:50 [info] Using the ground truth file '../data/wiki_all_1M/groundtruth.1M.neighbors.ibin'
2024-07-01T22:59:50+00:00
Running /opt/conda/bin/ann/RAFT_CAGRA_ANN_BENCH
Run on (32 X 3200 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x16)
L1 Instruction 32 KiB (x16)
L2 Unified 256 KiB (x16)
L3 Unified 20480 KiB (x2)
Load Average: 0.25, 0.20, 0.18
dataset: wiki_all_1M
dim: 768
distance: euclidean
gpu_driver_version: 12.4
gpu_mem_bus_width: 256
gpu_mem_freq: 5001000000.000000
gpu_mem_global_size: 15642329088
gpu_mem_shared_size: 65536
gpu_name: Tesla T4
gpu_runtime_version: 12.2
gpu_sm_count: 40
gpu_sm_freq: 1590000000.000000
max_k: 100
max_n_queries: 10000
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark
Time CPU Iterations GPU Latency Recall end_to_end items_per_second itopk k n_queries search_width total_queries
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:1 0.285 ms 0.284 ms 10000 278.171u 284.522u 0.985367 2.84522 3.51467k/s 512 100 1 1 10k algo="multi_cta"
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:2 0.187 ms 0.374 ms 20000 366.896u 375.096u 0.985735 3.75144 5.34108k/s 512 100 1 1 20k algo="multi_cta"
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:4 0.158 ms 0.631 ms 40000 623.354u 634.679u 0.967445 6.34719 6.32244k/s 512 100 1 1 40k algo="multi_cta"
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:8 0.157 ms 1.24 ms 80000 1.23527m 1.26796m 0.949066 12.6799 6.37292k/s 512 100 1 1 80k algo="multi_cta"
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:16 0.159 ms 2.53 ms 160000 1.36368m 2.55572m 0.945477 25.5594 6.27904k/s 512 100 1 1 160k algo="multi_cta"
raft_cagra.graph_degree32.intermediate_graph_degree64.graph_build_algoNN_DESCENT.dataset_memory_typemmap/process_time/real_time/threads:32 0.162 ms 5.15 ms 320000 1.42218m 5.20363m 0.945503 52.0375 6.17188k/s 512 100 1 1 320k algo="multi_cta"
Expected behavior
The recall level should not decrease.
Environment details (please complete the following information):
- Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]: Docker
- Method of RAFT install: [conda, Docker, or from source]: Docker
- If method of install is [Docker], provide
docker pull&docker runcommands used: provided above
- If method of install is [Docker], provide
Metadata
Metadata
Assignees
Labels
Type
Projects
Status