Open
Description
Hi there,
I'm facing issue when trying to run the cudf_merge benchmark locally on a node that hosts 4 h100:
GPU0 GPU1 GPU2 GPU3 NIC0 NIC1 NIC2 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV6 NV6 NV6 SYS SYS SYS 24-31 3 N/A
GPU1 NV6 X NV6 NV6 SYS SYS SYS 24-31 3 N/A
GPU2 NV6 NV6 X NV6 SYS SYS SYS 40-47 5 N/A
GPU3 NV6 NV6 NV6 X SYS SYS SYS 40-47 5 N/A
NIC0 SYS SYS SYS SYS X PIX SYS
NIC1 SYS SYS SYS SYS PIX X SYS
NIC2 SYS SYS SYS SYS SYS SYS X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_2
NIC1: mlx5_3
NIC2: mlx5_bond_0
I can run the benchmark over any pair of GPUs with no issue:
python -m ucp.benchmarks.cudf_merge --devs 0,1 --chunk-size 200_000_000 --iter 10
ucx-py-cu12 0.40.0
[notice] A new release of pip is available: 23.2.1 -> 24.2
[notice] To update, run: pip install --upgrade pip
[1729696581.284731] [kh013:1596654:0] ucp_context.c:2190 UCX INFO Version 1.17.0 (loaded from /ssoft/spack/pinot-noir/kuma-h100/v1/spack/opt/spack/linux-rhel9-zen4/gcc-13.2.0/ucx-1.17.0-no2vdboyxq2falry3mus5kwwmpafamdy/lib/libucp.so.0)
[1729696584.749090] [kh013:1596654:0] parser.c:2314 UCX INFO UCX_* env variables: UCX_LOG_LEVEL=info UCX_MEMTYPE_CACHE=n UCX_RNDV_THRESH=8192 UCX_RNDV_FRAG_MEM_TYPE=cuda UCX_MAX_RNDV_RAILS=1 UCX_PROTO_ENABLE=n
[1729696586.301065] [kh013:1596677:0] ucp_context.c:2190 UCX INFO Version 1.17.0 (loaded from /ssoft/spack/pinot-noir/kuma-h100/v1/spack/opt/spack/linux-rhel9-zen4/gcc-13.2.0/ucx-1.17.0-no2vdboyxq2falry3mus5kwwmpafamdy/lib/libucp.so.0)
[1729696586.311799] [kh013:1596678:0] ucp_context.c:2190 UCX INFO Version 1.17.0 (loaded from /ssoft/spack/pinot-noir/kuma-h100/v1/spack/opt/spack/linux-rhel9-zen4/gcc-13.2.0/ucx-1.17.0-no2vdboyxq2falry3mus5kwwmpafamdy/lib/libucp.so.0)
[1729696586.928897] [kh013:1596677:0] parser.c:2314 UCX INFO UCX_* env variables: UCX_LOG_LEVEL=info UCX_MEMTYPE_CACHE=n UCX_RNDV_THRESH=8192 UCX_RNDV_FRAG_MEM_TYPE=cuda UCX_MAX_RNDV_RAILS=1 UCX_PROTO_ENABLE=n
[1729696586.928901] [kh013:1596678:0] parser.c:2314 UCX INFO UCX_* env variables: UCX_LOG_LEVEL=info UCX_MEMTYPE_CACHE=n UCX_RNDV_THRESH=8192 UCX_RNDV_FRAG_MEM_TYPE=cuda UCX_MAX_RNDV_RAILS=1 UCX_PROTO_ENABLE=n
[1729696586.949084] [kh013:1596677:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#3 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696586.949104] [kh013:1596678:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#3 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696586.971682] [kh013:1596654:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#2 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696586.988770] [kh013:1596677:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#4 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696586.992192] [kh013:1596678:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#4 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696586.994004] [kh013:1596678:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#5 tag(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) rma(rc_mlx5/mlx5_2:1) am(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) stream(rc_mlx5/mlx5_2:1) ka(ud_mlx5/mlx5_2:1)
[1729696586.994007] [kh013:1596677:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#5 tag(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) rma(rc_mlx5/mlx5_2:1) am(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) stream(rc_mlx5/mlx5_2:1) ka(ud_mlx5/mlx5_2:1)
[1729696587.000277] [kh013:1596654:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#3 tag(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) rma(rc_mlx5/mlx5_2:1) am(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) stream(rc_mlx5/mlx5_2:1) ka(ud_mlx5/mlx5_2:1)
[1729696587.123809] [kh013:1596678:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#6 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696587.135094] [kh013:1596677:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#6 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696587.136850] [kh013:1596677:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#7 tag(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) rma(rc_mlx5/mlx5_2:1) am(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) stream(rc_mlx5/mlx5_2:1) ka(ud_mlx5/mlx5_2:1)
[1729696587.137598] [kh013:1596678:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#7 tag(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) rma(rc_mlx5/mlx5_2:1) am(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) stream(rc_mlx5/mlx5_2:1) ka(ud_mlx5/mlx5_2:1)
cuDF merge benchmark
--------------------------------------------------------------------------------------------------------------
Device(s) | [0, 1]
Chunks per device | 1
Rows per chunk | 200000000
Total data processed | 119.21 GiB
Data processed per iter | 11.92 GiB
Row matching fraction | 0.3
==============================================================================================================
Wall-clock | 3.00 s
Bandwidth | 24.88 GiB/s
Throughput | 39.68 GiB/s
==============================================================================================================
Run | Wall-clock | Bandwidth | Throughput
0 | 161.36 ms | 108.39 GiB/s | 73.88 GiB/s
1 | 360.91 ms | 18.61 GiB/s | 33.03 GiB/s
2 | 455.87 ms | 13.33 GiB/s | 26.15 GiB/s
3 | 383.55 ms | 16.98 GiB/s | 31.08 GiB/s
4 | 169.31 ms | 90.74 GiB/s | 70.41 GiB/s
5 | 474.04 ms | 12.65 GiB/s | 25.15 GiB/s
6 | 293.38 ms | 25.85 GiB/s | 40.63 GiB/s
7 | 370.52 ms | 17.85 GiB/s | 32.17 GiB/s
8 | 161.21 ms | 108.41 GiB/s | 73.95 GiB/s
9 | 169.63 ms | 90.10 GiB/s | 70.28 GiB/s
But it fails to run over the 4 devices:
python -m ucp.benchmarks.cudf_merge --devs 0,1,2,3 --chunk-size 200_000_000 --iter 10
[1729696635.679592] [kh013:1596934:0] ucp_context.c:2190 UCX INFO Version 1.17.0 (loaded from /ssoft/spack/pinot-noir/kuma-h100/v1/spack/opt/spack/linux-rhel9-zen4/gcc-13.2.0/ucx-1.17.0-no2vdboyxq2falry3mus5kwwmpafamdy/lib/libucp.so.0)
[1729696639.178149] [kh013:1596934:0] parser.c:2314 UCX INFO UCX_* env variables: UCX_LOG_LEVEL=info UCX_MEMTYPE_CACHE=n UCX_RNDV_THRESH=8192 UCX_RNDV_FRAG_MEM_TYPE=cuda UCX_MAX_RNDV_RAILS=1 UCX_PROTO_ENABLE=n
[1729696642.222481] [kh013:1596952:0] ucp_context.c:2190 UCX INFO Version 1.17.0 (loaded from /ssoft/spack/pinot-noir/kuma-h100/v1/spack/opt/spack/linux-rhel9-zen4/gcc-13.2.0/ucx-1.17.0-no2vdboyxq2falry3mus5kwwmpafamdy/lib/libucp.so.0)
[1729696642.243167] [kh013:1596955:0] ucp_context.c:2190 UCX INFO Version 1.17.0 (loaded from /ssoft/spack/pinot-noir/kuma-h100/v1/spack/opt/spack/linux-rhel9-zen4/gcc-13.2.0/ucx-1.17.0-no2vdboyxq2falry3mus5kwwmpafamdy/lib/libucp.so.0)
[1729696642.245405] [kh013:1596953:0] ucp_context.c:2190 UCX INFO Version 1.17.0 (loaded from /ssoft/spack/pinot-noir/kuma-h100/v1/spack/opt/spack/linux-rhel9-zen4/gcc-13.2.0/ucx-1.17.0-no2vdboyxq2falry3mus5kwwmpafamdy/lib/libucp.so.0)
[1729696642.247080] [kh013:1596954:0] ucp_context.c:2190 UCX INFO Version 1.17.0 (loaded from /ssoft/spack/pinot-noir/kuma-h100/v1/spack/opt/spack/linux-rhel9-zen4/gcc-13.2.0/ucx-1.17.0-no2vdboyxq2falry3mus5kwwmpafamdy/lib/libucp.so.0)
[1729696642.977422] [kh013:1596952:0] parser.c:2314 UCX INFO UCX_* env variables: UCX_LOG_LEVEL=info UCX_MEMTYPE_CACHE=n UCX_RNDV_THRESH=8192 UCX_RNDV_FRAG_MEM_TYPE=cuda UCX_MAX_RNDV_RAILS=1 UCX_PROTO_ENABLE=n
[1729696642.980930] [kh013:1596955:0] parser.c:2314 UCX INFO UCX_* env variables: UCX_LOG_LEVEL=info UCX_MEMTYPE_CACHE=n UCX_RNDV_THRESH=8192 UCX_RNDV_FRAG_MEM_TYPE=cuda UCX_MAX_RNDV_RAILS=1 UCX_PROTO_ENABLE=n
[1729696642.997896] [kh013:1596952:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#3 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696643.000134] [kh013:1596955:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#3 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696643.012295] [kh013:1596954:0] parser.c:2314 UCX INFO UCX_* env variables: UCX_LOG_LEVEL=info UCX_MEMTYPE_CACHE=n UCX_RNDV_THRESH=8192 UCX_RNDV_FRAG_MEM_TYPE=cuda UCX_MAX_RNDV_RAILS=1 UCX_PROTO_ENABLE=n
[1729696643.014948] [kh013:1596953:0] parser.c:2314 UCX INFO UCX_* env variables: UCX_LOG_LEVEL=info UCX_MEMTYPE_CACHE=n UCX_RNDV_THRESH=8192 UCX_RNDV_FRAG_MEM_TYPE=cuda UCX_MAX_RNDV_RAILS=1 UCX_PROTO_ENABLE=n
[1729696643.020409] [kh013:1596934:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#2 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696643.035409] [kh013:1596953:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#3 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696643.035662] [kh013:1596954:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#3 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696643.039847] [kh013:1596952:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#4 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696643.041863] [kh013:1596955:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#4 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696643.043738] [kh013:1596952:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#5 tag(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) rma(rc_mlx5/mlx5_2:1) am(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) stream(rc_mlx5/mlx5_2:1) ka(ud_mlx5/mlx5_2:1)
[1729696643.043739] [kh013:1596955:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#5 tag(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) rma(rc_mlx5/mlx5_2:1) am(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) stream(rc_mlx5/mlx5_2:1) ka(ud_mlx5/mlx5_2:1)
[1729696643.046726] [kh013:1596954:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#4 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696643.048626] [kh013:1596953:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#4 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696643.049389] [kh013:1596954:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#5 tag(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) rma(rc_mlx5/mlx5_2:1) am(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) stream(rc_mlx5/mlx5_2:1) ka(ud_mlx5/mlx5_2:1)
[1729696643.050116] [kh013:1596934:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#3 tag(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) rma(rc_mlx5/mlx5_2:1) am(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) stream(rc_mlx5/mlx5_2:1) ka(ud_mlx5/mlx5_2:1)
[1729696643.050260] [kh013:1596953:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#5 tag(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) rma(rc_mlx5/mlx5_2:1) am(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) stream(rc_mlx5/mlx5_2:1) ka(ud_mlx5/mlx5_2:1)
[1729696643.076486] [kh013:1596954:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#6 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696643.087261] [kh013:1596953:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#6 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696643.089062] [kh013:1596953:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#7 tag(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) rma(rc_mlx5/mlx5_2:1) am(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) stream(rc_mlx5/mlx5_2:1) ka(ud_mlx5/mlx5_2:1)
[1729696643.089778] [kh013:1596954:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#7 tag(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) rma(rc_mlx5/mlx5_2:1) am(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) stream(rc_mlx5/mlx5_2:1) ka(ud_mlx5/mlx5_2:1)
[1729696643.103213] [kh013:1596955:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#6 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696643.114753] [kh013:1596955:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#7 tag(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) rma(rc_mlx5/mlx5_2:1) am(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) stream(rc_mlx5/mlx5_2:1) ka(ud_mlx5/mlx5_2:1)
[1729696643.176847] [kh013:1596953:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#8 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696643.184527] [kh013:1596952:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#6 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696643.185616] [kh013:1596954:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#8 tag(rc_mlx5/mlx5_bond_0:1) rma(rc_mlx5/mlx5_bond_0:1) am(rc_mlx5/mlx5_bond_0:1) stream(rc_mlx5/mlx5_bond_0:1) ka(rc_mlx5/mlx5_bond_0:1)
[1729696643.186623] [kh013:1596952:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#7 tag(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) rma(rc_mlx5/mlx5_2:1) am(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) stream(rc_mlx5/mlx5_2:1) ka(ud_mlx5/mlx5_2:1)
[1729696643.187346] [kh013:1596954:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#9 tag(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) rma(rc_mlx5/mlx5_2:1) am(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) stream(rc_mlx5/mlx5_2:1) ka(ud_mlx5/mlx5_2:1)
[1729696643.187424] [kh013:1596953:0] ucp_worker.c:1888 UCX INFO ucp_context_0 intra-node cfg#9 tag(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) rma(rc_mlx5/mlx5_2:1) am(rc_mlx5/mlx5_2:1 cuda_ipc/cuda) stream(rc_mlx5/mlx5_2:1) ka(ud_mlx5/mlx5_2:1)
Task exception was never retrieved
future: <Task finished name='Task-7' coro=<_listener_handler_coroutine() done, defined at /work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/core.py:140> exception=UCXCanceled("<[Recv #002] ep: 0x7f5af660f140, tag: 0xedf8353cc3df7250, nbytes: 8, type: <class 'array.array'>>: ")>
Traceback (most recent call last):
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/core.py", line 190, in _listener_handler_coroutine
await func(ep)
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/benchmarks/utils.py", line 106, in server_handler
worker_results = await recv_pickled_msg(ep)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/benchmarks/utils.py", line 75, in recv_pickled_msg
msg = await ep.recv_obj()
^^^^^^^^^^^^^^^^^^^
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/core.py", line 863, in recv_obj
await self.recv(nbytes, tag=tag)
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/core.py", line 737, in recv
ret = await comm.tag_recv(self._ep, buffer, nbytes, tag, name=log)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ucp._libs.exceptions.UCXCanceled: <[Recv #002] ep: 0x7f5af660f140, tag: 0xedf8353cc3df7250, nbytes: 8, type: <class 'array.array'>>:
Task exception was never retrieved
future: <Task finished name='Task-2' coro=<_listener_handler_coroutine() done, defined at /work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/core.py:140> exception=UCXCanceled("<[Recv #002] ep: 0x7f5af660f080, tag: 0xd49e6a08b8eeaedd, nbytes: 8, type: <class 'array.array'>>: ")>
Traceback (most recent call last):
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/core.py", line 190, in _listener_handler_coroutine
await func(ep)
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/benchmarks/utils.py", line 106, in server_handler
worker_results = await recv_pickled_msg(ep)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/benchmarks/utils.py", line 75, in recv_pickled_msg
msg = await ep.recv_obj()
^^^^^^^^^^^^^^^^^^^
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/core.py", line 863, in recv_obj
await self.recv(nbytes, tag=tag)
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/core.py", line 737, in recv
ret = await comm.tag_recv(self._ep, buffer, nbytes, tag, name=log)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ucp._libs.exceptions.UCXCanceled: <[Recv #002] ep: 0x7f5af660f080, tag: 0xd49e6a08b8eeaedd, nbytes: 8, type: <class 'array.array'>>:
Task exception was never retrieved
future: <Task finished name='Task-3' coro=<_listener_handler_coroutine() done, defined at /work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/core.py:140> exception=UCXCanceled("<[Recv #002] ep: 0x7f5af660f0c0, tag: 0xbe680e4915f49a08, nbytes: 8, type: <class 'array.array'>>: ")>
Traceback (most recent call last):
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/core.py", line 190, in _listener_handler_coroutine
await func(ep)
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/benchmarks/utils.py", line 106, in server_handler
worker_results = await recv_pickled_msg(ep)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/benchmarks/utils.py", line 75, in recv_pickled_msg
msg = await ep.recv_obj()
^^^^^^^^^^^^^^^^^^^
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/core.py", line 863, in recv_obj
await self.recv(nbytes, tag=tag)
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/core.py", line 737, in recv
ret = await comm.tag_recv(self._ep, buffer, nbytes, tag, name=log)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ucp._libs.exceptions.UCXCanceled: <[Recv #002] ep: 0x7f5af660f0c0, tag: 0xbe680e4915f49a08, nbytes: 8, type: <class 'array.array'>>:
Task exception was never retrieved
future: <Task finished name='Task-6' coro=<_listener_handler_coroutine() done, defined at /work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/core.py:140> exception=UCXCanceled("<[Recv #002] ep: 0x7f5af660f100, tag: 0xf1c3abccc2be7d01, nbytes: 8, type: <class 'array.array'>>: ")>
Traceback (most recent call last):
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/core.py", line 190, in _listener_handler_coroutine
await func(ep)
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/benchmarks/utils.py", line 106, in server_handler
worker_results = await recv_pickled_msg(ep)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/benchmarks/utils.py", line 75, in recv_pickled_msg
msg = await ep.recv_obj()
^^^^^^^^^^^^^^^^^^^
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/core.py", line 863, in recv_obj
await self.recv(nbytes, tag=tag)
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/core.py", line 737, in recv
ret = await comm.tag_recv(self._ep, buffer, nbytes, tag, name=log)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ucp._libs.exceptions.UCXCanceled: <[Recv #002] ep: 0x7f5af660f100, tag: 0xf1c3abccc2be7d01, nbytes: 8, type: <class 'array.array'>>:
^CProcess Process-1:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
Process Process-3:
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/benchmarks/cudf_merge.py", line 633, in <module>
Process Process-2:
main()
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/benchmarks/cudf_merge.py", line 590, in main
stats = [server_queue.get() for i in range(args.n_chunks)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/benchmarks/cudf_merge.py", line 590, in <listcomp>
stats = [server_queue.get() for i in range(args.n_chunks)]
^^^^^^^^^^^^^^^^^^
File "/ssoft/spack/pinot-noir/kuma-h100/v1/spack/opt/spack/linux-rhel9-zen4/gcc-13.2.0/python-3.11.7-wpgsyqek7spdydbmic66srcfb3v7kzoi/lib/python3.11/multiprocessing/queues.py", line 103, in get
res = self._recv_bytes()
^^^^^^^^^^^^^^^^^^
File "/ssoft/spack/pinot-noir/kuma-h100/v1/spack/opt/spack/linux-rhel9-zen4/gcc-13.2.0/python-3.11.7-wpgsyqek7spdydbmic66srcfb3v7kzoi/lib/python3.11/multiprocessing/connection.py", line 216, in recv_bytes
Traceback (most recent call last):
buf = self._recv_bytes(maxlength)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ssoft/spack/pinot-noir/kuma-h100/v1/spack/opt/spack/linux-rhel9-zen4/gcc-13.2.0/python-3.11.7-wpgsyqek7spdydbmic66srcfb3v7kzoi/lib/python3.11/multiprocessing/connection.py", line 430, in _recv_bytes
File "/ssoft/spack/pinot-noir/kuma-h100/v1/spack/opt/spack/linux-rhel9-zen4/gcc-13.2.0/python-3.11.7-wpgsyqek7spdydbmic66srcfb3v7kzoi/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
buf = self._recv(4)
File "/ssoft/spack/pinot-noir/kuma-h100/v1/spack/opt/spack/linux-rhel9-zen4/gcc-13.2.0/python-3.11.7-wpgsyqek7spdydbmic66srcfb3v7kzoi/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/work/scitas-ge/orliac/KUMA_VENVS/UCX-PY-BENCH/lib/python3.11/site-packages/ucp/benchmarks/utils.py", line 125, in _server_process
ret = loop.run_until_complete(run())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^
File "/ssoft/spack/pinot-noir/kuma-h100/v1/spack/opt/spack/linux-rhel9-zen4/gcc-13.2.0/python-3.11.7-wpgsyqek7spdydbmic66srcfb3v7kzoi/lib/python3.11/asyncio/base_events.py", line 640, in run_until_complete
self.run_forever()
File "/ssoft/spack/pinot-noir/kuma-h100/v1/spack/opt/spack/linux-rhel9-zen4/gcc-13.2.0/python-3.11.7-wpgsyqek7spdydbmic66srcfb3v7kzoi/lib/python3.11/multiprocessing/connection.py", line 395, in _recv
File "/ssoft/spack/pinot-noir/kuma-h100/v1/spack/opt/spack/linux-rhel9-zen4/gcc-13.2.0/python-3.11.7-wpgsyqek7spdydbmic66srcfb3v7kzoi/lib/python3.11/asyncio/base_events.py", line 607, in run_forever
self._run_once()
File "/ssoft/spack/pinot-noir/kuma-h100/v1/spack/opt/spack/linux-rhel9-zen4/gcc-13.2.0/python-3.11.7-wpgsyqek7spdydbmic66srcfb3v7kzoi/lib/python3.11/asyncio/base_events.py", line 1884, in _run_once
event_list = self._selector.select(timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ssoft/spack/pinot-noir/kuma-h100/v1/spack/opt/spack/linux-rhel9-zen4/gcc-13.2.0/python-3.11.7-wpgsyqek7spdydbmic66srcfb3v7kzoi/lib/python3.11/selectors.py", line 468, in select
fd_event_list = self._selector.poll(timeout, max_ev)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
My environment:
Package Version
---------------------- -----------
cachetools 5.5.0
click 8.1.7
cloudpickle 3.1.0
cuda-python 12.6.0
cudf-cu12 24.10.1
cupy-cuda12x 13.3.0
dask 2024.9.0
dask-cudf-cu12 24.10.1
dask-expr 1.1.14
distributed 2024.9.0
fastrlock 0.8.2
fsspec 2024.10.0
importlib_metadata 8.5.0
Jinja2 3.1.4
libcudf-cu12 24.10.1
llvmlite 0.43.0
locket 1.0.0
markdown-it-py 3.0.0
MarkupSafe 3.0.2
mdurl 0.1.2
msgpack 1.1.0
numba 0.60.0
numpy 2.0.2
nvtx 0.2.10
packaging 24.1
pandas 2.2.2
partd 1.4.2
pip 23.2.1
psutil 6.1.0
pyarrow 17.0.0
Pygments 2.18.0
pylibcudf-cu12 24.10.1
pynvjitlink-cu12 0.3.0
python-dateutil 2.9.0.post0
pytz 2024.2
PyYAML 6.0.2
rapids-dask-dependency 24.10.0
rich 13.9.2
rmm-cu12 24.10.0
setuptools 65.5.0
six 1.16.0
sortedcontainers 2.4.0
tblib 3.0.0
toolz 1.0.0
tornado 6.4.1
typing_extensions 4.12.2
tzdata 2024.2
ucx-py-cu12 0.40.0
urllib3 2.2.3
zict 3.0.0
zipp 3.20.2
Any idea?
Also, I'm surprised by the variability of the benchmark over the successive 10 iterations.
And finally, is it expected that the benchmark saturate the available bandwidth between the GPUs?
Metadata
Metadata
Assignees
Labels
No labels