build.sh: Add option to log nvcc compile times #1262

ahendriksen · 2023-02-09T13:53:28Z

Add --time option to build.sh that enables compile time logging of nvcc.

Also, add a script cpp/scripts/analyze_nvcc_log.py to find the translation units that take the longest time.

Output looks like:

$ cpp/scripts/analyze_nvcc_log.py cpp/build/nvcc_compile_log.csv
-- loading data
-- analyzing data
-- Ten longest translation units:
phase  index                                               file        cicc   cudafe++  fatbinary  gcc (compiling)  gcc (preprocessing 1)  gcc (preprocessing 4)        ptxas   total time
0         10  ions/detail/canberra_double_double_double_int.cu    42.431063  10.601856   0.020979         6.747153               3.721194               2.093567  1618.390375  1684.006186
1         11  zations/detail/canberra_float_float_float_int.cu    36.928960   9.804138   0.011537         6.796088               3.481156               1.790703  1584.262875  1643.075457
2         85  ors/specializations/refine_d_uint64_t_uint8_t.cu   602.935531  14.980877   0.529673        36.300566               6.270717               2.889723   933.622969  1597.530056
3         84  bors/specializations/refine_d_uint64_t_int8_t.cu   606.513281  16.243960   0.729282        39.981113               5.608029               3.028493   897.241469  1569.345628
4         53  stance/neighbors/ivfpq_search_int8_t_uint64_t.cu   841.049750   8.233967   1.025554        24.248578               4.069022               1.747108   631.193734  1511.567713
5         52  istance/neighbors/ivfpq_search_float_uint64_t.cu   837.241437   8.145278   1.042313        24.400606               3.433528               1.882623   627.786672  1503.932457
6         54  tance/neighbors/ivfpq_search_uint8_t_uint64_t.cu   846.706656   8.371286   1.025517        24.094691               3.432749               1.645345   618.319234  1503.595479
7         76  izations/detail/ivfpq_search_uint8_t_uint64_t.cu   698.726266   7.086368   1.050021        39.727723               3.259101               1.333935   406.509937  1157.693351
8         74  alizations/detail/ivfpq_search_float_uint64_t.cu   706.702516   6.905794   1.049731        39.923895               2.814361               2.057154   395.604000  1155.057450
9         75  lizations/detail/ivfpq_search_int8_t_uint64_t.cu   689.390281   6.483386   1.025864        39.865668               3.121696               1.297788   409.099562  1150.284245
10        83  hbors/specializations/refine_d_uint64_t_float.cu   334.705594  15.466444   0.680270        36.551977               5.405133               2.947568   715.708781  1111.465767
-- Plotting absolute compile times
-- Wrote absolute compile time plot to cpp/build/nvcc_compile_log.csv.absolute.compile_times.png
-- Plotting relative compile times
-- Wrote relative compile time plot to cpp/build/nvcc_compile_log.csv.relative.compile_times.png

ahendriksen · 2023-02-09T13:54:34Z

An example of what the output looks like on PR #1254 :

ahendriksen · 2023-02-09T15:33:19Z

Okay.. This is causing segfaults in CI.. That is a disappointment..

cjnolet · 2023-02-09T15:44:36Z

In addition to the segfault, we have other challenges right now trying to extract the ninja_logs from the build for each package in CI (conda build wants to create its own working directory and then copy to a final directory when done, which overwrites the existing files each time).

Maybe for now we could add the --time flag as an option in build.sh (I'm thinking maybe build.sh -t to enable it? What do you think?).

I'd also like to start capturing these scripts in the codebase. Maybe we could create a new directory inside cpp/scripts specifically for these build analysis scripts? Something like cpp/scripts/stats? (I'm terrible at naming sometimes)

robertmaynard · 2023-02-09T15:58:44Z

Since we are specifying the same file for each compiler invocation we are hitting parallel clobber issues due to multiple compiler instances writing to the file at the same time.

If you need a high level overview of what compile jobs have taken so long, and not where in nvcc it has occured. You can scrape the ninja build database for that information. libcudf is doing this already: https://github.com/rapidsai/cudf/blob/branch-23.04/build.sh#L303

ahendriksen · 2023-02-09T16:37:25Z

Since we are specifying the same file for each compiler invocation we are hitting parallel clobber issues due to multiple compiler instances writing to the file at the same time.

I thought that could be an issue as well, but locally (12 cores) this has not been a problem. To rule it out, is there a way to instruct CMake to make the flag dependent on the output file, e.g., to get --time=CMakeFiles/raft_distance_lib.dir/src/distance/neighbors/ivfpq_search_uint8_t_uint64_t.cu.o.nvcc_log.csv?

robertmaynard · 2023-02-09T18:35:05Z

Since we are specifying the same file for each compiler invocation we are hitting parallel clobber issues due to multiple compiler instances writing to the file at the same time.

I thought that could be an issue as well, but locally (12 cores) this has not been a problem. To rule it out, is there a way to instruct CMake to make the flag dependent on the output file, e.g., to get --time=CMakeFiles/raft_distance_lib.dir/src/distance/neighbors/ivfpq_search_uint8_t_uint64_t.cu.o.nvcc_log.csv?

There is no easy way to do this. CMake doesn't offer any way to get the object location on a per source basis.

You would need to do something like this:

get_target_property(sources <target_name> SOURCES)
foreach(source IN LISTS sources)
  cmake_path(GET source FILENAME source_name)
  set_source_files_properties(${source} PROPERTIES COMPILE_FLAGS "--time=${source_name}.csv")
endforeach()

ahendriksen · 2023-02-10T10:17:54Z

@robertmaynard Thanks for the snippet! It works locally.

Since the --time is unlikely to work in general (it is now failing in CI; it may interfere with compile_commands.json), I have made it an optional flag to build.sh as @cjnolet suggested. I have also added a script to analyze the results.

cjnolet · 2023-02-11T02:39:07Z

@ahendriksen while it may seem cumbersome, I’ve been thinking about @robertmaynard’s previous suggestion and I actually think it could be useful to dump individual csv files for each of the source files that get compiled.

Though, I wonder, should we stick them in a common directory (maybe something like cpp/build/cuda_time_logs?) I still like the idea of keeping it as an option in build.sh (which ultimately turns on an option in cmake) and I can’t imagine this would add all that much overhead to the build. What we can do is have the script read through all the csv files in the whatever we decide to name that directory.

what do you think?

ahendriksen · 2023-02-20T09:11:46Z

@cjnolet, One thing that may not be immediately obvious from the suggestion of @robertmaynard is that:

it creates file names that are equal to the source file name (and thus discards the directory structure)
we have a dozen files that share the same name.

In summary, with the change you are suggesting, there would still be some translation units that result in writes to the same csv file and we would have to collect all csv files in a directory (as you suggest) to reduce clutter. As such, I think the current design is a local optimum.

robertmaynard · 2023-02-21T18:23:33Z

@cjnolet, One thing that may not be immediately obvious from the suggestion of @robertmaynard is that:

it creates file names that are equal to the source file name (and thus discards the directory structure)

we have a dozen files that share the same name.

In summary, with the change you are suggesting, there would still be some translation units that result in writes to the same csv file and we would have to collect all csv files in a directory (as you suggest) to reduce clutter. As such, I think the current design is a local optimum.

Those are both solvable by computing the relative path of the source file from the root of the project and encoding it into the file name / path.

ahendriksen · 2023-02-23T09:43:46Z

I thought this was going to be an easy change. The scope is slowly getting out of hand. Maybe we can close the PR and leave it as is for documentation purposes. What do you think @cjnolet?

cjnolet · 2023-03-09T02:59:25Z

@ahendriksen I'll leave that up to you. I do think it's a very useful change and it would really help debugging compile times. I agree with @robertmaynard, though, that we should probably capture these timings in individual files. It doesn't seem too hard to accomplish this by taking the path for each file and replacing the / for _. Maybe I'm oversimplifying this though?

@robertmaynard is there an easy way to get a hold of the relative path (from the repo root) for each source file? @ahendriksen I would honestly be happy even if we just dumped the renamed files to a single directory for now.

robertmaynard · 2023-03-09T16:29:42Z

@ahendriksen I'll leave that up to you. I do think it's a very useful change and it would really help debugging compile times. I agree with @robertmaynard, though, that we should probably capture these timings in individual files. It doesn't seem too hard to accomplish this by taking the path for each file and replacing the / for _. Maybe I'm oversimplifying this though?

@robertmaynard is there an easy way to get a hold of the relative path (from the repo root) for each source file? @ahendriksen I would honestly be happy even if we just dumped the renamed files to a single directory for now.

@cjnolet Here is how you can get the relative path for each source file and convert them into allowable file names:

get_target_property(sources <TARGET> SOURCES)
foreach(source IN LISTS sources)
  cmake_path(IS_ABSOLUTE source is_abs)
  if(is_abs)
    cmake_path(RELATIVE_PATH source BASE_DIRECTORY ${PROJECT_SOURCE_DIR}) # convert to relative path if not already one
  endif()
  string(MAKE_C_IDENTIFIER "${source}" filename) #convert to valid filename
  set_source_files_properties(${source} PROPERTIES COMPILE_FLAGS "--time=${filename}.csv")
endforeach()

This will create a csv file in the cpp/build directory that records the compilation of each translation unit and how long each phase of the nvcc compilation took. There does not seem to be a downside to enabling this, and it will be very helpful to diagnose build issues. To analyze the file, the following python code will help. It requires pandas, matplotlib, and seaborn: ------------------------------------------------------------ import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from pathlib import Path from matplotlib import colors df = pd.read_csv("./nvcc_compile_log.csv") df = df.rename(columns=str.strip) df["seconds"] = df["metric"] / 1000 df["file"] = df["source file name"] df["phase"] = df["phase name"].str.strip() def categorize_time(s): if s < 60: return "less than a minute" else: return "more than a minute" dfp = df.query("phase!='nvcc (driver)'").pivot("file", values="seconds", columns="phase") dfp_sum = dfp.sum(axis="columns") df_fraction = dfp.divide(dfp_sum, axis="index") df_fraction["total time"] = dfp_sum df_fraction = df_fraction.melt(ignore_index=False, id_vars="total time", var_name="phase", value_name="fraction") dfp["total time"] = dfp_sum df_absolute = dfp.melt(ignore_index=False, id_vars="total time", var_name="phase", value_name="seconds") df_fraction["time category"] = dfp["total time"].apply(categorize_time) df_absolute["time category"] = dfp["total time"].apply(categorize_time) palette = { "gcc (preprocessing 4)": colors.hsv_to_rgb((0, 1, 1)), 'cudafe++': colors.hsv_to_rgb((0, 1, .75)), 'gcc (compiling)': colors.hsv_to_rgb((0, 1, .4)), "gcc (preprocessing 1)": colors.hsv_to_rgb((.33, 1, 1)), 'cicc': colors.hsv_to_rgb((.33, 1, 0.75)), 'ptxas': colors.hsv_to_rgb((.33, 1, 0.4)), 'fatbinary': "grey", } sns.displot( df_absolute.sort_values("total time"), y="file", hue="phase", hue_order=reversed(["gcc (preprocessing 4)", 'cudafe++', 'gcc (compiling)', "gcc (preprocessing 1)", 'cicc', 'ptxas', 'fatbinary', ]), palette=palette, weights="seconds", multiple="stack", kind="hist", height=20, ) plt.xlabel("seconds"); plt.savefig('absolute_compile_times.png') sns.displot( df_fraction.sort_values('total time'), y="file", hue="phase", hue_order=reversed(["gcc (preprocessing 4)", 'cudafe++', 'gcc (compiling)', "gcc (preprocessing 1)", 'cicc', 'ptxas', 'fatbinary', ]), palette=palette, weights="fraction", multiple="stack", kind="hist", height=15, ) plt.xlabel("fraction"); plt.savefix("relative_compile_times.png")

Hopefully this prevents segfaults in CI.

The time option is disabled by default. When enabled, writes a log of compilation times to cpp/build/nvcc_compile_log.csv. This is not supported in CI, as it leads to seg faults.

Perhaps the file system outside the CMakeFiles/ directory is not writable.

cpp/CMakeLists.txt

cpp/scripts/analyze_nvcc_log.py

ahendriksen · 2023-03-24T14:33:31Z

I think it might be more difficult than we thought to get this working in CI. I have tried the suggestion writing to separate csv files, but the compilation continues to segfault in CI. A formatted example from the logs:

sccache /opt/conda/conda-bld/_build_env/bin/nvcc -forward-unknown-to-host-compiler [.. snip flags .. ]  \
--time=CMakeFiles/nvcc_log_src_neighbors_refine_d_int64_t_int8_t_cu.csv  \
-MD -MT CMakeFiles/raft_lib.dir/src/neighbors/refine_d_int64_t_int8_t.cu.o \
 -MF CMakeFiles/raft_lib.dir/src/neighbors/refine_d_int64_t_int8_t.cu.o.d -x cu \
-c /opt/conda/conda-bld/work/cpp/src/neighbors/refine_d_int64_t_int8_t.cu \
-o CMakeFiles/raft_lib.dir/src/neighbors/refine_d_int64_t_int8_t.cu.o
nvcc: Segmentation fault

I thought it could have to do with long file names that commonly occur in conda builds, but the paths passed to nvcc are all reasonably short.

If this cannot be made to work in CI, then maybe we should just document that -DCMAKE_CUDA_FLAGS='--time=nvcc_log.csv' will output the log file in one location and the script that is provided in the scripts directory can be used to analyze it (with all the risks for bitrot that that entails).

cjnolet · 2023-03-25T01:51:51Z

@robertmaynard do you have any other ideas as to why this might still be causing a segfault in CI.

Im actually a little surprised myself that it's still doing failing.

cjnolet · 2023-03-28T16:19:31Z

@ahendriksen Im actually on board w/ keeping this feature as-is for now (outputting to individual files that we can analyze separately) and just not having it run in CI. At least the feature exists that we can run locally to profile the build and we can turn it back on in CI once we figure out what's causing the segault. What do you think? I'd love to have this in 23.04.

ahendriksen · 2023-03-28T17:16:06Z

That sounds good! I would prefer to revert the scattering of the output files though, as it does not fix CI. Previously, all the information was be centralized in a single csv file. This makes analysis way easier.

…

________________________________ From: Corey J. Nolet ***@***.***> Sent: Tuesday, March 28, 2023 6:19:42 PM To: rapidsai/raft ***@***.***> Cc: Allard Hendriksen ***@***.***>; Mention ***@***.***> Subject: Re: [rapidsai/raft] build.sh: Add option to log nvcc compile times (PR #1262) @ahendriksen<https://github.com/ahendriksen> Im actually on board w/ keeping this feature as-is for now (outputting to individual files that we can analyze separately) and just not having it run in CI. At least the feature exists that we can run locally to profile the build and we can turn it back on in CI once we figure out what's causing the segault. What do you think? I'd love to have this in 23.04. — Reply to this email directly, view it on GitHub<#1262 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AA72YFUZOU5CBO7Z5QOSBELW6MFR5ANCNFSM6AAAAAAUWSRJ5I>. You are receiving this because you were mentioned.Message ID: ***@***.***>

----------------------------------------------------------------------------------- NVIDIA GmbH Wuerselen Amtsgericht Aachen HRB 8361 Managing Directors: Michael Ching, Donald Robertson, Rebecca Peters and Ludwig von Reiche

----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -----------------------------------------------------------------------------------

cjnolet · 2023-03-29T02:32:12Z

@ahendriksen are you sure all the data is being written to the file in that case? I wouldn't normally expect multiple threads / processes to be able to independently write to the same file without some sort of locking mechanism. @robertmaynard do you know if this is the case with nvcc?

ahendriksen · 2023-03-29T10:46:18Z

@cjnolet : I haven't read the nvcc source code if that is what you mean. I just tested with make -j 200 and 300 single empty-kernel translation units. The resulting csv file still parses and seems to be correct.

robertmaynard · 2023-03-29T14:20:11Z

I agree with @ahendriksen in testing this nvcc feature does seem to be multi process safe. It most likely does come at some performance cost as each processes takes an exclusive lock on the file.

My only theory on why we are seeing segfaults in CI is that this tracking does incur some memory overhead which is causing a OOM segfault.

cjnolet · 2023-03-29T14:30:48Z

@ahendriksen @robertmaynard sounds good to me. If we aren't losing any information, I do agree that it's easier for analysis just to keep everything in the same file.

This reverts commit 713e7b8.

This reverts commit 2aec639.

…compile-times

cjnolet · 2023-03-29T16:36:15Z

/merge

ahendriksen requested a review from a team as a code owner February 9, 2023 13:53

github-actions bot added CMake cpp labels Feb 9, 2023

ahendriksen requested a review from cjnolet February 9, 2023 13:54

ahendriksen added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Build Time Improvement 3 - Ready for Review labels Feb 9, 2023

ahendriksen requested a review from a team as a code owner February 10, 2023 10:12

ahendriksen changed the title ~~Log nvcc compile time by default~~ build.sh: Add option to log nvcc compile times Feb 10, 2023

ahendriksen force-pushed the enh-log-nvcc-compile-times branch from d2becd2 to 97d5f9e Compare February 10, 2023 10:39

cjnolet assigned ahendriksen Feb 14, 2023

ahendriksen added 5 commits March 21, 2023 17:51

Move default location of compile log

c6dff96

Hopefully this prevents segfaults in CI.

build.sh: Add --time option to log nvcc compile time

d175ed6

The time option is disabled by default. When enabled, writes a log of compilation times to cpp/build/nvcc_compile_log.csv. This is not supported in CI, as it leads to seg faults.

Add script to analyze nvcc compile time log

9acd1d0

Test nvcc log per source file in CI

2aec639

ahendriksen force-pushed the enh-log-nvcc-compile-times branch from f0d31b0 to 2aec639 Compare March 21, 2023 17:12

Try to fix segmentation faults (again)

713e7b8

Perhaps the file system outside the CMakeFiles/ directory is not writable.

divyegala reviewed Mar 24, 2023

View reviewed changes

cpp/CMakeLists.txt Outdated Show resolved Hide resolved

cpp/scripts/analyze_nvcc_log.py Outdated Show resolved Hide resolved

ahendriksen added 3 commits March 29, 2023 16:37

Revert "Try to fix segmentation faults (again)"

1aa4383

This reverts commit 713e7b8.

Revert "Test nvcc log per source file in CI"

1872afd

This reverts commit 2aec639.

Merge remote-tracking branch 'rapids/branch-23.04' into enh-log-nvcc-…

9fc48d0

…compile-times

cjnolet approved these changes Mar 29, 2023

View reviewed changes

Implement review feedback

1b716ef

rapids-bot bot merged commit e963f5a into rapidsai:branch-23.04 Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build.sh: Add option to log nvcc compile times #1262

build.sh: Add option to log nvcc compile times #1262

ahendriksen commented Feb 9, 2023 •

edited

Loading

ahendriksen commented Feb 9, 2023 •

edited

Loading

ahendriksen commented Feb 9, 2023

cjnolet commented Feb 9, 2023

robertmaynard commented Feb 9, 2023

ahendriksen commented Feb 9, 2023

robertmaynard commented Feb 9, 2023

ahendriksen commented Feb 10, 2023

cjnolet commented Feb 11, 2023 •

edited

Loading

ahendriksen commented Feb 20, 2023

robertmaynard commented Feb 21, 2023

ahendriksen commented Feb 23, 2023

cjnolet commented Mar 9, 2023 •

edited

Loading

robertmaynard commented Mar 9, 2023

ahendriksen commented Mar 24, 2023

cjnolet commented Mar 25, 2023

cjnolet commented Mar 28, 2023

ahendriksen commented Mar 28, 2023 via email

cjnolet commented Mar 29, 2023 •

edited

Loading

ahendriksen commented Mar 29, 2023

robertmaynard commented Mar 29, 2023

cjnolet commented Mar 29, 2023

cjnolet commented Mar 29, 2023

build.sh: Add option to log nvcc compile times #1262

build.sh: Add option to log nvcc compile times #1262

Conversation

ahendriksen commented Feb 9, 2023 • edited Loading

ahendriksen commented Feb 9, 2023 • edited Loading

ahendriksen commented Feb 9, 2023

cjnolet commented Feb 9, 2023

robertmaynard commented Feb 9, 2023

ahendriksen commented Feb 9, 2023

robertmaynard commented Feb 9, 2023

ahendriksen commented Feb 10, 2023

cjnolet commented Feb 11, 2023 • edited Loading

ahendriksen commented Feb 20, 2023

robertmaynard commented Feb 21, 2023

ahendriksen commented Feb 23, 2023

cjnolet commented Mar 9, 2023 • edited Loading

robertmaynard commented Mar 9, 2023

ahendriksen commented Mar 24, 2023

cjnolet commented Mar 25, 2023

cjnolet commented Mar 28, 2023

ahendriksen commented Mar 28, 2023 via email

cjnolet commented Mar 29, 2023 • edited Loading

ahendriksen commented Mar 29, 2023

robertmaynard commented Mar 29, 2023

cjnolet commented Mar 29, 2023

cjnolet commented Mar 29, 2023

ahendriksen commented Feb 9, 2023 •

edited

Loading

ahendriksen commented Feb 9, 2023 •

edited

Loading

cjnolet commented Feb 11, 2023 •

edited

Loading

cjnolet commented Mar 9, 2023 •

edited

Loading

cjnolet commented Mar 29, 2023 •

edited

Loading