-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Options to Generate Build Metrics Report #1369
Add Options to Generate Build Metrics Report #1369
Conversation
cc @ahendriksen for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Some small comments.
build.sh
Outdated
@@ -18,7 +18,7 @@ ARGS=$* | |||
# script, and that this script resides in the repo dir! | |||
REPODIR=$(cd $(dirname $0); pwd) | |||
|
|||
VALIDARGS="clean libraft pylibraft raft-dask docs tests bench clean --uninstall -v -g -n --compile-lib --allgpuarch --no-nvtx --show_depr_warn -h" | |||
VALIDARGS="clean libraft pylibraft raft-dask docs tests bench clean --uninstall -v -g -n --compile-lib --allgpuarch --no-nvtx --show_depr_warn --build_metrics --incl_cache_stats -h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Apart from --show_depr_warn
, most other options use -
. I would prefer this for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I also change --show_depr_warn
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am okay with changing --show_depr_warn
to --show-depr-warn
for consistency in the build.sh
flags of RAFT. @cjnolet : are you okay too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
I left some comments soliciting feedback from you and others.
Let's wait to make any of these changes until we get some consensus so that we can avoid wasteful iterations.
build.sh
Outdated
# get the current count before the compile starts | ||
CACHE_COMMAND="" | ||
if [[ -z "${CACHE_TOOL}" ]]; then | ||
# default is sccache for CI | ||
CACHE_COMMAND="sccache" | ||
else | ||
CACHE_COMMAND="${CACHE_TOOL}" | ||
fi | ||
if [[ "$BUILD_REPORT_INCL_CACHE_STATS" == "ON" && -x "$(command -v ${CACHE_COMMAND})" ]]; then | ||
# zero the cache statistics | ||
"${CACHE_COMMAND}" --zero-stats | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# get the current count before the compile starts | |
CACHE_COMMAND="" | |
if [[ -z "${CACHE_TOOL}" ]]; then | |
# default is sccache for CI | |
CACHE_COMMAND="sccache" | |
else | |
CACHE_COMMAND="${CACHE_TOOL}" | |
fi | |
if [[ "$BUILD_REPORT_INCL_CACHE_STATS" == "ON" && -x "$(command -v ${CACHE_COMMAND})" ]]; then | |
# zero the cache statistics | |
"${CACHE_COMMAND}" --zero-stats | |
fi | |
if [[ "$BUILD_REPORT_INCL_CACHE_STATS" == "ON" && -x "$(command -v ${CACHE_TOOL:-sccache})" ]]; then | |
"${CACHE_COMMAND}" --zero-stats | |
fi |
Two quick comments:
- Bash has the concept of default values that we can use to simplify this logic (src) to just
"$(command -v ${CACHE_TOOL:-sccache})"
- We can omit comments like
# zero the cache statistics
when using the long form of CLI flags like--zero-stats
since they're a bit redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay! I didn't know about default values in bash. But I think the correct expression here is:
CACHE_COMMAND=""
if [[ "$BUILD_REPORT_INCL_CACHE_STATS" == "ON" && -x "$(command -v ${CACHE_TOOL:-sccache})" ]]; then
CACHE_COMMAND=$CACHE_TOOL
"${CACHE_COMMAND}" --zero-stats
fi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change doesn't work. Can you verify what the failure is about? My bash is really weak:
./build.sh: line 394: : command not found
build.sh
Outdated
if [[ "$BUILD_REPORT_METRICS" == "ON" && -f "${LIBRAFT_BUILD_DIR}/.ninja_log" ]]; then | ||
if ! rapids-build-metrics-reporter.py 2> /dev/null && [ ! -f rapids-build-metrics-reporter.py ]; then | ||
echo "Downloading rapids-build-metrics-reporter.py" | ||
curl -sO https://raw.githubusercontent.com/rapidsai/build-metrics-reporter/v1/rapids-build-metrics-reporter.py | ||
fi | ||
|
||
echo "Formatting build metrics" | ||
MSG="" | ||
# get some sccache/ccache stats after the compile | ||
if [[ "$BUILD_REPORT_INCL_CACHE_STATS" == "ON" ]]; then | ||
if [[ ${CACHE_COMMAND} == "sccache" && -x "$(command -v sccache)" ]]; then | ||
COMPILE_REQUESTS=$(sccache -s | grep "Compile requests \+ [0-9]\+$" | awk '{ print $NF }') | ||
CACHE_HITS=$(sccache -s | grep "Cache hits \+ [0-9]\+$" | awk '{ print $NF }') | ||
HIT_RATE=$(echo - | awk "{printf \"%.2f\n\", $CACHE_HITS / $COMPILE_REQUESTS * 100}") | ||
MSG="${MSG}<br/>cache hit rate ${HIT_RATE} %" | ||
elif [[ ${CACHE_COMMAND} == "ccache" && -x "$(command -v ccache)" ]]; then | ||
CACHE_STATS_LINE=$(ccache -s | grep "Hits: \+ [0-9]\+ / [0-9]\+" | tail -n1) | ||
if [[ ! -z "$CACHE_STATS_LINE" ]]; then | ||
CACHE_HITS=$(echo "$CACHE_STATS_LINE" - | awk '{ print $2 }') | ||
COMPILE_REQUESTS=$(echo "$CACHE_STATS_LINE" - | awk '{ print $4 }') | ||
HIT_RATE=$(echo - | awk "{printf \"%.2f\n\", $CACHE_HITS / $COMPILE_REQUESTS * 100}") | ||
MSG="${MSG}<br/>cache hit rate ${HIT_RATE} %" | ||
fi | ||
fi | ||
fi | ||
MSG="${MSG}<br/>parallel setting: $PARALLEL_LEVEL" | ||
MSG="${MSG}<br/>parallel build time: $compile_total seconds" | ||
if [[ -f "${LIBRAFT_BUILD_DIR}/libraft.so" ]]; then | ||
LIBRAFT_FS=$(ls -lh ${LIBRAFT_BUILD_DIR}/libraft.so | awk '{print $5}') | ||
MSG="${MSG}<br/>libraft.so size: $LIBRAFT_FS" | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to include @vyasr and @davidwendt in a discussion here about how we can eliminate some of this boilerplate code.
Presumably some of this functionality will be needed by other repositories, so we should strive to make rapids-build-metrics-reporter.py
handle as much of this as possible so that we have a single source of truth.
I have some thoughts and proposals here:
- I think
rapids-build-metrics-reporter.py
should handlesccache -s
/ccache -s
parsing. The lines of code in this PR are already duplicated from cudf's build.sh, so this is a prime candidate for deduping. Perhaps we can expose a--cache-tool
/-c
flag that will let consumers indicate whetherccache
orsccache
should be used. - Whenever possible, I'm in favor of avoiding the manual generation of HTML in bash variables (e.g.
<br/>
). In place of the--msg
flag (which appears to take in an HTML string), I think we should create the following flags:--stat
/-s
- this flag should be able to be specified multiple times and its value will be a key/value pair that can be added to a final summary table (e.g.--stat="Parallel Setting=$PARALLEL_LEVEL"
,--stat="Parallel Build Time=$compile_total seconds"
)--file-size
/-f
- this flag should be able to be specified multiple times and its value will be a path to a file whose name and size will be added to a final summary table (e.g.--file-size="${LIBRAFT_BUILD_DIR}/libraft.so"
)
The sum of these changes should allow us to dedupe quite a bit here and also eliminate the MSG_OUTFILE
lines in this PR.
Once we get a consensus on all of this, I will open an issue to track the rollout of this new script to other repositories as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have the .py strictly handle the .ninja_log
file. It currently supports other formats besides html including csv and xml (unittest format?). Both of these are handy at times and and there is no place for the stats in these outputs.
Rather than including the ccache stats in the html output, maybe we generate a separate file with sccache states into the artifacts directory. The reason is what incorporated in the html output originally was to just minimize the burden on the Jenkins job at the time. Just piping the sccache -s > $RAPIDS_ARTIFACT_DIR/cache-stats.txt
seems a reasonable approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we entirely eliminate formating sccache/ccache statistics and just add it as a raw file for the user to parse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can expose a --cache-tool/-c flag
This already exists in RAFT!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are calling the output of sccache -s
raw output then yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think centralizing more of the logic is the right approach as long as it is done in a way that the different pieces of the document (cache stats, ninja log, etc) are composable and extensible by packages that want to add new files etc. I would advocate for starting by building out the different pieces and then figuring out how to combine them afterwards. As long as there is a consistent HTML output the combining shouldn't be too difficult.
For building up the HTML in Python a reasonable low-level approach is building up the HTML tree with xml.etree. A nicer approach might be to define a Jinja template that can be filled in, leaving some arbitrary extra sections that could be appended to so that the output format is extensible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
rapids-reporter.sh
which can build the html output like build.sh currently does and calls the .py to build the ninja html? We can look at different ways to combine the results into a single file.
Are you suggesting that we just move the existing logic to another shell script? That doesn't seem to solve the problem of reusability / keeping things DRY. All of the repositories will just have another rapids-reporter.sh
script that contains all of the duplicated lines of code still.
I think centralizing more of the logic is the right approach as long as it is done in a way that the different pieces of the document (cache stats, ninja log, etc) are composable and extensible by packages that want to add new files etc. I would advocate for starting by building out the different pieces and then figuring out how to combine them afterwards. As long as there is a consistent HTML output the combining shouldn't be too difficult.
+1.
For building up the HTML in Python a reasonable low-level approach is building up the HTML tree with xml.etree. A nicer approach might be to define a Jinja template that can be filled in, leaving some arbitrary extra sections that could be appended to so that the output format is extensible.
In order to keep this script curl
-able (to avoid the overhead of a pip
package), we should probably try to stick to Python standard libraries.
@davidwendt, can you point out where / how the other output formats that you mentioned will be used? As far as I can tell, build.sh
only outputs a single format type. Are you trying to preserve the multiple output formats for users who may run this tool manually outside of build.sh
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting that we just move the existing logic to another shell script? That doesn't seem to solve the problem of reusability / keeping things DRY. All of the repositories will just have another rapids-reporter.sh script that contains all of the duplicated lines of code still.
The existing shell script logic does not necessarily have move to into a new shell script. I just don't want to move this part of the existing shell script into the rapids-build-metrics-reporter.py
file. We could move the shell script logic into a new separate .py file if that helps. I'm suggesting there will be 2 files in this repository -- one to build/format the sccache/ccache results and one to build/format the ninja-log results. And maybe a 3rd file to combine them if necessary.
can you point out where / how the other output formats that you mentioned will be used? As far as I can tell, build.sh only outputs a single format type. Are you trying to preserve the multiple output formats for users who may run this tool manually outside of build.sh?
The other output formats are not being used by CI currently. I'm hoping the xml can be used in the future to fail a build if there is a problem with specific source file compile times. The CSV format has been helpful for me locally so I can load the results into Excel and sort them in different ways as well as compare them with other runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to merge what we have in RAFT at this moment, and when there is consensus on how to share these scripts and how much logic to pluck out we can go ahead and update it. Is everyone okay with that? There are some major changes happening in RAFT right now where it would be very valuable to have these metrics for hindsight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to merge what we have in RAFT at this moment, and when there is consensus on how to share these scripts and how much logic to pluck out we can go ahead and update it. Is everyone okay with that? There are some major changes happening in RAFT right now where it would be very valuable to have these metrics for hindsight.
That's fine.
Feel free to fix the merge conflicts and make any other changes necessary to this PR and I will approve.
I will open an issue in https://github.com/rapidsai/build-metrics-reporter to continue this discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like there are some errors that might need to be sorted out, but pre-approving in anticipation of those fixes.
/merge |
This PR will also automatically generate an HTML report in
conda-cpp-build
CI runs under the taskUpload Additional Artifacts