Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use registered strings for NVTX. Add more NVTX annotations. #518

Merged
merged 7 commits into from
Oct 31, 2024

Conversation

kingcrimsontianyu
Copy link
Contributor

@kingcrimsontianyu kingcrimsontianyu commented Oct 28, 2024

This PR makes some improvements on NVTX annotations in KvikIO, including:

  • Improve NVTX range annotation by registering the range name string (which reduces profiling-time overhead) and making the registered string static (which further reduces the overhead).
  • Separate the responsibility of "scoped range" from "function range" to improve clarity. Previously variadic macro KVIKIO_NVTX_FUNC_RANGE(...) is used for both the function range and arbitrary scoped range. However, the macro is somewhat a misnomer for the latter case, where __func__ is not used in the message. This PR introduces KVIKIO_NVTX_SCOPED_RANGE(msg, val) for the latter case.
  • Add NVTX marker macro KVIKIO_NVTX_MARKER(message, payload) to annotate an instantaneous event. One use case is to characterize asynchronous I/O operations (where the scoped range in a function does not apply): Once the profile is collected, the nsys-rep file can be exported, for instance, as a sqlite, and the marker data can be used to make a I/O-size histogram. As an example, for PARQUET_READER_NVBENCH---parquet_read_io_compression, the following I/O statistics can be obtained.
    This histogram is generated using the Python script python/kvikio/examples/kvikio_stat.py. PR link: TBD...
FileHandle::pread()
    Bins                 ...... Count
    [     0 B,      2 B) ...... 0
    [     2 B,      4 B) ...... 0
    [     4 B,      8 B) ...... 0
    [     8 B,     16 B) ...... 0
    [    16 B,     32 B) ...... 0
    [    32 B,     64 B) ...... 0
    [    64 B,    128 B) ...... 0
    [   128 B,    256 B) ...... 0
    [   256 B,    512 B) ...... 0
    [   512 B,   1024 B) ...... 0
    [  1024 B,     2 KB) ...... 0
    [    2 KB,     4 KB) ...... 0
    [    4 KB,     8 KB) ...... 0
    [    8 KB,    16 KB) ...... 0
    [   16 KB,    32 KB) ...... 0
    [   32 KB,    64 KB) ...... 0
    [   64 KB,   128 KB) ...... 0
    [  128 KB,   256 KB) ...... 45
    [  256 KB,   512 KB) ...... 30
    [  512 KB,  1024 KB) ...... 15
    [ 1024 KB,     2 MB) ...... 120
    [    2 MB,     4 MB) ...... 150
    [    4 MB,     8 MB) ...... 60
    [    8 MB,    16 MB) ...... 0
    [   16 MB,    32 MB) ...... 0
    [   32 MB,    64 MB) ...... 0
    [   64 MB,   128 MB) ...... 0
    [  128 MB,   256 MB] ...... 30



FileHandle::pwrite()
    Bins                 ...... Count
    [     0 B,      2 B) ...... 0
    [     2 B,      4 B) ...... 0
    [     4 B,      8 B) ...... 0
    [     8 B,     16 B) ...... 0
    [    16 B,     32 B) ...... 0
    [    32 B,     64 B) ...... 0
    [    64 B,    128 B) ...... 0
    [   128 B,    256 B) ...... 0
    [   256 B,    512 B) ...... 0
    [   512 B,   1024 B) ...... 0
    [  1024 B,     2 KB) ...... 0
    [    2 KB,     4 KB) ...... 0
    [    4 KB,     8 KB) ...... 0
    [    8 KB,    16 KB) ...... 0
    [   16 KB,    32 KB) ...... 0
    [   32 KB,    64 KB) ...... 0
    [   64 KB,   128 KB) ...... 0
    [  128 KB,   256 KB) ...... 8
    [  256 KB,   512 KB) ...... 8
    [  512 KB,  1024 KB) ...... 19
    [ 1024 KB,     2 MB) ...... 24
    [    2 MB,     4 MB) ...... 4
    [    4 MB,     8 MB) ...... 0
    [    8 MB,    16 MB) ...... 0
    [   16 MB,    32 MB) ...... 0
    [   32 MB,    64 MB) ...... 0
    [   64 MB,   128 MB) ...... 0
    [  128 MB,   256 MB] ...... 2



posix_device_read()
    Bins                 ...... Count
    [     0 B,      2 B) ...... 0
    [     2 B,      4 B) ...... 0
    [     4 B,      8 B) ...... 0
    [     8 B,     16 B) ...... 0
    [    16 B,     32 B) ...... 0
    [    32 B,     64 B) ...... 0
    [    64 B,    128 B) ...... 0
    [   128 B,    256 B) ...... 0
    [   256 B,    512 B) ...... 0
    [   512 B,   1024 B) ...... 0
    [  1024 B,     2 KB) ...... 0
    [    2 KB,     4 KB) ...... 0
    [    4 KB,     8 KB) ...... 0
    [    8 KB,    16 KB) ...... 0
    [   16 KB,    32 KB) ...... 0
    [   32 KB,    64 KB) ...... 0
    [   64 KB,   128 KB) ...... 0
    [  128 KB,   256 KB) ...... 75
    [  256 KB,   512 KB) ...... 60
    [  512 KB,  1024 KB) ...... 15
    [ 1024 KB,     2 MB) ...... 150
    [    2 MB,     4 MB] ...... 1710



posix_device_write()
    Bins                 ...... Count
    [     0 B,      2 B) ...... 0
    [     2 B,      4 B) ...... 0
    [     4 B,      8 B) ...... 0
    [     8 B,     16 B) ...... 0
    [    16 B,     32 B) ...... 0
    [    32 B,     64 B) ...... 0
    [    64 B,    128 B) ...... 0
    [   128 B,    256 B) ...... 0
    [   256 B,    512 B) ...... 0
    [   512 B,   1024 B) ...... 0
    [  1024 B,     2 KB) ...... 0
    [    2 KB,     4 KB) ...... 0
    [    4 KB,     8 KB) ...... 0
    [    8 KB,    16 KB) ...... 0
    [   16 KB,    32 KB) ...... 0
    [   32 KB,    64 KB) ...... 0
    [   64 KB,   128 KB) ...... 0
    [  128 KB,   256 KB) ...... 8
    [  256 KB,   512 KB) ...... 8
    [  512 KB,  1024 KB) ...... 19
    [ 1024 KB,     2 MB) ...... 24
    [    2 MB,     4 MB] ...... 104

@kingcrimsontianyu kingcrimsontianyu added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Oct 28, 2024
@kingcrimsontianyu kingcrimsontianyu self-assigned this Oct 28, 2024
@kingcrimsontianyu kingcrimsontianyu marked this pull request as ready for review October 28, 2024 19:42
@kingcrimsontianyu kingcrimsontianyu requested a review from a team as a code owner October 28, 2024 19:42
Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kingcrimsontianyu, looks good to me.

PS: the current CI error in test_read_access(s3_base) is not related to your changes. I will work on a fix tomorrow,

@madsbk
Copy link
Member

madsbk commented Oct 28, 2024

@kingcrimsontianyu could you add a description in the docs on how to generate the histograms? They are very cool!

@kingcrimsontianyu
Copy link
Contributor Author

@madsbk Sure! The histogram is generated by a Python script. I can make a separate PR and update the doc. Is python/kvikio/examples the right place for that script?

@madsbk
Copy link
Member

madsbk commented Oct 29, 2024

@madsbk Sure! The histogram is generated by a Python script. I can make a separate PR and update the doc. Is python/kvikio/examples the right place for that script?

Yes, that would be great!

cpp/include/kvikio/utils.hpp Outdated Show resolved Hide resolved
Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @kingcrimsontianyu

@madsbk
Copy link
Member

madsbk commented Oct 31, 2024

/merge

@rapids-bot rapids-bot bot merged commit fc56da6 into rapidsai:branch-24.12 Oct 31, 2024
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improves an existing functionality non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants