-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify batch and stream API check #271
Unify batch and stream API check #271
Conversation
109835e
to
f4f71ab
Compare
f4f71ab
to
4dc2cee
Compare
0d5f440
to
e4a4622
Compare
e4a4622
to
5358b35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the error handling for negative bytes written in the async write is incorrect. And, if I read the docs correctly, the pointer we're passing is also bad :(.
cpp/examples/basic_io.cpp
Outdated
check(cudaStreamSynchronize(stream) == cudaSuccess); | ||
// Note, `bytes_done` might be negative, which indicate an IO error thus we | ||
// use `CUFILE_CHECK_STREAM_IO` to check for errors. | ||
CUFILE_CHECK_STREAM_IO(bytes_done); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is a garbage error reporting api.
cpp/examples/basic_io.cpp
Outdated
cout << "Performing async I/O using file handle" << endl; | ||
off_t f_off{0}; | ||
off_t d_off{0}; | ||
ssize_t bytes_done{0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is incorrect, per the documentation for cuFileWriteAsync
:
The bytes_written pointer should be allocated with
cuMemHostAlloc
or registered withcuMemHostRegister
, and the pointer to access that memory from the device can be obtained by usingcuMemHostGetDevicePointer
.
Which we are not doing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed
cpp/include/kvikio/error.hpp
Outdated
if (_nbytes < 0) { \ | ||
throw(_exception_type){std::string{"cuFile error at: "} + __FILE__ + ":" + \ | ||
KVIKIO_STRINGIFY(__LINE__) + ": " + \ | ||
cufileop_status_error((CUfileOpError)(CUFILEOP_BASE_ERR - _nbytes))}; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the documentation says that this returns:
- The number of bytes successfully written. or;
- -1 on IO errors. or;
- All other errors will return a negative integer value of the
CUfileOpError
enum value.
But this code suggests that it returns -(some_cufile_op_error - CUFILEOP_BASE_ERR)
. Which can't be right given the above because it value-puns CU_FILE_DRIVER_NOT_INITIALIZED
with an IO error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the docs are correct. I now just print the raw value of _nbytes
. Let's fix this in a follow-up when we know the intended behavior :)
…o unify_batch_and_stream_api_check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some tiny fixes, LGTM!
* until execution time. | ||
* @param bytes_read Pointer to the bytes read from file. This pointer should be a non-NULL value | ||
* and *bytes_read set to 0. The bytes_read memory should be allocated with cuMemHostAlloc/malloc/ | ||
* mmap or registered with cuMemHostRegister. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you have just copied the cufile docs here, but this sentence makes no sense to me. AIUI, just plain malloc
and mmap
are completely unknown to the driver. So does this mean use cuMemHostAlloc
or cuMemAlloc
or cuMemMap
? Or am I fully even more confused than I thought?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it means that it has to be allocated on the heap (not stack)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct.
Co-authored-by: Lawrence Mitchell <[email protected]>
Thanks @robertmaynard and @wence- |
/merge |
We now use the
cuFileReadAsync
symbol to check for availability of both the stream and batch API of cuFile.We used to look for the mangled symbol
_ZTS13CUfileOpError
to determine the availability of the batch API. However, since mangling is compiler specific, we prefer usingcuFileReadAsync
even though it means that the batch API isn't available until CUDA version 12.2.Additionally, this PR also implements
CUFILE_CHECK_STREAM_IO()
, which is used to check async IO errors.cc. @tell-rebanta