-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++ bindings to the Batch API #220
Conversation
0aea697
to
0026f80
Compare
may be it will be clear to say this is "num_ops_in_batch". |
Good point, fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few places where we can do error checking I think.
Also, I don't like the name-mangled lookup, is there not a better way?!
file(READ "${cuFile_INCLUDE_DIRS}/cufile.h" CUFILE_H_STR) | ||
string(FIND "${CUFILE_H_STR}" "cuFileBatchIOSetUp" cuFileBatchIOSetUp_location) | ||
if(cuFileBatchIOSetUp_location EQUAL "-1") | ||
set(cuFile_BATCH_API_FOUND FALSE) | ||
else() | ||
set(cuFile_BATCH_API_FOUND TRUE) | ||
endif() | ||
message(STATUS "Found cuFile's Batch API: ${cuFile_BATCH_API_FOUND}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The right thing to do is to attempt to compile and link a call to this function, surely this is something that cmake supports? To ensure that both the header file and the library actually offer the function.
This is fine, I guess, but there must be a canonical way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using check_symbol_exists()
with no luck. Got irritated and just rolled my own :)
This doesn't work for some reason:
include(CheckSymbolExists)
check_symbol_exists(
cuFileBatchIOSetUp
"${cuFile_INCLUDE_DIRS}/cufile.h"
cuFile_BATCH_API_FOUND
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check_symbol_exists
will try to compile and link a small executable that verifies that symbol exists. For this to work properly for libcufile you would also need to tell check_symbol_exists
all the extra include directories, and link requirements.
The examples in the docs are easy since they are testing for values found in headers provided by the compiler/runtime and therefore no extra information is required.
Using CMake's try_compile
would be easier in comparison but would still require setting up a code snippet and specifying the include directory for the libcufile and cuda rutime headers
cpp/include/kvikio/batch.hpp
Outdated
CUFILE_TRY( | ||
cuFileAPI::instance().BatchIOSubmit(_handle, io_batch_params.size(), &io_batch_params[0], 0)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will produce a generic error if io_batch_params.size()
is larger than the requested max batch size. Does it make more sense to check if operations.size() > _max_num_events
and provide a useful message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, done.
// both the batch and async API. | ||
try { | ||
void* s{}; | ||
get_symbol(s, lib, "_ZTS23cuFileBatchContextState"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check the CTK version somehow? This is unstable wrt compilers (since name mangling is not standardised).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, not AFAIK :/
I think this will suffice for now. When CUDA v12.2 is released, I think we should make the Batch API require v12.2 and then use cuFileReadAsync
to determine the availability of both the batch and async API.
typedef enum CUfileOpcode { CUFILE_READ = 0, CUFILE_WRITE } CUfileOpcode_t; | ||
|
||
typedef enum CUFILEStatus_enum { | ||
CUFILE_WAITING = 0x000001, /* required value prior to submission */ | ||
CUFILE_PENDING = 0x000002, /* once enqueued */ | ||
CUFILE_INVALID = 0x000004, /* request was ill-formed or could not be enqueued */ | ||
CUFILE_CANCELED = 0x000008, /* request successfully canceled */ | ||
CUFILE_COMPLETE = 0x0000010, /* request successfully completed */ | ||
CUFILE_TIMEOUT = 0x0000020, /* request timed out */ | ||
CUFILE_FAILED = 0x0000040 /* unable to complete */ | ||
} CUfileStatus_t; | ||
|
||
typedef struct CUfileIOEvents { | ||
void* cookie; | ||
CUfileStatus_t status; /* status of the operation */ | ||
size_t ret; /* -ve error or amount of I/O done. */ | ||
} CUfileIOEvents_t; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are just copied from cufile.h I guess. Do you endeavour to be ABI compatible with the cufile definitions or is this just "anything, as long as it has the appopriate fields/names"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is "anything, as long as it has the appopriate fields/names". Added a comment.
Thanks @wence-, I think I have addressed all of your issues? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @madsbk
Awesome, thanks for the review @wence-. Just in time for the code freeze :) |
/merge |
An example use of the Batch API (requires CUDA v12.1+):