-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Legate Support #183
Legate Support #183
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preemptively approving to allow folks to move quickly here
legate/legate_kvikio/core.py
Outdated
the Legate data interface. | ||
""" | ||
output = _get_legate_store(obj) | ||
task = user_context.create_auto_task(TaskOpCode.READ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this task would be more appropriate as an unbounded-output task, similar to how unique
works in cuNumeric https://github.com/nv-legate/cunumeric/blob/branch-23.05/cunumeric/deferred.py#L3426.
This would also change the interface, such that the output store is created inside the function call, and returned as the result (rather than having the user pre-allocate a store of appropriate size).
Alternatively, if possible you could inspect the file's metadata, and preallocate an appropriate store inside this function, before doing the actual read. You wouldn't need to use unbounded stores in that case.
I believe for safety you also want to make it clear to legate that this must be launched as a singleton task:
task = user_context.create_auto_task(TaskOpCode.READ)
task.add_scalar_arg(path, types.string)
task.add_output(output)
# the whole output Store must be accessible by all point tasks
# Stores accessed with write permissions cannot be shared
# therefore the launch can only contain one point task.
task.add_broadcast(output)
task.execute()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this task would be more appropriate as an unbounded-output task, similar to how
unique
works in cuNumeric https://github.com/nv-legate/cunumeric/blob/branch-23.05/cunumeric/deferred.py#L3426.This would also change the interface, such that the output store is created inside the function call, and returned as the result (rather than having the user pre-allocate a store of appropriate size).
In this case, I would still have to know the total size of the file beforehand in order for each task to determine their file offset, right?
Alternatively, if possible you could inspect the file's metadata, and preallocate an appropriate store inside this function, before doing the actual read. You wouldn't need to use unbounded stores in that case.
Agree, the user API could support a default that infer the buffer type. However, the plan is to incorporate this into the existing KvikIO API so that something like the following will just work, no matter if buf
is a NumPy, CuPy, or cuNumeric array:
f = kvikio.CuFile("test-file", "r")
f.read(buf)
I believe for safety you also want to make it clear to legate that this must be launched as a singleton task:
I don't think this should be a singleton task, I want Legate to run multiple tasks in parallel. I am hoping that by calculating the offset based on the input store, each task will read its non-overlapping part of the file?
auto shape = store.shape<1>();
auto acc = store.read_accessor<char, 1>();
size_t strides[1];
const char* data = acc.ptr(shape, strides);
size_t itemsize = sizeof_legate_type_code(store.code());
assert(strides[0] == itemsize); // Must be contiguous
size_t nbytes = shape.volume() * itemsize;
size_t offset = shape.lo.x * itemsize; // Offset in bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I would still have to know the total size of the file beforehand in order for each task to determine their file offset, right?
Not necessarily. Each point task could query the size of the file independently and start reading at size * point_task_id / num_point_tasks
. Each task would need to return to the caller how many elements it actually read, so the returned buffers can be virtually "stitched together". See the code in unique
for how that's done (the term "weights" refers to how many elements each point task read).
Agree, the user API could support a default that infer the buffer type. However, the plan is to incorporate this into the existing KvikIO API
Then I assume there's calls so that the user code can query the size of the file, so it can pre-allocate an array of the right size?
I want Legate to run multiple tasks in parallel. I am hoping that by calculating the offset based on the input store, each task will read its non-overlapping part of the file?
Yes, that should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification @manopapad
legate/cpp/legate_kvikio.cpp
Outdated
auto shape = store.shape<1>(); | ||
auto acc = store.read_accessor<char, 1>(); | ||
size_t strides[1]; | ||
const char* data = acc.ptr(shape, strides); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@manopapad, can I assume that data
is contiguous?
If not, can I create two task variants - one to handle contiguous stores and one for non-contiguous stores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can I assume that data is contiguous?
You will certainly want to check this, since the API allows the user to pass in an arbitrary array (if the API were to create its own output array, then you could guarantee it internally). You could run into trouble if the user did something like:
x = np.zeros((4,5))
read_into(x[:,3])
where the user passes in a column, which is technically 1d, but that slice is not contiguous. Even in this case, you can force Legion to give you a "compacted" view of the slice, by declaring that the task needs an "exact" instance, similar to how it's done here https://github.com/nv-legate/cunumeric/blob/branch-23.05/src/cunumeric/mapper.cc#L111.
If not, can I create two task variants - one to handle contiguous stores and one for non-contiguous stores?
Currently Legate's variant registration system is specialized for making 3 specific variants (CPU, OpenMP, GPU), so we'd need some work to extend it beyond that. The simpler alternative, if you want to support non-contiguous stores, would be to take cases inside the task body as written.
legate/legate_kvikio/_version.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the rest of rapids is in the process of/has moved away from using versioneer to provide versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do that here as well but let's do it once for the whole repos.
Co-authored-by: Lawrence Mitchell <[email protected]>
/merge |
Support of Legate stores and cuNumeric arrays.
Install
Make sure cuNumeric is installed and then build like:
Usage
The API is very similar to regular KvikIO:
Run using Legate launcher (using 10GB device and host memory) like: