Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Legate Support #183

Merged
merged 20 commits into from
Mar 21, 2023
Merged

Legate Support #183

merged 20 commits into from
Mar 21, 2023

Conversation

madsbk
Copy link
Member

@madsbk madsbk commented Mar 14, 2023

Support of Legate stores and cuNumeric arrays.

Install

Make sure cuNumeric is installed and then build like:

cd legate
pip install .

Usage

The API is very similar to regular KvikIO:

import cunumeric as num
from legate_kvikio import CuFile
from legate.core import get_legate_runtime

a = num.arange(100)
with CuFile("/tmp/my-file", "w") as f:
    f.write(a)

    # In order to make sure the file has been written before the following
    # read execute, we insert a fence between the write and read.
    # Notice, this call isn't blocking.
    get_legate_runtime().issue_execution_fence(block=False)
    
    b = num.empty_like(a)
    f.read(b)

# In order to make sure the file has been written before re-opening
# it for reading, we block the execution.
get_legate_runtime().issue_execution_fence(block=True)

c = num.empty_like(a)
with kvikio.CuFile("/tmp/my-file", "r") as f:
    f.read(c)
print("sum: ", c.sum())

Run using Legate launcher (using 10GB device and host memory) like:

legate --sysmem 10000 --fbmem 10000 --cpus 2 --gpus 2 my_io_script.py

@madsbk madsbk added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Mar 14, 2023
Copy link
Member

@quasiben quasiben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preemptively approving to allow folks to move quickly here

the Legate data interface.
"""
output = _get_legate_store(obj)
task = user_context.create_auto_task(TaskOpCode.READ)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this task would be more appropriate as an unbounded-output task, similar to how unique works in cuNumeric https://github.com/nv-legate/cunumeric/blob/branch-23.05/cunumeric/deferred.py#L3426.

This would also change the interface, such that the output store is created inside the function call, and returned as the result (rather than having the user pre-allocate a store of appropriate size).

Alternatively, if possible you could inspect the file's metadata, and preallocate an appropriate store inside this function, before doing the actual read. You wouldn't need to use unbounded stores in that case.

I believe for safety you also want to make it clear to legate that this must be launched as a singleton task:

    task = user_context.create_auto_task(TaskOpCode.READ)
    task.add_scalar_arg(path, types.string)
    task.add_output(output)
    # the whole output Store must be accessible by all point tasks
    # Stores accessed with write permissions cannot be shared
    # therefore the launch can only contain one point task.
    task.add_broadcast(output)  
    task.execute()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this task would be more appropriate as an unbounded-output task, similar to how unique works in cuNumeric https://github.com/nv-legate/cunumeric/blob/branch-23.05/cunumeric/deferred.py#L3426.

This would also change the interface, such that the output store is created inside the function call, and returned as the result (rather than having the user pre-allocate a store of appropriate size).

In this case, I would still have to know the total size of the file beforehand in order for each task to determine their file offset, right?

Alternatively, if possible you could inspect the file's metadata, and preallocate an appropriate store inside this function, before doing the actual read. You wouldn't need to use unbounded stores in that case.

Agree, the user API could support a default that infer the buffer type. However, the plan is to incorporate this into the existing KvikIO API so that something like the following will just work, no matter if buf is a NumPy, CuPy, or cuNumeric array:

f = kvikio.CuFile("test-file", "r")
f.read(buf)

I believe for safety you also want to make it clear to legate that this must be launched as a singleton task:

I don't think this should be a singleton task, I want Legate to run multiple tasks in parallel. I am hoping that by calculating the offset based on the input store, each task will read its non-overlapping part of the file?

    auto shape = store.shape<1>();
    auto acc = store.read_accessor<char, 1>();
    size_t strides[1];
    const char* data = acc.ptr(shape, strides);
    size_t itemsize  = sizeof_legate_type_code(store.code());
    assert(strides[0] == itemsize);  // Must be contiguous
    size_t nbytes = shape.volume() * itemsize;  
    size_t offset = shape.lo.x * itemsize;  // Offset in bytes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I would still have to know the total size of the file beforehand in order for each task to determine their file offset, right?

Not necessarily. Each point task could query the size of the file independently and start reading at size * point_task_id / num_point_tasks. Each task would need to return to the caller how many elements it actually read, so the returned buffers can be virtually "stitched together". See the code in unique for how that's done (the term "weights" refers to how many elements each point task read).

Agree, the user API could support a default that infer the buffer type. However, the plan is to incorporate this into the existing KvikIO API

Then I assume there's calls so that the user code can query the size of the file, so it can pre-allocate an array of the right size?

I want Legate to run multiple tasks in parallel. I am hoping that by calculating the offset based on the input store, each task will read its non-overlapping part of the file?

Yes, that should work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification @manopapad

auto shape = store.shape<1>();
auto acc = store.read_accessor<char, 1>();
size_t strides[1];
const char* data = acc.ptr(shape, strides);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manopapad, can I assume that data is contiguous?
If not, can I create two task variants - one to handle contiguous stores and one for non-contiguous stores?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can I assume that data is contiguous?

You will certainly want to check this, since the API allows the user to pass in an arbitrary array (if the API were to create its own output array, then you could guarantee it internally). You could run into trouble if the user did something like:

x = np.zeros((4,5))
read_into(x[:,3])

where the user passes in a column, which is technically 1d, but that slice is not contiguous. Even in this case, you can force Legion to give you a "compacted" view of the slice, by declaring that the task needs an "exact" instance, similar to how it's done here https://github.com/nv-legate/cunumeric/blob/branch-23.05/src/cunumeric/mapper.cc#L111.

If not, can I create two task variants - one to handle contiguous stores and one for non-contiguous stores?

Currently Legate's variant registration system is specialized for making 3 specific variants (CPU, OpenMP, GPU), so we'd need some work to extend it beyond that. The simpler alternative, if you want to support non-contiguous stores, would be to take cases inside the task body as written.

legate/cpp/legate_kvikio.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the rest of rapids is in the process of/has moved away from using versioneer to provide versions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do that here as well but let's do it once for the whole repos.

@madsbk madsbk changed the title [WIP] Legate Support Legate Support Mar 17, 2023
@madsbk madsbk marked this pull request as ready for review March 17, 2023 13:41
@madsbk
Copy link
Member Author

madsbk commented Mar 21, 2023

/merge

@rapids-bot rapids-bot bot merged commit 325fd47 into rapidsai:branch-23.04 Mar 21, 2023
@madsbk madsbk deleted the legate branch March 21, 2023 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improves an existing functionality non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants