Legate Support #183

madsbk · 2023-03-14T13:06:33Z

Support of Legate stores and cuNumeric arrays.

Install

Make sure cuNumeric is installed and then build like:

cd legate
pip install .

Usage

The API is very similar to regular KvikIO:

import cunumeric as num
from legate_kvikio import CuFile
from legate.core import get_legate_runtime

a = num.arange(100)
with CuFile("/tmp/my-file", "w") as f:
    f.write(a)

    # In order to make sure the file has been written before the following
    # read execute, we insert a fence between the write and read.
    # Notice, this call isn't blocking.
    get_legate_runtime().issue_execution_fence(block=False)
    
    b = num.empty_like(a)
    f.read(b)

# In order to make sure the file has been written before re-opening
# it for reading, we block the execution.
get_legate_runtime().issue_execution_fence(block=True)

c = num.empty_like(a)
with kvikio.CuFile("/tmp/my-file", "r") as f:
    f.read(c)
print("sum: ", c.sum())

Run using Legate launcher (using 10GB device and host memory) like:

legate --sysmem 10000 --fbmem 10000 --cpus 2 --gpus 2 my_io_script.py

quasiben

preemptively approving to allow folks to move quickly here

manopapad · 2023-03-14T22:30:30Z

legate/legate_kvikio/core.py

+                     the Legate data interface.
+    """
+    output = _get_legate_store(obj)
+    task = user_context.create_auto_task(TaskOpCode.READ)


I think this task would be more appropriate as an unbounded-output task, similar to how unique works in cuNumeric https://github.com/nv-legate/cunumeric/blob/branch-23.05/cunumeric/deferred.py#L3426.

This would also change the interface, such that the output store is created inside the function call, and returned as the result (rather than having the user pre-allocate a store of appropriate size).

Alternatively, if possible you could inspect the file's metadata, and preallocate an appropriate store inside this function, before doing the actual read. You wouldn't need to use unbounded stores in that case.

I believe for safety you also want to make it clear to legate that this must be launched as a singleton task:

task = user_context.create_auto_task(TaskOpCode.READ) task.add_scalar_arg(path, types.string) task.add_output(output) # the whole output Store must be accessible by all point tasks # Stores accessed with write permissions cannot be shared # therefore the launch can only contain one point task. task.add_broadcast(output) task.execute()

I think this task would be more appropriate as an unbounded-output task, similar to how unique works in cuNumeric https://github.com/nv-legate/cunumeric/blob/branch-23.05/cunumeric/deferred.py#L3426.

This would also change the interface, such that the output store is created inside the function call, and returned as the result (rather than having the user pre-allocate a store of appropriate size).

In this case, I would still have to know the total size of the file beforehand in order for each task to determine their file offset, right?

Alternatively, if possible you could inspect the file's metadata, and preallocate an appropriate store inside this function, before doing the actual read. You wouldn't need to use unbounded stores in that case.

Agree, the user API could support a default that infer the buffer type. However, the plan is to incorporate this into the existing KvikIO API so that something like the following will just work, no matter if buf is a NumPy, CuPy, or cuNumeric array:

f = kvikio.CuFile("test-file", "r") f.read(buf)

I believe for safety you also want to make it clear to legate that this must be launched as a singleton task:

I don't think this should be a singleton task, I want Legate to run multiple tasks in parallel. I am hoping that by calculating the offset based on the input store, each task will read its non-overlapping part of the file?

auto shape = store.shape<1>(); auto acc = store.read_accessor<char, 1>(); size_t strides[1]; const char* data = acc.ptr(shape, strides); size_t itemsize = sizeof_legate_type_code(store.code()); assert(strides[0] == itemsize); // Must be contiguous size_t nbytes = shape.volume() * itemsize; size_t offset = shape.lo.x * itemsize; // Offset in bytes

In this case, I would still have to know the total size of the file beforehand in order for each task to determine their file offset, right?

Not necessarily. Each point task could query the size of the file independently and start reading at size * point_task_id / num_point_tasks. Each task would need to return to the caller how many elements it actually read, so the returned buffers can be virtually "stitched together". See the code in unique for how that's done (the term "weights" refers to how many elements each point task read).

Agree, the user API could support a default that infer the buffer type. However, the plan is to incorporate this into the existing KvikIO API

Then I assume there's calls so that the user code can query the size of the file, so it can pre-allocate an array of the right size?

I want Legate to run multiple tasks in parallel. I am hoping that by calculating the offset based on the input store, each task will read its non-overlapping part of the file?

Yes, that should work.

Thanks for the clarification @manopapad

madsbk · 2023-03-15T08:10:57Z

legate/cpp/legate_kvikio.cpp

+    auto shape           = store.shape<1>();
+    auto acc             = store.read_accessor<char, 1>();
+    size_t strides[1];
+    const char* data = acc.ptr(shape, strides);


@manopapad, can I assume that data is contiguous?
If not, can I create two task variants - one to handle contiguous stores and one for non-contiguous stores?

can I assume that data is contiguous?

You will certainly want to check this, since the API allows the user to pass in an arbitrary array (if the API were to create its own output array, then you could guarantee it internally). You could run into trouble if the user did something like:

x = np.zeros((4,5)) read_into(x[:,3])

where the user passes in a column, which is technically 1d, but that slice is not contiguous. Even in this case, you can force Legion to give you a "compacted" view of the slice, by declaring that the task needs an "exact" instance, similar to how it's done here https://github.com/nv-legate/cunumeric/blob/branch-23.05/src/cunumeric/mapper.cc#L111.

If not, can I create two task variants - one to handle contiguous stores and one for non-contiguous stores?

Currently Legate's variant registration system is specialized for making 3 specific variants (CPU, OpenMP, GPU), so we'd need some work to extend it beyond that. The simpler alternative, if you want to support non-contiguous stores, would be to take cases inside the task body as written.

legate/cpp/legate_kvikio.cpp

wence- · 2023-03-15T14:33:08Z

legate/legate_kvikio/_version.py

FWIW, the rest of rapids is in the process of/has moved away from using versioneer to provide versions.

We should do that here as well but let's do it once for the whole repos.

Co-authored-by: Lawrence Mitchell <[email protected]>

…o legate

madsbk · 2023-03-21T07:26:56Z

/merge

madsbk added 2 commits March 14, 2023 10:14

move clang format for root

9a91e8b

implement legate support

5b44306

madsbk added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Mar 14, 2023

quasiben approved these changes Mar 14, 2023

View reviewed changes

manopapad reviewed Mar 14, 2023

View reviewed changes

madsbk commented Mar 15, 2023

View reviewed changes

madsbk added 3 commits March 15, 2023 13:05

Implement CuFile handle

ecfa4f4

doc

0f9cb5a

benchmark

37c0886

wence- reviewed Mar 15, 2023

View reviewed changes

madsbk and others added 13 commits March 15, 2023 16:54

Clean up

a485f3a

Co-authored-by: Lawrence Mitchell <[email protected]>

setup.py: fixed package name

a6c1afc

No more generation of legate mapping and library source files

640c752

clean up

ed57b3d

cleanup

f7316cf

impl. read_write_store()

78f5b59

docs

014a26d

clean up boilerplate

e8811f0

renamed legate_kvikio.hpp => task_opcodes.hpp

4b409ea

adding basic tests

3269c9b

handle empty tasks and use mode "r+"

acba155

CuFile: trigger exceptions at __init__

2d66ece

cleanup

d6f61ca

madsbk changed the title ~~[WIP] Legate Support~~ Legate Support Mar 17, 2023

madsbk marked this pull request as ready for review March 17, 2023 13:41

madsbk added 2 commits March 20, 2023 09:09

clean up

1e74ee5

Merge branch 'branch-23.04' of https://github.com/rapidsai/kvikio int…

6705676

…o legate

rapids-bot bot merged commit 325fd47 into rapidsai:branch-23.04 Mar 21, 2023

madsbk deleted the legate branch March 21, 2023 07:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Legate Support #183

Legate Support #183

madsbk commented Mar 14, 2023 •

edited

Loading

quasiben left a comment

manopapad Mar 14, 2023

madsbk Mar 15, 2023

manopapad Mar 16, 2023

madsbk Mar 16, 2023

madsbk Mar 15, 2023

manopapad Mar 16, 2023

wence- Mar 15, 2023

madsbk Mar 15, 2023

madsbk commented Mar 21, 2023

Legate Support #183

Legate Support #183

Conversation

madsbk commented Mar 14, 2023 • edited Loading

Install

Usage

quasiben left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

madsbk commented Mar 21, 2023

madsbk commented Mar 14, 2023 •

edited

Loading