Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarr notebook #261

Merged
merged 12 commits into from
Aug 14, 2023
Merged

Zarr notebook #261

merged 12 commits into from
Aug 14, 2023

Conversation

madsbk
Copy link
Member

@madsbk madsbk commented Aug 4, 2023

@madsbk madsbk added doc Improvements or additions to documentation non-breaking Introduces a non-breaking change labels Aug 4, 2023
@madsbk madsbk marked this pull request as ready for review August 4, 2023 13:40
@jakirkham
Copy link
Member

Thanks Mads! 🙏

Isaac do you want to give this a try? Would appreciate hearing any feedback you have 🙂

Copy link

@ivirshup ivirshup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave this a look, and I think it's great. Thanks for putting this together @madsbk!

One content addition that would be really good is a "CPU compression / GPU decompression" example like in nvcomp_batch_codec.ipynb. It would be good for there to be a clear example of how to work with interoperable data, ideally in as "blessed" of a way as possible while zarr-developers/numcodecs#316 is resolved.

Maybe an intermediate kerchunk store could work? I haven't actually been 100% sure if kerchunk can be used with direct GPU loading (things like #235 make me think I can, but maybe it's legate specific?), so an example of that could also be great.

I would maybe also consider making this a little more visible. Linking in the readme is good, but I would put documentation about usage above the instructions for a dev install.

@madsbk
Copy link
Member Author

madsbk commented Aug 8, 2023

Added an example of reading a GPU-written Zarr file into a NumPy array. However, AFAICT it isn't possible to specify a compressor when reading a Zarr array thus it will use kvikio.zarr.Snappy always.

@ivirshup
Copy link

ivirshup commented Aug 8, 2023

Thanks!

However, AFAICT it isn't possible to specify a compressor when reading a Zarr array thus it will use kvikio.zarr.Snappy always.

In nvcomp_batch_codec.ipynb, it looks like the zarr.LZ4 compressor is swapped out with the kvikio.zarr.LZ4 compressor by overwriting the .zarray file in the store. I assume the reverse would work, but it would be nice if we could avoid modifying the original store inplace.

I think it's important to show if data can be read both direct to GPU by this library, and in an environment without the NVIDIA libraries or a GPU.

@madsbk
Copy link
Member Author

madsbk commented Aug 9, 2023

Good point, I have updated the notebook to show CPU and GPU mixing. Also, it now use the new NvCompBatchCodec compressor.

@madsbk
Copy link
Member Author

madsbk commented Aug 9, 2023

@Alexey-Kamenev, could I get you to review the notebook. It makes use of your new NvCompBatchCodec.

Copy link
Contributor

@Alexey-Kamenev Alexey-Kamenev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great - thank you!

notebooks/zarr.ipynb Show resolved Hide resolved
notebooks/zarr.ipynb Outdated Show resolved Hide resolved
notebooks/zarr.ipynb Outdated Show resolved Hide resolved
@madsbk
Copy link
Member Author

madsbk commented Aug 10, 2023

Thanks @Alexey-Kamenev

@madsbk
Copy link
Member Author

madsbk commented Aug 14, 2023

/merge

@rapids-bot rapids-bot bot merged commit 622a525 into rapidsai:branch-23.10 Aug 14, 2023
@madsbk madsbk deleted the zarr_notebook branch August 14, 2023 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Improvements or additions to documentation non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include Zarr example/notebook
5 participants