forked from rapidsai/kvikio
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fixes rapidsai#143 cc. @ivirshup @jakirkham Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Alexey Kamenev (https://github.com/Alexey-Kamenev) - Benjamin Zaitlen (https://github.com/quasiben) URL: rapidsai#261
- Loading branch information
Showing
3 changed files
with
372 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,3 +17,4 @@ docs/build/ | |
cpp/doxygen/html/ | ||
.mypy_cache | ||
.hypothesis | ||
.ipynb_checkpoints |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,364 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 23, | ||
"id": "7a060f7d-9a0c-4763-98df-7dc82409c6ba", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"\"\"\"\n", | ||
"In this tutorial, we will show how to use KvikIO to read and write GPU memory directly to/from Zarr files.\n", | ||
"\"\"\"\n", | ||
"import json\n", | ||
"import shutil\n", | ||
"import numpy\n", | ||
"import cupy\n", | ||
"import zarr\n", | ||
"import kvikio\n", | ||
"import kvikio.zarr\n", | ||
"from kvikio.nvcomp_codec import NvCompBatchCodec\n", | ||
"from numcodecs import LZ4" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "99f4d25b-2006-4026-8629-1accafb338ef", | ||
"metadata": {}, | ||
"source": [ | ||
"We need to set three Zarr arguments: \n", | ||
" - `meta_array`: in order to make Zarr read into GPU memory (instead of CPU memory), we set the `meta_array` argument to an empty CuPy array. \n", | ||
" - `store`: we need to use a GPU compatible Zarr Store, which will be KvikIO’s GDS store in our case. \n", | ||
" - `compressor`: finally, we need to use a GPU compatible compressor (or `None`). KvikIO provides a nvCOMP compressor `kvikio.nvcomp_codec.NvCompBatchCodec` that we will use." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 24, | ||
"id": "c179c24a-766e-4e09-83c5-349868042576", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"(<zarr.core.Array (10,) int64>,\n", | ||
" NvCompBatchCodec(algorithm='lz4', options={}),\n", | ||
" <kvikio.zarr.GDSStore at 0x7fd42021ac20>)" | ||
] | ||
}, | ||
"execution_count": 24, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"# Let's create a new Zarr array using KvikIO's GDS store and LZ4 compression\n", | ||
"z = zarr.array(\n", | ||
" cupy.arange(10), \n", | ||
" chunks=2, \n", | ||
" store=kvikio.zarr.GDSStore(\"my-zarr-file.zarr\"), \n", | ||
" meta_array=cupy.empty(()),\n", | ||
" compressor=NvCompBatchCodec(\"lz4\"),\n", | ||
" overwrite=True,\n", | ||
")\n", | ||
"z, z.compressor, z.store" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 25, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"cupy.ndarray" | ||
] | ||
}, | ||
"execution_count": 25, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"# And because we set the `meta_array` argument, reading the Zarr array returns a CuPy array\n", | ||
"type(z[:])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "549ded39-1053-4f82-a8a7-5a2ee999a4a1", | ||
"metadata": {}, | ||
"source": [ | ||
"From this point onwards, `z` can be used just like any other Zarr array." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 26, | ||
"id": "8221742d-f15c-450a-9701-dc8c05326126", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"array([1, 2, 3, 4, 5, 6, 7, 8])" | ||
] | ||
}, | ||
"execution_count": 26, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"z[1:9]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 27, | ||
"id": "f0c451c1-a240-4b26-a5ef-6e70a5bbeb55", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"array([42, 43, 44, 45, 46, 47, 48, 49, 50, 51])" | ||
] | ||
}, | ||
"execution_count": 27, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"z[:] + 42" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "7797155f-40f4-4c50-b704-2356ca64cba3", | ||
"metadata": {}, | ||
"source": [ | ||
"### GPU compression / CPU decompression" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "a0029deb-19b9-4dbb-baf0-ce4b199605a5", | ||
"metadata": {}, | ||
"source": [ | ||
"In order to read GPU-written Zarr file into a NumPy array, we simply open that file **without** setting the `meta_array` argument:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 28, | ||
"id": "399f23f7-4475-496a-a537-a7163a35c888", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"(numpy.ndarray,\n", | ||
" kvikio.nvcomp_codec.NvCompBatchCodec,\n", | ||
" array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))" | ||
] | ||
}, | ||
"execution_count": 28, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"z = zarr.open_array(kvikio.zarr.GDSStore(\"my-zarr-file.zarr\"))\n", | ||
"type(z[:]), type(z.compressor), z[:]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "8e9f31d5", | ||
"metadata": {}, | ||
"source": [ | ||
"And we don't need to use `kvikio.zarr.GDSStore` either:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 29, | ||
"id": "4b1f46b2", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"(numpy.ndarray,\n", | ||
" kvikio.nvcomp_codec.NvCompBatchCodec,\n", | ||
" array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))" | ||
] | ||
}, | ||
"execution_count": 29, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"z = zarr.open_array(\"my-zarr-file.zarr\")\n", | ||
"type(z[:]), type(z.compressor), z[:]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "f10fd704-35f7-46b7-aabe-ea68fb2bf88d", | ||
"metadata": {}, | ||
"source": [ | ||
"However, the above use `NvCompBatchCodec(\"lz4\")` for decompression. In the following, we will show how to read Zarr file written and compressed using a GPU on the CPU.\n", | ||
"\n", | ||
"Some algorithms, such as LZ4, can be used interchangeably on CPU and GPU but Zarr will always use the compressor used to write the Zarr file. We are working with the Zarr team to fix this shortcoming but for now, we will use a workaround where we _patch_ the metadata manually." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 30, | ||
"id": "d980361a-e132-4f29-ab13-cbceec5bbbb5", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"(numpy.ndarray, numcodecs.lz4.LZ4, array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))" | ||
] | ||
}, | ||
"execution_count": 30, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"# Read the Zarr metadata and replace the compressor with a CPU implementation of LZ4\n", | ||
"store = zarr.DirectoryStore(\"my-zarr-file.zarr\") # We could also have used kvikio.zarr.GDSStore\n", | ||
"meta = json.loads(store[\".zarray\"])\n", | ||
"meta[\"compressor\"] = LZ4().get_config()\n", | ||
"store[\".zarray\"] = json.dumps(meta).encode() # NB: this changes the Zarr metadata on disk\n", | ||
"\n", | ||
"# And then open the file as usually\n", | ||
"z = zarr.open_array(store)\n", | ||
"type(z[:]), type(z.compressor), z[:]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "8ea73705", | ||
"metadata": {}, | ||
"source": [ | ||
"### CPU compression / GPU decompression\n", | ||
"\n", | ||
"Now, let's try the otherway around." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 31, | ||
"id": "c9b2d56a", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"(<zarr.core.Array (10,) int64>,\n", | ||
" LZ4(acceleration=1),\n", | ||
" <zarr.storage.DirectoryStore at 0x7fd351e7a9b0>)" | ||
] | ||
}, | ||
"execution_count": 31, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"import numcodecs\n", | ||
"# Let's create a new Zarr array using the default compression.\n", | ||
"z = zarr.array(\n", | ||
" numpy.arange(10), \n", | ||
" chunks=2, \n", | ||
" store=\"my-zarr-file.zarr\", \n", | ||
" overwrite=True,\n", | ||
" # The default (CPU) implementation of LZ4 codec.\n", | ||
" compressor=numcodecs.registry.get_codec({\"id\": \"lz4\"})\n", | ||
")\n", | ||
"z, z.compressor, z.store" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "dedd4623", | ||
"metadata": {}, | ||
"source": [ | ||
"Again, we will use a workaround where we _patch_ the metadata manually." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 32, | ||
"id": "ac3f30b1", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"(cupy.ndarray,\n", | ||
" kvikio.nvcomp_codec.NvCompBatchCodec,\n", | ||
" array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))" | ||
] | ||
}, | ||
"execution_count": 32, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"# Read the Zarr metadata and replace the compressor with a GPU implementation of LZ4\n", | ||
"store = kvikio.zarr.GDSStore(\"my-zarr-file.zarr\") # We could also have used zarr.DirectoryStore\n", | ||
"meta = json.loads(store[\".zarray\"])\n", | ||
"meta[\"compressor\"] = NvCompBatchCodec(\"lz4\").get_config()\n", | ||
"store[\".zarray\"] = json.dumps(meta).encode() # NB: this changes the Zarr metadata on disk\n", | ||
"\n", | ||
"# And then open the file as usually\n", | ||
"z = zarr.open_array(store, meta_array=cupy.empty(()))\n", | ||
"type(z[:]), type(z.compressor), z[:]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 33, | ||
"id": "80682922-b7b0-4b08-b595-228c2b446a78", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Clean up\n", | ||
"shutil.rmtree(\"my-zarr-file.zarr\", ignore_errors=True)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.11" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |