Enable roundtrip for nvCOMP batch codecs. #253

Alexey-Kamenev · 2023-07-19T18:39:50Z

This PR enables a roundtrip between codecs other than LZ4. For LZ4 codec, there is already such support.

The "roundtrip" here means ability to compress/store data using default numcodecs codec (e.g. zstd) and decompress using nvCOMP batch codecs. Reverse would also be true: ability to compress data using nvCOMP and decompress using CPU numcodecs codec.

The original implementation of the nvCOMP codec was based on the assumption, taken from numcodecs LZ4 codec, that in order for the codecs to be compatible, each codec must write a small header (4 bytes) in the compressed chunk, which contains the original, uncompressed size of the data. However, after subsequent analysis of other numcodecs codecs, it turned out LZ4 was actually an exception! No other codec writes a header or any additional data to the compressed chunk.

This PR:

moves the header creation down to a specific algorithm, LZ4
switches to nvCOMP nvcompBatched*GetDecompressSizeAsync family of functions to get the original size of data during decompression.

rapids-bot · 2023-07-19T18:39:54Z

Pull requests from external contributors require approval from a rapidsai organization member with write permissions or greater before CI can begin.

wence- · 2023-07-25T10:16:50Z

/ok to test

wence-

Thanks, overall I think this looks good. One thing I wonder is whether we should just do all of the pointer manipulation in host buffers and then do a single round of H2D memcopies before calling into nvcomp. Since I imagine the number of batches is not that large this is probably faster than lots of tiny (explicit and implicit) memcopies back and forth.

WDYT?

python/kvikio/_lib/libnvcomp_ll.pyx

python/kvikio/nvcomp_codec.py

wence-

Thanks! Two minor typographic fixes, plus (apologies I didn't spot this first time round), I wonder if we want to use memcpy2DAsync to avoid the batch loop and kick everything off in one go. I don't think that's a necessary change though, so leave it up to you.

python/kvikio/_lib/libnvcomp_ll.pyx

wence- · 2023-07-27T16:30:12Z

/ok to test

wence- · 2023-07-28T08:38:47Z

/ok to test

wence- · 2023-07-28T09:00:31Z

Thanks very much for this! We've just gone into code freeze so this will need to go to 23.10 rather than 23.08. Could you please retarget to branch-23.10

Alexey-Kamenev · 2023-07-28T16:57:15Z

Thank you for the review! I've updated the PR's target branch to branch-23.10.

wence- · 2023-07-28T18:49:42Z

/ok to test

madsbk

LGTM, thanks @Alexey-Kamenev

wence-

Thanks for the updates, looks good to me too.

wence- · 2023-07-31T17:15:40Z

/ok to test

Alexey-Kamenev · 2023-07-31T17:19:38Z

Thank you for the review, @wence- and @madsbk !
Let me know if there is anything else I need to do to get it merged.

madsbk · 2023-08-01T06:41:39Z

/merge

madsbk · 2023-08-01T06:41:54Z

Thanks @Alexey-Kamenev

Enable roundtrip for nvCOMP batch codecs.

49c0177

Alexey-Kamenev requested a review from a team as a code owner July 19, 2023 18:39

wence- requested changes Jul 25, 2023

View reviewed changes

python/kvikio/_lib/libnvcomp_ll.pyx Outdated Show resolved Hide resolved

python/kvikio/nvcomp_codec.py Show resolved Hide resolved

python/kvikio/nvcomp_codec.py Outdated Show resolved Hide resolved

wence- added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jul 26, 2023

Alexey-Kamenev added 2 commits July 26, 2023 13:32

Address review feedback.

d3f663a

Remove redundant variable.

92e74e5

wence- reviewed Jul 27, 2023

View reviewed changes

wence- and others added 2 commits July 27, 2023 10:31

Typographic changes in docstring

11c775c

Merge branch 'branch-23.08' into fea-nvcomp-roundtrip

88450f4

Address review feedback.

18663c5

Alexey-Kamenev changed the base branch from branch-23.08 to branch-23.10 July 28, 2023 16:53

Merge branch 'branch-23.10' into fea-nvcomp-roundtrip

694a63f

madsbk approved these changes Jul 31, 2023

View reviewed changes

wence- approved these changes Jul 31, 2023

View reviewed changes

Merge branch 'branch-23.10' into fea-nvcomp-roundtrip

ade06c4

rapids-bot bot merged commit 98fca77 into rapidsai:branch-23.10 Aug 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable roundtrip for nvCOMP batch codecs. #253

Enable roundtrip for nvCOMP batch codecs. #253

Alexey-Kamenev commented Jul 19, 2023

rapids-bot bot commented Jul 19, 2023

wence- commented Jul 25, 2023

wence- left a comment

wence- left a comment

wence- commented Jul 27, 2023

wence- commented Jul 28, 2023

wence- commented Jul 28, 2023

Alexey-Kamenev commented Jul 28, 2023

wence- commented Jul 28, 2023

madsbk left a comment

wence- left a comment •

edited

Loading

wence- commented Jul 31, 2023

Alexey-Kamenev commented Jul 31, 2023

madsbk commented Aug 1, 2023

madsbk commented Aug 1, 2023

Enable roundtrip for nvCOMP batch codecs. #253

Enable roundtrip for nvCOMP batch codecs. #253

Conversation

Alexey-Kamenev commented Jul 19, 2023

rapids-bot bot commented Jul 19, 2023

wence- commented Jul 25, 2023

wence- left a comment

Choose a reason for hiding this comment

wence- left a comment

Choose a reason for hiding this comment

wence- commented Jul 27, 2023

wence- commented Jul 28, 2023

wence- commented Jul 28, 2023

Alexey-Kamenev commented Jul 28, 2023

wence- commented Jul 28, 2023

madsbk left a comment

Choose a reason for hiding this comment

wence- left a comment • edited Loading

Choose a reason for hiding this comment

wence- commented Jul 31, 2023

Alexey-Kamenev commented Jul 31, 2023

madsbk commented Aug 1, 2023

madsbk commented Aug 1, 2023

wence- left a comment •

edited

Loading