Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Serializer for mdspans #1173

Merged
merged 40 commits into from
Feb 2, 2023

Conversation

hcho3
Copy link
Contributor

@hcho3 hcho3 commented Jan 25, 2023

Closes #1017

Implement a serializer for mdspan using the NumPy format. Row- and column-major layouts are supported.

TODOs

  • Revise IVF to use the new mdspan serializer
  • Add gtest to cover serializing device mdspans
  • Implement serializer for scalars
  • Rename header to raft/core/serialize.hpp
  • Use device_resources instead of handle_t
  • Use 64-byte alignment
  • Serialize scalars as 0D NumPy array
  • Add version field to ANN serialization
  • Test mdspan serializer in the Python layer

@codecov-commenter
Copy link

codecov-commenter commented Jan 25, 2023

Codecov Report

Base: 87.99% // Head: 87.99% // No change to project coverage 👍

Coverage data is based on head (65fdf53) compared to base (afece4f).
Patch has no changes to coverable lines.

Additional details and impacted files
@@              Coverage Diff              @@
##           branch-23.02    #1173   +/-   ##
=============================================
  Coverage         87.99%   87.99%           
=============================================
  Files                21       21           
  Lines               483      483           
=============================================
  Hits                425      425           
  Misses               58       58           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@cjnolet cjnolet added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 25, 2023
@hcho3 hcho3 changed the title [WIP] Serializer for mdspans [REVIEW] Serializer for mdspans Jan 26, 2023
@hcho3
Copy link
Contributor Author

hcho3 commented Jan 26, 2023

@cjnolet I introduced a few macros to replace global constant variables, as the latter caused a link error due to symbol duplication. Is this okay? Macros are not hygienic and will not be scoped by namespaces. The alternative is to add new source file(s). What's the policy like for adding source files? Is RAFT meant to be mostly header only?

@cjnolet
Copy link
Member

cjnolet commented Jan 26, 2023

Macros are not hygienic and will not be scoped by namespaces

I think what you've done should be okay since they are at least prefixed w/ RAFT_NUMPY thus lowering the potential for collision.

What's the policy like for adding source files? Is RAFT meant to be mostly header only?

RAFT's APIs are all header-only and the source files are used for 1) template specializations that users can link against to avoid recompiling expensive headers, and 2) runtime APIs that can be accessed on host (and by pylibraft). This means the serializer code needs to be header-only at its core but if we find that the functions are expensive to compile, we can always add an explicit template specialization for them.

@hcho3 hcho3 requested a review from a team as a code owner January 27, 2023 06:41
@hcho3
Copy link
Contributor Author

hcho3 commented Jan 27, 2023

Can I get another round of review?

Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited finally to get this in raft! A few minor things in the in the comments.

Also, I'm concerned a little bit regarding the naming. Atm we use serialize_xx, deserialize_xx and also load,save. This looks a bit inconsistent. Would you consider changing all of these to serialize_xx/deserialize_xx? Or even just serialize/deserialize (and rely on overload machinery)?

cpp/include/raft/core/serialize.hpp Show resolved Hide resolved
cpp/include/raft/spatial/knn/detail/ivf_pq_build.cuh Outdated Show resolved Hide resolved
cpp/include/raft/core/mdspan.hpp Outdated Show resolved Hide resolved
@hcho3
Copy link
Contributor Author

hcho3 commented Jan 31, 2023

Would you consider changing all of these to serialize_xx/deserialize_xx? Or even just serialize/deserialize (and rely on overload machinery)?

I'm in favor of serialize_xx/deserialize_xx, because using the same serialize name for both scalars and mdspans may confuse the compiler and cause a surprising behavior in overload resolution. That is, when we declare

template <typename T>
void serialize(std::ostream& os, const T& value)

to serialize scalars, the function may be accidentally called with an mdspan (by setting T=mdspan<...>).

I have renamed all functions to serialize_xxx / deserialize_xxx.

Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m pre-approving so we don’t need to hold up the release for these but just a couple small things we should probably open a follow-on for (in 23.04)

@pytest.mark.parametrize("dtype", ["float32", "float64", "int32", "uint32"])
def test_mdspan_serializer(dtype):
X = np.random.random_sample((2, 3)).astype(dtype)
run_roundtrip_test_for_mdspan(X)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also test this for fortran_order=True?

Py_buffer* PyMemoryView_GET_BUFFER(PyObject* mview)


def run_roundtrip_test_for_mdspan(X, fortran_order=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this to the testing namespace? maybe something like test/serializer_helper.pyx?

Copy link
Member

@divyegala divyegala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-approving, the PR looks really great! Just one comment, could you please file an issue or add a TODO for my comment to be tackled later?


template <typename ElementType, typename Extents, typename LayoutPolicy, typename AccessorPolicy>
inline void serialize_mdspan(
const raft::device_resources&,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the base-class raft::resources for all these functions? device_resources is not needed with host_mdspan overloads

@cjnolet cjnolet requested a review from a team as a code owner February 1, 2023 21:43
@cjnolet
Copy link
Member

cjnolet commented Feb 2, 2023

/merge

@rapids-bot rapids-bot bot merged commit aa14ed4 into rapidsai:branch-23.02 Feb 2, 2023
@hcho3 hcho3 deleted the mdspan_serializer branch February 2, 2023 17:03
ahendriksen pushed a commit to ahendriksen/raft that referenced this pull request Feb 2, 2023
Closes rapidsai#1017

Implement a serializer for mdspan using the NumPy format. Row- and column-major layouts are supported.

TODOs
- [x] Revise IVF to use the new mdspan serializer
- [x] Add gtest to cover serializing device mdspans
- [x] Implement serializer for scalars
- [x] Rename header to `raft/core/serialize.hpp`
- [x] Use `device_resources` instead of `handle_t`
- [x] Use 64-byte alignment
- [x] Serialize scalars as 0D NumPy array
- [x] Add version field to ANN serialization
- [x] Test mdspan serializer in the Python layer

Authors:
  - Philip Hyunsu Cho (https://github.com/hcho3)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Artem M. Chirkin (https://github.com/achirkin)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Divye Gala (https://github.com/divyegala)
  - Ray Douglass (https://github.com/raydouglass)

URL: rapidsai#1173
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python
Projects
Development

Successfully merging this pull request may close these issues.

[FEA] Tracker for mdspan serialization and copy
7 participants