-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Serializer for mdspans #1173
Conversation
Codecov ReportBase: 87.99% // Head: 87.99% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## branch-23.02 #1173 +/- ##
=============================================
Coverage 87.99% 87.99%
=============================================
Files 21 21
Lines 483 483
=============================================
Hits 425 425
Misses 58 58 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@cjnolet I introduced a few macros to replace global constant variables, as the latter caused a link error due to symbol duplication. Is this okay? Macros are not hygienic and will not be scoped by namespaces. The alternative is to add new source file(s). What's the policy like for adding source files? Is RAFT meant to be mostly header only? |
I think what you've done should be okay since they are at least prefixed w/
RAFT's APIs are all header-only and the source files are used for 1) template specializations that users can link against to avoid recompiling expensive headers, and 2) runtime APIs that can be accessed on host (and by pylibraft). This means the serializer code needs to be header-only at its core but if we find that the functions are expensive to compile, we can always add an explicit template specialization for them. |
Can I get another round of review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm excited finally to get this in raft! A few minor things in the in the comments.
Also, I'm concerned a little bit regarding the naming. Atm we use serialize_xx
, deserialize_xx
and also load
,save
. This looks a bit inconsistent. Would you consider changing all of these to serialize_xx
/deserialize_xx
? Or even just serialize
/deserialize
(and rely on overload machinery)?
I'm in favor of template <typename T>
void serialize(std::ostream& os, const T& value) to serialize scalars, the function may be accidentally called with an mdspan (by setting I have renamed all functions to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m pre-approving so we don’t need to hold up the release for these but just a couple small things we should probably open a follow-on for (in 23.04)
@pytest.mark.parametrize("dtype", ["float32", "float64", "int32", "uint32"]) | ||
def test_mdspan_serializer(dtype): | ||
X = np.random.random_sample((2, 3)).astype(dtype) | ||
run_roundtrip_test_for_mdspan(X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also test this for fortran_order=True
?
Py_buffer* PyMemoryView_GET_BUFFER(PyObject* mview) | ||
|
||
|
||
def run_roundtrip_test_for_mdspan(X, fortran_order=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move this to the testing namespace? maybe something like test/serializer_helper.pyx
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-approving, the PR looks really great! Just one comment, could you please file an issue or add a TODO for my comment to be tackled later?
|
||
template <typename ElementType, typename Extents, typename LayoutPolicy, typename AccessorPolicy> | ||
inline void serialize_mdspan( | ||
const raft::device_resources&, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the base-class raft::resources
for all these functions? device_resources
is not needed with host_mdspan
overloads
/merge |
Closes rapidsai#1017 Implement a serializer for mdspan using the NumPy format. Row- and column-major layouts are supported. TODOs - [x] Revise IVF to use the new mdspan serializer - [x] Add gtest to cover serializing device mdspans - [x] Implement serializer for scalars - [x] Rename header to `raft/core/serialize.hpp` - [x] Use `device_resources` instead of `handle_t` - [x] Use 64-byte alignment - [x] Serialize scalars as 0D NumPy array - [x] Add version field to ANN serialization - [x] Test mdspan serializer in the Python layer Authors: - Philip Hyunsu Cho (https://github.com/hcho3) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Artem M. Chirkin (https://github.com/achirkin) - Corey J. Nolet (https://github.com/cjnolet) - Divye Gala (https://github.com/divyegala) - Ray Douglass (https://github.com/raydouglass) URL: rapidsai#1173
Closes #1017
Implement a serializer for mdspan using the NumPy format. Row- and column-major layouts are supported.
TODOs
raft/core/serialize.hpp
device_resources
instead ofhandle_t