Adding padded layout 'layout_padded_general' #725

mfoerste4 · 2022-06-27T10:29:53Z

This is a different approach / followup PR of #663 for issue #497.

I implemented a layout_padded_general within raft to statically enforce padding on mdpsan accesses.

The layout has template parameters for ValueType, StorageOrder (default row_major_t), and ByteAlignment (default 128)
in order to not require changes upstream I skipped submdspan functionality right now. I have a branch on a mdspan fork where I tested this though (https://github.com/mfoerste4/mdspan/tree/layout_padded).

mfoerste4 · 2022-06-27T10:38:02Z

@achirkin, please have a look at this. It is now possible to retrieve the layout / padding width at compile time which makes it possible to target optimized kernel code correctly.

achirkin

Looks good! I wonder if we can further simplify/optimize some parts of the linewiseOp for padded data?..

achirkin · 2022-06-28T13:00:30Z

cpp/include/raft/detail/layout_padded_general.hpp

+    strides[r] = stride;
+    if (stride == 1) {
+      stride *=
+        std::max<size_t>(alignment, (__exts.extent(r) + alignment - 1) / alignment * alignment);


A nitpick: perhaps, we can use raft::ceildivfor readability?

achirkin · 2022-06-28T13:02:49Z

cpp/include/raft/detail/layout_padded_general.hpp

+    strides[r] = stride;
+    if (stride == 1) {
+      stride *=
+        std::max<size_t>(alignment, (__exts.extent(r) + alignment - 1) / alignment * alignment);


A nitpick: perhaps, we can use raft::ceildiv for readability?

achirkin · 2022-06-28T13:05:38Z

cpp/include/raft/matrix/detail/linewise_op.cuh

+    for (int k = threadIdx.x; k < VecElems * BlockSize; k += BlockSize, j += BlockSize) {
+      while (j >= rowLenPadded)
+        j -= rowLenPadded;
+      shm[k] = j < rowLen ? p[j] : Type(1);


Out of curiosity: why ones and not zeroes? :)

I had division ops in mind and did not want to risk any division by zero. I was unsure whether this might cause issues with tools like valgrind?

Hmm, good idea, I guess. I don't think it will cost anything anyway.

achirkin · 2022-06-28T13:28:06Z

cpp/include/raft/detail/layout_padded_general.hpp

+// similar to layout_strided, but contiguous with padding in second smallest stride dimension
+template <typename ValueType,
+          StorageOrderType StorageOrder = StorageOrderType::row_major_t,
+          size_t ByteAlignment          = 128>


I've just realized that the padding of the strides is done in terms of elements rather than bytes. I assume, it wouldn't be possible to express the padding in bytes due to how mapping works? If no, what would you think about having the template parameter expressed also in elements?

The mapping redirects index-access based on elements, so yes - I don't think that can be changed. But as the hardware constraint for data is byte-based I thought it would be good to have a template based on the bytes (with reasonable default) here. This way the user does not have to think about the width of the datatypes he uses and the class computes the element-alignment automatically.
It can also be retrieved statically from the layout via
static constexpr size_t element_alignment = std::max(ByteAlignment / sizeof(ValueType), 1ul);

Sounds good. Maybe you'd also want to constrain it here to always be power-of-two (I've noticed you used Pow2<..> utils somewhere)? That would still cover our hardware-inspired use cases.

mfoerste4 · 2022-07-01T11:27:50Z

Looks good! I wonder if we can further simplify/optimize some parts of the linewiseOp for padded data?..

The current implementation basically consists of the old main kernel which was running the aligned data portion. I don't think we can simplify it further without loosing performance. Regarding optimizations - we could skip the operation on the padded portion of data, but I guess that would only have very limited effect on larger datasets.

mfoerste4 · 2022-08-02T12:35:22Z

@achirkin , what are the next steps here to proceed?

achirkin

Overall, this looks good to me.

The only thing I'm not sure is whether we'd want to define the padded layout synonyms for public use with or without the template parameter padding size. On the one hand, the parameterized version seems to be more logical for whatever use case. On the other hand, any application I could imagine, it only matters to have the specific padding of 128 bytes for the coalesced memory access. Or, maybe, we should have both?

cpp/include/raft/matrix/detail/linewise_op.cuh

cpp/include/raft/detail/layout_padded_general.hpp

mhoemmen

Thanks for soliciting my review! I have a few comments on the design.

cpp/include/raft/core/cudart_utils.hpp

cpp/include/raft/detail/layout_padded_general.hpp

mhoemmen · 2022-08-08T20:51:35Z

cpp/include/raft/core/mdarray.hpp

+
+template <typename ElementType, storage_order_type order>
+using padded_layout = detail::stdex::layout_padded_general<
+  detail::stdex::padding<std::remove_cv_t<std::remove_reference_t<ElementType>>>::value,


[OPTIONAL]

C++20 introduces remove_cvref_t, but alas, we're probably stuck on C++17 at the latest : - ( You could always use the feature test macro __cpp_lib_remove_cvref:

namespace detail::stdex { #if defined(__cpp_lib_remove_cvref) using ::std::remove_cvref; using ::std::remove_cvref_t; #else template<class T> struct remove_cvref { using type = ::std::remove_cv_t<::std::remove_reference_t<T>>; }; template<class T> using remove_cvref_t = typename remove_cvref<T>::type; #endif }

mhoemmen · 2022-08-08T21:01:19Z

cpp/include/raft/core/mdarray.hpp

+  // that encodes alignment as a non-type template parameter.
+  assert(input_pointer == alignTo(input_pointer, alignment::value));
+
+  pointer aligned_pointer = input_pointer;  // assert_aligned(input_pointer,


[INFORMATIVE NOTE]

I've been thinking about how to express this idea. The issue is that std::assume_aligned (C++20) or equivalent compiler built-ins (e.g., GCC's __builtin_assume_aligned) don't affect the type. My current thinking is that aligned_accessor::access should do return (std::assume_aligned(p))[i];. Please see my comment below on aligned_accessor::access.

For my latest thoughts on assume_aligned etc., please see my aligned_accessor PR kokkos/mdspan#176 .

cpp/include/raft/detail/layout_padded_general.hpp

mfoerste4 · 2022-10-17T11:15:14Z

@cjnolet , thanks for reviewing. I re-based the branch to 22.12 and tried to align naming of template classes and API as close to the existing pattern as possible.

@mhoemmen

Premature approval. Hoping for @mhoemmen's blessing before we merge this over.

mhoemmen · 2022-10-26T21:23:27Z

@cjnolet @mfoerste4 I've been completely overwhelmed with my current project and haven't had time to review this. Please don't feel like you have to wait on me, though I appreciate that you asked me for feedback!

cjnolet · 2022-10-27T02:42:01Z

@gpucibot merge

@Nyrio

This should fix the failures @Nyrio found in [#911](#911 (comment)). This is a test issue within a new testcase that was introduced by [#725](#725). @cjnolet , FYI. Authors: - Malte Förster (https://github.com/mfoerste4) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #964

mfoerste4 added 5 commits June 22, 2022 14:17

added testcases for layout_padded_general

c0b6eb3

adjustments due to changes in mdspan

34620a0

moved layout_padded_general to raft repo

2567b8b

some minor adjustments

9094c1d

added linewise-op for padded mdspan

7d57828

mfoerste4 requested a review from a team as a code owner June 27, 2022 10:29

mfoerste4 marked this pull request as draft June 27, 2022 10:30

github-actions bot added the cpp label Jun 27, 2022

This was referenced Jun 27, 2022

[FEA] Padding support for mdspan and mdarray. #497

Open

[WIP] [HELP-REQ] added convenience layout mapper for to enable padding via layout_stride #663

Closed

achirkin reviewed Jun 28, 2022

View reviewed changes

moved some conexpr utilities to c++ header

aa22b8c

cjnolet assigned mfoerste4 Jul 14, 2022

cjnolet added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 14, 2022

switch to alignTo utility

0ae0f03

achirkin requested changes Aug 4, 2022

View reviewed changes

cpp/include/raft/matrix/detail/linewise_op.cuh Outdated Show resolved Hide resolved

cpp/include/raft/matrix/detail/linewise_op.cuh Outdated Show resolved Hide resolved

cpp/include/raft/detail/layout_padded_general.hpp Outdated Show resolved Hide resolved

mhoemmen reviewed Aug 4, 2022

View reviewed changes

cpp/include/raft/core/cudart_utils.hpp Outdated Show resolved Hide resolved

cpp/include/raft/detail/layout_padded_general.hpp Outdated Show resolved Hide resolved

cpp/include/raft/detail/layout_padded_general.hpp Outdated Show resolved Hide resolved

mfoerste4 added 3 commits August 5, 2022 17:56

moved template definition from detail to core/mdarray

cbc3e65

padding layout based on element alignment

7a17f80

aligned accessor review suggestion

b9b7121

mhoemmen reviewed Aug 8, 2022

View reviewed changes

fix merge conflicts

59dada4

mfoerste4 added 4 commits October 11, 2022 05:20

merge after rebase to 22.12

4eaf092

resolved merge conflicts

9b78a33

move interface of linewiseOp

c41c21e

align interfacr and template names with existing pattern

9bd2100

mfoerste4 requested review from a team as code owners October 17, 2022 11:08

github-actions bot added CMake gpuCI python labels Oct 17, 2022

mfoerste4 changed the base branch from branch-22.10 to branch-22.12 October 17, 2022 11:13

mfoerste4 mentioned this pull request Oct 17, 2022

[FEA] update to official padded layouts and test submdspan functionality #922

Open

fixed API, modified comment to match mdspan index order

0cd5f65

github-actions bot removed CMake python gpuCI labels Oct 17, 2022

cjnolet previously approved these changes Oct 19, 2022

View reviewed changes

cjnolet removed the request for review from a team October 19, 2022 22:29

cjnolet approved these changes Oct 27, 2022

View reviewed changes

rapids-bot bot merged commit af05bcc into rapidsai:branch-22.12 Oct 27, 2022

Nyrio mentioned this pull request Oct 28, 2022

Replace map_along_rows with matrixVectorOp #911

Merged

mfoerste4 mentioned this pull request Oct 28, 2022

[Hotfix] linewiseop padded span test #964

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding padded layout 'layout_padded_general' #725

Adding padded layout 'layout_padded_general' #725

mfoerste4 commented Jun 27, 2022

mfoerste4 commented Jun 27, 2022

achirkin left a comment

achirkin Jun 28, 2022

achirkin Jun 28, 2022

achirkin Jun 28, 2022

mfoerste4 Jun 28, 2022

achirkin Jun 28, 2022

achirkin Jun 28, 2022

mfoerste4 Jun 28, 2022

achirkin Jun 28, 2022

mfoerste4 commented Jul 1, 2022

mfoerste4 commented Aug 2, 2022

achirkin left a comment

mhoemmen left a comment

mhoemmen Aug 8, 2022

mhoemmen Aug 8, 2022

mhoemmen Aug 18, 2022

mfoerste4 commented Oct 17, 2022

mhoemmen commented Oct 26, 2022

cjnolet commented Oct 27, 2022

Adding padded layout 'layout_padded_general' #725

Adding padded layout 'layout_padded_general' #725

Conversation

mfoerste4 commented Jun 27, 2022

mfoerste4 commented Jun 27, 2022

achirkin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfoerste4 commented Jul 1, 2022

mfoerste4 commented Aug 2, 2022

achirkin left a comment

Choose a reason for hiding this comment

mhoemmen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfoerste4 commented Oct 17, 2022

mhoemmen commented Oct 26, 2022

cjnolet commented Oct 27, 2022