-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding padded layout 'layout_padded_general' #725
Adding padded layout 'layout_padded_general' #725
Conversation
@achirkin, please have a look at this. It is now possible to retrieve the layout / padding width at compile time which makes it possible to target optimized kernel code correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I wonder if we can further simplify/optimize some parts of the linewiseOp
for padded data?..
strides[r] = stride; | ||
if (stride == 1) { | ||
stride *= | ||
std::max<size_t>(alignment, (__exts.extent(r) + alignment - 1) / alignment * alignment); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nitpick: perhaps, we can use raft::ceildiv
for readability?
strides[r] = stride; | ||
if (stride == 1) { | ||
stride *= | ||
std::max<size_t>(alignment, (__exts.extent(r) + alignment - 1) / alignment * alignment); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nitpick: perhaps, we can use raft::ceildiv
for readability?
for (int k = threadIdx.x; k < VecElems * BlockSize; k += BlockSize, j += BlockSize) { | ||
while (j >= rowLenPadded) | ||
j -= rowLenPadded; | ||
shm[k] = j < rowLen ? p[j] : Type(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity: why ones and not zeroes? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had division ops in mind and did not want to risk any division by zero. I was unsure whether this might cause issues with tools like valgrind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good idea, I guess. I don't think it will cost anything anyway.
// similar to layout_strided, but contiguous with padding in second smallest stride dimension | ||
template <typename ValueType, | ||
StorageOrderType StorageOrder = StorageOrderType::row_major_t, | ||
size_t ByteAlignment = 128> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just realized that the padding of the strides is done in terms of elements rather than bytes. I assume, it wouldn't be possible to express the padding in bytes due to how mapping
works? If no, what would you think about having the template parameter expressed also in elements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mapping redirects index-access based on elements, so yes - I don't think that can be changed. But as the hardware constraint for data is byte-based I thought it would be good to have a template based on the bytes (with reasonable default) here. This way the user does not have to think about the width of the datatypes he uses and the class computes the element-alignment automatically.
It can also be retrieved statically from the layout via
static constexpr size_t element_alignment = std::max(ByteAlignment / sizeof(ValueType), 1ul);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Maybe you'd also want to constrain it here to always be power-of-two (I've noticed you used Pow2<..> utils somewhere)? That would still cover our hardware-inspired use cases.
The current implementation basically consists of the old main kernel which was running the aligned data portion. I don't think we can simplify it further without loosing performance. Regarding optimizations - we could skip the operation on the padded portion of data, but I guess that would only have very limited effect on larger datasets. |
@achirkin , what are the next steps here to proceed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks good to me.
The only thing I'm not sure is whether we'd want to define the padded layout synonyms for public use with or without the template parameter padding size
. On the one hand, the parameterized version seems to be more logical for whatever use case. On the other hand, any application I could imagine, it only matters to have the specific padding of 128 bytes for the coalesced memory access. Or, maybe, we should have both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for soliciting my review! I have a few comments on the design.
cpp/include/raft/core/mdarray.hpp
Outdated
|
||
template <typename ElementType, storage_order_type order> | ||
using padded_layout = detail::stdex::layout_padded_general< | ||
detail::stdex::padding<std::remove_cv_t<std::remove_reference_t<ElementType>>>::value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[OPTIONAL]
C++20 introduces remove_cvref_t
, but alas, we're probably stuck on C++17 at the latest : - ( You could always use the feature test macro __cpp_lib_remove_cvref
:
namespace detail::stdex {
#if defined(__cpp_lib_remove_cvref)
using ::std::remove_cvref;
using ::std::remove_cvref_t;
#else
template<class T>
struct remove_cvref {
using type = ::std::remove_cv_t<::std::remove_reference_t<T>>;
};
template<class T>
using remove_cvref_t = typename remove_cvref<T>::type;
#endif
}
cpp/include/raft/core/mdarray.hpp
Outdated
// that encodes alignment as a non-type template parameter. | ||
assert(input_pointer == alignTo(input_pointer, alignment::value)); | ||
|
||
pointer aligned_pointer = input_pointer; // assert_aligned(input_pointer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[INFORMATIVE NOTE]
I've been thinking about how to express this idea. The issue is that std::assume_aligned
(C++20) or equivalent compiler built-ins (e.g., GCC's __builtin_assume_aligned
) don't affect the type. My current thinking is that aligned_accessor::access
should do return (std::assume_aligned(p))[i];
. Please see my comment below on aligned_accessor::access
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my latest thoughts on assume_aligned
etc., please see my aligned_accessor
PR kokkos/mdspan#176 .
@cjnolet , thanks for reviewing. I re-based the branch to 22.12 and tried to align naming of template classes and API as close to the existing pattern as possible. |
Premature approval. Hoping for @mhoemmen's blessing before we merge this over.
@cjnolet @mfoerste4 I've been completely overwhelmed with my current project and haven't had time to review this. Please don't feel like you have to wait on me, though I appreciate that you asked me for feedback! |
@gpucibot merge |
This should fix the failures @Nyrio found in [#911](#911 (comment)). This is a test issue within a new testcase that was introduced by [#725](#725). @cjnolet , FYI. Authors: - Malte Förster (https://github.com/mfoerste4) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #964
This is a different approach / followup PR of #663 for issue #497.
I implemented a
layout_padded_general
within raft to statically enforce padding on mdpsan accesses.ValueType
,StorageOrder
(defaultrow_major_t
), andByteAlignment
(default 128)submdspan
functionality right now. I have a branch on a mdspan fork where I tested this though (https://github.com/mfoerste4/mdspan/tree/layout_padded).