-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Support for half-float mixed precise in brute-force #2382
Conversation
rhdong
commented
Jul 17, 2024
- distance supports half-float
- SDDMM support half-float
- gemm supports multi-type compose
- transpose & copy support half
- random supports half
- distance supports half-float - SDDMM support half-float - gemm supports multi-type compose - transpose & copy support half - random supports half
cpp/test/CMakeLists.txt
Outdated
@@ -219,38 +219,7 @@ if(BUILD_TESTS) | |||
NAME | |||
LINALG_TEST | |||
PATH | |||
linalg/add.cu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this file was accidentally checked in - we should probably keep the linalg/sparse tests here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OMG, I forgot to recover the test cases a second time...T_T
* @param a need to convert | ||
*/ | ||
template <typename T> | ||
HDI auto half2float(T& a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this API should be renamed - since this not only converts from half2float, but also handles conversions when T=float.
I also think we probably want to have both an input and output template parameters for the conversion - rather than have a template parameter for the input, and assume the output is always a float.
The faiss codebase has a generic 'convert' operator that does this -
(https://github.com/facebookresearch/faiss/blob/dd72e4121dc6684c6fbf289949ba4526a54d9f3b/faiss/gpu/utils/ConversionOperators.cuh#L39-L45) which seems to work pretty well
@@ -102,7 +102,13 @@ template <typename T> | |||
RAFT_INLINE_FUNCTION auto asin(T x) | |||
{ | |||
#ifdef __CUDA_ARCH__ | |||
return ::asin(x); | |||
if constexpr (std::is_same<T, __half>::value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need half support for the asin
function?
I'm wondering if we should either remove the half support for this function, or add half support for all the other trigonometric functions in this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe we can. As of now, it's a kind of trade-off. I understand your meaning; some distance algorithms need this as supporting half; removing it will cause a compilation error. If we bring half everything, that could be ideal, but the workload can be out of control..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little concerned by the transpose kernel and I think this PR needs some additional benchmarks- both on the perf w/ half precision and on the increase in build time and binary size.
Given that we're butting right up against code freeze, I'd like to retarget this PR to 24.10 so we can make sure we're happy w/ the perf and changes to the build.
const half* __restrict__ in, | ||
half* __restrict__ out) | ||
{ | ||
__shared__ half tile[TILE_DIM][TILE_DIM + 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transpose of a non-symmetric non-square matrix is non-trivial to parallelize. This seems to be doing it all in a single kernel. This makes me wonder if we might be lacking test coverage here.
/ok to test |
/ok to test |
/ok to test |
/okay to test |
@rhdong before we merge this, we need to make sure it doesn't break cuml. Can you build cuML with this RAFT branch and just verify it doesn't need to be updated? |
OK, I will verify it. |
/merge |