-
Notifications
You must be signed in to change notification settings - Fork 203
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add lds and sts inline ptx instructions to force vector instruction g…
…eneration (#273) Adds inline ptx assembly for lds & sts instructions for float, float2, float4, double, double2. This ensures that compiler doesn't mistakenly generate non-vectorized instructions whenever we need it to generate vectorized version. Also this ensures that we always generate non-generic ld/st instructions eliminating compiler from generating generic ld/st instructions. These functions now requires the given shmem pointer should be aligned by the vector length, like for float4 lds/sts shmem pointer should be aligned by 16 bytes else it might silently fail or can also give runtime error. Authors: - Mahesh Doijade (https://github.com/mdoijade) Approvers: - Thejaswi. N. S (https://github.com/teju85) URL: #273
- Loading branch information
Showing
1 changed file
with
68 additions
and
33 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters