Add extension subgroup-size-control#5578
Conversation
This patch adds a new WebGPU and WGSL extension `subgroup-size-control` based on `proposals/subgroup-size-control.md`.
|
Previews, as seen when this build job started (f7741c6): |
|
Should this PR also update the correspondence reference? |
Done |
|
PTAL, thanks! |
|
Hi @jimblandy and @mwyrzykowski , I've updated this PR. PTAL, thanks! |
|
I'll take a look at this today. |
jimblandy
left a comment
There was a problem hiding this comment.
I found a few editorial issues. Please look through the generated HTML to make sure the formatting is coming through the way it should.
jimblandy
left a comment
There was a problem hiding this comment.
Editorial issues aside, this looks good to me.
|
|
||
| These are not surfaced in WebGPU because: | ||
| - The D3D12 `waveLaneCountMax` is not reliable according to [DirectXShaderCompiler Wiki](https://github.com/microsoft/DirectXShaderCompiler/wiki/Wave-Intrinsics/#caps-flags). | ||
| - The D3D12 `waveLaneCountMin` may differ from the actual minimum subgroup sizes used in fragment shaders on some Intel GPUs. |
There was a problem hiding this comment.
Continuing a conversation from gpuweb/cts#4641 (comment) @Jiawei-Shao @jrprice:
I understand that if we set subgroupMinSize to 8 (because of fragment) then it's not going to be valid for compute on this device. I guess what I'm saying is if this AdapterInfo isn't going to provide enough information about the hardware to actually use the API, it seems bad.
-
IF it would otherwise be the case that all values between
subgroupMinSizeandsubgroupMaxSizeare valid for trivial pipelines, I really would like to try to maintain that invariant and test it. The test (now removed) suggested this would be true, by arbitrarily choosingsubgroupMaxSizeas the value to test, for all devices except these Intel ones.The options I see for doing that are:
- Redefine
subgroupMinSizeso it only applies to compute - assuming this is a non-starter since it's used for more than justsubgroup-size-control - Separate the compute and fragment
subgroupMinSize/subgroupMaxSize - Implementation artificially increases
subgroupMinSizeto 16 even for fragment by using subgroup size control to prevent 8 from actually being used in fragment - no clue if this is possible
- Redefine
-
IF NOT, it doesn't really matter and we can leave the spec as is. The test (when we re-add it) should be changed to keep trying different subgroup sizes until it find one that works. We can use the
vendor/architectureto choose the order in which we try different subgroup sizes.
There was a problem hiding this comment.
The test (now removed) suggested this would be true, by arbitrarily choosing subgroupMaxSize as the value to test, for all devices except these Intel ones.
I've put this test in another PR 4643.
There was a problem hiding this comment.
I guess what I'm saying is if this AdapterInfo isn't going to provide enough information about the hardware to actually use the API, it seems bad.
Part of the argument I made in this comment (discussed more in these meeting minutes) was that the value of this limit alone often isn't enough to use the feature either. Knowing that the minimum possible subgroup size is 8 vs 16 vs ... doesn't tell you enough about the memory hierarchy or other hardware properties to decide how your shader should operate. Middleware will want to know if it's Intel vs NVIDIA vs AMD vs ..., and potentially different generations of these architectures, in order to decide how to tile data through memory and registers (for example). At this point they have likely already decided what subgroup size they want based on the architecture, and they won't be considering using 8 on Intel regardless of what the limit says.
I agree that it'd be nice if every value between the limits was reliable usable for a trivial pipeline. My understanding is that over time that will increasingly be true, as it is just this one generation of Intel GPUs that has this issue.
There was a problem hiding this comment.
Sure, real applications have more complex requirements. But the issue with this Intel generation seems very specific, just that there's one value that's valid for fragment that's not valid for compute? That seems like something to try to paper over, not something to expose as a wart.
For CTS, there's two reasons I would like to aspirationally test that all values in the range work for a trivial pipeline: (1) so applications can (at least mostly) not worry about that particular detail, (2) so that we can check that the values the implementation exposes are actually the correct ones for the device. The latter is quite valuable IMO.
There was a problem hiding this comment.
- Implementation artificially increases
subgroupMinSizeto 16 even for fragment by using subgroup size control to prevent 8 from actually being used in fragment - no clue if this is possible
James clarified to me offline that my idea to paper over the difference by using the backend's subgroup size control to prevent fragment shaders from ever selecting 8 (even if it would be allowed) won't work, because subgroup size control is compute-only. So seems like papering over is probably not feasible and the only two reasonable options are internal error (the current proposal) or adding a new limit.
The current option does have me somewhat questioning the value of subgroupMinSize/subgroupMaxSize in the first place if sometimes there will be values in that range that just don't work at all. But I guess that ship has sailed.
There was a problem hiding this comment.
- Redefine
subgroupMinSizeso it only applies to compute - assuming this is a non-starter since it's used for more than justsubgroup-size-control
Maybe this is still feasible? If subgroupMinSize/subgroupMaxSize is going to have very limited use anyway.
There was a problem hiding this comment.
This would be a breaking change to the API. I don't know how frequently these properties are used for fragment shaders but I'm not sure that we could change them at this stage.
There was a problem hiding this comment.
How about subgroupMaxSize ?
Actually the D3D12 document mentions WaveLaneCountMax is not reliable so currently we have to always choose 128 for subgroupMaxSize on D3D12, but 128 is not acceptable on lots of GPUs.
Hi @kainino0x, I've added all the tests to CTS (in gpuweb/cts#4640). PTAL, thanks! |
|
The CTS PR has been merged. @kainino0x, is this ready to land? |
I was pushing to get this resolved, but I want to withdraw my pressure here. Based on the discussion above, it looks to me like this question is still very much unresolved. The web will have to deal with whatever we put in the spec forever, so it is very much worthwhile for us to spend a few weeks ensuring that we're doing the best we can do. |
This patch adds a new WebGPU and WGSL extension
subgroup-size-controlbased on subgroup-size-control.md.
Fixed: #5545