Skip to content

Add extension subgroup-size-control#5578

Open
Jiawei-Shao wants to merge 30 commits into
gpuweb:mainfrom
Jiawei-Shao:add-subgroup-size-control
Open

Add extension subgroup-size-control#5578
Jiawei-Shao wants to merge 30 commits into
gpuweb:mainfrom
Jiawei-Shao:add-subgroup-size-control

Conversation

@Jiawei-Shao

@Jiawei-Shao Jiawei-Shao commented Feb 27, 2026

Copy link
Copy Markdown
Contributor

This patch adds a new WebGPU and WGSL extension subgroup-size-control
based on subgroup-size-control.md.

Fixed: #5545

@Jiawei-Shao Jiawei-Shao marked this pull request as draft February 27, 2026 07:19
@Jiawei-Shao Jiawei-Shao marked this pull request as ready for review February 27, 2026 07:58
@github-actions

github-actions Bot commented Feb 27, 2026

Copy link
Copy Markdown
Contributor

Previews, as seen when this build job started (f7741c6):
WebGPU webgpu.idl | Explainer | Correspondence Reference
WGSL grammar.js | wgsl.lalr.txt

@dneto0 dneto0 requested review from dneto0 February 27, 2026 14:40
Comment thread spec/index.bs Outdated
Comment thread spec/index.bs Outdated
Comment thread spec/index.bs Outdated
Comment thread wgsl/index.bs Outdated
Comment thread wgsl/index.bs Outdated
Comment thread wgsl/index.bs Outdated
@Jiawei-Shao Jiawei-Shao marked this pull request as draft March 4, 2026 07:59
@Jiawei-Shao Jiawei-Shao marked this pull request as ready for review March 4, 2026 08:02
@jimblandy

Copy link
Copy Markdown
Contributor

Should this PR also update the correspondence reference?

@Jiawei-Shao Jiawei-Shao marked this pull request as draft March 12, 2026 03:06
@Jiawei-Shao Jiawei-Shao marked this pull request as ready for review March 12, 2026 03:17
@Jiawei-Shao

Copy link
Copy Markdown
Contributor Author

Should this PR also update the correspondence reference?

Done

@Jiawei-Shao

Copy link
Copy Markdown
Contributor Author

PTAL, thanks!

@Jiawei-Shao Jiawei-Shao requested a review from jrprice April 17, 2026 02:35
@Jiawei-Shao

Copy link
Copy Markdown
Contributor Author

Hi @jimblandy and @mwyrzykowski ,

I've updated this PR. PTAL, thanks!

Comment thread spec/index.bs Outdated
Comment thread spec/index.bs Outdated
Comment thread spec/index.bs Outdated
Comment thread spec/index.bs
@kainino0x kainino0x requested a review from alan-baker May 13, 2026 22:01

@kainino0x kainino0x left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API LGTM.
The last WGSL changes should still be reviewed by a WGSL person.
And we still need tests before landing (that's tracked by the open comment, which will block the PR from landing).

@jimblandy

Copy link
Copy Markdown
Contributor

I'll take a look at this today.

@jimblandy jimblandy left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a few editorial issues. Please look through the generated HTML to make sure the formatting is coming through the way it should.

Comment thread wgsl/index.bs
Comment thread wgsl/index.bs Outdated
Comment thread wgsl/index.bs

@jimblandy jimblandy left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Editorial issues aside, this looks good to me.

Comment thread correspondence/index.bs

These are not surfaced in WebGPU because:
- The D3D12 `waveLaneCountMax` is not reliable according to [DirectXShaderCompiler Wiki](https://github.com/microsoft/DirectXShaderCompiler/wiki/Wave-Intrinsics/#caps-flags).
- The D3D12 `waveLaneCountMin` may differ from the actual minimum subgroup sizes used in fragment shaders on some Intel GPUs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing a conversation from gpuweb/cts#4641 (comment) @Jiawei-Shao @jrprice:

I understand that if we set subgroupMinSize to 8 (because of fragment) then it's not going to be valid for compute on this device. I guess what I'm saying is if this AdapterInfo isn't going to provide enough information about the hardware to actually use the API, it seems bad.

  • IF it would otherwise be the case that all values between subgroupMinSize and subgroupMaxSize are valid for trivial pipelines, I really would like to try to maintain that invariant and test it. The test (now removed) suggested this would be true, by arbitrarily choosing subgroupMaxSize as the value to test, for all devices except these Intel ones.

    The options I see for doing that are:

    • Redefine subgroupMinSize so it only applies to compute - assuming this is a non-starter since it's used for more than just subgroup-size-control
    • Separate the compute and fragment subgroupMinSize/subgroupMaxSize
    • Implementation artificially increases subgroupMinSize to 16 even for fragment by using subgroup size control to prevent 8 from actually being used in fragment - no clue if this is possible
  • IF NOT, it doesn't really matter and we can leave the spec as is. The test (when we re-add it) should be changed to keep trying different subgroup sizes until it find one that works. We can use the vendor/architecture to choose the order in which we try different subgroup sizes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test (now removed) suggested this would be true, by arbitrarily choosing subgroupMaxSize as the value to test, for all devices except these Intel ones.

I've put this test in another PR 4643.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what I'm saying is if this AdapterInfo isn't going to provide enough information about the hardware to actually use the API, it seems bad.

Part of the argument I made in this comment (discussed more in these meeting minutes) was that the value of this limit alone often isn't enough to use the feature either. Knowing that the minimum possible subgroup size is 8 vs 16 vs ... doesn't tell you enough about the memory hierarchy or other hardware properties to decide how your shader should operate. Middleware will want to know if it's Intel vs NVIDIA vs AMD vs ..., and potentially different generations of these architectures, in order to decide how to tile data through memory and registers (for example). At this point they have likely already decided what subgroup size they want based on the architecture, and they won't be considering using 8 on Intel regardless of what the limit says.

I agree that it'd be nice if every value between the limits was reliable usable for a trivial pipeline. My understanding is that over time that will increasingly be true, as it is just this one generation of Intel GPUs that has this issue.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, real applications have more complex requirements. But the issue with this Intel generation seems very specific, just that there's one value that's valid for fragment that's not valid for compute? That seems like something to try to paper over, not something to expose as a wart.

For CTS, there's two reasons I would like to aspirationally test that all values in the range work for a trivial pipeline: (1) so applications can (at least mostly) not worry about that particular detail, (2) so that we can check that the values the implementation exposes are actually the correct ones for the device. The latter is quite valuable IMO.

@kainino0x kainino0x Jun 1, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Implementation artificially increases subgroupMinSize to 16 even for fragment by using subgroup size control to prevent 8 from actually being used in fragment - no clue if this is possible

James clarified to me offline that my idea to paper over the difference by using the backend's subgroup size control to prevent fragment shaders from ever selecting 8 (even if it would be allowed) won't work, because subgroup size control is compute-only. So seems like papering over is probably not feasible and the only two reasonable options are internal error (the current proposal) or adding a new limit.

The current option does have me somewhat questioning the value of subgroupMinSize/subgroupMaxSize in the first place if sometimes there will be values in that range that just don't work at all. But I guess that ship has sailed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Redefine subgroupMinSize so it only applies to compute - assuming this is a non-starter since it's used for more than just subgroup-size-control

Maybe this is still feasible? If subgroupMinSize/subgroupMaxSize is going to have very limited use anyway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrprice @dneto0 What do you think about Kai's comment?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a breaking change to the API. I don't know how frequently these properties are used for fragment shaders but I'm not sure that we could change them at this stage.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about subgroupMaxSize ?

Actually the D3D12 document mentions WaveLaneCountMax is not reliable so currently we have to always choose 128 for subgroupMaxSize on D3D12, but 128 is not acceptable on lots of GPUs.

@Jiawei-Shao

Copy link
Copy Markdown
Contributor Author

API LGTM. The last WGSL changes should still be reviewed by a WGSL person. And we still need tests before landing (that's tracked by the open comment, which will block the PR from landing).

Hi @kainino0x,

I've added all the tests to CTS (in gpuweb/cts#4640).

PTAL, thanks!

@jimblandy

Copy link
Copy Markdown
Contributor

The CTS PR has been merged. @kainino0x, is this ready to land?

@jimblandy

Copy link
Copy Markdown
Contributor

The CTS PR has been merged. @kainino0x, is this ready to land?

I was pushing to get this resolved, but I want to withdraw my pressure here. Based on the discussion above, it looks to me like this question is still very much unresolved. The web will have to deal with whatever we put in the spec forever, so it is very much worthwhile for us to spend a few weeks ensuring that we're doing the best we can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Subgroup Size Control

7 participants