Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add subgroups feature #4963

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from
Draft

WIP: Add subgroups feature #4963

wants to merge 13 commits into from

Conversation

alan-baker
Copy link
Contributor

  • API
    • new feature: subgroups
    • new properties for adapter info:
      • minSubgroupSize
      • maxSubgroupSize
  • WGSL
    • new built-in values
      • subgroup_invocation_id
      • subgroup_size
    • subgroup and quad built-in functions
    • new uniformity diagnostic subgroup_uniformity

Missing subgroup_branching diagnostic and uniformity updates.

* API
  * new feature: subgroups
  * new properties for adapter info:
    * minSubgroupSize
    * maxSubgroupSize
* WGSL
  * new built-in values
    * subgroup_invocation_id
    * subgroup_size
  * subgroup and quad built-in functions
  * new uniformity diagnostic subgroup_uniformity
@alan-baker alan-baker added this to the Milestone 1 milestone Nov 8, 2024
spec/index.bs Outdated Show resolved Hide resolved
Copy link
Contributor

github-actions bot commented Nov 8, 2024

Previews, as seen when this build job started (95fceec):
WebGPU webgpu.idl | Explainer | Correspondence Reference
WGSL grammar.js | wgsl.lalr.txt

Copy link
Contributor

@dneto0 dneto0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main question is whether we want a new GPUAdapterProperties object or not

spec/index.bs Outdated Show resolved Hide resolved
wgsl/index.bs Outdated Show resolved Hide resolved
wgsl/index.bs Outdated Show resolved Hide resolved
spec/index.bs Outdated Show resolved Hide resolved
wgsl/index.bs Outdated Show resolved Hide resolved
@dneto0 dneto0 self-requested a review November 8, 2024 20:04
Copy link
Contributor

@Kangz Kangz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Added a bunch of comments but they are relatively minor.

spec/index.bs Outdated Show resolved Hide resolved
spec/index.bs Outdated Show resolved Hide resolved
spec/index.bs Show resolved Hide resolved
spec/index.bs Outdated Show resolved Hide resolved
wgsl/index.bs Outdated Show resolved Hide resolved
operating in non-uniform control flow and device compilers often aggressively
optimize such code.
The result is that the subgroup may contain a different set of active
invocations than the shader author expects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe link in to the new uniformity analysis again, telling authors that there is more portability if they abide by them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is noted below, but IDK that shader authors would go read ### Subgroup Operations ### {#subgroup-ops} itself instead of the intro + definition of builtins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The surrounding text was rewritten, please take another look.


A [=trigger/subgroup_uniformity=] [=diagnostic=] is [=triggered=] if
`delta` is not a [=uniform value=].
An [=indeterminate value=] is returned if `subgroup_invocation_id + delta`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand how one would use this and what's the benefit compared to a subgroupShuffle. It look like a uniform delta would be useful to do a shift, but then the highest active invocation would get an indeterminate value (what we'd want would be a rotation). Should we instead expose some form of shuffleRotate that is easy to use correctly when in subgroup-uniform control flow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is likely much cheaper on some implementations than a shuffle. I guess it depends on use case. I don't know all the best use cases for this, but some sort of iterative compaction wouldn't care about indeterminate values.

There is a rotate primitive in MSL and SPIR-V, though support in SPIR-V seems rather sparse (VK_KHR_shader_subgroup_rotate is ~8.3% supported). It could be polyfilled, but it would likely also be more expensive than a pure rotate on some hardware.

wgsl/index.bs Show resolved Hide resolved
wgsl/index.bs Outdated Show resolved Hide resolved
wgsl/index.bs Show resolved Hide resolved
Copy link
Contributor

@kainino0x kainino0x left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No further comments on the API side

spec/index.bs Outdated Show resolved Hide resolved
* move minSubgroupSize and maxSubgroupSize into GPUAdapterInfo
  * provide default values
* Fix link to wgsl spec
* Fix uniformity rules in wgsl related to parameters they are required
  to be uniform
spec/index.bs Outdated Show resolved Hide resolved
@@ -1888,6 +1889,8 @@ interface GPUAdapterInfo {
readonly attribute DOMString architecture;
readonly attribute DOMString device;
readonly attribute DOMString description;
readonly attribute unsigned long minSubgroupSize;
readonly attribute unsigned long maxSubgroupSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we can come up with better names, but even with this is in GPUAdapterInfo, I find it really confusing that these sound like limits on what the application can do, rather than bounds on what the implementation can do.

subgroupSizeLowerBound
subgroupSizeUpperBound
??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't find it problematic, but I'm open to suggestions. The bounds don't seem that different to me. What about minSupportedSubgroupSize and maxSupportedSubgroupSize?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maxSupportedSubgroupSize still doesn't solve the problem. It still sounds like the programmer tries something and the system accepts or denies. Also "supported" is used in the name of GPUSupportedLimits to bound the things the programmer can do.

Second, in future the causality will be flipped around.
We're likely to add a future feature to let the programmer control the subgroup size. There are good reasons to do so, it's been requested (#3950), and more recent D3D and Vulkan can support it. It's not in this round because Metal can't. See the TODO at https://github.com/gpuweb/gpuweb/blob/main/proposals/subgroups.md#gpu-feature

I think the current names are good. (Also, it helps that Vulkan uses the same names. No reason to make a new name for the same concep.t)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... interesting. I wonder if we would want to add real (configurable, enforced) limits when that happens, but I'm guessing that would not actually work, if I'm correct in my understanding that the subgroup size ranges of different devices are disjoint.

Is it possible that the limits on a user-specified subgroup size would ever be different from the bounds on what automatic subgroup size an implementation will produce? If that's possible then we would need to expose both (regardless of whether the new one is a limit or an info property), and they shouldn't have the same name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Vulkan feature is accessed at pipeline creation and the D3D feature is a shader attribute.

Vulkan has 2 pipeline creation flags:

  • one allows varying subgroup sizes
  • one requires full pipelines

Vulkan also has a pipeline creation structure to specify a specific subgroup size. It is not compatible with the varying subgroup size flag, but the full pipeline bit is optional.

D3D can specify a required subgroup size via an attribute in SM6.6.

The usual case is that the shader author has specialized versions of the shader for specific subgroup sizes.

So this additional functionality isn't really limits. It would be more like a shader/pipeline value to specify a specific subgroup size. That value could be validated against the adapter info being added here (otherwise you'd expect failed compiles).

Copy link
Contributor

@dneto0 dneto0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WGSL side looking very good. Thanks!

wgsl/index.bs Show resolved Hide resolved
wgsl/index.bs Outdated Show resolved Hide resolved
wgsl/index.bs Outdated Show resolved Hide resolved
wgsl/index.bs Outdated Show resolved Hide resolved
wgsl/index.bs Outdated Show resolved Hide resolved
wgsl/index.bs Outdated Show resolved Hide resolved
wgsl/index.bs Outdated Show resolved Hide resolved
wgsl/index.bs Outdated Show resolved Hide resolved
wgsl/index.bs Outdated Show resolved Hide resolved
wgsl/index.bs Show resolved Hide resolved
alan-baker and others added 4 commits November 15, 2024 09:41
Co-authored-by: Kai Ninomiya <[email protected]>
* Mention subgroup_size is uniform in compute shaders in the uniformity
  analysis description of built-in values
* Updates to the subgroups section
* Fix typos and links
wgsl/index.bs Outdated Show resolved Hide resolved
wgsl/index.bs Show resolved Hide resolved
@kainino0x kainino0x added wgsl WebGPU Shading Language Issues api WebGPU API labels Nov 18, 2024
* add notes about uniform control flow scope
* add prefix to inclusive and exclusive subgroup op descriptions
@kdashg
Copy link
Contributor

kdashg commented Nov 19, 2024

WGSL 2024-11-19 Minutes
  • (FYI)
  • DN: AB’s been speccing this, dropped the subgroup_branching diagnostic.
  • AB: We were looking at this diagnostic, and looking at the subtest. Any time you wanted to [very common patterns] you had to turn this off. Many shaders immediately had problems when this was enabled, so we think almost everyone would turn this off. So [looking at?] portability instead of this diagnostic.
  • (no resolution today)

Copy link
Contributor

@dneto0 dneto0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with wording nit.

Thanks!

wgsl/index.bs Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api WebGPU API wgsl WebGPU Shading Language Issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants