Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static Samplers Proposal #4687

Open
Sirtsu55 opened this issue May 31, 2024 · 9 comments
Open

Static Samplers Proposal #4687

Sirtsu55 opened this issue May 31, 2024 · 9 comments
Labels
api WebGPU API proposal wgsl WebGPU Shading Language Issues
Milestone

Comments

@Sirtsu55
Copy link

Overview

Static samplers are immutable sampler objects that can not be modified after they are defined. They exist in all three backends: Vulkan, D3D12, and Metal. They are called immutable samplers in Vulkan, static samplers in D3D12, and for Metal they are constexpr samplers in Metal Shading Language (MSL). This proposal for static samplers in WebGPU would allow for samplers to be defined in WGSL, thus simplifying the workflow since sampler objects are rarely changed after initializing them once. The proposal also allows defining them through the WebGPU API in bind group layouts (BGL).

Vulkan

Vulkan allows defining immutable samplers in VkDescriptorSetLayoutBinding with a binding. It asks for VkSamplers in VkDescriptorSetLayoutBinding::pImmutableSamplers. Immutable samplers are tied toVkDescriptorSetLayout and can be used across with different pipeline layouts. This means that WebGPU needs the binding slot and the sampler at BGL creation.

If static samplers are defined in WGSL, the implementation can automatically generate the BGL using shader reflection. If they are defined in the WebGPU API, a sampler object has to be passed. The WebGPU API is discussed later.

Direct3D 12

Static samplers in Direct3D 12 are similar to Vulkan. Static samplers are defined when creating the root signature: D3D12_ROOT_SIGNATURE_DESC::pStaticSamplers. Similar to Vulkan, they also need a binding slot for the sampler and can be used across pipelines with the same root signature. D3D12 requires a sampler description rather than a sampler object.

Unfortunately, D3D12 defines a hard upper limit on static samplers in the specification: “The total number of unique static samplers that can be declared across all root signatures live on a D3D device at a time is limited to 2032.” Note the usage of “unique static samplers”. This implies that if there are duplicate static samplers then the drivers should deduplicate it. There are a total of 1728 unique combinations of samplers that can be created in WebGPU, disregarding LOD and anisotropy fields. The upper limit can be hit, but is unlikely.

Metal

Static samplers in metal are defined in MSL as a constexpr sampler.

constexpr sampler textureSampler (mag_filter::linear, min_filter::linear);

Metal has the most straightforward static sampler implementation, which does not require any API calls. A WGSL static sampler can be translated into the Metal equivalent when parsing WGSL.

This simplicity poses a limitation for the static samplers when defining them through the WebGPU API. Consider a situation where a sampler is only declared in WGSL but the sampler description is defined via the API.

@group(X) @binding(Y) const mySampler;

If the description is provided at runtime, the implementation has to modify the Metal source or Metal-IR in order to complete the sampler definition. However, we can not require implementations to modify Metal source or Metal-IR because precompiling shaders would not be possible and Metal-IR is not documented. As a consequence, we must require the full sampler description in WGSL and in the API.

WGSL Syntax

This is an open topic and feedback is appreciated. There are many options for the syntax, but here is the most sensible.

There could be a sampler struct that corresponds to GPUSamplerDescriptor with the same defaults:

@group(X) @binding(Y) const mySampler = sampler {
addressModeU: mirror_repeat, // enumerant
magFilter: linear,
lodMinClamp: 1, // abstract float
...
};

However, this design needs struct literal syntax which does not exist in WebGPU.

WebGPU API

The static samplers are bound to the BGL, so they need to be defined in the BGL before using them in pipeline layouts. There needs to be a way to declare them in the BGL. There has been discussion on this topic: webgpu-native/webgpu-headers#284.
A new structure is introduced that will be chained to WGPUBindGroupLayoutEntry:

struct WGPUStaticSamplerBindingLayout {
	WGPUChainedStruct chain;
	WGPUSampler sampler; 
};

This structure expects a sampler object with the same description as the static sampler in WGSL shader. The sampler object can be converted into a D3D12 static sampler description and a VkSampler object in Vulkan.

Auto Layouts

The user can request to create a pipeline with an auto pipeline layout and this would take care of the static sampler definitions in the BGL for the user. This approach allow the user to only specify the static samplers in WGSL and forget about them in the API.

Improvements

Partial Auto Layouts

Currently if the user wants to override the auto layout for just one binding point (let’s say a static sampler), you have to explicitly write out every BGL for the entry point. This is cumbersome and can be avoided by introducing partial auto layouts. Issue: #2637. That proposal only allows mixing explicit and auto BGLs but this partial auto layout proposed here would allow specifying the binding slots to override.

This will allow the user to use auto pipeline generation and only override certain binding slots. The overrides act as an input for the auto pipeline layout algorithm. If the auto pipeline layout is incorrect in some circumstances, partial auto layouts avoid writing every BGL by hand. To achieve partial auto layouts, pipeline creation would take an array of (group index, binding index, bind group layout entry) and replace the bind group layout entry at the specified slot. These layouts would still be non-shareable.

Shader Reflection

Instead of redundant typing of the same sampler description in the BGL and WGSL, we can leverage shader reflection. New method for retrieving the sampler description from the shader module can simplify the use of static samplers.

WGPUSamplerDescriptor wgpuGetStaticSamplerDescriptor(WGPUShaderModule module, uint32_t group, uint32_t binding); 

The method can be used to create a new sampler and fill in WGPUStaticSamplerBindingLayout.

Validation

Static sampler validation would happen when creating the BGL. The validation layer should check if the sampler descriptor defined in the BGL is equal to the one in the shader and report it. If this happens for Vulkan or D3D12 the implementation can continue to use the static samplers as defined in the BGL and for Metal, it can continue to use the static sampler in the shader and ignore the BGL definition. This is not valid usage, but a method of keeping the program running.

D3D12 Limit

Limit of 2032 samplers in D3D12 is abundant for every application. As previously mentioned, the maximum number of combinations achievable in WebGPU is 1728 without considering LOD and anisotropy. However, if the limit is reached, implementations can lose the WebGPU device because the application is probably allocating static samplers unintendedly.

@Sirtsu55 Sirtsu55 added the api WebGPU API label May 31, 2024
@alan-baker alan-baker added proposal wgsl WebGPU Shading Language Issues labels May 31, 2024
@magcius
Copy link

magcius commented Jun 1, 2024

Static samplers are somewhat seldomly used in game engines and games because it's common to have a configurable "MaxAnisotropy" parameter tied to a user preference, and recreating all pipelines is seen as expensive, compared to recreating all samplers instead.

Personally, I don't see much value in static samplers. It's not a major performance consideration or win, and I think it complicates a topic (samplers) which I don't think benefits from more functionality. Especially considering I expect to see bindless samplers for when bindless comes in in the future. If static samplers were supported natively by SPIR-V and HLSL, I would be more accepting of the proposal.

@Kangz
Copy link
Contributor

Kangz commented Jun 4, 2024

IMHO there can be some nice ergonomics wins when using static samplers and automatic layouts, though I agree it complicates the topic of samplers a bit.

@Sirtsu55
Copy link
Author

Sirtsu55 commented Jun 4, 2024

I agree, it's more of a ergonomic feature. It allows you to just define the sampler in WGSL and let WebGPU handle the rest. This simplifies texture sampling and offers some flexibility to the user. I'd say web projects would find this attractive since it's less code and functions the same.

@austinEng
Copy link
Contributor

We did receive some feedback from one game engine that they use immutable samplers in Vulkan for convenience as it is simpler to set up, and they don't have to bind descriptor sets. Yes, samplers which change min/max lod, anisotropy need to be bound in Vulkan descriptor sets. However, there are still many of the common samplers for bilinear/trilinear filtering which they use Vulkan immutable samplers for.

@Sirtsu55
Copy link
Author

Sirtsu55 commented Jun 21, 2024

We ran some benchmarks comparing static and normal samplers.

Metal

Deferred Rendering Sample from Apple was used to benchmark for Metal. The benchmark ran on M2 Macbook air.
Fragment shader profiling timings:

  • Static Samplers max duration varied between 1.67-2.3ms across multiple runs
  • Normal Sampler max duration varied between 1.78-2.37ms across multiple runs
    Additionally normal samplers need +1 API call per draw.

Vulkan & D3D12

The following benchmarks were conducted by sampling a 2048x2048 texture to a 4096x4096 render target.
The code is here: https://dawn-review.googlesource.com/c/dawn/+/194102
Timings were retrieved via timestamp queries.

Android Vulkan

Normal samplers: 2.47 ms
Static samplers: 2.45 ms

This is a negligible improvement.

Desktop Vulkan & D3D12

AMD (integrated) hardware on Vulkan saw an improvement of around 0.1 ms for static samplers, but negligible improvement on D3D12. Nvidia hardware did not show any measurable difference on both backends.

Binding time benchmark

This is a minor benchmark that we did to see the CPU performance when binding a large number of bind groups with normal vs static samplers. This was done by binding 216 different bind groups with the same texture but a different sampler.

On Android Vulkan we saw that binding static samplers bind groups are more expensive.

Normal samplers: 68338 ns
Static samplers: 111491 ns

These are measured in nanoseconds and this benchmark is unrealistic in real-world applications. It’s unrealistic to assume 216 different samplers across bind groups in an application.

We also observed a difference in desktop Vulkan of static samplers being 10 000 - 20 000 ns slower. However, D3D12 was consistent and there was negligible difference (~2000 ns). This could be due to its different binding model.

Conclusion

Static samplers do not have a significant advantage over normal samplers. They can be sometimes faster on some hardware, but it is unlikely to be a bottleneck in applications. This concludes that this proposal isn't a performance feature but rather an ergonomic one.

@magcius
Copy link

magcius commented Jun 26, 2024

On the specific proposal itself, it requires authors to put the sampler descriptor in both the WGSL text and the binding group layout. Ergonomically, this does not seem great to me. This is caused by the issue where the sampler descriptor is effectively embedded in the binding group layout on D3D12 and Vulkan (the root signature and descriptor set layout, respectively), while it's part of the shader pipeline on Metal.

re: Metal compilation, you say:

If the description is provided at runtime, the implementation has to modify the Metal source or Metal-IR in order to complete the sampler definition. However, we can not require implementations to modify Metal source or Metal-IR because precompiling shaders would not be possible and Metal-IR is not documented.

This is not true. Compilation to MSL code happens during createGraphicsPipeline, at which point the BGL is known. So it would absolutely be possible to put the static sampler in BGL. Apple wanted to opportunistically compile the shader to Metal in createShaderModule if the BGL was known; in this case, the GPUPipelineLayout is already passed as a hint, so this would include static samplers.

If we go down the route of including static samplers, I would encourage us to put them exclusively inside the BGL, and remove the WGSL syntax for specifying samplers. It would be nice to specify static samplers entirely in WGSL, but I can't see any way to make that work for Vulkan or D3D12 unless we give each PSO its own root signature / descriptor set layout, which doesn't feel worth it for such a small ergonomics feature.

@kainino0x
Copy link
Contributor

This is not true. Compilation to MSL code happens during createGraphicsPipeline, at which point the BGL is known. So it would absolutely be possible to put the static sampler in BGL. Apple wanted to opportunistically compile the shader to Metal in createShaderModule if the BGL was known; in this case, the GPUPipelineLayout is already passed as a hint, so this would include static samplers.

Fair point, I forgot that the hints were needed to compile metal shaders early.

If we go down the route of including static samplers, I would encourage us to put them exclusively inside the BGL, and remove the WGSL syntax for specifying samplers. It would be nice to specify static samplers entirely in WGSL, but I can't see any way to make that work for Vulkan or D3D12 unless we give each PSO its own root signature / descriptor set layout, which doesn't feel worth it for such a small ergonomics feature.

The big benefit of being able to define them in the shader is that "auto" layouts can be generated from the shader so you only have to describe the sampler once. But given the point above, that Metal actually does not require us to have the sampler definition at createShaderModule, we wouldn't have to describe it twice anyway so it's not so bad. Partial-auto layouts would still be very helpful for the opposite reason as before, though - then you could define the static sampler explicitly and have the rest of the bindings be auto.

(I would love if we also had WGSL syntax to be able to use "auto" with the few things that can currently only be done with explicit layouts, and even better if there were a way to write an entire explicit layout in WGSL such that it's not sensitive to static usage so we don't have to set [[exclusivePipeline]].)

@magcius
Copy link

magcius commented Jun 27, 2024

Oh, yeah! I suppose you could put them in the shader for auto layout exclusive stuff. Not sure if I think that's worth adding as a feature (it has some drawbacks, like if someone is using static samplers with auto layouts and wants to port away to explicit layouts later, they're in a world of hurt), but it's a possibility.

(I would love if we also had WGSL syntax to be able to use "auto" with the few things that can currently only be done with explicit layouts, and even better if there were a way to write an entire explicit layout in WGSL such that it's not sensitive to static usage so we don't have to set [[exclusivePipeline]].)

Admittedly I stopped using auto layouts early on, but I haven't really enjoyed the ergonomics of them. I have some ideas for how to improve things, but it's probably worth opening a bikeshed "V2" issue for feedback wrt. making the experience of WGSL and auto layouts better.

@kainino0x kainino0x added this to the Milestone 3+ milestone Jul 2, 2024
@kdashg
Copy link
Contributor

kdashg commented Jul 29, 2024

WGSL 2024-06-25 Minutes
  • (milestone?)
  • KG: Summary of the post: perf is not significantly faster. Converted to an ergonomics suggestion.
  • KG: For milestone, can wait to M3+. Native APIs do have something for this. The Metal design sounds useful.
  • AB: The Metal design feels nice. Work in Vulkan to make it work seems unappealing. They are experimenting in Dawn to gather information.
  • Move to M3+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api WebGPU API proposal wgsl WebGPU Shading Language Issues
Projects
None yet
Development

No branches or pull requests

7 participants