-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Static Samplers Proposal #4687
Comments
Static samplers are somewhat seldomly used in game engines and games because it's common to have a configurable "MaxAnisotropy" parameter tied to a user preference, and recreating all pipelines is seen as expensive, compared to recreating all samplers instead. Personally, I don't see much value in static samplers. It's not a major performance consideration or win, and I think it complicates a topic (samplers) which I don't think benefits from more functionality. Especially considering I expect to see bindless samplers for when bindless comes in in the future. If static samplers were supported natively by SPIR-V and HLSL, I would be more accepting of the proposal. |
IMHO there can be some nice ergonomics wins when using static samplers and automatic layouts, though I agree it complicates the topic of samplers a bit. |
I agree, it's more of a ergonomic feature. It allows you to just define the sampler in WGSL and let WebGPU handle the rest. This simplifies texture sampling and offers some flexibility to the user. I'd say web projects would find this attractive since it's less code and functions the same. |
We did receive some feedback from one game engine that they use immutable samplers in Vulkan for convenience as it is simpler to set up, and they don't have to bind descriptor sets. Yes, samplers which change min/max lod, anisotropy need to be bound in Vulkan descriptor sets. However, there are still many of the common samplers for bilinear/trilinear filtering which they use Vulkan immutable samplers for. |
We ran some benchmarks comparing static and normal samplers. MetalDeferred Rendering Sample from Apple was used to benchmark for Metal. The benchmark ran on M2 Macbook air.
Vulkan & D3D12The following benchmarks were conducted by sampling a 2048x2048 texture to a 4096x4096 render target. Android VulkanNormal samplers: 2.47 ms This is a negligible improvement. Desktop Vulkan & D3D12AMD (integrated) hardware on Vulkan saw an improvement of around 0.1 ms for static samplers, but negligible improvement on D3D12. Nvidia hardware did not show any measurable difference on both backends. Binding time benchmarkThis is a minor benchmark that we did to see the CPU performance when binding a large number of bind groups with normal vs static samplers. This was done by binding 216 different bind groups with the same texture but a different sampler. On Android Vulkan we saw that binding static samplers bind groups are more expensive. Normal samplers: 68338 ns These are measured in nanoseconds and this benchmark is unrealistic in real-world applications. It’s unrealistic to assume 216 different samplers across bind groups in an application. We also observed a difference in desktop Vulkan of static samplers being 10 000 - 20 000 ns slower. However, D3D12 was consistent and there was negligible difference (~2000 ns). This could be due to its different binding model. ConclusionStatic samplers do not have a significant advantage over normal samplers. They can be sometimes faster on some hardware, but it is unlikely to be a bottleneck in applications. This concludes that this proposal isn't a performance feature but rather an ergonomic one. |
On the specific proposal itself, it requires authors to put the sampler descriptor in both the WGSL text and the binding group layout. Ergonomically, this does not seem great to me. This is caused by the issue where the sampler descriptor is effectively embedded in the binding group layout on D3D12 and Vulkan (the root signature and descriptor set layout, respectively), while it's part of the shader pipeline on Metal. re: Metal compilation, you say:
This is not true. Compilation to MSL code happens during If we go down the route of including static samplers, I would encourage us to put them exclusively inside the BGL, and remove the WGSL syntax for specifying samplers. It would be nice to specify static samplers entirely in WGSL, but I can't see any way to make that work for Vulkan or D3D12 unless we give each PSO its own root signature / descriptor set layout, which doesn't feel worth it for such a small ergonomics feature. |
Fair point, I forgot that the
The big benefit of being able to define them in the shader is that (I would love if we also had WGSL syntax to be able to use |
Oh, yeah! I suppose you could put them in the shader for auto layout exclusive stuff. Not sure if I think that's worth adding as a feature (it has some drawbacks, like if someone is using static samplers with auto layouts and wants to port away to explicit layouts later, they're in a world of hurt), but it's a possibility.
Admittedly I stopped using auto layouts early on, but I haven't really enjoyed the ergonomics of them. I have some ideas for how to improve things, but it's probably worth opening a bikeshed "V2" issue for feedback wrt. making the experience of WGSL and auto layouts better. |
WGSL 2024-06-25 Minutes
|
Overview
Static samplers are immutable sampler objects that can not be modified after they are defined. They exist in all three backends: Vulkan, D3D12, and Metal. They are called immutable samplers in Vulkan, static samplers in D3D12, and for Metal they are constexpr samplers in Metal Shading Language (MSL). This proposal for static samplers in WebGPU would allow for samplers to be defined in WGSL, thus simplifying the workflow since sampler objects are rarely changed after initializing them once. The proposal also allows defining them through the WebGPU API in bind group layouts (BGL).
Vulkan
Vulkan allows defining immutable samplers in
VkDescriptorSetLayoutBinding
with a binding. It asks forVkSampler
s inVkDescriptorSetLayoutBinding::pImmutableSamplers
. Immutable samplers are tied toVkDescriptorSetLayout
and can be used across with different pipeline layouts. This means that WebGPU needs the binding slot and the sampler at BGL creation.If static samplers are defined in WGSL, the implementation can automatically generate the BGL using shader reflection. If they are defined in the WebGPU API, a sampler object has to be passed. The WebGPU API is discussed later.
Direct3D 12
Static samplers in Direct3D 12 are similar to Vulkan. Static samplers are defined when creating the root signature:
D3D12_ROOT_SIGNATURE_DESC::pStaticSamplers
. Similar to Vulkan, they also need a binding slot for the sampler and can be used across pipelines with the same root signature. D3D12 requires a sampler description rather than a sampler object.Unfortunately, D3D12 defines a hard upper limit on static samplers in the specification: “The total number of unique static samplers that can be declared across all root signatures live on a D3D device at a time is limited to 2032.” Note the usage of “unique static samplers”. This implies that if there are duplicate static samplers then the drivers should deduplicate it. There are a total of 1728 unique combinations of samplers that can be created in WebGPU, disregarding LOD and anisotropy fields. The upper limit can be hit, but is unlikely.
Metal
Static samplers in metal are defined in MSL as a constexpr sampler.
Metal has the most straightforward static sampler implementation, which does not require any API calls. A WGSL static sampler can be translated into the Metal equivalent when parsing WGSL.
This simplicity poses a limitation for the static samplers when defining them through the WebGPU API. Consider a situation where a sampler is only declared in WGSL but the sampler description is defined via the API.
If the description is provided at runtime, the implementation has to modify the Metal source or Metal-IR in order to complete the sampler definition. However, we can not require implementations to modify Metal source or Metal-IR because precompiling shaders would not be possible and Metal-IR is not documented. As a consequence, we must require the full sampler description in WGSL and in the API.
WGSL Syntax
This is an open topic and feedback is appreciated. There are many options for the syntax, but here is the most sensible.
There could be a
sampler
struct that corresponds toGPUSamplerDescriptor
with the same defaults:However, this design needs struct literal syntax which does not exist in WebGPU.
WebGPU API
The static samplers are bound to the BGL, so they need to be defined in the BGL before using them in pipeline layouts. There needs to be a way to declare them in the BGL. There has been discussion on this topic: webgpu-native/webgpu-headers#284.
A new structure is introduced that will be chained to
WGPUBindGroupLayoutEntry
:This structure expects a sampler object with the same description as the static sampler in WGSL shader. The sampler object can be converted into a D3D12 static sampler description and a
VkSampler
object in Vulkan.Auto Layouts
The user can request to create a pipeline with an auto pipeline layout and this would take care of the static sampler definitions in the BGL for the user. This approach allow the user to only specify the static samplers in WGSL and forget about them in the API.
Improvements
Partial Auto Layouts
Currently if the user wants to override the auto layout for just one binding point (let’s say a static sampler), you have to explicitly write out every BGL for the entry point. This is cumbersome and can be avoided by introducing partial auto layouts. Issue: #2637. That proposal only allows mixing explicit and auto BGLs but this partial auto layout proposed here would allow specifying the binding slots to override.
This will allow the user to use auto pipeline generation and only override certain binding slots. The overrides act as an input for the auto pipeline layout algorithm. If the auto pipeline layout is incorrect in some circumstances, partial auto layouts avoid writing every BGL by hand. To achieve partial auto layouts, pipeline creation would take an array of (group index, binding index, bind group layout entry) and replace the bind group layout entry at the specified slot. These layouts would still be non-shareable.
Shader Reflection
Instead of redundant typing of the same sampler description in the BGL and WGSL, we can leverage shader reflection. New method for retrieving the sampler description from the shader module can simplify the use of static samplers.
The method can be used to create a new sampler and fill in
WGPUStaticSamplerBindingLayout
.Validation
Static sampler validation would happen when creating the BGL. The validation layer should check if the sampler descriptor defined in the BGL is equal to the one in the shader and report it. If this happens for Vulkan or D3D12 the implementation can continue to use the static samplers as defined in the BGL and for Metal, it can continue to use the static sampler in the shader and ignore the BGL definition. This is not valid usage, but a method of keeping the program running.
D3D12 Limit
Limit of 2032 samplers in D3D12 is abundant for every application. As previously mentioned, the maximum number of combinations achievable in WebGPU is 1728 without considering LOD and anisotropy. However, if the limit is reached, implementations can lose the WebGPU device because the application is probably allocating static samplers unintendedly.
The text was updated successfully, but these errors were encountered: