WGSL 2020 09 22

Dan (Dean is out)	:	Chair
Dan Sinclair and everyone	:	⌨️ Scribe
Google Meet	:	Location
https://webgpu.dev/wgsl	:	Specification
WGSL Issues	:	Open Issues
Marked Issues	:	Meeting Issues

Tentative Agenda

Question: Should we ask to take over next week's WebGPU meeting with WGSL topics?
Announcement: There are lots of Open PRs, which is excellent. Thanks to everyone.
Are entryPoints namespaces per module or per stage per module? (#1023)
Disallow fragment atomic ops on texture storages (#728)
Invariant qualifier (#893)
WGSL should use thread instead of invocation (#1031)
read-only image storage type declarations: examples inconsistent with grammar (#1056)
Method of ensuring GPUShaderModules can contain MTLLibraries (#1064)

📋 Attendance

WIP, the list of all the people invited to the meeting. In bold, the people that have been seen in the meeting:

Apple
- Dean Jackson
- Fil Pizlo
- Myles C. Maxfield
- Robin Morisset
Google
- Dan Sinclair
- David Neto
- Kai Ninomiya
- Ryan Harrison
- Sarah Mashayekhi
Intel
- Yunchao He
- Narifumi Iwamoto
Microsoft
- Damyan Pepper
- Greg Roth
- Michael Dougherty
- Rafael Cintron
- Tex Riddell
Mozilla
- Dzmitry Malyshau
- Jeff Gilbert
Kings Distributed Systems
- Daniel Desjardins
- Dominic Cerisano
- Hamada Gasmallah
- Wes Garland
Joshua Groves
Kris & Paul Leathers
Lukasz Pasek
Matijs Toonen
Mehmet Oguz Derin
Pelle Johnsen
Timo de Kort
Tyler Larson

⚖️ Discussion

Question: Should we ask to take over next week's WebGPU meeting with WGSL topics?

MM: Consensus in WebGPU meeting was they have enough topics for next meeting.

Announcement: There are lots of Open PRs, which is excellent. Thanks to everyone.

MM: WebGPU does periodic PR burndown which we should consider. A forcing function to get reviewed.
DS: Will bring up with Dean to add to the schedule

Are entryPoints namespaces per module or per stage per module? (#1023)

DS: Q: Do entry point names have to be unique in the entire file? Or just unique per stage?
DN: I’m in favour, makes the HLSL compilation simpler. SPIR-V should reject duplicate names
DM: I think that’s harsh. The reason to disallow them names overlapping is ambiguity in calls. Entry points can’t be called, so it doesn’t apply.
MM: Other languages exist with duplicate names and no ambiguity
DN: Entry points are interface. Different APIs will be calling them. Mangling the name means you have a process to re-lineup with the API side. That’s where the concern comes from. Adds complexity I don’t think we need.
MM: Mangling isn’t necessary for webgpu. Only webgpu function hat accepts name also accepts the type.
DN: True.
DM: Can be fully internal to the D3D backend.
DN: Simpler for me to accept more complex thing.
MM: So, names can conflict for entry points. Names can conflict in the stdlib.
DN: Those are separate.
DS: Entry point names can’t conflict with non-entry points, fair?
MM: Think that’s a good rule.
DN: Sure.
DS: Keeps it consistent with the fact that
DN: Is it a totally separate namespace from everything else? (Global variables, builtins)
MM: Whatever the rule is for functions, can’t we have it for entry points?
…
MM: Conflict with input variables?
DN: Entry point names can conflict with any other kind of name, but an entry point name conflicts with another entry point name only if the stages are the same.MM:
KN: Don’t like that we half-separate the namespaces.
DM: Why are we trying so hard to disallow conflicts with non-entrypoints? Entry point names are totally invisible to the rest of the shader. Should have no naming conflicts except with each other.
MM: So you have 4 namespaces: vert shader, frag shader, compute shader, and one bucket for all the rest of the names.
…
DN: Think this should be part of WGSL spec. A standalone tool handling WGSL can see this.
GR: Is the amount of work to describe worth the benefit?
MM: Benefit isn’t in the options, just having a rule.
GR: Simplest rules is 1 namespace, entry point names unique.
KN: I prefer either one namespace or four separate namespaces.
DS: More restricted means we can relax it later, which is usually my preference.
MM: Fine with us, no preference.
DN: Post MVP?
DN: Sure
DM: We’ll probably allow some collisions but no harm in spec’ing as more restrictive.
Resolution: Entry point names must be unique.

Disallow fragment atomic ops on texture storages (#728)

MM: Metal doesn’t have atomic operations on textures.
DS: Is this commonly used? Can we push this to post-MVP?
MM: Metal has atomic ops on a buffer, just not texture
DN: Don’t want to force emulation, so better to drop the feature
KN: Issue says fragment shader, do atomics not exist in vertex shader? Or does Metal allow atomics on textures in vertex shader?
MM: There is no method for those clases for atomic ops for textures. Can’t pull out a variable that references a texel.
DM: So, MM is saying there is no atomics on textures at all?
MM: Yes. Textures are opaque. Only way to interact is methods. None of the methods do atomics.
Resolution: Follow this restriction for now.

Invariant qualifier (#893)

MM: Last time, if memory serves, came to the consensus probably good feature but in core or ext, which devices, which apis, etc. Last time, leaning towards core, but evidence came up saying not possible.
KN: Yea, requires macOS 10.15, iOS 13.0.

Extensions

MM: So, has to be an extension which is fine. Probably the first WGSL extension, so may need a discussion on how to do that. HOw to discover, enable, what’s the mechanism, etc.
KN: Is there value in discussion without a proposal?
MM: Can make a verbal proposal: We have an enum in the API (JS enum) when compiling shader, createShaderModule takes new arg which is set of enum values.
DS: and those are the requested extensions?
MM: Yes
DS: Is the extension implicit or explicit in the shader
MM: Implicit. If the machine doesnt’ have the ext then it fails
KN: Why createShaderModule and not GPUDeviceDescriptor?
MM: Does it have to be?
KN: No, but already have ext mechanism there.
MM: Restating, Myles proposed 2 sets of extensions, one for API one for shading language. You’re saying, why not just one set of extensions?
KN: Essentially
MM: That works, could do that.
KN: The question for that is are we comfortable with saying extensions are 1) implicitly enabled and 2) always enabled for the device. Could put require extension at top of shader which requires it to be enabled.
MM: Even doing that,may not be good. Doesn’t actually solve the problem. Shader with 1 ext and shader with 2nd ext requires 2 different shader modules. Probably not fine grain enough if they want to batch up their source. Not sure if folks want to do that. In Metal don’t really have this concept. Different Metal versions have different features which aren’t turned on, just exist.
RM: One possible issue, with teh idea behind implicit extension is developers may not realize they’re using an extension so could write what they think is portable WGSL to run on all machine, but only runs on machines with EXT enabled. Making it explicit makes the programmer aware how portable/non-portable the code theyr’e writing is.
DC: What would the error look like if the ext isnt’ available? Would it report the extension or set of extensions?
KN: Would be implementation specific. I think we can create good error messages.
MM: Have opinions on listing the set of extensions for finger printing. Wouldn’t want to just list hte set, would be more request/response if the developer asks and we let them know if they exist.
KN: Could do to the dev console without going to the application
MM: Error messages are not good at being parsed by the app. For extensions dev doesn’t know the ext doesn’t exist. For fallback they have to parse the error message.
KN: Error messages are not to expose availability.
DM: The point Robin made about portability explicitness is a strong point. If we do this would probably want on an entry point decoration because as discussed having a giant module with many entry points, having ext and want to use it, based on the existing code base add an entry point with just that ext instead of creating a new module and copying code in there.
KN: Can imagine that, or would be you change the implementation form helper functions. Still need pre-processing to select the right entry point
MM: There are SPIR-V extensions that are required which are turned on for Vulkan extensions at device time, can we do that?
KN: Can turn on all extensions on the device and validate at module creation time if they’re available, but think there should be something at device creation.
DS: Sounds like this needs more in-depth proposal and discussion. Good candidate for WGSL/API meeting slot.

WGSL should use `thread` instead of `invocation` (#1031)

RM: Will bikeshed. It probably doesn’t matter much which we use for people familiar with gpus. For people starting with gpus then it’s clearer they should not have assumptions based on previous use of threads.
MD: Might be true but original reason is host spec uses thread and made issue to be consistent, I am fine with group’s decision. For clarity hardware ISAs use thread to point out these constructs so shoudln’t be too ambiguous.
DN: Amplifying what Robin said. Thread on CPU side then invocation on GPU side. Different word makes redo your assumptions
KN: Not about CPU theads, it’s about what we named invocations in the WebGPU side. The word thread should only be in reference to GPU thread.
DN: Should use the same word, whatever it is.
DS: Is there a reason for thread vs invocation on the api side
KN: Don’t think so
DS: Can we rename that one?
KN: We could. Not sure if we use anywhere in the API, just in the docs. We say on dispatch …. Worlds thread words.
MM: Two thoughts, if going with invocation this is not in the lexicon of developers so needs to be defined. Other thought if calling invocations are we calling thread groups thread groups or invocation_groups?
DN: So, last week landed patch talking about compute shader with workgroup_grid and workgroup. Should use consistent terminology like workgroup set of invocations. Thread group appears no-where and plan to not write thread_group.
KN: APIs mention thread groups so should rename.
MM: If we get SIMD groups or subgroups we’d be using one of those words.
DN: Yes, subgroup.
MM: Don’t necessarily agree, but can discuss later.
DM: So, consensus is to define invocation but call them work items in the spec? Workgroup and related terms?
DN: Workgroup, invocation, derivative group, eventually ubgroup. Want to stay away from thread.
DC: What about instance or task.
MM: If worried about developer notion of existing term of art and applying to GPU, Task is already an existing term of art and sufficiently different meaning
GR: Agree
KN: Also has web meanings
DC: What about instances
MM: Instancing for draw calls
DS: Consensus is to use invocation, we’ll update the API side docs to Workgroup.

read-only image storage type declarations: examples inconsistent with grammar (#1056)

DN: Lets take offline. It’s old and maybe superseded.
DM: We have the problem that for storage textures the access is part of the type, ro and wo types. On SPIR-V it isnt’ part of the type, decoration on the global variable. On WGSL buffer side not sure what we’re going to have. If it’s part of the buffer type or buffer variable. 2 problems. Should beconsistent between textures and buffers and when doing the correspondence to spirv how we link the types and global variables
MM: Why is the 2nd a problem
DM: Just so the spec defines the spirv op for everything. For the ro texture types don’t have a single spirv opcode.
MM: That’s fine
DM: Sure
MM: Trying to make point defining the spec in terms of spirv and making every construct int he spec equal to a single spirv construct
DN : Which it isn’t and is fine. One of the things that crept in is ReadOnly and WriteONly on the image type which is Kernel not Shader. The Readonly write only is an attribute on the variable. Question is how/where to explain. Can solve with english text somewhere.
MM: In metal have different readony and read write textures but not for readonly and read write buffers use c++ const keyword. A chance in metal we emit we never use const as we’re doing that validation on the WGSL src. So, the two might collapse to the same MSL type. Wonder if that’s true in other languages.
DN: I think it’s not true in HLSL there is a ByteAddressableBuffer. In vulkan SPIRV the read only buffer is an attribute on the variable decoration. It’s a bit all over the place. Have not been troubled by teh WGSL choices so far. There is an outstanding bug for the readonly buffer case which needs a proposal to be reviewed.
MM: Sounds like, if any api offers this distinction in the type and we want to use those facilities we have to have it in the type too so it can transform. If there is any of these things athat don’t correspond to a backend type we canmake it an attribute. Can only decay in one direction.
DN: Can decay in the other direction if you want to do reconstructive analysis. There is a complexity cost, but there is a bonafide choice to make
DM: They are different types in HLSL.
MM: In HLSL can you turn an RW buffer into a regular buffer. Is there an op to do that? I think it’s no?
DN: I think it’s no.
MM: If no way to convert then have to have two distinct types.
GR: No, I don’t think it can be done in HLSL
MM: Answers the question and we need 2 distinct types.
DS: Can you summarize for me.
DM: Can deal with spec text and need proposal for ro buffers making it part of the type
DN: Issue #935 for this already.

Method of ensuring GPUShaderModules can contain MTLLibraries (#1064)

MM: Talked about this once in the WebGPU group, it is relevant to the shading as one proposals is a change to WGSL. For background right now if there are no WGSL changes and no WebGPU changes it’s impossible to generate all WGSL for MSL. In order to generate MSL need to know the pipeline layout and some other information. That info is only specified at pipeline state creation time and only specified for 1 entry point. So, today there is no way to turn a whole WGSL shader module into MSL for every line inside the WGSL shader module. This is an issue for MSL inparticular because running the frontend of the compiler is expensive. In particular there is a startup cost. Waking up the frontend many ties and compile a bunch of small programs is worse then waking it up once and compiling the concatenation. The desire is to make some change such that we can generate MSL that represents all of the lines of a WGSL module. So we can run that through the frontend of the compiler once for the entire module. GIving a significant perf improvement
KN: One thing to separate out, goal is to reduce the number of times that the Metal compiler is invoked. If taking all pipeline creation info and require at shader module creation then you’ve removed difference between shader module and pipeline creation. So, would collapse to 1 stage instead of 2 stages. So, doesn’t make sense to move all of it, the only way this can be helpful and do something meaningful is to move only some of the state/info to shader module creation. Is that right?
MM: Yes. Haven’t enumerated everything necessary. One thing necessary is the pipeline layout and maybe a few other things. You’e right it isnt’ everything.
KN: Pipeline layout is per entry point so you need it for every entry point
MM: THat is already present in all metal programs, it’s in the src in the form of structures. Two methods to include this info, no preference on which one. 1- Add extra argos to shaderModulecreation out of band or 2- Add more WGSL expressivness such that the author provides the information inside the WGSL shader. The 2nd is what MSL does, don’t have to do that for WGSL.
DM: Would that change be backwards compatible? Seems like a tough call to maek at this point and not strictly necessary. Can we followup to add those argos?
MM: When you say backwards compat make this optional?
DM: After MVP add argument in the descriptor that can be used
MM: Dont’ think that makes sense. The reason is because of the charts in the issue. The potential perf improvement. In our view, the performance slow down, is so big that we would like all authors to be providing.
KN: So, don’t think we should continue this too much longer as CW isnt’ here. Part of his arg that this doesn’t work is that we don’t know what information we’re going to have to cary fromthe pipeline layout or pipeline creation into the metal shading language code gen. We know now that we have to take the sample mask into consideration, this is true because MSL doesn’t have a runtime feature has to be done in teh shader. Don’t know what we’ll need in the long run to work around driver and compiler issues. Don’t know what information is needed. IN the long run likely will end up having no app info in shader module creation and will need to change the impl to do shader module creation later at pipeline creation time.
MM: Disagree with much of that. A world in which most Metal compilation happens during shaderModuleCreation but some has to be delayed is still a success. Trying to enable authors to get on the fast path to perform the best practices for most/all of their src code. If there is some case that doesn’t work that’s fine.
KN: Another issue is that the idea here is that providing more info at shaderModuleCreation means we can generate the MSL earlier. That appears not to be the point, the point isn’t to generate it earlier, but to generate it fewer times. Those are different things. The more info we require at shaderMoudleCreation means the more calls to creatShaderModule. At somep ont defeats the purpose of thaving createShaderModule. Have to be clear that having more done isn’t to do it earlier, but to do it less and authors want to control how many times each is done. There doesn’t have to be 1 msl compilation per shader module or pipeline, can have aninternal mapping in the browser from shader module + state -> MSL library. It doesn’t matter that the cache lookup happens later during pipeline creation. What matteris is how many times the creation happens. Developers should expect pipeline creation to be expensive. This is true on tother systems. TO create pipelines quickly (/ used quickly) they should creat them early. So, moving state from pipeline creation to shader module creation just means at the beginning of time they have to call createShaderModule more times then before. They’re issuing more work to the implementation when the implementation could have filgured out what more work needed to be done.
MM: Unrealistic to expect applications creat PSOs ahead of time.
KN: That’s true, however, regardless of how early they create they have to expect that they’re potentially slow. In the common cached case it won’t be slow if they change state in a pipeline that doesn’t effect the cahe key. Can pre-load the cache on shader module creation to avoid the pipeline missing the cache too often. Preloading that info probably works on a limited set of situations but also probably helps with common use cases.
MM: The cache is a good optimization but I don’t think it solves the problem. If the developer has a shader module with 100s of entry points, some share code and some don’t and they want to use roughly all of the entrypoints throughout the program. Today that means lots of small metal libraries. Cache would help if they want to use the same entry points with the same PSO twice. The same inputs, cache would hepl. Wouldn’t help for having 100 entry points and want to use all of them. That’s the case we want to speed up. From charts, most of cost is in metal library creation.
KN: Sounds like the problem is MSL compiler is slow. If that’s the issue, it isn’t our issue.
MM: It is our issue. We are working on top of three apis and have to operate on their constraints.
KN: I agree, but ...
GR: Pipeline creation is expensive. With latest graphics apis we allow the user to amortize that cost. How they do that they can. Whatever systems their using for their targets should help them.
DM: This also exists on DX12 backend as well.
GR: Don’t know which suffers more, but that’s what I’m getting at.
KN: One of the problems here is that the Metal compiler is just slow. And that isn’t necessarily our problem.
MM: It is absolutely our problem! We need WebGPU to be fast for all of our users. We are building on top of the 3 backend APIs.
MM: Let’s discuss during joint call.

WGSL 2020 09 22

Tentative Agenda

📋 Attendance

⚖️ Discussion

Question: Should we ask to take over next week's WebGPU meeting with WGSL topics?

Announcement: There are lots of Open PRs, which is excellent. Thanks to everyone.

Are entryPoints namespaces per module or per stage per module? (#1023)

Disallow fragment atomic ops on texture storages (#728)

Invariant qualifier (#893)

Extensions

WGSL should use thread instead of invocation (#1031)

read-only image storage type declarations: examples inconsistent with grammar (#1056)

Method of ensuring GPUShaderModules can contain MTLLibraries (#1064)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WGSL should use `thread` instead of `invocation` (#1031)