Minutes 2018 03 28

GPU Web 2018-03-28

Chair: Corentin

Scribe: Ken

Location: Google Hangout

Minutes from last meeting

TL;DR

Consensus around’s JG’s buffer mapping proposal
- https://github.com/gpuweb/gpuweb/pull/50/files
- There is an explicit “map” enqueued operation for example as a transition to MAP_READ.
- There is a WebGPUFence object that’s created from a queue and goes from unsignaled to signaled once when prior submits are done being executed.
- That the pointer can be queried from the buffer 1) is observed through fences 2) doesn’t happen until a fence observes it. Don’t want buffers to redundantly have their own fence.
- For multiprocess implementations, we need an explicit infaillible unmap.
- Need to figure pollable vs. promisable fences, and whether one can wait on a fences in the main thread of worker.
- For MapWrite, the mapped pointer would have to be zero-filled.

Tentative agenda

Buffer mapping (reach resolution ;) )
Agenda for next meeting

Attendance

Apple
- Dean Jackson
- Myles C. Maxfield
Google
- Corentin Wallez
- James Darpinian
- Kai Ninomiya
- Ken Russell
- Victor Miura
Microsoft
- Chas Boyd
- Rafael Cintron
Mozilla
- Dzmitry Malyshau
- Jeff Gilbert
- Markus Siglreithmaier

Status updates

Apple
- About CLA, Google lawyers wanted some minor changes.
- Myles started to come up with a Vukan case where explicit barriers would be a performance increase over implicit.
  - Corentin to talk with Jesse Hall about this today
Google
- Met with Epic Games at GDC; they didn’t hate the idea of WebGPU being a 4th API like they did ~6 months ago
Microsoft
- No updates
Mozilla
- Nothing major

Buffer mapping

JG: put up another PR. Met with Corentin and Kai yesterday to solidify things more.
- Not a huge update. Didn’t have time to write example code. Transitions don’t exist in the sketch API yet, so was impossible to add that.
- https://github.com/gpuweb/gpuweb/pull/50/files
- Asking for the buffer to be transitioned to a mappable form. That request for the transition is the real mapping call. It’s implicit.
- To readback a buffer: write into a command decoder a transition of the buffer to MAP_READ usage. Submit that into a queue. Then, insert a fence into that queue. That fence tells you when the transition is complete, and the buffer is ready to be mapped (actually, has been mapped).
- At the impl level: at first, looks like you can query the mapping until it comes back. It’s racy. What actually happens: you don’t change the mapping from null until the app has knowledge that the transition is complete.
- So: you can enqueue the transition, and then if you don’t do anything else, you can never actually map it. Instead, have to insert a fence, and do .then() on the fence to get a Promise back. When Promise comes back, you can read the mapping.
CW: how does unmapping work?
JG: when you enqueue a transition away from MAP_READ, you neuter the ArrayBuffer, and unmap the actual buffer. Transitioning away from mapping is the implicit unmap.
MM: when you enqueue an object it can muck with the state of other objects?
JG: yes
CW: that’s scary. What if the command buffer you are submitting fails submission for some reason or another?
JG: then you don’t remove this stuff.
CW: that’s an issue for people who are going to offload validation to another process.
JG: at some point, something has to fail, and something has to be synchronous.
CW: in Chrome’s renderer process we can say “I don’t want to touch this again” and then transition away from MAP_READ.
JG: if you haven’t unmapped it and try to enqueue a transition, and submission of the CB to the queue fails, it isn’t unmapped.
KN: problem is that we don’t know that the real unmap will succeed until later.
KN: can have infallible unmap.
CW: that could be done before the submission of the command buffer. If it doesn’t see the “infallible unmap” before, then it will reject. Instead of something that does it implicitly and might fail.
MM: was asking whether the enqueue would change the state. But now it sounds like the submit changes it.
CW: yes. JG, your proposal is that there’s a transition to something else than MAP_READ in the command buffer and when it’s submitted it transitions?
- This is an issue for apps submitting on other threads, or other processes.
- When you see submission of the command buffer,
JG: Can unmap regardless of whether the submit fails.
MM: It’s very confusing because then a failed operation has an effect on state (CW: agree)
JG: What does unmap do?
- CW: Neuters ArrayBuffer, can free regions of shmem
JG: Ok, see the problem.
RC: What happens if a buffer is used without unmap?
- CW: Any uses or transitions away from Map usage would fail. CPU (client-side) has ownership of the entire buffer.
JG: What if you neuter regardless of whether the submit works? I.e. if you try to submit a CB that unmaps a buffer, it gets unmapped even if the CB doesn’t succeed.
KN: What if the submission fails because you do a wrong transition of the buffer in the command buffer? You’d have to compute the net transition and do the computation of the whole validation on the client side.
JG: That would unmap on any transition of the buffer.
RC+JG: But what if any transition of the buffer unmaps it?
CW: Essentially it depends on how you handle submission failure; does it have no side effect or does it have only that unmap side effect?
KN: Can make it so transitions happen at device rather than CB level
CW: could make that work
DM: fallible submission - is that a valid use case, user submits something provoking a failure and then submitting something good? If we handle this in a portable way then we’re good. Doesn’t have to be completely clean, no-side-effect submission.
CW: as long as it’s portable between browsers, yes.
DM: is it a valid application which submits something, fails, and then keeps going?
JG: one idea is lose the device if you fail to submit something valid.
CW: we’ll have to talk about the error handling.
KN: sounds fragile.
JG: what do you get out of not being fragile?
KN: there are plenty of WebGL apps that throw errors and keep going. Not saying it’s good, but it’s the case.
JG: as long as the validation is the same seems fine to me.
CW: if submission errors cause device loss, seems plausible, but we need to discuss error handling separately soon. Impacts Chrome because most of the validation is in a different process.
CW: we would like an explicit unmap unless we don’t need an explicit unmap because validation error causes device loss.
JG: for the time being I can add an explicit unmap. Strange because implicit map, explicit unmap.
CW: other cool part about your ClientBuffer pull request is that it defines what a fence is.
JG: talked yesterday about form Fences should take. One most appealing: fences are one-shot FenceSyncs from OpenGL. Insert into queue; that’s what gives you a fence. That fence tells you if that point in the queue is reached. Gives API synchronization between CPU and GPU. Polls, timeouts, etc. Gives happens-after relationships and dependency graphs in queue submissions.
CW: more context: Vulkan-style where you create a (binary) fence. This model is the same as this, but you don’t have to create the fence separately. Tried hard to make it numerical fences but failed b/c of q’s like: can you enqueue the same fence on different queues? Though it was nice from an API standpoint was surprisingly subtle and complex.
MM: inside IDL, there’s a function called Wait(). If it does what I think it does it’s not acceptable.
JG: classic situation: can give it 0, which is a poll. More useful for more async things in web workers: just wait on it. OK to block that worker, though not the main thread. Promise “.then()” gives you the most Webby approach. Can’t implement one on top of the other, therefore are both.
MM: are you saying wait should work in a web worker?
JG: proposal here is that wait supports a timeout of 0. Later, decide where we support a non-zero timeout.
MM: so let’s get rid of the argument. What would wait() return then?
JG: would have to return a bool.
MM: for now, how about removing argument and having wait() return bool?
CW: there are application that want to keep their mainloop without relinguishing control to the browser, isn’t WebGPUFence::Wait useful in that case.
KR: This discussion was centered around swapchains, Not sure about Fences. On Mac we need to call glFinish sometimes to get GPU backpressure.
JG: for the time being, if I split this off and add explicit unmap(), might be palatable as a pull request.
RC: if we do have a polling thing, I worry about people polling it in a loop and then having it suddenly return true, and start using it for timing attacks. Think we need to make it so that it can only resolve on the next frame, or microtask boundary, etc.
MM: but if you want it to advance then you have to return to the browser. So there’s no benefit over a .then() block.
JG: not true. It’s a design choice whether you want .then() blocks.
CW: it’s the application’s choice.
KR: If can’t spinloop in any way, then poll isn’t needed because it doesn’t give you a latency improvement. Would still have to break up your app’s main loop to return control, defeating that use case.
KN: latency reduction. Spin loop won’t start returning true any earlier than your callback would be called. Would need a Promise somewhere in your WebAssembly shim so that WASM can poll something. Polyfilling the polling API using the Promise API. Maybe no latency issue on the main thread, but still a pain and induces GC pressure. May want more functionality on worker threads.
MM: that’s a discussion we should have, about never giving control back indefinitely on any thread, including on a worker. Can get into it now, or not.
CW: let’s do it on another meeting.
MM: if we’re going to have to relinquish control of the PC, and you can implement the polling interface on top of the then() block, then might as well as just have one interface.
KN: might want to take out the wait() function until we’ve had that other discussion.
CW: ok. Also it’s orthogonal to buffer mapping whether we have the polling API or not. Main design points: they’re enqueued on a queue, and go from unsignaled to signaled exactly once.
JG: but the Promise API is contingent on other APIs too.
KN: assuming we don’t allow spin-loops on the main thread, aside from a little extra garbage, think that Promises are strictly more powerful.
CW: they’re also webby.
JG: more awkward to deal with them from WASM.
KN: a polyfill is simple – just 2 or 3 lines. (to turn a promise into a polling API)
MM: this is something that a library can solve.™
CW: when does the app know that the buffer is done mapping? In light of this new proposal, what about if Buffer gives back a Fence that is signaled when the buffer’s ready to map?
JG: this sounds like something that a library can solve.™
CW: yes, but like the polling interface, may be something we want in the API.
JG: just implementing something you can already do
RC: this gives you a new instance of a brand new fence. How are these going to be memory managed? Browser responsible for keeping round-robin list of these? Tying Fence to the WebGPUBuffer ties their lifetimes together.
CW: the way we see this after implementing NXT: WebGPU impls will have to look at GPU’s forward progress only for their own implementation. Waiting for this serial to complete on that queue. It’s not a GPU object; just a tiny thing. Can create a ton of them. Only drawback is garbage collection. The object itself is 16 bytes.
RC: the way spec’ed here: supposed to make a new WebGPUFence instance every time. Could reuse Vulkan objects under the hood
CW: don’t need to. Already have real, native fences under the hood. Doesn’t need to be backed by a D3D12/Vulkan fence.
JG: don’t think inserting fences is or would be an automatic thing for Firefox (when submitting command buffers), anyway.
JG: there’s an alternative version of this proposal: resettable / reusable fences.
MM: the browser can expose a new object but it could hold one of a small pool of Vulkan fences.
JG: correct.
RC: was just asking about lifetime of WebGPU fences. Mainly asking whether .fence() would ever return multiple instances of WebGPUFence.
JG: disadvantage of fence-per-upload or fence-per-buffer: can’t (easily) coalesce them.
RC: if we had this on the WebGPU Buffer, we’d know “this is the last one, so use this one”.
CW: so everyone happy with WebGPUQueue.fence() being the only way to insert a fence?
RC: when will the mapping thing ever return an ArrayBuffer? Do you have to insert a fence?
JG: couple of ways to determine whether the buffer is mappable. (Ask Queue if empty, insert fence, etc.). App has to ask, so at those points we can – first time it’s ready for use – allow the app to get a non-null ArrayBuffer.
JG: well-formed app has to check a fence. If app assumes things happening in the background, that’s not OK.
RC: you mean doing .then() since we took away wait()?
MM: why not just .then() on the command buffer?
JG: can’t coalesce. Doing many uploads/downloads in the same frame. Making N promises.
KN: wait, inserting it on command buffer, not WebGPUBuffer?
CW: it’s basically Metal’s OnCompletedHandler.
JG: you’d then need a fence per command buffer or queue submission. Seems weird.
CW: to recap:
- Jeff’s proposal looks good
- Like queue submission to mappable state
- Chrome would like infallible explicit unmap
- WebGPUFence looks good
- Will talk about pollable vs. Promise afterward
- Don’t want Buffers to be able to give you a fence of when they’re ready to be mapped, because that’s redundant
CW: most of the time we’ve been talking about MAP_READ. Would like to talk about MAP_WRITE_DISCARD.
- Otherwise need BufferSubData and that’s an extra copy.
- MM: need both reading and writing to be one-way transfers.
CW: discussed buffer read-after-write rules.
MM: if it’s an actual map you have to zero-fill it?
- CW: think so.
- JG: only other option: make a write-only ArrayBufferView.
- KR: would strongly advise against this based on experience with Java.
- JG: disagree but won’t push back.
- MM: zero-filling could happen either on CPU or GPU. Want to be able to insert the compute / blit / etc. kernel to clear it somewhere.
- RC: in current proposal when you make a GPU buffer how do you know read vs. write?
  - JG: Different usages.
  - Part of the WebIDL that’s already there.

Agenda for next meeting

Error handling
Wait vs. poll vs. Promise, including in workers.

Minutes 2018 03 28

GPU Web 2018-03-28

Minutes from last meeting

TL;DR

Tentative agenda

Attendance

Status updates

Buffer mapping

Agenda for next meeting

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!