-
Notifications
You must be signed in to change notification settings - Fork 359
Minutes 2018 04 04
Corentin Wallez edited this page Jan 4, 2022
·
1 revision
Chair: Corentin
Scribe: Ken
Location: Google Hangout
- Multiprocess implementations would like to have asynchronous validation.
- At least for pipeline and command buffer creation and queue submission.
- Discussion of using Promises for deferred object creation validation. Pro: JS-ness, Con: introduces latency.
- Discussion of a GL-like model where commands are pipelined and errors are retrieved asynchronously. Makes the code less complex assuming every command is correct.
- Concern about using promises as it doesn’t mesh well with WASM
- Would still require a “synchronous validation” mode for development, for example in devtools.
- Desire to allow application to handle OOM on resource creation.
- AI: Google to make a proposal in IDL form.
- Error Handling
- Pollable vs. promisable fences
- Agenda for next meeting
- Apple
- Myles C. Maxfield
- Theresa O'Connor
- Google
- Corentin Wallez
- Kai Ninomiya
- Ken Russell
- Microsoft
- Chas Boyd
- Rafael Cintron
- Mozilla
- Dzmitry Malyshau
- Jeff Gilbert
- Markus Siglreithmaier
- Elviss Strazdiņš** **
- CW: JG how should we go about the changes to the IDL
- JG: Will do a some fix
- CW: Then the change about the shape of the API.
- KN: we talked about how error handling is going to work in this API. Bunch of constraints.
- Pipeline validation needs to be asynchronous because doing pipeline construction can take a long time.
- Command buffer and submit validation need to be async because they take a while
- Ideally, buffer/texture allocations should be fallible. Because we can always make a smaller version of the resource.
- DM: nobody is going to create pipelines on the fly. So is it really a requirement that they be async?
- MM: anybody creating shaders at runtime will run into this.
- CW: Don’t want to stall the main thread if JS is going ahead and creating 100 pipeline in without yielding.
- KN: More constraints:
- Anything CPU-side that doesn’t have a GPU-side object can be validated immediately.
- Context loss should always be handled asynchronously.
- Would like the option of synchronous errors for debug/development. This has been a problem in WebGL.
- KR: this can be done with a library.
- KN: OK, but there has to be some way for it to happen.
- CW: from Chrome’s standpoint, the more asynchronous things are, the better. But we understand that maybe that might not be that practical for everyone.
- CW: the solution we came up with in NXT, we put everything inside the Maybe monad. Everything you get back from the API is either a real object, or an Error object. These objects wrap the errors. Everything’s completely asynchronous.
- CW: the things we think are important to be async:
- Pipeline validation.
- Command buffer validation. This and the previous one are expensive validations.
- Submitting command buffers to a queue.
- CW: would like most of the device-level, non-buffer validations, to have async validation if possible.
- Example: in NXT, we chose to have transitions at the device level. Ideally would be validated GPU-process-side. Don’t have an example right now of something that’s in the WebGPU proposal.
- MM: similarly, could you give an example of what could go wrong in cmd buffer validation?
- CW: for example, in a command buffer, doing a draw without a pipeline bound. Validating it requires going through every command. Would rather not do it twice, once in the renderer (submitting) process and one in the GPU (consuming) process.
- RC: why do you have to do it twice?
- CW: because the GPU process can’t trust the renderer process.
- MM: is the discussion you’re hoping to get, which parts of error handling should be sync and which should be async?
- CW: more, how errors work in the API. How are errors returned and propagated. Is there context loss. One of hardest questions is synchrony vs. asynchrony.
- JG: this is all comparing to existing APIs like D3D12 and Vulkan. In D3D12, most operations return error codes. Want to avoid this as much as possible.
- RC: This is true for object creation in D3D but command like draw etc. return void. The goal is to keep things simple for people and let validation layer catch issues.
- MM: programmer wants to create a buffer. Call CreateBuffer. Could fail. Return error code, or return an object you can query later, or like WebGL it doesn’t return anything and you can query the state later.
- KR: In Chrome actually when you create a buffer that causes an OOM in the GPU process, Blink still returns a WebGLBuffer and the application has to check for OOM by itself.
- JG: that’s because there’s a distinction between handles and the objects backing them. It’s the pattern in the OpenGL API where everything returns void and you have to query for errors asynchronously
- MM: seems like a bad design.
- JG: unergonomic, but it lets you do things asynchronously. Coalesces things into a single error flag. Surface warnings to JS.
- CW: think the OpenGL error model is not precise enough. Would be great if errors can be attached to objects, for example out-of-memory.
- RC: if we want to have the correct, HTML spec conformant way of doing error handling, then everything should be wrapped in Promises. If there are things that take a long time, wrap it in a Promise.
- CW: that would work well for JavaScript APIs that don’t try to do many things in a single frame. But with a model like this, you can’t create a Buffer, BufferView, then use it in a command buffer. This breaks a lot of assumptions in apps that do graphics. For parts with async errors, need to have a way for apps to pipeline commands so it doesn’t have to wait for end of microtask to do commands.
- JD: maybe WebGPU commands could take Promises, so you don’t have to wait for the Promise resolution?
- KN: if you go down that route, I see value in exposing things as Promises rather than the Maybe monad from NXT.
- DM: can you elaborate on the microtask requirement? To me, it’s fine if there’s a microtask between buffer creation and command buffer submission.
- MM: think we should make a distinction between things which can fail multiple times per frame, vs. things you do once.
- CW: disagree that creating a buffer isn’t something that happens multiple times per frame. It might. For example, if you’re Google Maps and you spend your time uploading vertices. Streaming data down from the server. If you find you need more space you will create more resources right now. Might not be most efficient, but 100% sure that apps will do it.
- MM: Is that bad if the buffer arrives one frame later?
- CW / MM: discussion (Kai explains it better).
- KN: you’re “un-pipelining” something. If it fails, it fails. If it doesn’t fail, the upload is already in flight before you find out whether the buffer has been created. If you require the buffer to be created before you can upload anything at all, you’re introducing huge latency for when the data can get onto the GPU.
- CW: that’s why we were thinking either the Maybe monad or Promise pipelining would be a good direction. The reason Kai discussed is the reason we think WebGPU should have some Promise pipelining, Maybe monads, etc.
- MM: a little scary because it sounds like we’re describing a task graph to run in the GPU process, which isn’t idiomatic JS. To decide, think we’ll have to look at specific cases where one command would be pipelined with execution of another one.
- CW: OpenGL is already doing Promise pipelining. It’s at the core of OpenGL 1.0. You query errors, and it incurs a flush when you do it. It’s not new to the web platform, though we can say that OpenGL isn’t idiomatic w.r.t the web platform.
- MM: Would be great if we could do things better. Would like to have examples of what operations would require this.
- JG: would be great if we could get an example of wanting to put data on the GPU.
- KN: from a high level: everything is pipelined. If you don’t synchronously wait on anything, you’re just pushing commands to the GPU process. Every command might depend on earlier ones. Disagree that OpenGL/Maybe monad is non-ergonomic. Instead of just doing tons of calls and later checking if everything was OK, if it’s synchronous you have to have tons of “.then()”s. That makes the code much more complex, and you incur the cost of the JS scheduler all over the place in your code. Think it’s a lot less ergonoic.
- CW: makes the code less complex if you assume everything was correct.
- JG: my proposal is that if you produce valid calls, it’s asynchronous, but if you had something that had to be fallible, like memory allocation, you’d have to have some way of getting the errors back. Think we should do something similar to WebGL, where as long as you do valid calls, you can ignore the results.
- CW: not sure what the difference is between this, and Promise pipelining. Do you suggest that outside specific cases, validation errors produce context loss?
- JG: one of my proposals was to produce e.g. context loss upon out-of-memory.
- KN: we’re talking about slightly tweaked OpenGL error handling, right?
- JG: yes.
- KN: and you can get handles to things that might be invalid.
- JG: yes.
- JG: if we want to pipeline and let things run ahead of allocation, we have to define the behavior when allocation fails, and we run ahead assuming it succeeded.
- CW: what we suggested (in Vancouver?): when an object is created with parameters where Objects are Errors, that new Object is an error. Submitting an Error is a no-op. You can query errors asynchronously.
- MM: if you try to compile a state but there’s a bug in the shader, and then you try to bind that pipeline to a command buffer, and then you try to draw, none of that would happen?
- CW: your shader is wrong. So your Pipeline is an error. So submitting is a no-op.
- MM: how does the programmer know they’ve hit an error condition?
- CW: some async mechanism. Most production applications don’t need to check for errors. Most apps want to know in production, did I hit any error?
- KR: The only case where you might want to know about errors is OOM. Got a lot of feedback from WebGL devs is that they allocated one too many textures and got an OOM. Dunno if we can have such applications on the Web because the browser would slow down. Dunno if we can accuratly track avaialble memory but it would help a lot. Somehow Firefox is better than Chrome at this. We told customers that could use the heuristic of “bytes per pixel” so that mobile phones with less screen space use less GPU memory than desktops with large monitors and GPUs. Would like a graceful fail mechanism. Talking about the “streaming” use case where the application tries to allocate as much GPU memory as it can.
- CW: that’s taken into account in our initial Vancouver proposal. If you’re streaming and you create a texture, then once you know it’s valid, you can use it.
- MM: if the app has to do this round-trip – creates a texture, and then needs to know it worked – then the Maybe monad seems pointless?
- CW: that’s only needed in the streaming case. We could add queries of VRAM. Most non-fancy applications will just create textures and go ahead.
- JG: even if the allocation failed, you might want to do things like upload data. Don’t want to have to wait to upload data to find out whether it failed. But if you have to delay uploading data until you know the allocation succeeded, that’s another frame of delay.
- MM: seems a few ways app could know it did something wrong.
-
- Most functions don’t return anything. But some functions do return things synchronously. Could return “history” of past errors.
-
- No return values at all. Call “glGetError” to find out what went wrong.
-
- Async queries. Every operation has a tag, and a frame or two later, you can say “tell me about the operation with this tag. Errored or not?”
-
- CW: that’s basically what we got to as well.
- DM: Myles did not list the Promise approach.
- MM: everyone seems to say the Promise approach is no good. Sounds like people don’t like round-trips between creation of a resource and first-use of a resource.
- DM: Promises bad because they’re interrupting JS execution?
- JG: either async/await, or non-contiguous procedural programming. One worrisome thing: in FF, we treat some of these as sync calls. Worried about over-specializing for them to be remoted, and going too far down the route of making everything async, because that’s not what we do. Even when we have multithreaded OpenGL contexts on macOS.
- JG: want to know exactly when things are happening. Passing things off to async/await is concerning.
- DM: what stops you from writing promises in C++?
- JG: there’s no easy concept of async/await.
- DM: it’s a solvable problem in terms of the language.
- JG: it’s not how the code is written today. Want the API to be structured in a way that it’s comprehensible to both JS and C++ programmers.
- DM: so current native code is written in synchronous manner.
- JG: thinking about compile to WASM case.
- CW: it’s bad for Chrome if you create a command buffer and then say, “Resolve now”. Would cause huge numbers of pipeline stalls.
- DM: OK.
- CW: OpenGL pipelining lets you write synchronous code, and get errors async.
- JG: what about “error flags”?
- RC: “getError approach”? You ask every now and then for the errors?
- CW: “Maybe Monad”?
- JG: errors could be associated with objects. Get errors if you do operations with an object “In Error”.
- CW: maybe glGetError returns to you a Promise. :)
- KN: think we’re talking about OpenGL style. Not Maybe Monad.
- MM: before we discount Promises: which specific calls are we talking about?
- CW: for example, in current IDL: if you want to use a Texture: create a TextureView, put it in a BindGroup, then can use it in the command buffer. If there’s a Promise every step of the way, then you’re introducing a one-microtask latency inside your rendering loop.
- MM: you said it’s unclear which of these operations would require Promises. Need clarity.
- CW: CreateCommandBuffer returns a Promise. EndCommandBuffer requires validation, returns a Promise. Again, want to be able to create resources during frames (even if it’s not the “right thing” to do – it’s necessary).
- RC: seems either we care about resource allocation failing, or don’t. If we do, we should have a Promise based API where you call CreateBuffer and get a Promise back receiving the buffer. But for others, where it’s more structural (i.e., correctness of the submitted commands), think it should be some sort of error callback.
- MM: think that’s a good compromise. Allocation be a Promise, other things use Maybe monad style. Even if we do this, need to decide what app should do in unlikely case of errors, in the Maybe monad style.
- CW: could have WebGPUGetError(), returning Promise taking error code. …
- RC: or OnError callback, called synchronously. Not a Promise, just regular JS callback. Need to see if it would be allowed for regular programming. Seems more friendly than using a library which inserts glGetError everywhere. Set a breakpoint in OnError, then you can see the stack trace in the debugger.
- CW: doesn’t have to be core in the API. Could be an extension like “Enable Developer mode”.
- RC: agreed. But if there’s something fallible, like allocation, think that should use a regular, JS-friendly concept, like Promises.
- CW: Google can make a proposal in IDL form.
- KR: Would like to see what the WASM side of things works like. The “GL-style” would work well for WASM, not sure how higher-level JS objects would work in the WASM bindings.
- Pollable vs. Promiseable (vs. both)
- If more time: error handling, based on pull request and discussion that happen there.
- Remember, if you are coming to the F2F, Canada does require electronic travel authorization if you’re not a Canadian or US citizen!
- MM: are we ever going to talk about how presents work, and workers?