Replies: 2 comments 10 replies
-
|
Generally I would recommend However, under the following conditions you may get better performance by doing a "staging belt" yourself:
Edit: if you go this path, there is also an extra benefit that you can overwrite sections of your target buffers multiple times in the same submission. With
The actual native buffer can be persistently mapped for all you know, so don't worry about this.
Not sure I understand that difficulty/guessing. You maintain a pool of staging buffers that are mapped, and once you use any of them, you request it for mapping again, and in the promise you bring it back into the pool. There shouldn't be any guesswork here.
This is strictly inferior to |
Beta Was this translation helpful? Give feedback.
-
year 2024, macos m1 wgpu-rs implementation, metal backend |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
So I have been digging though the old issues trying to determine what the optimal/recommended path is for per frame buffer uploads (i.e. for MVP/normal matrices and dynamic vertex arrays). I can identify three paths to take. NOTE: I am currently assuming non-unified memory as there does not seem to be a method of determining when staging buffers are not necessary, therefore I am assuming
MAP_WRITEandVERTEX/UNIFORMetc are mutually exclusive.Create the staging buffers every frame with
createBuffersand usemappedAtCreation, destroy the buffer after the copy to the GPU private buffer is complete. Unless there is some management of allocation behind the scenes (which is not mentioned in the spec) this seems like a bad idea because of the constant allocation/deallocation. As far as I can tell it's not like OpenGL where buffer orphaning is a fast path.Issue a direct write into the GPU private buffer with
writeBuffer. This seems very similar toglBufferSubDatawhich is controversial as it is a fast path on NVIDIA but is a slow path on Intel. Not sure if that is the case with WebGPU or not as it's actual behavior is not in the spec.Map (using
mapAsync) and unmap round robin staging buffers every frame. The difficulty of this is guessing which ones are required for the frame as it's an async operation and thus cannot be performed inside the main render function. It's not impossible, just difficult in some circumstances, though the ability to map a sub range synchronously once thePromiseis resolved partially negates this. The bigger issue is that mapping and unmapping is traditionally slow. Going back to OpenGL, NVIDIA does not recommend this approach, though Intel seems to prefer it over direct writes likeglBufferSubData.In other API's the recommendation from all vendors is to use persistent buffer mapping as it is the fast path for all IHVs. This is neither currently supported nor does there seem to be any plans of supporting this in WebGPU (am I wrong?). Therefore, is there any recommendations on what should be the recommended path on WebGPU, especially since there is no way to determine which IHV it's running on and adjust accordingly?
Beta Was this translation helpful? Give feedback.
All reactions