You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 27, 2022. It is now read-only.
I think there should be a benchmark that compares how the libraries handle parallel iteration. Currently, the closest test for this would be heavy_compute, but the task (inverting a matrix 100 times) is not fine-grained enough to make a comparison of the parallel overhead (there is too much work per item).
I propose either:
reducing the task in the parallel loop of heavy_compute (e.g., to inverting the matrix once, or multiplying a float value, something very small)
Or introducing a new parallel_light_compute benchmark.
The current heavy_compute shows bevy as about ~2x slower than specs. However, parallel_light_compute (see discussion) shows bevy is very sensitive to batch size and can be anywhere up to 10x slower than e.g. specs.
The text was updated successfully, but these errors were encountered:
The current heavy_compute shows bevy as about ~2x slower than specs. However, parallel_light_compute (see discussion) shows bevy is very sensitive to batch size and can be anywhere up to 10x slower than e.g. specs.
In my results (where I merged your and other forks and updated all libraries among some adjustments) bevy is only 2x slower than specs in parallel_light_compute and actually faster than the other libraries. It might be sensitive on thread count as well (I ran it on a 16c/32t system), or the situation improved drastically between bevy 0.5 and 0.6.
However, a note: bevy is extremely sensitive to batch size, while other libraries don't need a batch size to be set. Your file shows a batch size set to 1024. In the discussion I posted above, you'll find the following table which shows bevy scaling with batch size:
Batch Size
Time
8
1.177ms
64
234.13us
256
149.48us
1024
130.48us
4096
207.13us
10,000
485.55us
On my pc, 1024 was the optimum batch size for bevy. For comparison, specs was 108.00 us, so bevy was about ~2x slower than specs. However, in the worst case scenario of unoptimised batch size, bevy remains >10x slower (hence my numbers in first post). I expect the 'ideal' batch size is both hardware and System dependent, and the optimum will be rarely achieved.
(Disclaimer: my tests are still for bevy 0.5 and I didn't get time to run comparisons for 0.6 yet! but my understanding is the parallel performance did not change from other discussions).
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi,
I think there should be a benchmark that compares how the libraries handle parallel iteration. Currently, the closest test for this would be
heavy_compute
, but the task (inverting a matrix 100 times) is not fine-grained enough to make a comparison of the parallel overhead (there is too much work per item).I propose either:
heavy_compute
(e.g., to inverting the matrix once, or multiplying a float value, something very small)parallel_light_compute
benchmark.An example of option two is here: https://github.com/ElliotB256/ecs_bench_suite/tree/parallel_light_compute
Further discussion can be found here: bevyengine/bevy#2173
The current
heavy_compute
shows bevy as about ~2x slower than specs. However,parallel_light_compute
(see discussion) shows bevy is very sensitive to batch size and can be anywhere up to 10x slower than e.g. specs.The text was updated successfully, but these errors were encountered: