-
Notifications
You must be signed in to change notification settings - Fork 32
Description
Tracking failures for CUDA: #1732 (review)
This is a list of unit tests that are failing, disabled, and/or have a workaround in-place on matrix:
core_flatmap_serial
This unit test failure is broken down into two different errors (1) and (2)
(1) Error Description: One of the configurations tested uses std::string, which I think is failing when its attempted to be used on device.
Status: Disabled for now. Configurations involving std::string are disabled for CUDA: https://github.com/LLNL/axom/blob/2a7af8675710293b8c26d293aae51f17c99323c0/src/axom/core/tests/core_flatmap.hpp#L750-L757
(2) Error Description: With a pinned memory policy, batched insertion test on the flat map seems to result in either less than the expected number of insertions or a deadlock.
Status: Disabled for now:
https://github.com/LLNL/axom/blob/2a7af8675710293b8c26d293aae51f17c99323c0/src/axom/core/tests/core_flatmap_for_all.hpp#L84-L92
numerics_quadrature_serial
Error Description: This error I think results from quadrature.cpp being not quite completely ported for device-only policy. For example, in the compute_gauss_legendre_data function, axom::Array's are allocated on device but then accessed on the host:
https://github.com/LLNL/axom/blob/df7fef005ffb2c40284ef22d0be789304ab51935/src/axom/core/numerics/quadrature.cpp#L44-L52
Status: Workaround for now. Use unified memory for testing: https://github.com/LLNL/axom/blob/2a7af8675710293b8c26d293aae51f17c99323c0/src/axom/core/tests/numerics_quadrature.hpp#L100-L111
bump_cutfield
bump_topology_mapper
mir_coupled3d
mir_equiz2d (passing with workaround)
mir_equiz3d
mir_concentric_circles_cuda (passing with workaround)
mir_tutorial_simple_cuda_2 (passing with workaround)
mir_tutorial_simple_cuda_5 (passing with workaround)
Error Description: Notably, a subset of these errors (mir_equiz*, bump_topology_mapper) fail intermittently with HIP, so the whole set of failures are likely related?
Status: Some are still failing, those marked "passing" pass after a workaround using unified memory was added to conduit_memory.hpp: https://github.com/LLNL/axom/blob/2a7af8675710293b8c26d293aae51f17c99323c0/src/axom/bump/utilities/conduit_memory.hpp#L101-L108
The others fail with a message suggesting an issue with reading the baseline files from conduit: #1732 (comment)