Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Brief
Add another workspace memory resource that does not have the explicit memory limit. That is, after the change we have the following:
rmm::mr::get_current_device_resource()
is default for all allocations, as before. It is used for the allocations with unlimited lifetime, e.g. returned to the user.raft::get_workspace_resource()
is for temporary allocations and forced to have fixed size, as before. However, it becomes smaller and should be used only for allocations, which do not scale with problem size. It defaults to a thin layer on top of thecurrent_device_resource
.raft::get_large_workspace_resource()
(new) is for temporary allocations, which can scale with the problem size. Unlikeworkspace_resource
, its size is not fixed. By default, it points to thecurrent_device_resource
, but the user can set it to something backed by the host memory (e.g. managed memory) to avoid OOM exceptions when there's not enough device memory left.Problem
We have a list of issues/preference/requirements, some of which contradict others
rmm::mr::pool_memory_resource
for performance reasons (to avoid lots of cudaMalloc calls in the loops)workspace_resource
, because some of them scale with the problem size and would inevitably fail with OOM at some point.Solution
I propose to split the workspace memory into two:
Notes: