Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Sparse Matrix APIs #1279

Merged
merged 62 commits into from
Mar 15, 2023
Merged

Conversation

cjnolet
Copy link
Member

@cjnolet cjnolet commented Feb 14, 2023

Closes #348.

This design addresses some problems we've had in the past when modeling sparse data where our objects were not flexible nor composable enough, which led to APIs which were hard to maintain and state which was hard to track.

This design starts by decomposing sparse formats into two components which are utlimately combined to compose the full sparse object:

  1. a structural component manages the sparsity of the object , indexing, and data-specific metadata, such as total number of rows and columns.
  2. a valued or matrix component combines with a structure and manages the nonzero elements.

Note that this design also affords the ability to model a sparse tensor in the future, as a new format tensor could allow for composing multiple structural and multiple valued components. This could enable our algorithms to support things decompositions of higher ordered structures and/or associated values.

In addition to being flexible and composable, this design also needs to satisfy a couple different levels of immutability:

  1. Read-only: immutable structure, immutable nonzero elements
  2. Fixed-sparsity and value-mutable: immutable structure, mutable nonzero elements
  3. Mutable-sparsity and value-mutable: mutable structure, mutable nonzero elements

Two concepts introduced in this design are pretty core to the 3 states above:

  1. structure-preserving formats are views and require the sparsity to be known at creation time. The actual structural components may or may not be mutable.
  2. structure-owning formats house owning containers and don't require the sparsity to be known at creation time and provide a way to initialize() the sparsity once it is known. These formats will have mutable structure and nonzero elements.

Both the structure and matrix formats can be structure-preserving or structure-owning. While this PR only includes csr_matrix and coo_matrix (I'm considering dropping the r from csr since it doesn't really matter if it's csr or csc), the design will further allow for other formats, such as dcsr and bcsr, in the future.

  1. csr_matrix_view - this is a structure-preserving matrix view. Sparsity must be known up front and the underlying arrays may or may not be const.
  2. csr_matrix - this can be structure-owning or structure-preserving depending on whether its underlying structural component is structure-preserving (view) or structure-owning. Calling view() on this object produces the csr_matrix_view above.
  3. coo_matrix_view - this is a structure-preserving matrix view. Sparsity must be known up front and the underlying arrays may or may not be const.
  4. coo_matrix - this can be structure-owning or structure-preserving depending on whether its underlying structural component is structure-preserving (view) or structure-owning. Calling view() on this object produces the csr_matrix_view above.

Similar to mdarray and mdspan, a bunch of factory functions are provided in raft/core/device_sparse_matrix.hpp to ease the construction process for users. The owning matrix types can be constructed either to own the underlying structure or a view of the structure.

These new formats will allow us to model our sparse APIs so they are much more concise- a function can explicitly require a structure-owning matrix, which is a signal to the user that the function itself will compute the sparsity and at least fill in the initial structure. This will allow us to continue to provide an API which is easier to use, and ultimately feels more like our dense API, while still considering the design differences in sparse computations.

I also want to thank @divyegala for his help making the template metaprogramming layer flexible, reusable, and generally pleasant to use.

  • more comprehensive googletests
  • split into different files (types, compressed, coordinate, etc...)
  • add host APIs

@cjnolet cjnolet added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 14, 2023
@cjnolet cjnolet self-assigned this Feb 14, 2023
@github-actions github-actions bot added the cpp label Feb 14, 2023
@github-actions github-actions bot added the CMake label Feb 15, 2023
@cjnolet cjnolet marked this pull request as ready for review February 16, 2023 00:23
@cjnolet cjnolet requested review from a team as code owners February 16, 2023 00:23
@cjnolet
Copy link
Member Author

cjnolet commented Feb 16, 2023

cc @mhoemmen for thoughts

@mhoemmen
Copy link
Contributor

Hi @cjnolet ! I do like the idea of distinguishing between structure-preserving formats and structure-owning formats. The latter show up, for example, when wrapping third-party libraries that ingest 3-array CSR and produce an optimized opaque format.

Sometimes those libraries give users a way to modify values or even structure. Regardless, users still may want to take some structure-preserving or opaque format, and "dissolve" it to get a transparent, modifiable format.

I'll take a look at the PR; thanks for tagging me!

@cjnolet
Copy link
Member Author

cjnolet commented Feb 17, 2023

@mhoemmen, thanks so much! I'm looking forward to your feedback. I tried to find a way to generalize the different states w/ nomenclature that we could apply directly to the different options.

@github-actions github-actions bot removed the python label Mar 14, 2023
@cjnolet
Copy link
Member Author

cjnolet commented Mar 14, 2023

Opened up docs issue as follow-on #1342.

@cjnolet
Copy link
Member Author

cjnolet commented Mar 15, 2023

/merge

@rapids-bot rapids-bot bot merged commit fb84190 into rapidsai:branch-23.04 Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

[FEA] Consolidate sparse/dense matrix descriptor implementations
4 participants