-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cagra index construction without copying device mdarrays #1494
Conversation
Linking the original discussion: #1375 (comment) and tagging @divyegala and @enp1s0 for review. |
Currently API change is non breaking. Behavior is changed: if device arrays go out of scope while the index is alive, then index would be invalid. Therefore labelled as breaking change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple fairly minor things. It would help if we could capture an issue to revisit the sometimes owning
semantics. I think it's another great case for the mdbuffer
that @wphicks has been working on.
private: | ||
raft::distance::DistanceType metric_; | ||
raft::device_matrix<T, IdxT, row_major> dataset_; | ||
raft::device_matrix<IdxT, IdxT, row_major> graph_; | ||
raft::device_matrix_view<const T, IdxT, row_major> dataset_view_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have some types to help us w/ the sometimes owning
semantics that I think can be useful here. I'm okay merging this PR in for release the way it is, but I think we should consider using something like the mdbuffer
or TemporaryDeviceBuffer
in the future. Though I think the TemporaryDeviceBuffer
will require some adjustments to make it useful for this specific case (tagging @divyegala).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point, we shall return to this once we have a better way to express this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to what @cjnolet mentioned here. I think it will be pretty straightforward to transition to mdbuffer once it's available (hopefully very soon).
40ba206
to
1227797
Compare
1227797
to
1934c59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the PR according to the comments.
private: | ||
raft::distance::DistanceType metric_; | ||
raft::device_matrix<T, IdxT, row_major> dataset_; | ||
raft::device_matrix<IdxT, IdxT, row_major> graph_; | ||
raft::device_matrix_view<const T, IdxT, row_major> dataset_view_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point, we shall return to this once we have a better way to express this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @tfeher!
/merge |
This PR aims to improve the workflow when dealing with large datasets. When experimenting with different versions of the knn-graph, we might want to construct indices with the same dataset (see #1435 for further discussion).
If the dataset is already in device memory (and rows are properly aligned / padded), then we only store a reference to the dataset, therefore multiple indices can refer to the same dataset. Similarly, when
knn_graph
is a device array, then store only a reference.Additionally, this PR adds
update_dataset
andupdate_graph
methods to the index.Closes #1479