Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use doctest for testing python example docstrings #1073

Merged
merged 2 commits into from
Dec 7, 2022

Conversation

benfred
Copy link
Member

@benfred benfred commented Dec 7, 2022

Similar to rapidsai/cudf#9815, this change uses doctest to test that the pylibraft example docstrings run without issue.

This caught several errors in the example docstrings, that are also fixed in this PR:

  • a missing ‘device_ndarray’ import in kmeans fit when the centroids weren’t explicitly passed in
  • an error in the fused_l2_nn_argmin docstring where output wasn’t defined
  • An AttributeError: module 'pylibraft.neighbors.ivf_pq' has no attribute 'np' error in ivf_pq

Closes #981

Similar to rapidsai/cudf#9815, this change uses doctest
to test that the pylibraft example docstrings run without issue.

This caught several errors in the example docstrings, that are also fixed in this PR:
 *  a missing ‘device_ndarray’ import in kmeans fit when the centroids weren’t explicitly passed in
 *  an error in the fused_l2_nn_argmin docstring where output wasn’t defined
 *  An `AttributeError: module 'pylibraft.neighbors.ivf_pq' has no attribute 'np'` error in ivf_pq

Closes rapidsai#981
@benfred benfred requested a review from a team as a code owner December 7, 2022 03:49
@github-actions github-actions bot added the python label Dec 7, 2022
@benfred benfred added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 7, 2022
>>>
... n_probes=20,
... lut_dtype=np.float16,
... internal_distance_dtype=np.float32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be cp.float16 and cp.float32? The numpy import might not be necessary

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cupy has been purposefully left out of the list of dependencies for RAFT because cupy actually depends on RAFT. This is why we just assume numpy, since it's likely to already be installed in the user's environment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, we do use cupy in the usage examples, but we could just as easily use PyTorch or Numba in the examples. We provide a "device_ndarray" object to keep RAFT itself dependency agnostic but interoperable. I think we should build a notebook for this very topic at some point soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cupy is already being used in this example snippet - including for dtypes (on line 621 for instance). I've updated in the last commit to use cupy here to be consistent.

I think putting together a notebook showing interop with other frameworks like pytorch/tensorflow/jax is a great idea (or even interop with just numpy without using cupy) - but is out of scope of this PR

Copy link
Member

@cjnolet cjnolet Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cupy is already being used in this example snippet - including for dtypes (on line 621 for instance). I've updated in the last commit to use cupy here to be consistent.

I missed the detail that this comment was made on a usage example in my initial reply to @lowener. However, it got me thinking more about it and now I'm now confused how these tests are passing without cupy installed. I don't see it being installed in anywhere explicitly in our ci scripts or conda environments but I do see it in the CI logs for the gpu test tasks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it must already be packaged in the docker container CI is using.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me! I'm excited to be testing these.

Once we add some notebooks to the repo, we should also look at how cuml and cugraph (and I believe cudf?) are converting those to python scripts and testing them in CI.

>>> X = cp.random.random_sample((n_samples, n_features),
>>> dtype=cp.float32)
>>>
... dtype=cp.float32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is good to know. So this is how we represent continuation of previous line in the pydocs.

>>>
... n_probes=20,
... lut_dtype=np.float16,
... internal_distance_dtype=np.float32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, we do use cupy in the usage examples, but we could just as easily use PyTorch or Numba in the examples. We provide a "device_ndarray" object to keep RAFT itself dependency agnostic but interoperable. I think we should build a notebook for this very topic at some point soon.

@benfred
Copy link
Member Author

benfred commented Dec 7, 2022

Once we add some notebooks to the repo, we should also look at how cuml and cugraph (and I believe cudf?) are converting those to python scripts and testing them in CI.

For testing notebooks, I think that cudf uses some bash script's to run notebooks using ipython https://github.com/rapidsai/cudf/blob/branch-23.02/ci/utils/nbtest.sh .

Alternatively- the Merlin team is currently using testbook to test out their notebooks - and it seems to work pretty well.

@cjnolet
Copy link
Member

cjnolet commented Dec 7, 2022

Alternatively- the Merlin team is currently using testbook to test out their notebooks - and it seems to work pretty well.

That looks perfect!

@cjnolet
Copy link
Member

cjnolet commented Dec 7, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 2dd4860 into rapidsai:branch-23.02 Dec 7, 2022
@benfred benfred deleted the doctest branch December 7, 2022 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python
Projects
Development

Successfully merging this pull request may close these issues.

[FEA] Use doctest for testing python docstrings
3 participants