Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH add from_cv_results in RocCurveDisplay (single RocCurveDisplay) #30399

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

lucyleeow
Copy link
Member

Reference Issues/PRs

Reference Issues/PRs

Supercedes #25939

This is part of a group of draft PRs to determine best API for adding plots for cv results to our displays.

For all 3 options we take the output of cross_validate, and use the fitted estimator and test indicies. No fitting is done in the display.

We do recalculate the predictions (which would have already been done in cross_validate), which could be avoided if we decided to change cross_validate to optionally return the predictions as well (note this would make cross_val_predict redundant).
See more thread: #25939 (comment)). I think should be outside of the scope of this body of work though.

What does this implement/fix? Explain your changes.

Not 100% I've implemented this optimally.

  • RocCurveDisplay object may contain data (fpr/tpf etc) for single or multi curves
  • RocCurveDisplay returns single mpl Artist object, or list of objects for multi curves
  • RocCurveDisplay.plot handles both single and multi-curve plotting, this has meant a lot more checking is required (c.f. the other 2 implementations, as this is the only case where you can use plot directly to plot multi-curves)

More specific concerns detailed in review comments

Plot looks like:

image

Copy link

github-actions bot commented Dec 3, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: f4f9a98. Link to the linter CI: here

Comment on lines +701 to +703
fpr_all.append(fpr)
tpr_all.append(tpr)
auc_all.append(roc_auc)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be a better way to do this?

Copy link
Member

@jeremiedbb jeremiedbb Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could do something like

fpr_tpr_auc = []

for ...:
    fpr_tpr_auc.append((fpr, tpr, auc))

fpr_all, tpr_all, auc_all = zip(*fpr_tpr_auc)

but I don't know it's really a "better" way :)

Comment on lines +30 to +33
if n_multi is None:
name = self.estimator_name if name is None else name
else:
name = [f"{curve_type} fold {curve_idx}:" for curve_idx in range(n_multi)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with this implementation, I thought this change could be used for other multi cv displays. Not 100% sure on this change though.

@jeremiedbb
Copy link
Member

As discussed in today's meeting, this is my favorite solution because it's the simplest and least surprising one from a user point of view, even though it adds a bit more internal complexity than the others. And I think we can mitigate some of it by extracting parts of the plot code into dedicated _plot_single and _plot_multiple methods. Or just into small helpers, that would already help readability.

It also looks like a good portion of the added complexity will be exactly the same for other displays like PRCurveDisplay, so there might be a chance that we'll be able to factorize some parts to be used by several displays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants