Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH add from_cv_results in RocCurveDisplay (single RocCurveDisplay) #30399

Draft
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

lucyleeow
Copy link
Member

@lucyleeow lucyleeow commented Dec 3, 2024

Reference Issues/PRs

Reference Issues/PRs

Supercedes #25939

This is part of a group of draft PRs to determine best API for adding plots for cv results to our displays.

For all 3 options we take the output of cross_validate, and use the fitted estimator and test indicies. No fitting is done in the display.

We do recalculate the predictions (which would have already been done in cross_validate), which could be avoided if we decided to change cross_validate to optionally return the predictions as well (note this would make cross_val_predict redundant).
See more thread: #25939 (comment)). I think should be outside of the scope of this body of work though.

What does this implement/fix? Explain your changes.

Not 100% I've implemented this optimally.

  • RocCurveDisplay object may contain data (fpr/tpf etc) for single or multi curves
  • RocCurveDisplay returns single mpl Artist object, or list of objects for multi curves
  • RocCurveDisplay.plot handles both single and multi-curve plotting, this has meant a lot more checking is required (c.f. the other 2 implementations, as this is the only case where you can use plot directly to plot multi-curves)

More specific concerns detailed in review comments

Plot looks like:

image

TODO

We should update visualization.rst after this PR is in to add a section about from_cv_results.

Copy link

github-actions bot commented Dec 3, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 62c3f32. Link to the linter CI: here

Comment on lines 30 to 33
if n_multi is None:
name = self.estimator_name if name is None else name
else:
name = [f"{curve_type} fold {curve_idx}:" for curve_idx in range(n_multi)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with this implementation, I thought this change could be used for other multi cv displays. Not 100% sure on this change though.

@jeremiedbb
Copy link
Member

As discussed in today's meeting, this is my favorite solution because it's the simplest and least surprising one from a user point of view, even though it adds a bit more internal complexity than the others. And I think we can mitigate some of it by extracting parts of the plot code into dedicated _plot_single and _plot_multiple methods. Or just into small helpers, that would already help readability.

It also looks like a good portion of the added complexity will be exactly the same for other displays like PRCurveDisplay, so there might be a chance that we'll be able to factorize some parts to be used by several displays.

@lucyleeow
Copy link
Member Author

The changes in f0908e1 and 7e77d4c factorizes out common code (compared to #30508), adding helper function to either _BinaryClassifierCurveDisplayMixin (if function relevant to other binary displays) or sklearn/utils/_plotting.py (if function more generally applicable to more diplays - these potentially could be a parent class method?)

)
# Add separate test for displays that have converted to names?,
# add note to remove this one in 1.9
@pytest.mark.parametrize("Display", [DetCurveDisplay, PrecisionRecallDisplay])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that I could add a separate test that uses names (instead of estimator_name), so we can delete this test once we've switched all these estimators across to names?
This avoids complicate if/else inside this test.
WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree.

@@ -302,7 +304,7 @@ def test_plot_roc_curve_pos_label(pyplot, response_method, constructor_name):
classifier = LogisticRegression()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about adding from_cv_results to this (test_plot_roc_curve_pos_label) test BUT I see we create an imbalanced training set, and pass a separate test set to from_predictions and from_estimator, which we could not do with from_cv_results, so it feels like it would miss the spirit of this test. I think we should add a separate test for pos_label from_cv_results, WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I don't mind to have separated test for from_cv_results because we do much more stuff inside and the test might end-up with an if/else. So instead let's define a dedicated test.

Comment on lines +307 to 308
# sanity check to be sure the positive class is `classes_[0]` and that we
# are betrayed by the class imbalance
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glemaitre I don't quite follow this comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here, we have string and the positive class is "cancer" that corresponds to classes_[0].
So we are in a non-default scenario (classes_[1] is usually the positive class) and thus we can verify that everything works as expected: get_response_value returns a probability of 1 - p(classes_[1]) (or return the probability of classes_[0]) or that negate the decision function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what does "are betrayed by the class imbalance" mean? How does class imbalance affect pos_label ?

@DeaMariaLeon
Copy link
Contributor

As a (not so educated) user, I wonder if it wouldn't be better to add different colours to each plot line? (they all look blue now). I see that here it was decided to do that here: #30508 (comment).

@@ -41,10 +55,44 @@ class RocCurveDisplay(_BinaryClassifierCurveDisplayMixin):

.. versionadded:: 0.24

fpr : ndarray or list of ndarray
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does fpr also take a list of ndarray? I think that it gets converted to a list on _deprecate_singular. As a "user" I would be a bit confused there I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a copy pasta! Fixed now.

These are deprecated now - they'll have the red deprecated box (via the .. deprecated), which I am hoping will be enough.

If the user does use it, they will get a warning and what they input will be put into a list and passed. This is all stated in the warning. I thought about adding it to the docstring here as well, but thought it wasn't necessary and would make the parameter descriptions un-necessarily long and complicated. Open to other opinions though!

@jeremiedbb
Copy link
Member

I checked the choices that were made in terms of parameter naming in the code base when we accept a single value or a list of values and in most cases (not all thought) the singular name was kept. So I don't think that we need to make the parameter names plural and go through a deprecation cycle. I don't remember where this discussion was happening and might have missed something though.

(more comments regarding the rest of the PR soon 😄)

@lucyleeow
Copy link
Member Author

I'm happy to keep singular name, prevents deprecation!

You would allow both single ndarray and list of ndarray input right?

@lucyleeow
Copy link
Member Author

(FYI I'll wait for the rest of your comments and make changes, and then add tests, thanks!)

@glemaitre
Copy link
Member

I'll give another review on this one. @jeremiedbb do you have additional feedback for @lucyleeow?

@lucyleeow
Copy link
Member Author

lucyleeow commented Feb 13, 2025

I've amended the PR to keep all singular parameter names, which can take a single ndarrary/str etc or a list of such.

The attributes are processed such that they are always a list or None and named with trailing _ e.g., self.tpr_

This has avoided a lot of deprecation, which is nice!

Edit - I've realised that maybe I should not have renamed these attributes, or at least add a getter so that self.tpr returns self.tpr_ ...?

@@ -368,7 +423,7 @@ def from_predictions(
error will be raised.

name : str, default=None
Name of ROC curve for labeling. If `None`, name will be set to
Name of ROC curve for legend labeling. If `None`, name will be set to
`"Classifier"`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering it this is still a good default, to name things "Classifier". Not sure if it provides more info, the legend label could just have the AUC value (or no legend)

Comment on lines 186 to 188
# TODO: Not sure about this, as ideally we would check params are correct
# first??
self.ax_, self.figure_, name_ = self._validate_plot_params(ax=ax, name=name)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we do parameter checking first, but I think this is okay...?

Comment on lines 115 to 118
self.fpr_ = _convert_to_list_leaving_none(fpr)
self.tpr_ = _convert_to_list_leaving_none(tpr)
self.roc_auc_ = _convert_to_list_leaving_none(roc_auc)
self.name_ = _convert_to_list_leaving_none(name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we only need those in plot, we can avoid the _ and delay the conversion in plot itself.

@lucyleeow
Copy link
Member Author

Thanks @glemaitre, changes made. We process in plot now and have both e.g., self.tpr and self.tpr_

Comment on lines +28 to +29
For more about the ROC metric, see :ref:`roc_metrics`.
For more about scikit-learn visualization classes, see :ref:`visualizations`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds reference the user guide section on roc metrics and visualization classes.

I initially wanted to follow the 'Read more in our user guide' line that we use everywhere else, but since we are reference 2 places in the user guide, i thought it would be okay to change for this case but I am not 100%. Happy to amend.

@glemaitre glemaitre self-requested a review February 24, 2025 13:18
and `tpr`. If `None`, no area under ROC curve score is shown. If `name`
is also `None` no legend is added.

name : str or list of str, default=None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking again, it is a breaking change. So better to deprecate it. However, we don't have the same constraint as for the estimator so we can do it in __init__. I think that we can create a small function because the same deprecation might happen in plot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have forgotten where we were up to in our discussions with this. Are we happy with the term 'name' ? I think it is reasonable.

And yes it is a breaking change so we should deprecate estimator_name and add name right? And I can add a function that errors if both params are given and warns if the deprecated one is.

fpr,
tpr,
roc_auc=None,
name=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be gentle and deprecate it as well.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK here is a bunch of comments.

@@ -129,25 +164,48 @@ def plot(

.. versionadded:: 1.6

fold_line_kwargs : dict or list of dict, default=None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the long-term, I'm thinking that we could come with a new API way to specify those. For instance:

display = Display.from_estimator().set_style(chance_level_kw={"color": "tab:blue"})
display.plot()
display.plot(chance_level_kw={"color": "tab:blue"})  # overwrite the one set by `set_style`

I could be handy because you would not need to pass each tim the style. But let's discuss in another PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I do not follow this.

In the example above, do you mean fold_line_kwargs instead of chance_level_kw ?

If you mean chance_level_kw, there is only one per plot, so you only need to pass this once?

If you mean fold_line_kwargs, you can just pass one set of kwargs and fold_line_kwargs will apply it to all curves. Unless you mean we want to pass some default kwargs that apply to all curves, and then also some specific ones (e.g., colour) that should be applied to each curve?



# TODO(1.9): remove
# Should this be a parent class method?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to have it here because we will need it between different displays.

self.ax_, self.figure_, self.name_ = self._validate_plot_params(
ax=ax, name=name
)
self.name_ = _convert_to_list_leaving_none(self.name_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do that in the _validate_plot_params

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is easier but will require a 'not so pretty':

if self.__class__.__name__ in (<classes requiring name_>):
    self.name_ = _convert_to_list_leaving_none()

Comment on lines +187 to +189
self.fpr_ = _convert_to_list_leaving_none(self.fpr)
self.tpr_ = _convert_to_list_leaving_none(self.tpr)
self.roc_auc_ = _convert_to_list_leaving_none(self.roc_auc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potentially, I would think that we need to do that in the _validate_plot_params where in each class, we could modify it e.g.:

def _validate_plot_params(self, ax=ax, name=name, fpr=self.fpr, ...):
    ax, fig, name = super()._validate_plot_params(ax=ax, name=name)
    fpr = ...
    return ax, fig, name, fpr, ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that _BinaryClassifierCurveDisplayMixin is used for many display classes not just ROC curve.

We could have a **kwarg at the end, where all these parameters are validated with _convert_to_list_leaving_none BUT we have to set a e.g., self.tpr_ and I don't know how you would do this...?

ax=ax, name=name
)
self.name_ = _convert_to_list_leaving_none(self.name_)
_check_param_lengths(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also include this check in the _validate_plot_params

y_true,
y_pred,
pos_label=pos_label,
sample_weight=sample_weight,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that there is a bug here (but we don't have the test yet). We should split the sample_weight to get the testing portion as well. So the above _validation_from_predictions_param` is indeed useful :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_validate_from_predictions_params only checks that sample_weight is the same length as y_true and y_pred,
But you are right, this is a bug and I should be using _safe_indexing

I think the problem is that _validation_from_predictions_param and _validate_and_get_response_values are very specific for the from_estimator and from_prediction methods.

Maybe it is a good idea to add a _validate_cv_result_params method?

drop_intermediate=drop_intermediate,
)
roc_auc = auc(fpr, tpr)
# Append all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Append all

)
# Add separate test for displays that have converted to names?,
# add note to remove this one in 1.9
@pytest.mark.parametrize("Display", [DetCurveDisplay, PrecisionRecallDisplay])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree.

@@ -302,7 +304,7 @@ def test_plot_roc_curve_pos_label(pyplot, response_method, constructor_name):
classifier = LogisticRegression()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I don't mind to have separated test for from_cv_results because we do much more stuff inside and the test might end-up with an if/else. So instead let's define a dedicated test.

Comment on lines +307 to 308
# sanity check to be sure the positive class is `classes_[0]` and that we
# are betrayed by the class imbalance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here, we have string and the positive class is "cancer" that corresponds to classes_[0].
So we are in a non-default scenario (classes_[1] is usually the positive class) and thus we can verify that everything works as expected: get_response_value returns a probability of 1 - p(classes_[1]) (or return the probability of classes_[0]) or that negate the decision function.

@glemaitre
Copy link
Member

But overall it looks quite good I think. @jeremiedbb do you have additional remarks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants