-
-
Notifications
You must be signed in to change notification settings - Fork 25.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH add from_cv_results
in RocCurveDisplay
(single RocCurveDisplay
)
#30399
base: main
Are you sure you want to change the base?
Conversation
sklearn/utils/_plotting.py
Outdated
if n_multi is None: | ||
name = self.estimator_name if name is None else name | ||
else: | ||
name = [f"{curve_type} fold {curve_idx}:" for curve_idx in range(n_multi)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go with this implementation, I thought this change could be used for other multi cv displays. Not 100% sure on this change though.
As discussed in today's meeting, this is my favorite solution because it's the simplest and least surprising one from a user point of view, even though it adds a bit more internal complexity than the others. And I think we can mitigate some of it by extracting parts of the It also looks like a good portion of the added complexity will be exactly the same for other displays like PRCurveDisplay, so there might be a chance that we'll be able to factorize some parts to be used by several displays. |
The changes in f0908e1 and 7e77d4c factorizes out common code (compared to #30508), adding helper function to either |
) | ||
# Add separate test for displays that have converted to names?, | ||
# add note to remove this one in 1.9 | ||
@pytest.mark.parametrize("Display", [DetCurveDisplay, PrecisionRecallDisplay]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that I could add a separate test that uses names
(instead of estimator_name
), so we can delete this test once we've switched all these estimators across to names
?
This avoids complicate if/else inside this test.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree.
@@ -302,7 +304,7 @@ def test_plot_roc_curve_pos_label(pyplot, response_method, constructor_name): | |||
classifier = LogisticRegression() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about adding from_cv_results
to this (test_plot_roc_curve_pos_label
) test BUT I see we create an imbalanced training set, and pass a separate test set to from_predictions
and from_estimator
, which we could not do with from_cv_results
, so it feels like it would miss the spirit of this test. I think we should add a separate test for pos_label
from_cv_results
, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I don't mind to have separated test for from_cv_results
because we do much more stuff inside and the test might end-up with an if/else. So instead let's define a dedicated test.
# sanity check to be sure the positive class is `classes_[0]` and that we | ||
# are betrayed by the class imbalance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glemaitre I don't quite follow this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here, we have string and the positive class is "cancer"
that corresponds to classes_[0]
.
So we are in a non-default scenario (classes_[1] is usually the positive class) and thus we can verify that everything works as expected: get_response_value
returns a probability of 1 - p(classes_[1])
(or return the probability of classes_[0]
) or that negate the decision function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what does "are betrayed by the class imbalance" mean? How does class imbalance affect pos_label
?
As a (not so educated) user, I wonder if it wouldn't be better to add different colours to each plot line? (they all look blue now). I see that here it was decided to do that here: #30508 (comment). |
sklearn/metrics/_plot/roc_curve.py
Outdated
@@ -41,10 +55,44 @@ class RocCurveDisplay(_BinaryClassifierCurveDisplayMixin): | |||
|
|||
.. versionadded:: 0.24 | |||
|
|||
fpr : ndarray or list of ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does fpr
also take a list of ndarray? I think that it gets converted to a list on _deprecate_singular
. As a "user" I would be a bit confused there I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a copy pasta! Fixed now.
These are deprecated now - they'll have the red deprecated box (via the .. deprecated
), which I am hoping will be enough.
If the user does use it, they will get a warning and what they input will be put into a list and passed. This is all stated in the warning. I thought about adding it to the docstring here as well, but thought it wasn't necessary and would make the parameter descriptions un-necessarily long and complicated. Open to other opinions though!
I checked the choices that were made in terms of parameter naming in the code base when we accept a single value or a list of values and in most cases (not all thought) the singular name was kept. So I don't think that we need to make the parameter names plural and go through a deprecation cycle. I don't remember where this discussion was happening and might have missed something though. (more comments regarding the rest of the PR soon 😄) |
I'm happy to keep singular name, prevents deprecation! You would allow both single ndarray and list of ndarray input right? |
(FYI I'll wait for the rest of your comments and make changes, and then add tests, thanks!) |
I'll give another review on this one. @jeremiedbb do you have additional feedback for @lucyleeow? |
I've amended the PR to keep all singular parameter names, which can take a single ndarrary/str etc or a list of such. The attributes are processed such that they are always a list or This has avoided a lot of deprecation, which is nice! Edit - I've realised that maybe I should not have renamed these attributes, or at least add a getter so that |
@@ -368,7 +423,7 @@ def from_predictions( | |||
error will be raised. | |||
|
|||
name : str, default=None | |||
Name of ROC curve for labeling. If `None`, name will be set to | |||
Name of ROC curve for legend labeling. If `None`, name will be set to | |||
`"Classifier"`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering it this is still a good default, to name things "Classifier". Not sure if it provides more info, the legend label could just have the AUC value (or no legend)
sklearn/metrics/_plot/roc_curve.py
Outdated
# TODO: Not sure about this, as ideally we would check params are correct | ||
# first?? | ||
self.ax_, self.figure_, name_ = self._validate_plot_params(ax=ax, name=name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we do parameter checking first, but I think this is okay...?
sklearn/metrics/_plot/roc_curve.py
Outdated
self.fpr_ = _convert_to_list_leaving_none(fpr) | ||
self.tpr_ = _convert_to_list_leaving_none(tpr) | ||
self.roc_auc_ = _convert_to_list_leaving_none(roc_auc) | ||
self.name_ = _convert_to_list_leaving_none(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we only need those in plot
, we can avoid the _
and delay the conversion in plot
itself.
Thanks @glemaitre, changes made. We process in |
For more about the ROC metric, see :ref:`roc_metrics`. | ||
For more about scikit-learn visualization classes, see :ref:`visualizations`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds reference the user guide section on roc metrics and visualization classes.
I initially wanted to follow the 'Read more in our user guide' line that we use everywhere else, but since we are reference 2 places in the user guide, i thought it would be okay to change for this case but I am not 100%. Happy to amend.
and `tpr`. If `None`, no area under ROC curve score is shown. If `name` | ||
is also `None` no legend is added. | ||
|
||
name : str or list of str, default=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking again, it is a breaking change. So better to deprecate it. However, we don't have the same constraint as for the estimator so we can do it in __init__
. I think that we can create a small function because the same deprecation might happen in plot
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have forgotten where we were up to in our discussions with this. Are we happy with the term 'name' ? I think it is reasonable.
And yes it is a breaking change so we should deprecate estimator_name
and add name
right? And I can add a function that errors if both params are given and warns if the deprecated one is.
fpr, | ||
tpr, | ||
roc_auc=None, | ||
name=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be gentle and deprecate it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK here is a bunch of comments.
@@ -129,25 +164,48 @@ def plot( | |||
|
|||
.. versionadded:: 1.6 | |||
|
|||
fold_line_kwargs : dict or list of dict, default=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the long-term, I'm thinking that we could come with a new API way to specify those. For instance:
display = Display.from_estimator().set_style(chance_level_kw={"color": "tab:blue"})
display.plot()
display.plot(chance_level_kw={"color": "tab:blue"}) # overwrite the one set by `set_style`
I could be handy because you would not need to pass each tim the style. But let's discuss in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I do not follow this.
In the example above, do you mean fold_line_kwargs
instead of chance_level_kw
?
If you mean chance_level_kw
, there is only one per plot, so you only need to pass this once?
If you mean fold_line_kwargs
, you can just pass one set of kwargs and fold_line_kwargs
will apply it to all curves. Unless you mean we want to pass some default kwargs that apply to all curves, and then also some specific ones (e.g., colour) that should be applied to each curve?
|
||
|
||
# TODO(1.9): remove | ||
# Should this be a parent class method? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine to have it here because we will need it between different displays.
self.ax_, self.figure_, self.name_ = self._validate_plot_params( | ||
ax=ax, name=name | ||
) | ||
self.name_ = _convert_to_list_leaving_none(self.name_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do that in the _validate_plot_params
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is easier but will require a 'not so pretty':
if self.__class__.__name__ in (<classes requiring name_>):
self.name_ = _convert_to_list_leaving_none()
self.fpr_ = _convert_to_list_leaving_none(self.fpr) | ||
self.tpr_ = _convert_to_list_leaving_none(self.tpr) | ||
self.roc_auc_ = _convert_to_list_leaving_none(self.roc_auc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentially, I would think that we need to do that in the _validate_plot_params
where in each class, we could modify it e.g.:
def _validate_plot_params(self, ax=ax, name=name, fpr=self.fpr, ...):
ax, fig, name = super()._validate_plot_params(ax=ax, name=name)
fpr = ...
return ax, fig, name, fpr, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that _BinaryClassifierCurveDisplayMixin
is used for many display classes not just ROC curve.
We could have a **kwarg
at the end, where all these parameters are validated with _convert_to_list_leaving_none
BUT we have to set a e.g., self.tpr_
and I don't know how you would do this...?
ax=ax, name=name | ||
) | ||
self.name_ = _convert_to_list_leaving_none(self.name_) | ||
_check_param_lengths( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also include this check in the _validate_plot_params
y_true, | ||
y_pred, | ||
pos_label=pos_label, | ||
sample_weight=sample_weight, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that there is a bug here (but we don't have the test yet). We should split the sample_weight
to get the testing portion as well. So the above _validation_from_predictions_param` is indeed useful :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_validate_from_predictions_params
only checks that sample_weight
is the same length as y_true
and y_pred
,
But you are right, this is a bug and I should be using _safe_indexing
I think the problem is that _validation_from_predictions_param
and _validate_and_get_response_values
are very specific for the from_estimator
and from_prediction
methods.
Maybe it is a good idea to add a _validate_cv_result_params
method?
drop_intermediate=drop_intermediate, | ||
) | ||
roc_auc = auc(fpr, tpr) | ||
# Append all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Append all |
) | ||
# Add separate test for displays that have converted to names?, | ||
# add note to remove this one in 1.9 | ||
@pytest.mark.parametrize("Display", [DetCurveDisplay, PrecisionRecallDisplay]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree.
@@ -302,7 +304,7 @@ def test_plot_roc_curve_pos_label(pyplot, response_method, constructor_name): | |||
classifier = LogisticRegression() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I don't mind to have separated test for from_cv_results
because we do much more stuff inside and the test might end-up with an if/else. So instead let's define a dedicated test.
# sanity check to be sure the positive class is `classes_[0]` and that we | ||
# are betrayed by the class imbalance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here, we have string and the positive class is "cancer"
that corresponds to classes_[0]
.
So we are in a non-default scenario (classes_[1] is usually the positive class) and thus we can verify that everything works as expected: get_response_value
returns a probability of 1 - p(classes_[1])
(or return the probability of classes_[0]
) or that negate the decision function.
But overall it looks quite good I think. @jeremiedbb do you have additional remarks? |
Reference Issues/PRs
Reference Issues/PRs
Supercedes #25939
This is part of a group of draft PRs to determine best API for adding plots for cv results to our displays.
from_cv_results
inRocCurveDisplay
(Multi-display) #30359)For all 3 options we take the output of
cross_validate
, and use the fitted estimator and test indicies. No fitting is done in the display.We do recalculate the predictions (which would have already been done in
cross_validate
), which could be avoided if we decided to changecross_validate
to optionally return the predictions as well (note this would makecross_val_predict
redundant).See more thread: #25939 (comment)). I think should be outside of the scope of this body of work though.
What does this implement/fix? Explain your changes.
Not 100% I've implemented this optimally.
RocCurveDisplay
object may contain data (fpr/tpf etc) for single or multi curvesRocCurveDisplay
returns single mpl Artist object, or list of objects for multi curvesRocCurveDisplay.plot
handles both single and multi-curve plotting, this has meant a lot more checking is required (c.f. the other 2 implementations, as this is the only case where you can useplot
directly to plot multi-curves)More specific concerns detailed in review comments
Plot looks like:
TODO
We should update
visualization.rst
after this PR is in to add a section aboutfrom_cv_results
.