ENH add `from_cv_results` in `RocCurveDisplay` (single `RocCurveDisplay`) #30399

lucyleeow · 2024-12-03T11:27:13Z

Reference Issues/PRs

Supercedes #25939

This is part of a group of draft PRs to determine best API for adding plots for cv results to our displays.

Add multi display class (ENH add from_cv_results in RocCurveDisplay (Multi-display) #30359)
Use list of single display classes (ENH add from_cv_results in RocCurveDisplay (list of displays) #30370)
Amend single display class to optionally return list (this PR)

For all 3 options we take the output of cross_validate, and use the fitted estimator and test indicies. No fitting is done in the display.

We do recalculate the predictions (which would have already been done in cross_validate), which could be avoided if we decided to change cross_validate to optionally return the predictions as well (note this would make cross_val_predict redundant).
See more thread: #25939 (comment)). I think should be outside of the scope of this body of work though.

What does this implement/fix? Explain your changes.

Not 100% I've implemented this optimally.

RocCurveDisplay object may contain data (fpr/tpf etc) for single or multi curves
RocCurveDisplay returns single mpl Artist object, or list of objects for multi curves
RocCurveDisplay.plot handles both single and multi-curve plotting, this has meant a lot more checking is required (c.f. the other 2 implementations, as this is the only case where you can use plot directly to plot multi-curves)

More specific concerns detailed in review comments

Plot looks like:

TODO

We should update visualization.rst after this PR is in to add a section about from_cv_results.

github-actions · 2024-12-03T11:28:31Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 62c3f32. Link to the linter CI: here}

sklearn/metrics/_plot/roc_curve.py

lucyleeow · 2024-12-03T11:32:51Z

sklearn/utils/_plotting.py

+        if n_multi is None:
+            name = self.estimator_name if name is None else name
+        else:
+            name = [f"{curve_type} fold {curve_idx}:" for curve_idx in range(n_multi)]


If we go with this implementation, I thought this change could be used for other multi cv displays. Not 100% sure on this change though.

jeremiedbb · 2024-12-11T15:59:18Z

As discussed in today's meeting, this is my favorite solution because it's the simplest and least surprising one from a user point of view, even though it adds a bit more internal complexity than the others. And I think we can mitigate some of it by extracting parts of the plot code into dedicated _plot_single and _plot_multiple methods. Or just into small helpers, that would already help readability.

It also looks like a good portion of the added complexity will be exactly the same for other displays like PRCurveDisplay, so there might be a chance that we'll be able to factorize some parts to be used by several displays.

lucyleeow · 2024-12-31T04:15:29Z

The changes in f0908e1 and 7e77d4c factorizes out common code (compared to #30508), adding helper function to either _BinaryClassifierCurveDisplayMixin (if function relevant to other binary displays) or sklearn/utils/_plotting.py (if function more generally applicable to more diplays - these potentially could be a parent class method?)

lucyleeow · 2025-01-16T05:48:08Z

sklearn/metrics/_plot/tests/test_common_curve_display.py

-)
+# Add separate test for displays that have converted to names?,
+# add note to remove this one in 1.9
+@pytest.mark.parametrize("Display", [DetCurveDisplay, PrecisionRecallDisplay])


I was thinking that I could add a separate test that uses names (instead of estimator_name), so we can delete this test once we've switched all these estimators across to names?
This avoids complicate if/else inside this test.
WDYT?

Yes I agree.

lucyleeow · 2025-01-16T05:50:58Z

sklearn/metrics/_plot/tests/test_roc_curve_display.py

@@ -302,7 +304,7 @@ def test_plot_roc_curve_pos_label(pyplot, response_method, constructor_name):
    classifier = LogisticRegression()


I thought about adding from_cv_results to this (test_plot_roc_curve_pos_label) test BUT I see we create an imbalanced training set, and pass a separate test set to from_predictions and from_estimator, which we could not do with from_cv_results, so it feels like it would miss the spirit of this test. I think we should add a separate test for pos_label from_cv_results, WDYT?

In general, I don't mind to have separated test for from_cv_results because we do much more stuff inside and the test might end-up with an if/else. So instead let's define a dedicated test.

lucyleeow · 2025-01-16T05:51:01Z

sklearn/metrics/_plot/tests/test_roc_curve_display.py

+    # sanity check to be sure the positive class is `classes_[0]` and that we
    # are betrayed by the class imbalance


@glemaitre I don't quite follow this comment?

So here, we have string and the positive class is "cancer" that corresponds to classes_[0].
So we are in a non-default scenario (classes_[1] is usually the positive class) and thus we can verify that everything works as expected: get_response_value returns a probability of 1 - p(classes_[1]) (or return the probability of classes_[0]) or that negate the decision function.

But what does "are betrayed by the class imbalance" mean? How does class imbalance affect pos_label ?

DeaMariaLeon · 2025-01-20T17:26:14Z

As a (not so educated) user, I wonder if it wouldn't be better to add different colours to each plot line? (they all look blue now). I see that here it was decided to do that here: #30508 (comment).

DeaMariaLeon · 2025-01-20T17:39:15Z

sklearn/metrics/_plot/roc_curve.py

@@ -41,10 +55,44 @@ class RocCurveDisplay(_BinaryClassifierCurveDisplayMixin):

        .. versionadded:: 0.24

+    fpr : ndarray or list of ndarray


Does fpr also take a list of ndarray? I think that it gets converted to a list on _deprecate_singular. As a "user" I would be a bit confused there I think.

That was a copy pasta! Fixed now.

These are deprecated now - they'll have the red deprecated box (via the .. deprecated), which I am hoping will be enough.

If the user does use it, they will get a warning and what they input will be put into a list and passed. This is all stated in the warning. I thought about adding it to the docstring here as well, but thought it wasn't necessary and would make the parameter descriptions un-necessarily long and complicated. Open to other opinions though!

sklearn/metrics/_plot/roc_curve.py

jeremiedbb · 2025-01-22T16:35:16Z

I checked the choices that were made in terms of parameter naming in the code base when we accept a single value or a list of values and in most cases (not all thought) the singular name was kept. So I don't think that we need to make the parameter names plural and go through a deprecation cycle. I don't remember where this discussion was happening and might have missed something though.

(more comments regarding the rest of the PR soon 😄)

lucyleeow · 2025-01-23T03:49:52Z

I'm happy to keep singular name, prevents deprecation!

You would allow both single ndarray and list of ndarray input right?

lucyleeow · 2025-01-24T05:10:41Z

(FYI I'll wait for the rest of your comments and make changes, and then add tests, thanks!)

glemaitre · 2025-02-05T08:09:52Z

I'll give another review on this one. @jeremiedbb do you have additional feedback for @lucyleeow?

lucyleeow · 2025-02-13T06:31:44Z

I've amended the PR to keep all singular parameter names, which can take a single ndarrary/str etc or a list of such.

The attributes are processed such that they are always a list or None and named with trailing _ e.g., self.tpr_

This has avoided a lot of deprecation, which is nice!

Edit - I've realised that maybe I should not have renamed these attributes, or at least add a getter so that self.tpr returns self.tpr_ ...?

lucyleeow · 2025-02-13T06:33:22Z

sklearn/metrics/_plot/roc_curve.py

@@ -368,7 +423,7 @@ def from_predictions(
            error will be raised.

        name : str, default=None
-            Name of ROC curve for labeling. If `None`, name will be set to
+            Name of ROC curve for legend labeling. If `None`, name will be set to
            `"Classifier"`.


I am wondering it this is still a good default, to name things "Classifier". Not sure if it provides more info, the legend label could just have the AUC value (or no legend)

lucyleeow · 2025-02-13T06:34:16Z

sklearn/metrics/_plot/roc_curve.py

+        # TODO: Not sure about this, as ideally we would check params are correct
+        # first??
+        self.ax_, self.figure_, name_ = self._validate_plot_params(ax=ax, name=name)


Ideally we do parameter checking first, but I think this is okay...?

glemaitre · 2025-02-14T15:31:04Z

sklearn/metrics/_plot/roc_curve.py

+        self.fpr_ = _convert_to_list_leaving_none(fpr)
+        self.tpr_ = _convert_to_list_leaving_none(tpr)
+        self.roc_auc_ = _convert_to_list_leaving_none(roc_auc)
+        self.name_ = _convert_to_list_leaving_none(name)


Since we only need those in plot, we can avoid the _ and delay the conversion in plot itself.

lucyleeow · 2025-02-18T02:22:21Z

Thanks @glemaitre, changes made. We process in plot now and have both e.g., self.tpr and self.tpr_

lucyleeow · 2025-02-18T03:09:55Z

sklearn/metrics/_plot/roc_curve.py

+    For more about the ROC metric, see :ref:`roc_metrics`.
+    For more about scikit-learn visualization classes, see :ref:`visualizations`.


This adds reference the user guide section on roc metrics and visualization classes.

I initially wanted to follow the 'Read more in our user guide' line that we use everywhere else, but since we are reference 2 places in the user guide, i thought it would be okay to change for this case but I am not 100%. Happy to amend.

glemaitre · 2025-02-24T13:29:56Z

sklearn/metrics/_plot/roc_curve.py

+        and `tpr`. If `None`, no area under ROC curve score is shown. If `name`
+        is also `None` no legend is added.
+
+    name : str or list of str, default=None


Thinking again, it is a breaking change. So better to deprecate it. However, we don't have the same constraint as for the estimator so we can do it in __init__. I think that we can create a small function because the same deprecation might happen in plot.

I have forgotten where we were up to in our discussions with this. Are we happy with the term 'name' ? I think it is reasonable.

And yes it is a breaking change so we should deprecate estimator_name and add name right? And I can add a function that errors if both params are given and warns if the deprecated one is.

glemaitre · 2025-02-24T13:31:00Z

sklearn/metrics/_plot/roc_curve.py

+        fpr,
+        tpr,
+        roc_auc=None,
+        name=None,


Let's be gentle and deprecate it as well.

glemaitre

OK here is a bunch of comments.

glemaitre · 2025-02-24T16:35:45Z

sklearn/metrics/_plot/roc_curve.py

@@ -129,25 +164,48 @@ def plot(

            .. versionadded:: 1.6

+        fold_line_kwargs : dict or list of dict, default=None


In the long-term, I'm thinking that we could come with a new API way to specify those. For instance:

display = Display.from_estimator().set_style(chance_level_kw={"color": "tab:blue"}) display.plot() display.plot(chance_level_kw={"color": "tab:blue"}) # overwrite the one set by `set_style`

I could be handy because you would not need to pass each tim the style. But let's discuss in another PR.

Sorry I do not follow this.

In the example above, do you mean fold_line_kwargs instead of chance_level_kw ?

If you mean chance_level_kw, there is only one per plot, so you only need to pass this once?

If you mean fold_line_kwargs, you can just pass one set of kwargs and fold_line_kwargs will apply it to all curves. Unless you mean we want to pass some default kwargs that apply to all curves, and then also some specific ones (e.g., colour) that should be applied to each curve?

glemaitre · 2025-02-24T16:38:20Z

sklearn/utils/_plotting.py

+
+
+# TODO(1.9): remove
+# Should this be a parent class method?


I think it is fine to have it here because we will need it between different displays.

glemaitre · 2025-02-24T16:40:44Z

sklearn/metrics/_plot/roc_curve.py

+        self.ax_, self.figure_, self.name_ = self._validate_plot_params(
+            ax=ax, name=name
+        )
+        self.name_ = _convert_to_list_leaving_none(self.name_)


I would do that in the _validate_plot_params

I think this is easier but will require a 'not so pretty':

if self.__class__.__name__ in (<classes requiring name_>): self.name_ = _convert_to_list_leaving_none()

glemaitre · 2025-02-24T16:43:47Z

sklearn/metrics/_plot/roc_curve.py

+        self.fpr_ = _convert_to_list_leaving_none(self.fpr)
+        self.tpr_ = _convert_to_list_leaving_none(self.tpr)
+        self.roc_auc_ = _convert_to_list_leaving_none(self.roc_auc)


potentially, I would think that we need to do that in the _validate_plot_params where in each class, we could modify it e.g.:

def _validate_plot_params(self, ax=ax, name=name, fpr=self.fpr, ...): ax, fig, name = super()._validate_plot_params(ax=ax, name=name) fpr = ... return ax, fig, name, fpr, ...

The problem is that _BinaryClassifierCurveDisplayMixin is used for many display classes not just ROC curve.

We could have a **kwarg at the end, where all these parameters are validated with _convert_to_list_leaving_none BUT we have to set a e.g., self.tpr_ and I don't know how you would do this...?

glemaitre · 2025-02-24T16:44:17Z

sklearn/metrics/_plot/roc_curve.py

+            ax=ax, name=name
+        )
+        self.name_ = _convert_to_list_leaving_none(self.name_)
+        _check_param_lengths(


I would also include this check in the _validate_plot_params

glemaitre · 2025-02-24T16:59:16Z

sklearn/metrics/_plot/roc_curve.py

+                y_true,
+                y_pred,
+                pos_label=pos_label,
+                sample_weight=sample_weight,


I would say that there is a bug here (but we don't have the test yet). We should split the sample_weight to get the testing portion as well. So the above _validation_from_predictions_param` is indeed useful :)

_validate_from_predictions_params only checks that sample_weight is the same length as y_true and y_pred,
But you are right, this is a bug and I should be using _safe_indexing

I think the problem is that _validation_from_predictions_param and _validate_and_get_response_values are very specific for the from_estimator and from_prediction methods.

Maybe it is a good idea to add a _validate_cv_result_params method?

glemaitre · 2025-02-24T16:59:28Z

sklearn/metrics/_plot/roc_curve.py

+                drop_intermediate=drop_intermediate,
+            )
+            roc_auc = auc(fpr, tpr)
+            # Append all


Suggested change

# Append all

glemaitre · 2025-02-24T17:00:06Z

sklearn/metrics/_plot/tests/test_common_curve_display.py

-)
+# Add separate test for displays that have converted to names?,
+# add note to remove this one in 1.9
+@pytest.mark.parametrize("Display", [DetCurveDisplay, PrecisionRecallDisplay])


Yes I agree.

glemaitre · 2025-02-24T17:02:28Z

sklearn/metrics/_plot/tests/test_roc_curve_display.py

@@ -302,7 +304,7 @@ def test_plot_roc_curve_pos_label(pyplot, response_method, constructor_name):
    classifier = LogisticRegression()


In general, I don't mind to have separated test for from_cv_results because we do much more stuff inside and the test might end-up with an if/else. So instead let's define a dedicated test.

glemaitre · 2025-02-24T17:06:23Z

sklearn/metrics/_plot/tests/test_roc_curve_display.py

+    # sanity check to be sure the positive class is `classes_[0]` and that we
    # are betrayed by the class imbalance


So here, we have string and the positive class is "cancer" that corresponds to classes_[0].
So we are in a non-default scenario (classes_[1] is usually the positive class) and thus we can verify that everything works as expected: get_response_value returns a probability of 1 - p(classes_[1]) (or return the probability of classes_[0]) or that negate the decision function.

glemaitre · 2025-02-24T17:07:22Z

But overall it looks quite good I think. @jeremiedbb do you have additional remarks?

first commit

a1442e5

lucyleeow marked this pull request as draft December 3, 2024 11:27

github-actions bot added module:metrics module:utils labels Dec 3, 2024

lint

f4f9a98

lucyleeow commented Dec 3, 2024

View reviewed changes

This was referenced Dec 3, 2024

ENH add from_cv_results in RocCurveDisplay (Multi-display) #30359

Closed

ENH add from_cv_results in RocCurveDisplay (list of displays) #30370

Closed

fixes

9e87e13

lucyleeow mentioned this pull request Dec 19, 2024

ENH add from_cv_results in PrecisionRecallDisplay (single Display) #30508

Draft

lucyleeow added 2 commits December 31, 2024 15:10

factorize

f0908e1

fix

7e77d4c

lucyleeow mentioned this pull request Jan 7, 2025

Different(wrong?) meaning of pos_label=None #10010

Open

lucyleeow added 14 commits January 15, 2025 11:01

review changes

1a4a2f3

merge main

4e25ffc

lint

042c0f6

ignore mypy

e8a5073

fix validate plot param

4431c55

fix from predictions

34d8051

fix example in docstring

dc6adce

fix tests

bdf2e43

fix docstring example

f5dbb1d

black

144fd13

black

0aa7751

fix tests

70f0127

fix testst

b9e1b0b

fix tests

73984e3

lint

65f3d21

lucyleeow commented Jan 16, 2025

View reviewed changes

DeaMariaLeon reviewed Jan 20, 2025

View reviewed changes

review

47f6bc7

review

231eb51

lucyleeow mentioned this pull request Jan 25, 2025

MNT Make binary display method parameters' order consistent #30717

Open

lucyleeow mentioned this pull request Feb 4, 2025

DOC Add from_predictions example to visualizations.rst #30767

Open

glemaitre self-requested a review February 5, 2025 08:08

lucyleeow added 4 commits February 13, 2025 14:48

Merge branch 'main' into cv_results3

8bb0402

revert to singular only

8fc20e8

lint

18b2b1a

update tests, more fixes

a1824b9

lucyleeow commented Feb 13, 2025

View reviewed changes

glemaitre reviewed Feb 14, 2025

View reviewed changes

process attrs in plot

2745405

add ref

62c3f32

lucyleeow commented Feb 18, 2025

View reviewed changes

glemaitre self-requested a review February 24, 2025 13:18

glemaitre reviewed Feb 24, 2025

View reviewed changes

		@@ -302,7 +304,7 @@ def test_plot_roc_curve_pos_label(pyplot, response_method, constructor_name):
		classifier = LogisticRegression()

		# sanity check to be sure the positive class is `classes_[0]` and that we
		# are betrayed by the class imbalance

		@@ -41,10 +55,44 @@ class RocCurveDisplay(_BinaryClassifierCurveDisplayMixin):

		.. versionadded:: 0.24

		fpr : ndarray or list of ndarray

		For more about the ROC metric, see :ref:`roc_metrics`.
		For more about scikit-learn visualization classes, see :ref:`visualizations`.

		@@ -129,25 +164,48 @@ def plot(

		.. versionadded:: 1.6

		fold_line_kwargs : dict or list of dict, default=None

ENH add from_cv_results in RocCurveDisplay (single RocCurveDisplay) #30399

Are you sure you want to change the base?

ENH add from_cv_results in RocCurveDisplay (single RocCurveDisplay) #30399

Conversation

lucyleeow commented Dec 3, 2024 • edited Loading

Reference Issues/PRs

Reference Issues/PRs

What does this implement/fix? Explain your changes.

TODO

github-actions bot commented Dec 3, 2024 • edited Loading

✔️ Linting Passed

Choose a reason for hiding this comment

jeremiedbb commented Dec 11, 2024

lucyleeow commented Dec 31, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DeaMariaLeon commented Jan 20, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeremiedbb commented Jan 22, 2025

lucyleeow commented Jan 23, 2025

lucyleeow commented Jan 24, 2025

glemaitre commented Feb 5, 2025

lucyleeow commented Feb 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lucyleeow commented Feb 18, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glemaitre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glemaitre commented Feb 24, 2025

ENH add `from_cv_results` in `RocCurveDisplay` (single `RocCurveDisplay`) #30399

ENH add `from_cv_results` in `RocCurveDisplay` (single `RocCurveDisplay`) #30399

lucyleeow commented Dec 3, 2024 •

edited

Loading

github-actions bot commented Dec 3, 2024 •

edited

Loading

lucyleeow commented Feb 13, 2025 •

edited

Loading