DOC Improve and make consistent `scoring` parameter docstrings #30319

lucyleeow · 2024-11-21T11:26:04Z

Reference Issues/PRs

Follow on from #30303
Builds (essentially branches) from #30316 so keeping this as draft until that one goes in.

What does this implement/fix? Explain your changes.

Show scoring options as bullet points
Add info on what None does
Makes this parameter docstring consistent
I've tried to keep any additional info from the original docstring, (e.g., "which should return only a single value", even though this is true in all cases, just in case this was important to highlight in these specific cases?)

Any other comments?

github-actions · 2024-11-21T11:27:34Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 604ccba. Link to the linter CI: here}

lucyleeow · 2024-11-21T11:29:31Z

sklearn/linear_model/_ridge.py

+        - `None`: negative :ref:`mean squared error <mean_squared_error>` if cv is
+          None (i.e. when using leave-one-out cross-validation), and
+          :ref:`coefficient of determination <r2_score>` (:math:`R^2`) otherwise.


@StefanieSenger I get what you meant when you said Ridge is confusing. cv can no longer be 'auto' (I assume thats been deprecated) so removing that helps.

Essentially when using cv=None, we use the default efficient LOO cv, in which case negative mean squared error is used. R2 is used for all other values of cv. Now that I've figured out what this means, it makes sense as it is and I can't think of a better way to put it without making it too long. Suggestions welcome.

That reads much clearer. :)

I am wondering, since there is no deprecation notification on cv for 'auto', if the gcv_mode is meant with it. In this case it would have to be mentioned here.

Edit: just checked: the reference to 'auto' first appeared in the 0.21 release (link), but cv didn't have the option 'auto' then.

Yeah no idea where 'auto' came from.

if the gcv_mode is meant with it.

I don't think so because I think mean squared error is the default no matter what gcv mode is:

scikit-learn/sklearn/linear_model/_ridge.py

Lines 2174 to 2175 in caaa1f5

squared_errors = (c / G_inverse_diag) ** 2

alpha_score = self._score_without_scorer(squared_errors=squared_errors)

also 'auto' could be either 'svd' or 'eigen' so it doesn't make sense to say score is only a specific thing when gcv_mode is auto.

lucyleeow · 2024-11-29T09:54:08Z

sklearn/feature_selection/_rfe.py

@@ -553,7 +553,7 @@ class RFECV(RFE):

    The number of features selected is tuned automatically by fitting an :class:`RFE`
    selector on the different cross-validation splits (provided by the `cv` parameter).
-    The performance of the :class:`RFE` selector are evaluated using `scorer` for
+    The performance of the :class:`RFE` selectors are evaluated using `scoring` for


There is a selector for each cv fold, so I think plural is right here

lucyleeow · 2024-11-29T10:01:21Z

sklearn/base.py

@@ -545,7 +545,7 @@ def __sklearn_tags__(self):

    def score(self, X, y, sample_weight=None):
        """
-        Return the mean accuracy on the given test data and labels.
+        Return mean :ref:`accuracy <accuracy_score>` on test data and labels.


Thinking again about the use of the word 'mean' again, maybe it is used to indicate it is not the count (which you get with normalize=False) but the fraction/'mean' accuracy across the test set?

Following on from our conversation in #30316 (comment)

I do still agree that 'mean' is confusing. Not sure what a better solution would be though, 'fraction accuracy' ?

Yes, it seems so to me as well.

But accuracy is defined (acc = (TP+TN)/(all)) as a normalized fraction anyways and I think we don't need to provide the implementation details to the users here.

I've removed 'mean', but happy to change

lucyleeow · 2024-11-29T10:02:50Z

This is ready for review now #30316 is merged, maybe @StefanieSenger and @ArturoAmorQ may be interested, if you are not tired of scoring! Thank you

StefanieSenger

Thank you for the PR, @lucyleeow. It will help our users find out about the options for scoring.

I left some suggestions.

StefanieSenger · 2024-12-06T10:47:04Z

sklearn/base.py

@@ -545,7 +545,7 @@ def __sklearn_tags__(self):

    def score(self, X, y, sample_weight=None):
        """
-        Return the mean accuracy on the given test data and labels.
+        Return mean :ref:`accuracy <accuracy_score>` on test data and labels.


Yes, it seems so to me as well.

But accuracy is defined (acc = (TP+TN)/(all)) as a normalized fraction anyways and I think we don't need to provide the implementation details to the users here.

StefanieSenger · 2024-12-06T10:48:55Z

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

+          ``scorer(estimator, X, y)``. See :ref:`scoring_callable` for details.
+        - `None`: the :ref:`coefficient of determination <r2_score>`
+          (:math:`R^2`) is used.
+        - 'loss': early stopping is checked w.r.t the loss value.


Suggested change

- 'loss': early stopping is checked w.r.t the loss value.

- 'loss': efficiently compute score with respect to the loss value

at each iteration.

Maybe like this?

Hmm maybe I don't understand this parameter. I thought this was the metric to use to see if we should stop early, it's not a score to return..? In which case 'compute score' seems misleading?

I'm not sure, but what I was thinking was that the scoring to be used for the predictions is computed from the loss somehow during fit time, for efficiency.

But true: the code is early stopping based on the scores archived and it doesn't seem to have anything to do with predict(). I am more confused about this than before. Please, feel free to disregard my suggestion.

I am more confused about this than before.

The meat of the early stopping code seems to be here:

scikit-learn/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

Lines 730 to 756 in ebfda90

if self.scoring == "loss":

# we're going to compute scoring w.r.t the loss. As losses

# take raw predictions as input (unlike the scorers), we

# can optimize a bit and avoid repeating computing the

# predictions of the previous trees. We'll reuse

# raw_predictions (as it's needed for training anyway) for

# evaluating the training loss.

self._check_early_stopping_loss(

raw_predictions=raw_predictions,

y_train=y_train,

sample_weight_train=sample_weight_train,

raw_predictions_val=raw_predictions_val,

y_val=y_val,

sample_weight_val=sample_weight_val,

n_threads=n_threads,

)

else:

self._scorer = check_scoring(self, self.scoring)

# _scorer is a callable with signature (est, X, y) and

# calls est.predict() or est.predict_proba() depending on

# its nature.

# Unfortunately, each call to _scorer() will compute

# the predictions of all the trees. So we use a subset of

# the training set to compute train scores.

# Compute the subsample set

AFAICT, we want to check if there has been improvement in the score of the predictions, and if not we stop early (before we reach max leaves or something). For 'loss' we just use the loss, otherwise a scorer function is used. Both calculate on predictions, but scorers have the signature (est, X, y), so we do it on a subset of train to avoid calling predict constantly.

Yes, after looking at it again I would agree.

The scoring param here is used differently than in the CV models or in cross validation, which is only for determining how we calculate early stopping (if it is enabled). It's the parameter name, that is confusing. But this cannot be changed so easily, so we need to live with it.

Going back to the documentation I see that it doesn't promise anything more than the scoring is used in early stopping: Scoring parameter to use for early stopping. Still, I wonder if this misunderstanding could be prevented for other people by rewording into Scoring to use for calculating early stopping. Or alternatively add as a last sentence, that this scoring param differs from the way scoring is defined in the Glossary. But it's also good as it is.

I see your point. I've altered it slightly to "Scoring method to use ..." because the 'parameter' part seemed redundant and possibly confusing.

I think it is clear enough in the parameter docstring that this is used only for early stopping. What we could do though is add an extra sentence in the glossary about how scoring is sometimes used to specify scoring method to use for early stopping as well?

What we could do though is add an extra sentence in the glossary about how scoring is sometimes used to specify scoring method to use for early stopping as well?

Oh yes, that's good. 👍

StefanieSenger · 2024-12-06T11:21:33Z

sklearn/feature_selection/_rfe.py

+    The performance of the :class:`RFE` selectors are evaluated using `scoring` for
    different number of selected features and aggregated together. Finally, the scores


Suggested change

The performance of the :class:`RFE` selectors are evaluated using `scoring` for

different number of selected features and aggregated together. Finally, the scores

The performance of the :class:`RFE` selectors is evaluated using `scoring` for

different numbers of selected features and aggregated together. Finally, the scores

is sounds wrong, what about just:

Suggested change

The performance of the :class:`RFE` selectors are evaluated using `scoring` for

different number of selected features and aggregated together. Finally, the scores

The performance of each :class:`RFE` selector is evaluated using `scoring` for

different numbers of selected features and aggregated together. Finally, the scores

That's perfect! :)

StefanieSenger · 2024-12-06T11:30:37Z

sklearn/feature_selection/_sequential.py

        NOTE that when using a custom scorer, it should return a single
        value.


This constraint could be integrated with the description of callable a few lines above:

"callable with [signature ...], that returns a single value"

sklearn/feature_selection/_sequential.py

StefanieSenger · 2025-01-19T19:52:17Z

sklearn/linear_model/_logistic.py

-        a scorer callable object / function with signature
-        ``scorer(estimator, X, y)``. For a list of scoring functions
-        that can be used, look at :mod:`sklearn.metrics`.
+    scoring : str or callable or None


Suggested change

scoring : str or callable or None

scoring : str or callable, default=None

Should we also mention the default type here, similar as in the other parts of the docs?

There is no default for this one (private func), but I have looked at the rest and made them consistent, thanks!

sklearn/model_selection/_validation.py

StefanieSenger

Thanks for your further improvements, @lucyleeow!

I went over the PR again and found some minor things, that could be improved. Otherwise I am very happy. I had been thinking about how we could be documenting the scoring better for some time before, because it was a bit obscure. But low it looks very helpful.

ArturoAmorQ

Thanks for the PR and sorry for taking so long to review @lucyleeow! Here are a couple of tweaks but the general status is already a big improvement :)

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

sklearn/linear_model/_logistic.py

sklearn/linear_model/_ridge.py

ArturoAmorQ · 2025-02-06T13:46:41Z

sklearn/model_selection/_validation.py

-        the test set. If `None`, the
-        :ref:`default evaluation criterion <scoring_api_overview>` of the estimator
-        is used.
+        the test set.


The term "test set" here makes me feel as if the user had to provided the test set coming from a train-test split. I guess "validation set" or just "left out data" would be more appropriate, but maybe it's only me. Alternatively, we could use a wording similar to

Strategy to evaluate the performance of the model across cross-validation splits.

If you agree, this comment also applies to cross_val_score and maybe slightly adapt it for permutation_test_score.

I like:

Strategy to evaluate the performance of the model across cross-validation splits.

I've altered to use estimator, to reference the estimator parameter:

Strategy to evaluate the performance of the estimator across cross-validation splits.

Am happy to change though.

For permutation_test_score I have used "validation set", as we use CV to generate the null hypothesis distribution as well, so I thought "across cross-validation splits" was not as fitting. Happy to change.

sklearn/base.py

lucyleeow · 2025-02-07T02:23:44Z

Thanks for the review @ArturoAmorQ ! I've made the changes but happy to change anything.

ArturoAmorQ

Thanks @lucyleeow! Merging!

lucyleeow added 11 commits November 19, 2024 14:12

add none

80dc0ae

review

ceb11e5

wip

1987d15

merge main

1b5d73f

improve scoring param

be6a14a

wording

491982a

wording

853cfd8

Merge branch 'doc_score_api' into doc_score_none

3792012

wip

f09f353

fix cross ref

69f23de

improve scoring doc

a56cdcf

lucyleeow marked this pull request as draft November 21, 2024 11:26

github-actions bot added the Documentation label Nov 21, 2024

lucyleeow commented Nov 21, 2024

View reviewed changes

lucyleeow added 10 commits November 22, 2024 21:40

Merge branch 'main' into doc_score_none

523ef25

Merge branch 'main' into doc_score_api

4acbe1a

reviews

836a263

fix typos

68ba861

fixes

84cba21

review

bb03495

merge doc_score_api

74f2daf

fix refs

8c1231c

mergemain

0026d37

more merge

5c324ba

lucyleeow marked this pull request as ready for review November 29, 2024 09:57

lucyleeow commented Nov 29, 2024

View reviewed changes

StefanieSenger reviewed Dec 6, 2024

View reviewed changes

review

2c7ee57

lucyleeow added 2 commits December 10, 2024 16:12

merge main

d86e3c1

fix merge

d25d36b

lucyleeow mentioned this pull request Dec 10, 2024

DOC Fix broken ref #30407

Merged

lucyleeow added 3 commits December 18, 2024 15:57

Merge branch 'main' into doc_score_none

b4da48f

Merge branch 'main' into doc_score_none

fbb516c

fix

e6744fc

lucyleeow mentioned this pull request Dec 27, 2024

DOC Add early stopping case to scoring glossary entry #30544

Merged

glemaitre self-requested a review January 8, 2025 07:57

StefanieSenger reviewed Jan 19, 2025

View reviewed changes

sklearn/feature_selection/_sequential.py Outdated Show resolved Hide resolved

StefanieSenger approved these changes Jan 19, 2025

View reviewed changes

StefanieSenger reviewed Jan 19, 2025

View reviewed changes

sklearn/feature_selection/_sequential.py Outdated Show resolved Hide resolved

StefanieSenger reviewed Jan 19, 2025

View reviewed changes

sklearn/model_selection/_validation.py Outdated Show resolved Hide resolved

StefanieSenger reviewed Jan 19, 2025

View reviewed changes

Merge branch 'main' into doc_score_none

9c681cb

lucyleeow added the Waiting for Second Reviewer First reviewer is done, need a second one! label Jan 21, 2025

review

8ca6603

ArturoAmorQ reviewed Feb 6, 2025

View reviewed changes

lucyleeow added 2 commits February 7, 2025 09:07

Merge branch 'main' into doc_score_none

ead6729

review

604ccba

ArturoAmorQ approved these changes Feb 7, 2025

View reviewed changes

ArturoAmorQ merged commit a4225f3 into scikit-learn:main Feb 7, 2025
31 checks passed

lucyleeow deleted the doc_score_none branch February 7, 2025 09:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC Improve and make consistent `scoring` parameter docstrings #30319

DOC Improve and make consistent `scoring` parameter docstrings #30319

lucyleeow commented Nov 21, 2024 •

edited

Loading

github-actions bot commented Nov 21, 2024 •

edited

Loading

lucyleeow Nov 21, 2024

StefanieSenger Nov 25, 2024 •

edited

Loading

lucyleeow Nov 27, 2024

lucyleeow Nov 29, 2024

lucyleeow Nov 29, 2024

StefanieSenger Dec 6, 2024

lucyleeow Dec 10, 2024

lucyleeow commented Nov 29, 2024 •

edited

Loading

StefanieSenger left a comment

StefanieSenger Dec 6, 2024

StefanieSenger Dec 6, 2024

lucyleeow Dec 10, 2024

StefanieSenger Dec 12, 2024

lucyleeow Dec 18, 2024

StefanieSenger Dec 18, 2024

lucyleeow Dec 19, 2024

StefanieSenger Dec 19, 2024

StefanieSenger Dec 6, 2024

lucyleeow Dec 10, 2024

StefanieSenger Dec 12, 2024

StefanieSenger Dec 6, 2024

StefanieSenger Jan 19, 2025

lucyleeow Jan 21, 2025

StefanieSenger left a comment •

edited

Loading

ArturoAmorQ left a comment

ArturoAmorQ Feb 6, 2025

lucyleeow Feb 7, 2025

lucyleeow commented Feb 7, 2025

ArturoAmorQ left a comment

	squared_errors = (c / G_inverse_diag) ** 2
	alpha_score = self._score_without_scorer(squared_errors=squared_errors)

	- 'loss': early stopping is checked w.r.t the loss value.
	- 'loss': efficiently compute score with respect to the loss value
	at each iteration.

	if self.scoring == "loss":
	# we're going to compute scoring w.r.t the loss. As losses
	# take raw predictions as input (unlike the scorers), we
	# can optimize a bit and avoid repeating computing the
	# predictions of the previous trees. We'll reuse
	# raw_predictions (as it's needed for training anyway) for
	# evaluating the training loss.

	self._check_early_stopping_loss(
	raw_predictions=raw_predictions,
	y_train=y_train,
	sample_weight_train=sample_weight_train,
	raw_predictions_val=raw_predictions_val,
	y_val=y_val,
	sample_weight_val=sample_weight_val,
	n_threads=n_threads,
	)
	else:
	self._scorer = check_scoring(self, self.scoring)
	# _scorer is a callable with signature (est, X, y) and
	# calls est.predict() or est.predict_proba() depending on
	# its nature.
	# Unfortunately, each call to _scorer() will compute
	# the predictions of all the trees. So we use a subset of
	# the training set to compute train scores.

	# Compute the subsample set

		The performance of the :class:`RFE` selectors are evaluated using `scoring` for
		different number of selected features and aggregated together. Finally, the scores

		NOTE that when using a custom scorer, it should return a single
		value.

	scoring : str or callable or None
	scoring : str or callable, default=None

DOC Improve and make consistent scoring parameter docstrings #30319

DOC Improve and make consistent scoring parameter docstrings #30319

Conversation

lucyleeow commented Nov 21, 2024 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Nov 21, 2024 • edited Loading

✔️ Linting Passed

Choose a reason for hiding this comment

StefanieSenger Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lucyleeow commented Nov 29, 2024 • edited Loading

StefanieSenger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StefanieSenger left a comment • edited Loading

Choose a reason for hiding this comment

ArturoAmorQ left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lucyleeow commented Feb 7, 2025

ArturoAmorQ left a comment

Choose a reason for hiding this comment

DOC Improve and make consistent `scoring` parameter docstrings #30319

DOC Improve and make consistent `scoring` parameter docstrings #30319

lucyleeow commented Nov 21, 2024 •

edited

Loading

github-actions bot commented Nov 21, 2024 •

edited

Loading

StefanieSenger Nov 25, 2024 •

edited

Loading

lucyleeow commented Nov 29, 2024 •

edited

Loading

StefanieSenger left a comment •

edited

Loading