ENH Add Multiclass Brier Score Loss #22046

ogrisel · 2021-12-21T14:24:36Z

Resolves #16055.
This PR updates #18699 by @aggvarun01 after a merge with main and resolves merge conflicts. I do not have the permissions to push directly in the original branch and opening a sub-PR pointing to #18699 would lead to an unreadable diff because of the one-year merge sync.

I also added a changelog entry and demonstrate the new function in the multiclass calibration example.

@aggvarun01 if you want feel free to pull the last commit from this commit from this branch to your branch. Alternatively we can finalize the review here.

…score_loss

Co-authored-by: Olivier Grisel <[email protected]>

…e_loss

lorentzenchr · 2024-05-21T10:50:10Z

@ogrisel Do you intend to finish this one? It would be nice to have it prior to #28971.

ogrisel · 2024-05-21T14:50:26Z

Two strings in the scorer string "binary_brier_score", "multiclass_brier_score"

I guess the scorer strings would instead be "neg_binary_brier_score" and "neg_multiclass_brier_score" (to enforce the greater is better model selection convention that is common to all our scorer string names) while we would also introduce two binary_brier_score, multiclass_brier_score functions that would probably share some common private helper code under the hood. We would ensure that the docstring of those functions would cross-reference each other explain the difference in normalization constant.

The binary_brier_score would raise if passed a y_observed value (or y_proba_pred) with n_classes > 2 with a message pointing to use the multiclass_ variant instead.

The multiclass_ variants on the other hand could accept binary data without raising or warning I think.

Then we would deprecate the "neg_brier_score" name and brier_score function in favor or the binary_ prefixed counter part.

If that's right I can start working on reviving this PR to implement this plan.

lorentzenchr · 2024-05-23T07:04:15Z

The first step, if I recall correctly, would be to add keyword to brier_score_loss to switch between normalizations.
A 2nd step is about the different strings to select a score. I'm still not too happy about a proliferation of names because a user can always use the the score directly with the normalization parameter.

…ore_loss

github-actions · 2024-11-06T16:04:21Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 4f00c63. Link to the linter CI: here}

antoinebaker · 2024-11-06T16:20:34Z

For the changelog, I first tried two entries 22046.feature.rst and 18699.feature.rst with the same content. That indeed works for towncrier, it merges them into one single entry and credits both PR #22046 and #18699. But then unfortunately the "Check Changelog" complains, because I guess #18699 has not been merged, so I only keep this PR entry 22046.feature.rst.

antoinebaker · 2024-11-06T16:35:22Z

sklearn/metrics/_classification.py

+
+
+def multiclass_brier_score_loss(y_true, y_prob, sample_weight=None, labels=None):
+    r"""Compute the Brier score loss.


@ogrisel Should we add a @validate_params here ? It seems that all metrics have this decorator now. Also should we rename y_prob -> y_proba ? It seems to be the new convention (eg in brier_score_loss)

antoinebaker · 2024-11-07T10:31:41Z

Should we add array api support for multiclass_brier_score_loss and brier_score_loss or do this in a follow up PR ?

lorentzenchr · 2024-11-07T11:53:06Z

Follow-up PR. Let's keep different features well separated by PRs.

antoinebaker · 2024-11-12T10:26:35Z

@ogrisel @lorentzenchr have you settled on the scorer strings / normalization keyword debate ? It's not clear to me with the above comments whether you prefer:

deprecate brier_score_loss in favor of binary_brier_score_loss and corresponding scoring strings
introduce a normalization keyword to brier_score_loss

lorentzenchr · 2024-11-12T10:37:51Z

I am -1000 for 2 different functions for binary and multiclass. I want a single function dealing with both, as does log_loss.

ogrisel · 2024-11-14T17:41:06Z

Let's keep y_prob, it's still used a lot elsewhere and y_proba sounds like a weird Frenchism.

…ore_loss

antoinebaker · 2024-12-04T16:36:23Z

The normalize keyword was added to brier_score_loss with the meaning "divides by the number of classes".

Then b60cb3b results in the following for brier_score_loss:

the unnormalized Brier score ranges from 0 to 2
the normalized Brier score ranges from 0 to 2/n_classes
labels was added for the multi-class support (required when not all classes appear in y_true)
pos_label was kept for binary classification (required when y_true contains strings)

There is some ambiguity on binary vs multiclass when n_classes=2, especially about the usage of pos_label vs labels. Currently the following choice has been made:

targets y_true must be passed as a 1D array (n_samples,)
pos_label is only used when y_prob.shape=(n_samples,) which is considered as a binary classification task
labels is only used when y_prob.shape=(n_sample, n_classes) which is considered as a multiclass task (even when n_classes=2)

@lorentzenchr @ogrisel are you happy with the current API ?
If so, I can continue on cleaning up the tests, docs and scoring strings.

ogrisel · 2024-12-06T17:32:14Z

I think the API makes sense, apart from the meaning of normalize=True.

However, I am not sure about the normalize=True parameter with 2/n_classes. For three class problems, that would mean Brier score values between 0 and 2/3? Who would want to compute this?

Shall we not just error when normalize=True and n_classes > 2? Then we issue a FutureWarning to state that in the future normalize=False will become the default, meaning that Brier score will always be between 0 and 2, including for binary classification problems.

Alternatively, normalize=True could mean to divide by 2 instead of by n_classes so that normalize=True means that Brier score is always scaled between 0 (best) and 1 (worst), whatever the number of classes.

This parameter could be named scaled_by_half=True instead or something else more explicit than normalize.

antoinebaker · 2024-12-07T12:39:57Z

I think the API makes sense, apart from the meaning of normalize=True.

However, I am not sure about the normalize=True parameter with 2/n_classes. For three class problems, that would mean Brier score values between 0 and 2/3? Who would want to compute this?

I guess only for the correspondence with the MSE, if we divide by n_classes the Brier score is indeed the MSE. But it doesn't seem to be a common practice, I prefer your scaled_by_half parameter.

What about scaled_by_half with a default "auto" option meaning scaled_by_half=(n_classes==2) ?
The Brier score then follows the convention by default and remains backward compatible.

ogrisel · 2024-12-09T09:06:33Z

+1 for scaled_by_half="auto" that divides by 2 whenever n_classes=True.

This way we can automatically switch between the two formula of the Wikipedia page:

https://en.wikipedia.org/wiki/Brier_score

It's also backward compatible with the current implementation, while also giving control to the user if they prefer to use a different convention.

What do you think of this proposal @lorentzenchr?

lorentzenchr · 2024-12-09T11:02:12Z

In my opinion, it's important to keep the goal in sight: one Brier scoring function, ideally with no scaling (always range [0, 2]).

So yes, I'm ok with scaled_by_half.
I would rename it to scale_by_half.
I would start with the default scale_by_half="deprecated" and later set the default to True.

ogrisel · 2024-12-09T11:08:42Z

I would rename it to scale_by_half.

+1

I would start with the default scale_by_half="deprecated" and later set the default to True.

To me, scale_by_half=True means a BS range of [0, 1] (instead of [0, 2]) and would typically be used for the binary classification setting where it's common to use the [0, 1] range.

I am ok with raising a FutureWarning to change the scale_by_half default value to False in the future (to always use the [0, 2] range): in the long term this will make our code simpler to maintain by only implementing the convention introduced in the original definition by Brier.

Varun Aggarwal and others added 30 commits October 25, 2020 20:47

add multi-class support

f630718

fix swapped y_true y_prob

e08d4f4

fix docstring

eff8854

fix docstring

d864395

fix variable name spelling

32ab60a

add tests

6e73c0d

merge upstream

7ce3f85

import re

9cd4247

fix docstring

1369945

fix linting

a183d06

fix linting

08688d3

remove unused import

4f8a5f2

add multiclass_brier_score_loss

7b51433

add tests

d5c90bf

fix docstring

2243828

Merge remote-tracking branch 'upstream/master' into multiclass_brier_…

9893101

…score_loss

use f-strings

3e4465f

fix tests

eafda42

fix error message

038abf7

fix docstring

838f827

fix linting

5ef41c7

Update sklearn/metrics/_classification.py

4fb4c4f

Co-authored-by: Olivier Grisel <[email protected]>

Apply suggestions from code review

3260bf3

Co-authored-by: Olivier Grisel <[email protected]>

split tests

86d793e

add private function

411ec1a

add warning for labels

f84493c

Merge remote-tracking branch 'origin/main' into multiclass_brier_scor…

79f014d

…e_loss

Fix multiclass_brier_score_loss docstring sections order

50f50ef

Add entry in the changelog

884c434

Update multiclass calibration example

cdc4cc9

lorentzenchr mentioned this pull request May 21, 2024

FEA D2 Brier Score #28971

Open

antoinebaker added 3 commits November 6, 2024 15:27

Merge remote-tracking branch 'upstream/main' into multiclass_brier_sc…

28c313f

…ore_loss

fix doctest and matched errors

478c568

changelog

2ee8fd7

fix changelog

b6e8344

antoinebaker reviewed Nov 6, 2024

View reviewed changes

antoinebaker added 3 commits December 2, 2024 10:33

Merge remote-tracking branch 'upstream/main' into multiclass_brier_sc…

da9e8c6

…ore_loss

add normalization keyword

b60cb3b

Merge remote-tracking branch 'upstream/main' into multiclass_brier_sc…

2754db7

…ore_loss

document normalize

dcde0d4

antoinebaker and others added 4 commits December 11, 2024 14:56

add scale_by_half

a0baefb

update doc

3311ca8

changelog

bfa89dd

Merge branch 'main' into multiclass_brier_score_loss

4f00c63

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH Add Multiclass Brier Score Loss #22046

ENH Add Multiclass Brier Score Loss #22046

ogrisel commented Dec 21, 2021 •

edited by lorentzenchr

Loading

lorentzenchr commented May 21, 2024

ogrisel commented May 21, 2024 •

edited

Loading

lorentzenchr commented May 23, 2024

github-actions bot commented Nov 6, 2024 •

edited

Loading

antoinebaker commented Nov 6, 2024 •

edited

Loading

antoinebaker Nov 6, 2024 •

edited

Loading

antoinebaker commented Nov 7, 2024

lorentzenchr commented Nov 7, 2024

antoinebaker commented Nov 12, 2024 •

edited

Loading

lorentzenchr commented Nov 12, 2024

ogrisel commented Nov 14, 2024

antoinebaker commented Dec 4, 2024 •

edited

Loading

ogrisel commented Dec 6, 2024

antoinebaker commented Dec 7, 2024

ogrisel commented Dec 9, 2024 •

edited

Loading

lorentzenchr commented Dec 9, 2024

ogrisel commented Dec 9, 2024 •

edited

Loading



		def multiclass_brier_score_loss(y_true, y_prob, sample_weight=None, labels=None):
		r"""Compute the Brier score loss.

ENH Add Multiclass Brier Score Loss #22046

Are you sure you want to change the base?

ENH Add Multiclass Brier Score Loss #22046

Conversation

ogrisel commented Dec 21, 2021 • edited by lorentzenchr Loading

lorentzenchr commented May 21, 2024

ogrisel commented May 21, 2024 • edited Loading

lorentzenchr commented May 23, 2024

github-actions bot commented Nov 6, 2024 • edited Loading

✔️ Linting Passed

antoinebaker commented Nov 6, 2024 • edited Loading

antoinebaker Nov 6, 2024 • edited Loading

Choose a reason for hiding this comment

antoinebaker commented Nov 7, 2024

lorentzenchr commented Nov 7, 2024

antoinebaker commented Nov 12, 2024 • edited Loading

lorentzenchr commented Nov 12, 2024

ogrisel commented Nov 14, 2024

antoinebaker commented Dec 4, 2024 • edited Loading

ogrisel commented Dec 6, 2024

antoinebaker commented Dec 7, 2024

ogrisel commented Dec 9, 2024 • edited Loading

lorentzenchr commented Dec 9, 2024

ogrisel commented Dec 9, 2024 • edited Loading

ogrisel commented Dec 21, 2021 •

edited by lorentzenchr

Loading

ogrisel commented May 21, 2024 •

edited

Loading

github-actions bot commented Nov 6, 2024 •

edited

Loading

antoinebaker commented Nov 6, 2024 •

edited

Loading

antoinebaker Nov 6, 2024 •

edited

Loading

antoinebaker commented Nov 12, 2024 •

edited

Loading

antoinebaker commented Dec 4, 2024 •

edited

Loading

ogrisel commented Dec 9, 2024 •

edited

Loading

ogrisel commented Dec 9, 2024 •

edited

Loading