FEA add zero_division to matthews_corrcoef #28509

Redjest · 2024-02-22T21:10:44Z

Reference Issues/PRs

Partially address: #29048
Fixes #25258
See also #19977

What does this implement/fix? Explain your changes.

The Matthews correlation coefficient is ill-defined (due to zero division) when only one class is present in either the true or predicted labels.

If only one of the true or predicted labels contains a single class, the limit value is 0 (can be shown using polar coordinates). This is sensible as it suggests that the model either provided constant predictions on non-constant data, or variable predictions on single-class data. In such cases, the metric should return a value of 0.

However, if both the true and predicted labels contain only a single class, the limit does not exist, rendering the metric undefined. Also, if both true and model labels are single class, the model succeeded in trivial task and we genuinely can't tell if the correlation is good or poor. Consequently, in this scenario, the metric should return a nan value. This behavior was chosen to avoid returning 0 for perfect predictions on single-class data.

Any other comments?

I noticed that for some other metrics the handling of zero division was based on _prf_divide(), but it didn't seem a good fit for this case as its default behavior is to return 0 on both cases and IMHO directly returning nan or 0 yields more readable code and is simpler for users.

@marctorsoc @glemaitre WDYT?

github-actions · 2024-02-22T21:12:20Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: a8a4ca0. Link to the linter CI: here}

marctorsoc

Thanks for the contribution! We still need some discussion, so not approving yet. Either way, I think you need maintainer approval to merge :)

sklearn/metrics/tests/test_classification.py

sklearn/metrics/_classification.py

Co-authored-by: Marc Torrellas Socastro <[email protected]>

Redjest · 2024-03-26T18:34:30Z

@glemaitre WDUT?

Redjest · 2024-05-02T08:06:16Z

@glemaitre?

glemaitre · 2024-05-13T10:40:19Z

I'll had this PR for the new milestones. I'll provide a review.

glemaitre

So for consistency with other metrics and the fact that to have a single class is kind of an extreme "weird" use case in practice, I want that we add the zero_division parameter as it was previously decided: #23183 (comment)

sklearn/metrics/_classification.py

doc/whats_new/v1.6.rst

glemaitre · 2024-06-12T15:23:37Z

@Redjest Are you able to address the comments?

Redjest · 2024-06-15T18:31:34Z

@Redjest Are you able to address the comments?
Sure, had some very busy days, hope to get to it soon.

glemaitre

I push the modification directly in this branch. I added the zero_division parameter as discussed before and modify the test accordingly.

@adrinjalali you might want to have a look at this one. I think that we can include it in 1.6

glemaitre · 2024-10-29T17:25:43Z

Also @lucyleeow is aware of this topic ;)

Redjest added 3 commits February 22, 2024 17:47

Implementation and documentation of the new MCC zero division behaviour

843e007

Updated the tests accordingly

99711c2

Fixed a typo

e6a74da

github-actions bot added the module:metrics label Feb 22, 2024

marctorsoc reviewed Feb 23, 2024

View reviewed changes

Redjest and others added 4 commits February 23, 2024 16:18

Apply suggestions from code review

80030cf

Co-authored-by: Marc Torrellas Socastro <[email protected]>

Linting

2bb963e

Merge remote-tracking branch 'upstream/main' into mcc_zero_division

57c0248

Merge remote-tracking branch 'upstream/main' into mcc_zero_division

938a45a

glemaitre self-requested a review February 28, 2024 16:25

Redjest added 7 commits March 9, 2024 20:49

Merge remote-tracking branch 'upstream/main' into mcc_zero_division

959b4bd

updated the changelog

7a23ae2

Fixed an indent problem in the changelog

bbd8a24

Fixed a changelog MD issue

d7d977a

Merge branch 'main' into mcc_zero_division

e08fff1

Fixed a git issue

70a652b

Merge branch 'main' into mcc_zero_division

e22268a

Merge branch 'main' into mcc_zero_division

bd61c7d

glemaitre added this to the 1.6 milestone May 13, 2024

glemaitre mentioned this pull request May 13, 2024

Add zero_division for single class prediction in MCC #28982

Closed

Merge remote-tracking branch 'origin/main' into pr/Redjest/28509

f7e5d2f

glemaitre reviewed May 19, 2024

View reviewed changes

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

glemaitre reviewed May 19, 2024

View reviewed changes

doc/whats_new/v1.6.rst Outdated Show resolved Hide resolved

glemaitre mentioned this pull request Jun 6, 2024

Make zero_division parameter consistent in the different metric #29048

Open

5 tasks

glemaitre self-requested a review October 29, 2024 16:27

glemaitre added 4 commits October 29, 2024 17:59

implement zero_division

a4265f6

Merge remote-tracking branch 'origin/main' into pr/Redjest/28509

2b5e919

modify changelog to use towncrier

e717d08

fix hyperlink

32f0cbd

glemaitre changed the title ~~FIX matthews_corrcoef returns zero for perfect prediction~~ FEA add zero_division to matthews_corrcoef Oct 29, 2024

[doc build] trigger documentation build

a8a4ca0

glemaitre approved these changes Oct 29, 2024

View reviewed changes

adrinjalali approved these changes Oct 30, 2024

View reviewed changes

adrinjalali merged commit ba2dd5d into scikit-learn:main Oct 30, 2024
30 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEA add zero_division to matthews_corrcoef #28509

FEA add zero_division to matthews_corrcoef #28509

Redjest commented Feb 22, 2024 •

edited by glemaitre

Loading

github-actions bot commented Feb 22, 2024 •

edited

Loading

marctorsoc left a comment

Redjest commented Mar 26, 2024

Redjest commented May 2, 2024

glemaitre commented May 13, 2024

glemaitre left a comment

glemaitre commented Jun 12, 2024

Redjest commented Jun 15, 2024

glemaitre left a comment

glemaitre commented Oct 29, 2024

FEA add zero_division to matthews_corrcoef #28509

FEA add zero_division to matthews_corrcoef #28509

Conversation

Redjest commented Feb 22, 2024 • edited by glemaitre Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Feb 22, 2024 • edited Loading

✔️ Linting Passed

marctorsoc left a comment

Choose a reason for hiding this comment

Redjest commented Mar 26, 2024

Redjest commented May 2, 2024

glemaitre commented May 13, 2024

glemaitre left a comment

Choose a reason for hiding this comment

glemaitre commented Jun 12, 2024

Redjest commented Jun 15, 2024

glemaitre left a comment

Choose a reason for hiding this comment

glemaitre commented Oct 29, 2024

Redjest commented Feb 22, 2024 •

edited by glemaitre

Loading

github-actions bot commented Feb 22, 2024 •

edited

Loading