Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEA add zero_division to matthews_corrcoef #28509

Merged
merged 21 commits into from
Oct 30, 2024

Conversation

Redjest
Copy link
Contributor

@Redjest Redjest commented Feb 22, 2024

Reference Issues/PRs

Partially address: #29048
Fixes #25258
See also #19977

What does this implement/fix? Explain your changes.

The Matthews correlation coefficient is ill-defined (due to zero division) when only one class is present in either the true or predicted labels.

If only one of the true or predicted labels contains a single class, the limit value is 0 (can be shown using polar coordinates). This is sensible as it suggests that the model either provided constant predictions on non-constant data, or variable predictions on single-class data. In such cases, the metric should return a value of 0.

However, if both the true and predicted labels contain only a single class, the limit does not exist, rendering the metric undefined. Also, if both true and model labels are single class, the model succeeded in trivial task and we genuinely can't tell if the correlation is good or poor. Consequently, in this scenario, the metric should return a nan value. This behavior was chosen to avoid returning 0 for perfect predictions on single-class data.

Any other comments?

I noticed that for some other metrics the handling of zero division was based on _prf_divide(), but it didn't seem a good fit for this case as its default behavior is to return 0 on both cases and IMHO directly returning nan or 0 yields more readable code and is simpler for users.

@marctorsoc @glemaitre WDYT?

Copy link

github-actions bot commented Feb 22, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: a8a4ca0. Link to the linter CI: here

Copy link
Contributor

@marctorsoc marctorsoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! We still need some discussion, so not approving yet. Either way, I think you need maintainer approval to merge :)

sklearn/metrics/tests/test_classification.py Outdated Show resolved Hide resolved
sklearn/metrics/tests/test_classification.py Outdated Show resolved Hide resolved
sklearn/metrics/tests/test_classification.py Outdated Show resolved Hide resolved
sklearn/metrics/_classification.py Outdated Show resolved Hide resolved
@glemaitre glemaitre self-requested a review February 28, 2024 16:25
@Redjest
Copy link
Contributor Author

Redjest commented Mar 26, 2024

@glemaitre WDUT?

@Redjest
Copy link
Contributor Author

Redjest commented May 2, 2024

@glemaitre?

@glemaitre
Copy link
Member

I'll had this PR for the new milestones. I'll provide a review.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for consistency with other metrics and the fact that to have a single class is kind of an extreme "weird" use case in practice, I want that we add the zero_division parameter as it was previously decided: #23183 (comment)

doc/whats_new/v1.6.rst Outdated Show resolved Hide resolved
@glemaitre
Copy link
Member

@Redjest Are you able to address the comments?

@Redjest
Copy link
Contributor Author

Redjest commented Jun 15, 2024

@Redjest Are you able to address the comments?
Sure, had some very busy days, hope to get to it soon.

@glemaitre glemaitre self-requested a review October 29, 2024 16:27
@glemaitre glemaitre changed the title FIX matthews_corrcoef returns zero for perfect prediction FEA add zero_division to matthews_corrcoef Oct 29, 2024
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I push the modification directly in this branch. I added the zero_division parameter as discussed before and modify the test accordingly.

@adrinjalali you might want to have a look at this one. I think that we can include it in 1.6

@glemaitre
Copy link
Member

Also @lucyleeow is aware of this topic ;)

@adrinjalali adrinjalali merged commit ba2dd5d into scikit-learn:main Oct 30, 2024
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

Matthews correlation coefficient gives wrong results for single-class result
4 participants