API deprecate CalibratedClassifierCV(..., cv=prefit) for FrozenEstimator #30171

adrinjalali · 2024-10-29T08:01:35Z

Towards #29893

This deprecates CalibratedClassifierCV(..., cv="prefit") in favor of CalibratedClassifierCV(FrozenEstimator(est)).

Also adds ensemble="auto" option for automatically handling FrozenEstimator correctly.

cc @glemaitre @ogrisel

github-actions · 2024-10-29T08:02:52Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 4c17d22. Link to the linter CI: here}

adam2392

Looks simple enough to me, but I'm by no means an expert in this area of the codebase :p

sklearn/calibration.py

Co-authored-by: Adam Li <[email protected]>

ogrisel

I have mixed feelings about this: asking the user to import a class and wrap an argument feels like a small usability regression.

What about not deprecating cv="prefit" and automatically do the wrapping with FrozenEstimator on the fly in _get_estimator:

    def _get_estimator(self):
        """Resolve which estimator to return (default is LinearSVC)"""
        if self.cv == "prefit" and self.estimator is None:
            raise ValueError(
                'estimator cannot be left to None when cv="prefit"'
            )
        if self.estimator is None:
            # we want all classifiers that don't expose a random_state
            # to be deterministic (and we don't want to expose this one).
            estimator = LinearSVC(random_state=0)
            if _routing_enabled():
                estimator.set_fit_request(sample_weight=True)
        else:
            estimator = self.estimator
            if self.cv == "prefit":
                estimator = FrozenEstimator(estimator)
    return estimator

But then we would have too equivalent ways to do the same thing:

from sklearn.frozen import FrozenEstimator

CalibratedClassifierCV(FrozenEstimator(model)).fit(X_cal, y_cal)

and:

CalibratedClassifierCV(model, cv="prefit").fit(X_cal, y_cal)

But maybe that's ok?

adrinjalali · 2024-10-30T09:23:09Z

I wouldn't say it's a usability regression. cv="prefit" doesn't make any sense semantically. It's just we didn't have any other place to put "please don't fit my estimator" so we've put it under cv.

ogrisel · 2024-10-30T09:53:59Z

cv="prefit" doesn't make any sense semantically

I partially agree... But at the same time, calling cross_val_predict on a frozen estimator is just a complex way to do estimator.predict(X_cal) when cv is a CV splitter with non-overlapping and full-coverage validation folds as with the default cv=5.

In that respect, cv="prefit" is a usability improvement in the sense that it's simpler to understand compared to inferring the meaning and statistical implications of a call such as CalibratedClassifierCV(FrozenEstimator(model), cv=5).

ogrisel

If @thomasjpfan and others think it's better, and I am in the minority, then so be it. It's true that this will also allow a small code simplification once the deprecation period is over.

ogrisel · 2024-10-30T10:05:43Z

sklearn/tests/test_calibration.py

    X, y = dict_data
    clf = dict_data_pipeline
-    calib_clf = CalibratedClassifierCV(clf, cv="prefit")
+    calib_clf = CalibratedClassifierCV(FrozenEstimator(clf), cv=2)


This change highlights another small usability regression: it's difficult to understand why we would need to adjust the CV strategy when calibrating a frozen estimator on a dataset with less than 5 data-points per class. One could expect that cross-fitting would be entirely disabled in that case.

Again, I can live with it.

I'd agree with you, but in reality, people with 5 data points shouldn't be doing statistical ML 😁

In particular, meaningful calibration with 5 data points per class is quite challenging. Still, it reveals a rough edge in my opinion. But I believe we can fix it later if needed.

API deprecate CalibratedClassifierCV(..., cv=prefit) for FrozenEstimator

c39ecc5

changelog

7a1f020

adrinjalali added this to the 1.6 milestone Oct 29, 2024

adam2392 approved these changes Oct 29, 2024

View reviewed changes

sklearn/calibration.py Outdated Show resolved Hide resolved

Update sklearn/calibration.py

4c17d22

Co-authored-by: Adam Li <[email protected]>

ogrisel reviewed Oct 30, 2024

View reviewed changes

ogrisel approved these changes Oct 30, 2024

View reviewed changes

ogrisel reviewed Oct 30, 2024

View reviewed changes

ogrisel mentioned this pull request Oct 30, 2024

Implications of FrozenEstimator on our API #29893

Open

ogrisel merged commit b4eef25 into scikit-learn:main Oct 30, 2024
25 checks passed

adrinjalali deleted the calibration/cv branch October 31, 2024 10:03

hcho3 mentioned this pull request Dec 11, 2025

Scikit-learn 1.8 compatibility fix dmlc/xgboost#11858

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API deprecate CalibratedClassifierCV(..., cv=prefit) for FrozenEstimator #30171

API deprecate CalibratedClassifierCV(..., cv=prefit) for FrozenEstimator #30171

adrinjalali commented Oct 29, 2024

Uh oh!

github-actions bot commented Oct 29, 2024 •

edited

Loading

Uh oh!

adam2392 left a comment

Uh oh!

Uh oh!

ogrisel left a comment

Uh oh!

adrinjalali commented Oct 30, 2024

Uh oh!

ogrisel commented Oct 30, 2024 •

edited

Loading

Uh oh!

ogrisel left a comment

Uh oh!

ogrisel Oct 30, 2024

Uh oh!

adrinjalali Oct 30, 2024

Uh oh!

ogrisel Oct 30, 2024 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

API deprecate CalibratedClassifierCV(..., cv=prefit) for FrozenEstimator #30171

API deprecate CalibratedClassifierCV(..., cv=prefit) for FrozenEstimator #30171

Conversation

adrinjalali commented Oct 29, 2024

Uh oh!

github-actions bot commented Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

adam2392 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

adrinjalali commented Oct 30, 2024

Uh oh!

ogrisel commented Oct 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel Oct 30, 2024

Choose a reason for hiding this comment

Uh oh!

adrinjalali Oct 30, 2024

Choose a reason for hiding this comment

Uh oh!

ogrisel Oct 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Oct 29, 2024 •

edited

Loading

ogrisel commented Oct 30, 2024 •

edited

Loading

ogrisel Oct 30, 2024 •

edited

Loading