Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH remove _xfail_checks, pass directly to check runners, return structured output from check_estimator #30149

Merged
merged 25 commits into from
Nov 8, 2024

Conversation

adrinjalali
Copy link
Member

@adrinjalali adrinjalali commented Oct 25, 2024

Fixes #29951, Fixes #30133

This moves _xfail_checks out of the estimator tags.

  • Deprecation: We don't need to deprecate _xfail_checks since we haven't even released the new tags anyway.
  • Behavior change: check_estimator now doesn't raise a SkipTestWarning for _xfailed_ checks, since they're not skipped. It also doesn't skip them at all, it runs all the checks.
  • Structured output: check_estimator now returns a report of xfail and skipped tests if it passes, and if it fails, the thrown exception has that same report included in it.

Copy link

github-actions bot commented Oct 25, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 239abb6. Link to the linter CI: here

Comment on lines 929 to 930
# This doesn't seem to fail anymore?
# "check_methods_subset_invariance": ("fails for the decision_function method"),
Copy link
Member Author

@adrinjalali adrinjalali Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by writing a test, I noticed this test doesn't actually fail. We could, in principle now easily do this same test for all our estimators, but it would mean running the common tests twice, or to switch from parametrize_with_checks to check_estimator which I don't think is a good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by writing a test, I noticed this test doesn't actually fail.

Maybe we can remove it from the dict then.

I don't think we should run the common tests twice or switch to check_estimator but we could have a maintainer tool to help run this kind of analysis locally from time to time, no?


should_be_marked, reason = _should_be_skipped_or_marked(estimator, check)
if not should_be_marked:
def _maybe_mark(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we had two functions, one marking them as xfail with pytest, another would wrap the test with something that would simply raise SkipTest. Those two are now merged into this function.

@@ -0,0 +1,8 @@
- :func:`~utils.estimator_checks.check_estimator` now accepts the list of tests that are
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really doesn't feel right as a single chengelog entry @lesteve

Copy link
Member

@lesteve lesteve Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😓 🤷 😓

If we think this is crucial, here are the work-arounds I can think of:

  • do nested bullet points for each aspect
  • use multiple bullet points and mention the :pr:`30149` in each of the bullet pointes except the last one, since towncrier will add it

To double-check the generated rst file:

towncrier build --version 1.6.0 --draft

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to add this to our doc build, so that we can see the rendered version of the changelog on the CI

@adrinjalali
Copy link
Member Author

@adam2392 @Charlie-XIAO @ogrisel @glemaitre this is mostly moving _xfail_checks around, and fixing the infrastructure for that.

@thomasjpfan @Micky774 ended up diverging from the initial proposal a tiny bit, since I feel this information is best suited outside the estimator itself, since it's very test specific. It also has the benefit of giving us a nice overview of what's failing in a single spot.

@adrinjalali adrinjalali added this to the 1.6 milestone Oct 25, 2024
@adrinjalali adrinjalali added the Developer API Third party developer API related label Oct 25, 2024
@glemaitre glemaitre self-requested a review October 29, 2024 09:42
@adrinjalali
Copy link
Member Author

Ping since this is a big PR and we kinda need it for the release.

@adam2392 adam2392 self-requested a review October 29, 2024 21:36
Copy link
Member

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the code is removing the tags._xfail_checks, so it wasn't that bad to go through!

I had a few questions, but mostly it LGTM as a minor refactoring of the testing framework.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a partial review with a first pass of feedback. I need to pause for lunch :)

@adrinjalali
Copy link
Member Author

@ogrisel (#30149 (comment))

I added a script to check for that. I think in a separate PR we can try to find the ones that are actual fails in the whole CI. Right now, on my system / env, this is what the script says:

/home/adrin/Projects/gh/me/scikit-learn/sklearn/utils/_test_common/instance_generator.py:657: SkipTestWarning: Can't instantiate estimator FrozenEstimator
  warnings.warn(msg, SkipTestWarning)
GridSearchCV did not fail expected failures:
  check_supervised_y_2d
GridSearchCV did not fail expected failures:
  check_requires_y_none
HalvingGridSearchCV did not fail expected failures:
  check_estimators_nan_inf
  check_fit2d_1feature
  check_classifiers_one_label_sample_weights
HalvingGridSearchCV did not fail expected failures:
  check_classifiers_one_label_sample_weights
  check_estimators_nan_inf
  check_requires_y_none
  check_supervised_y_2d
  check_fit2d_1feature
HalvingGridSearchCV did not fail expected failures:
  check_requires_y_none
  check_estimators_nan_inf
  check_fit2d_1feature
  check_classifiers_one_label_sample_weights
HalvingGridSearchCV did not fail expected failures:
  check_classifiers_one_label_sample_weights
  check_estimators_nan_inf
  check_requires_y_none
  check_supervised_y_2d
  check_fit2d_1feature
HalvingRandomSearchCV did not fail expected failures:
  check_estimators_nan_inf
  check_fit2d_1feature
  check_classifiers_one_label_sample_weights
HalvingRandomSearchCV did not fail expected failures:
  check_classifiers_one_label_sample_weights
  check_estimators_nan_inf
  check_requires_y_none
  check_supervised_y_2d
  check_fit2d_1feature
HalvingRandomSearchCV did not fail expected failures:
  check_requires_y_none
  check_estimators_nan_inf
  check_fit2d_1feature
  check_classifiers_one_label_sample_weights
HalvingRandomSearchCV did not fail expected failures:
  check_classifiers_one_label_sample_weights
  check_estimators_nan_inf
  check_requires_y_none
  check_supervised_y_2d
  check_fit2d_1feature
KNeighborsClassifier did not fail expected failures:
  check_dataframe_column_names_consistency
KNeighborsRegressor did not fail expected failures:
  check_dataframe_column_names_consistency
WARNING: class label 0 specified in weight is not found
LinearSVC did not fail expected failures:
  check_non_transformer_estimators_n_iter
RandomizedSearchCV did not fail expected failures:
  check_supervised_y_2d
RandomizedSearchCV did not fail expected failures:
  check_requires_y_none
/home/adrin/Projects/gh/me/scikit-learn/sklearn/utils/_test_common/instance_generator.py:657: SkipTestWarning: Can't instantiate estimator SparseCoder
  warnings.warn(msg, SkipTestWarning)
SpectralBiclustering did not fail expected failures:
  check_dont_overwrite_parameters
  check_fit2d_predict1d
  check_estimator_sparse_array
  check_methods_subset_invariance
  check_estimator_sparse_matrix
TunedThresholdClassifierCV did not fail expected failures:
  check_sample_weight_equivalence

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More feedback:

@adrinjalali
Copy link
Member Author

@ogrisel @adam2392 @glemaitre this is now green with new API design.

@adam2392
Copy link
Member

adam2392 commented Nov 7, 2024

For my understanding of the changes, is the following what you would do in a unit-test suite?

@parametrize_with_checks(
    [
        A(random_state=12345, n_estimators=10),
        B(random_state=12345, n_estimators=10),
        C(random_state=12345, n_estimators=10),
    ]
)
def test_sklearn_compatible_estimator(estimator, check):
    if check.func.__name__ in [
        # sample weights do not necessarily imply a sample is not used in clustering
        "check_sample_weight_equivalence",
    ]:
        expected_failed_checks={'check_sample_weight_equivalence': 'sample weights not need to be checked'}

    check(estimator, expected_failed_checks=expected_failed_checks)

@adrinjalali
Copy link
Member Author

@adam2392 not really, you pass a callable to parametrize_with_checks as this PR is doing in our test_common.py. This is modified version of your code:

def get_expected_failed_checks(estimator):
    if estimator.__class__.__name__ == "A":
        return {"check_sample_weight_equivalence": "sample weights not need to be checked"}
    return {}

@parametrize_with_checks(
    [
        A(random_state=12345, n_estimators=10),
        B(random_state=12345, n_estimators=10),
        C(random_state=12345, n_estimators=10),
    ],
    expected_failed_checks=get_expected_failed_checks,
)
def test_sklearn_compatible_estimator(estimator, check):
    check(estimator)

@adam2392 adam2392 self-requested a review November 7, 2024 18:09
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me apart of a couple of nitpicks.

I'm wondering where did we discuss the need of a callback in the past? I can see that this is useful but I don't recall the discussion.

@@ -14,6 +14,7 @@
"UndefinedMetricWarning",
"PositiveSpectrumWarning",
"UnsetMetadataPassedError",
"EstimatorCheckFailedWarning",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add this exception to the API doc?
To me it has more its place in the developer API whenever we will have one, isn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to our api doc, forgot to add it. I don't know how to add it to a separate section with the current setting, but I don't think that's necessary. We would need to have a proper developer API documentation / examples / user guide anyway.

Copy link
Member Author

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering where did we discuss the need of a callback in the past? I can see that this is useful but I don't recall the discussion.

@glemaitre somehow during the development of this PR, since I ended up allowing check_estimator to not actually fail, it seemed nice to have the callback. However, the design changed after talking to @ogrisel and he had the idea of taking the callback into account even further. But we settled on what you see here as a middle ground, and people who want more control, can use the test generator and have full control over what happens anyway.

@@ -14,6 +14,7 @@
"UndefinedMetricWarning",
"PositiveSpectrumWarning",
"UnsetMetadataPassedError",
"EstimatorCheckFailedWarning",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to our api doc, forgot to add it. I don't know how to add it to a separate section with the current setting, but I don't think that's necessary. We would need to have a proper developer API documentation / examples / user guide anyway.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines 787 to 789
This return value is only present when all tests pass, or the ones failing
are expected to fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer true.

Suggested change
This return value is only present when all tests pass, or the ones failing
are expected to fail.

Copy link
Member

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! It's a big PR, so I can let someone else take a look? Or we can merge if you're happy with it :p

@glemaitre
Copy link
Member

The failing test is something that we merge this morning that was using the previous _xfailed_checks. The test would need to be amended.

@adrinjalali adrinjalali enabled auto-merge (squash) November 8, 2024 15:29
@adrinjalali adrinjalali merged commit 9012b78 into scikit-learn:main Nov 8, 2024
30 checks passed
@glemaitre
Copy link
Member

Let me backport now to not forget it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Developer API Third party developer API related
Development

Successfully merging this pull request may close these issues.

check_estimator to return structured info RFC Expose xfail_checks with a more flexible API
5 participants