Skip to content

Duplicate check_finite when calling scipy.linalg functions #18837

@ogrisel

Description

@ogrisel

Most functions in scipy.linalg functions (e.g. svd, qr, eig, eigh, pinv, pinv2 ...) have a default kwarg check_finite=True that we typically leave to the default value in scikit-learn.

As we already validate the input data for most estimators in scikit-learn, this check is redundant and can cause significant overhead, especially at predict / transform time. We should probably always call those method with an explicit check_finite=False in scikit-learn.

This issue shall probably be addressed in many PRs, probably one per module that imports scipy.linalg.

We should still make sure that the estimators raise a ValueError with the expected error message when fed with numpy arrays with infinite some values (-np.inf, np.inf or np.nan). This can be done manually by calling sklearn.utils.estimator_checks.check_estimators_nan_inf on the estimator, which should be automatically be called by sklearn.tests.test_common but we need to check that it's actually the case when reviewing such PRs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugModerateAnything that requires some knowledge of conventions and best practicesPerformance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions