Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allows disabling refitting of CV estimators #30463

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

AhmedThahir
Copy link

@AhmedThahir AhmedThahir commented Dec 11, 2024

Reference Issues/PRs

What does this implement/fix? Explain your changes.

What

  • Allows disable refitting of cross-validation estimators (such as LassoCV, RidgeCV) on the full training set after finding the best hyperparameters.
  • User may use a keyword argument refit to toggle this behavior.

Why
User does not want to waste resources on refitting when user only wants one/more of the following, which do not involve refitting:

  • optimal hyperparameter
  • cv_results_ for the different hyperparameters
  • best_score_ of all the hyperparameters

This is especially impactful for large datasets.

How

  • Added a refit argument which disables refitting if refit=False.
  • By default, refit=True is set to prevent breaking existing usage

Any other comments?

This is my first (hopefully first of many) PR for scikit-learn. If you have any feedback on my implementation/PR documentation/etc, feel free to share - I'd really appreciate it.

Thanks to @paulAdrienMarie for all the support!

Copy link

github-actions bot commented Dec 11, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 59712ad. Link to the linter CI: here

@AhmedThahir
Copy link
Author

I think I'm done from my side with the code changes and documentation.

Code changes for ElasticNetCV and MultitaskElasticNetCV will be done by @paulAdrienMarie

paulAdrienMarie added a commit to paulAdrienMarie/scikit-learn that referenced this pull request Dec 11, 2024
paulAdrienMarie added a commit to paulAdrienMarie/scikit-learn that referenced this pull request Dec 11, 2024
@AhmedThahir
Copy link
Author

AhmedThahir commented Dec 11, 2024

@kayo09, please do not make such reviews - it confuses us.

Only @paulAdrienMarie and I are assigned to work on this PR.

Edit: @kayo09, could you try to remove the review.

Copy link
Author

@AhmedThahir AhmedThahir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulAdrienMarie The refit parameter was supposed to appear after cv parameter. This is not a programming issue, but a conflict with the sklearn docs automation, as I followed the scheme followed by GridSearchCV and accordingly made the documentation in the same order, ie refit after cv.

Just sharing this as feedback so that it may be useful for you in the future - not as criticism.

This was the error message from which I understood:

AssertionError: Docstring Error:
In function: sklearn.linear_model._coordinate_descent.ElasticNetCV.__init__
There's a parameter name mismatch in function docstring w.r.t. function signature, at index 8 diff: 'cv' != 'refit'
Full diff:
['l1_ratio',
'eps',
'n_alphas',
'alphas',
'fit_intercept',
'precompute',
'max_iter',
'tol',
+  'refit',
'cv',
'copy_X',
-  'refit',
'verbose',
'n_jobs',
'positive',
'random_state',
'selection']
In function: sklearn.linear_model._coordinate_descent.MultiTaskElasticNetCV.__init__
There's a parameter name mismatch in function docstring w.r.t. function signature, at index 7 diff: 'cv' != 'refit'
Full diff:
['l1_ratio',
'eps',
'n_alphas',
'alphas',
'fit_intercept',
'max_iter',
'tol',
+  'refit',
'cv',
'copy_X',
-  'refit',
'verbose',
'n_jobs',
'random_state',
'selection']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants