-
-
Notifications
You must be signed in to change notification settings - Fork 25.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LogisticRegression's regularization is scaled by the dataset size #30308
Comments
I don't think that we are ready to pay for the change of behaviour cost. However, I don't mind that we had a note in the docstring of the estimator. Just pinging @lorentzenchr if he has additional thoughts. |
I'm not speaking on @lorentzenchr's behalf, but I came across a similar discussion on Ridge/Lasso regression where he shared an incredibly helpful comment. I believe it’s relevant to this case as well. That said, I still think @lorentzenchr is the best person to weigh in on this. |
Thanks, @virchan, for your nice words and the link(s). So I don't need to repeat those arguments again. Documentation: Model Change:
I see it a bit different than @glemaitre. I'm open to change the penalty parameter of all linear models to be consistent. I find the different ways of different estimators quite annoying and error-prone. |
I wouldn't mind the change to make them consistent. |
To be backward compatible, can we add a new parameter first like |
Backward incompatible changes are very taxing to our users. I really want to stress this, users and companies are afraid of upgrade because of these, and it's a classic to see people stuck on old version, and developers complaining about that.
That said, I agree that properly scaling penalties would be a huge improvement to scikit-learn, here and in Ridge/RidgeCV. In RidgeCV, the right scaling is the trace of the Gram matrix, which also has the benefit of making the penalty parameter invariant to a global rescaling of X (I would so much like this to happen in Ridge/RidgeCV, it would make RidgeCV much more usable).
In logistic too, I think that a similar scaling would be relevant. I haven't had a look at the PR here, but it seems that this would go further than the proposed change, probably scaling by a Frobenius norm of X.
So, the question is: what's the best move, and what's our strategy to have a way forward to slow change, with temporal backward compatbility
|
Thank you to all for paying so much attention to this problem! (and congrats for the creation of probabl) If I may add something, what should the default be? I feel like the answer should be, whichever is more stable when using CV to select the final |
Describe the workflow you want to enable
Other linear models on https://scikit-learn.org/1.5/modules/linear_model.html have regularization that doesn't depend on the dataset size
Describe your proposed solution
It would be good to either change the behavior or document it very very very clearly, not only in the user guide as it is now but also in the model documentation.
Describe alternatives you've considered, if relevant
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: