LogisticRegression's regularization is scaled by the dataset size #30308

louisabraham · 2024-11-19T14:56:18Z

Describe the workflow you want to enable

Other linear models on https://scikit-learn.org/1.5/modules/linear_model.html have regularization that doesn't depend on the dataset size

Describe your proposed solution

It would be good to either change the behavior or document it very very very clearly, not only in the user guide as it is now but also in the model documentation.

Describe alternatives you've considered, if relevant

No response

Additional context

No response

glemaitre · 2024-11-19T20:19:22Z

I don't think that we are ready to pay for the change of behaviour cost. However, I don't mind that we had a note in the docstring of the estimator.

Just pinging @lorentzenchr if he has additional thoughts.

virchan · 2024-11-20T06:43:44Z

I'm not speaking on @lorentzenchr's behalf, but I came across a similar discussion on Ridge/Lasso regression where he shared an incredibly helpful comment. I believe it’s relevant to this case as well.

That said, I still think @lorentzenchr is the best person to weigh in on this.

lorentzenchr · 2024-11-22T14:52:02Z

Thanks, @virchan, for your nice words and the link(s). So I don't need to repeat those arguments again.

Documentation:
If someone wants an improvement, then a PR is welcome.

Model Change:

I don't think that we are ready to pay for the change of behaviour cost.

I see it a bit different than @glemaitre. I'm open to change the penalty parameter of all linear models to be consistent. I find the different ways of different estimators quite annoying and error-prone.
This is, however, a decision that needs consensus among core contributors, therefore ping @scikit-learn/Core-devs.

adrinjalali · 2024-11-22T14:56:59Z

I wouldn't mind the change to make them consistent.

thomasjpfan · 2024-11-22T14:58:17Z

I'm open to change the penalty parameter of all linear models to be consistent. I find the different ways of different estimators quite annoying and error-prone.

To be backward compatible, can we add a new parameter first like scale_regularization = {"n_samples", "none"}? Then we can make everything consistent in terms of this parameter (with a deprecation cycle if it is required for that estimator).

GaelVaroquaux · 2024-11-22T15:22:53Z

Backward incompatible changes are very taxing to our users. I really want to stress this, users and companies are afraid of upgrade because of these, and it's a classic to see people stuck on old version, and developers complaining about that. That said, I agree that properly scaling penalties would be a huge improvement to scikit-learn, here and in Ridge/RidgeCV. In RidgeCV, the right scaling is the trace of the Gram matrix, which also has the benefit of making the penalty parameter invariant to a global rescaling of X (I would so much like this to happen in Ridge/RidgeCV, it would make RidgeCV much more usable). In logistic too, I think that a similar scaling would be relevant. I haven't had a look at the PR here, but it seems that this would go further than the proposed change, probably scaling by a Frobenius norm of X. So, the question is: what's the best move, and what's our strategy to have a way forward to slow change, with temporal backward compatbility

louisabraham · 2024-11-22T15:31:03Z

Thank you to all for paying so much attention to this problem! (and congrats for the creation of probabl)

If I may add something, what should the default be? loss.sum() + lambda * reg or loss.mean() + lambda * reg?

I feel like the answer should be, whichever is more stable when using CV to select the final lambda value, but I'm not sure which one it should be, aka which one selects the best lambda when evaluating on subsets.

louisabraham added Needs Triage Issue requires triage New Feature labels Nov 19, 2024

glemaitre added Documentation and removed New Feature Needs Triage Issue requires triage labels Nov 19, 2024

lorentzenchr added Needs Decision Requires decision module:linear_model labels Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LogisticRegression's regularization is scaled by the dataset size #30308

LogisticRegression's regularization is scaled by the dataset size #30308

louisabraham commented Nov 19, 2024

glemaitre commented Nov 19, 2024

virchan commented Nov 20, 2024

lorentzenchr commented Nov 22, 2024

adrinjalali commented Nov 22, 2024

thomasjpfan commented Nov 22, 2024 •

edited

Loading

GaelVaroquaux commented Nov 22, 2024 via email

louisabraham commented Nov 22, 2024 •

edited

Loading

LogisticRegression's regularization is scaled by the dataset size #30308

LogisticRegression's regularization is scaled by the dataset size #30308

Comments

louisabraham commented Nov 19, 2024

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

glemaitre commented Nov 19, 2024

virchan commented Nov 20, 2024

lorentzenchr commented Nov 22, 2024

adrinjalali commented Nov 22, 2024

thomasjpfan commented Nov 22, 2024 • edited Loading

GaelVaroquaux commented Nov 22, 2024 via email

louisabraham commented Nov 22, 2024 • edited Loading

thomasjpfan commented Nov 22, 2024 •

edited

Loading

louisabraham commented Nov 22, 2024 •

edited

Loading