Replies: 1 comment 4 replies
-
|
With |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am using scikit-learn to train some regression models on data and noticed that the cost function for Lasso regression is defined like this:
whereas the cost function for e.g. Ridge regression is shown as:
I had a look in the code (Lasso & Ridge) as well and the implementations of the cost functions look like described above. I am confused why the
1/n_samplesfactor is only present in the Lasso regression case.From my perspective it makes sense to have a scaling of the residuals inversely proportional to the number of samples
so that if an algorithm is used on a dataset with more training samples the value of
alphashould be somehow invariant to that. In the Elastic Net class, which can be understood as a combination of Lasso and Ridge regression, we also see that factor of1/n_samples. Can someone explain why this factor is not present in the cost function of Ridge regression?My related stackexchange question: here
Beta Was this translation helpful? Give feedback.
All reactions