-
-
Notifications
You must be signed in to change notification settings - Fork 25.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New example about how to implement the SuperLearner in Python #30398
Comments
The main paper was published in 2010 and was cited more than 400 times so would meet our inclusion criteria. https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Super+Learner+In+Prediction&btnG= The algorithm is presented in section 2 of the paper: It seems very close to our StackingClassifier/Regressor model where the second stage model is a Ridge classifier or regressor model with positivity constraints on the coefficients (and an extra constraint that they should sum to one): https://scikit-learn.org/stable/modules/ensemble.html#stacked-generalization The main difference with what we currently have in scikit-learn is the preconfigured list of base estimators used to populate the first stage. Things that we could explore:
Our only example is: |
Totally agreed that it is not to be implemented in Scikit-Learn. The idea would be to include it in the documentation through an example, either in the existing stacking example or through a new example. With an explicit mention of the SuperLearner paper and/or package in the example? (so that it is findable when looking for "super learner python" on your favorite search engine), wdyt? your other suggestions are really interesting too, maybe in a second step if needed? depending on the reach of use of the example base code (not sure how to measure it directly, but communication metrics can help) |
Providing a list of good based linear pipelines might also be useful for #6329 (greedy ensemble) which is very related. |
I also agree that mentionning "SuperLearner" either in the docstring of the stacking meta-estimators, or in the user guide or in the example or in all of the above might be helpful for googleability. |
Describe the issue linked to the documentation
The SuperLearner is a stacking strategy that is very used in fields like Statistics (for instance in causal inference, survival analysis etc) to obtain a good machine learning model fitted to your data without caring too much about model selection. It is implemented as an R package with a good documentation, but not available off-the-shelf in Python, while it is not very difficult to do with Scikit-Learn
Suggest a potential alternative/fix
Probably not in the spirit of Scikit-Learn to implement it, but a good example explaining briefly what it is, and how to do it in a nice way in Scikit-Learn could be super helpful!
happy to help (either write, review etc) if needed
The text was updated successfully, but these errors were encountered: