Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance "Choosing the Right Estimator" Graphic (scikit-learn algorithm cheat sheet) #30354

Open
sylvaincom opened this issue Nov 27, 2024 · 4 comments

Comments

@sylvaincom
Copy link
Contributor

sylvaincom commented Nov 27, 2024

Describe the issue linked to the documentation

In its user guide, scikit-learn offers a Choosing the right estimator which is an interactive scikit-learn algorithm cheat sheet that is great.

When thinking about new features for skore, I thought of enhancing the user guide and have a pedagogical table which, for each estimator, says:

  • if it needs to be scaled,
  • if it can handle categorical features,
  • if it can handle missing data,
  • if it holds some randomness (and where / why),
  • if it can be paralleled,
  • etc (full proper list to be determined).

EDIT:

  • The scikit-learn graph / map is great, but not sufficient IMHO because I would like to have, for each estimator, if I need to normalize the data or not, etc -> guidelines for each estimator
  • I would like a table that is separate from the map, this is also a cheat sheet but not to appear on the map, maybe at the bottom of the map on the same user guide page

When discussing this with @jeromedockes and @Vincent-Maladiere, they told me about scikit-learn's estimator tags such as is_regressor. It seems that that knowledge is already partially in the tags.

Suggest a potential alternative/fix

  • Maybe scikit-learn could have a table in the user guide with guidelines for each estimator?
  • Maybe scikit-learn could hold more tags? And the table could be built from those tags?
@sylvaincom sylvaincom added Documentation Needs Triage Issue requires triage labels Nov 27, 2024
@virchan
Copy link
Contributor

virchan commented Nov 27, 2024

Pinging @lesteve, @Charlie-XIAO, and @thomasjpfan, as they are more qualified than I am to comment on this. Apologies for the spam!

@lesteve
Copy link
Member

lesteve commented Nov 27, 2024

IMO, the first thing to do is reduce/precise the scope and try to improve the situation by making incremental PRs.

About improving "Choosing the right estimator" map in a minimal way for example some quick recent thoughts #30283 (comment). Better suggestions more than welcome!

See #7686 for some attempts at improving the map. There is also #28314.

@lesteve
Copy link
Member

lesteve commented Nov 27, 2024

Also about having doc that uses estimator tags, we kind of already do it in some places e.g. Estimators handling NaNs is actually generated with a sphinx directive using estimator tags.

@sylvaincom
Copy link
Contributor Author

sylvaincom commented Nov 27, 2024

Sorry, let me try to clarify my point (I also edited my initial issue):

  • The scikit-learn graph / map is great, but not sufficient IMHO because I would like to have, for each estimator, if I need to normalize the data or not, etc -> guidelines for each estimator
  • I would like a table that is separate from the map, this is also a cheat sheet but not to appear on the map, maybe at the bottom of the map on the same user guide page
  • This table could be generated automatically from tags

Thanks for your feedback! Understood, will try to break down into more digestible tasks and will look into your provided links

@ArturoAmorQ ArturoAmorQ removed the Needs Triage Issue requires triage label Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants