[MRG] Adds Permutation Importance #13146

thomasjpfan · 2019-02-12T16:33:12Z

Reference Issues/PRs

Resolves #11187

What does this implement/fix? Explain your changes.

Adds permutation importance to a model_inspection module.

TODO

Initial implementation.
Add example demonstrating the differences between permutation importance and feature_importances_ when using trees.
Add to user guide.
Support pandas dataframes.

jnothman

Do we want to provide a meta estimator giving feature_importances_ for use the local where that's expected?

Please also consider looking at eli5 for feature parity, and perhaps testing ideas

jnothman

Hmmmm... By conducting cross validation over multiple splits, this determines feature importance for a class of model, rather than a specific model. If we are trying to inspect a specific model, surely we should not be fitting cv-many different models, but merely assessing the importance of features to prediction accuracy for the given model.

jnothman · 2019-02-13T22:47:35Z

sklearn/inspect/permutation_importance.py

+    for column in columns:
+        with _permute_column(X_test, column, random_state) as X_perm:
+            feature_score = scoring(estimator, X_perm, y_test)
+            permutation_importance_scores.append(baseline_score -


What does it mean when this value is negative? Do we need to clip in that case??

Negative means that the model performed better with the feature permuted. This could mean that the feature should be dropped.

There is a paragraph about this in https://explained.ai/rf-importance/index.html at Figure 3(a)

Interesting. I think both the docstring and the user guide should explain the meaning of negative importance.

thomasjpfan · 2019-02-14T15:29:12Z

Hmmmm... By conducting cross validation over multiple splits, this determines feature importance for a class of model, rather than a specific model.

This is correct. I will add a prefit option to inspect a specific model (turning off the cross validation).

The CV mode isn't inspecting the model, it is using a multiple models to find the importance of the features. It is "inspecting the data". If the scope of the inspect module is for data and model inspection, then this CV feature could be kept in.

jnothman · 2019-02-14T22:08:03Z

Hmmmm I had indeed thought of inspect as being about model inspection.

ogrisel · 2019-02-18T09:59:27Z

+1 for focusing first on a tool used for the single (fitted) model inspection use case. Here are alternative implementations:

http://rasbt.github.io/mlxtend/user_guide/evaluate/feature_importance_permutation/
https://eli5.readthedocs.io/en/latest/blackbox/permutation_importance.html (although I am not a fan of the fit API)

Then we could think of a tool for automated feature selection using a nested cross-validation loop that can be used in Pipeline as the SelectFromModel does. However, to me, it's less of a priority.

examples/inspect/plot_permutation_importance.py

ogrisel

Because it's so cheap to resample the individual predictions (on the permuted validation set), we should take advantage of this to recompute the mean score on many resampled predictions (bootstrap estimates of the importance). I think it's very important that the default behavior of this tool makes it natural to get bootstrap confidence intervals on the feature importance (e.g. a 2.5%-97.5% percentile interval in addition to the median importance across resampled importances.

Also, the feature importance plot in the example should use horizontal mustache/ box plots to highlight the uncertainty of this feature importance estimates:

https://matplotlib.org/gallery/pyplots/boxplot_demo_pyplot.html#sphx-glr-gallery-pyplots-boxplot-demo-pyplot-py

ogrisel · 2019-02-18T10:20:34Z

We could even set the opacity of feature boxplots where 0 is outside of the 2.5%-97.5% range to highlight that those features are not predictive (given the others).

ogrisel · 2019-02-18T10:23:12Z

Here are other interesting references that I have not carefully read yet:

thomasjpfan · 2019-02-18T14:40:59Z

@ogrisel Thank you for all the suggestions! I will focus this PR on inspecting a single fitted model and tune the API to make it easy to get bootstrap results.

…tance

jnothman · 2019-02-19T22:00:13Z

Can I ask what your intention is in using the commit prefix RFC?

thomasjpfan · 2019-02-20T01:32:05Z

It’s a prefix I use to mean “REFACTOR”.

jnothman · 2019-02-20T02:08:31Z

Oh! RFC means "request for comment" to me. Try CLN for clean?

…tance

amueller · 2019-03-05T18:04:19Z

sklearn/inspect/permutation_importance.py

+
+    scores : array, shape (n_features, bootstrap_samples)
+        Permutation importance scores
+    """


Needs a reference - and a user guide!

…tance

amueller · 2019-07-09T15:20:54Z

I like returning bunches, though it means we need to support both __getattr__ and __getitem__ during any deprecation cycle if we decide to change it. Though I guess that shouldn't be that big an issue.

What I like about cross_validate and grid search resuling in dicts is that they can easily be turned into dataframes. I'm not sure that makes sense here.

Previously we had been very careful not to add classes, even though things like cv_results_ could benefit from more logic. The plotting infrastructure that @thomasjpfan is working on would add lots of classes so if we go that route (which I hope) this might be a bit of a change of policy?

…tance

amueller

Can you please check my and guillaume's suggestions and address the remaining comments? I'd really like to merge this.

amueller · 2019-07-16T18:38:30Z