EstimatorReport.feature_importance.permutation#
- EstimatorReport.feature_importance.permutation(*, data_source='test', X=None, y=None, aggregate=None, scoring=None, n_repeats=5, max_samples=1.0, n_jobs=None, seed=None, flat_index=False)[source]#
Report the permutation feature importance.
This computes the permutation importance using sklearn’s
permutation_importance()
function, which consists in permuting the values of one feature and comparing the value ofscoring
between with and without the permutation, which gives an indication on the impact of the feature.By default,
seed
is set toNone
, which means the function will return a different result at every call. In that case, the results are not cached. If you wish to take advantage of skore’s caching capabilities, make sure you set theseed
parameter.- Parameters:
- data_source{“test”, “train”, “X_y”}, default=”test”
The data source to use.
“test” : use the test set provided when creating the report.
“train” : use the train set provided when creating the report.
“X_y” : use the provided
X
andy
to compute the metric.
- Xarray-like of shape (n_samples, n_features), default=None
New data on which to compute the metric. By default, we use the test set provided when creating the report.
- yarray-like of shape (n_samples,), default=None
New target on which to compute the metric. By default, we use the test target provided when creating the report.
- aggregate{“mean”, “std”} or list of such str, default=None
Function to aggregate the scores across the repeats.
- scoringstr, callable, list, tuple, or dict, default=None
The scorer to pass to
permutation_importance()
.If
scoring
represents a single score, one can use:a single string, which must be one of the supported metrics;
a callable that returns a single value.
If
scoring
represents multiple scores, one can use:a list or tuple of unique strings, which must be one of the supported metrics;
a callable returning a dictionary where the keys are the metric names and the values are the metric scores;
a dictionary with metric names as keys and callables a values.
- n_repeatsint, default=5
Number of times to permute a feature.
- max_samplesint or float, default=1.0
The number of samples to draw from
X
to compute feature importance in each repeat (without replacement).If int, then draw max_samples samples.
If float, then draw max_samples * X.shape[0] samples.
If max_samples is equal to 1.0 or X.shape[0], all samples will be used.
While using this option may provide less accurate importance estimates, it keeps the method tractable when evaluating feature importance on large datasets. In combination with n_repeats, this allows to control the computational speed vs statistical accuracy trade-off of this method.
- n_jobsint or None, default=None
Number of jobs to run in parallel. -1 means using all processors.
- seedint or None, default=None
The seed used to initialize the random number generator used for the permutations.
- flat_indexbool, default=False
Whether to flatten the multi-index columns. Flat index will always be lower case, do not include spaces and remove the hash symbol to ease indexing.
- Returns:
- pandas.DataFrame
The permutation importance.
Examples
>>> from sklearn.datasets import make_regression >>> from sklearn.linear_model import Ridge >>> from sklearn.model_selection import train_test_split >>> from skore import EstimatorReport >>> X_train, X_test, y_train, y_test = train_test_split( ... *make_regression(n_features=3, random_state=0), random_state=0 ... ) >>> regressor = Ridge() >>> report = EstimatorReport( ... regressor, ... X_train=X_train, ... y_train=y_train, ... X_test=X_test, ... y_test=y_test, ... )
>>> report.feature_importance.permutation( ... n_repeats=2, ... seed=0, ... ) Repeat Repeat #0 Repeat #1 Metric Feature r2 Feature #0 0.699... 0.885... Feature #1 2.320... 2.636... Feature #2 0.028... 0.022...
>>> report.feature_importance.permutation( ... scoring=["r2", "rmse"], ... n_repeats=2, ... seed=0, ... ) Repeat Repeat #0 Repeat #1 Metric Feature r2 Feature #0 0.699... 0.885... Feature #1 2.320... 2.636... Feature #2 0.028... 0.022... rmse Feature #0 -47.222... -53.231... Feature #1 -86.608... -92.366... Feature #2 -8.930... -7.916...
>>> report.feature_importance.permutation( ... n_repeats=2, ... aggregate=["mean", "std"], ... seed=0, ... ) mean std Metric Feature r2 Feature #0 0.792... 0.131... Feature #1 2.478... 0.223... Feature #2 0.025... 0.003...
>>> report.feature_importance.permutation( ... n_repeats=2, ... aggregate=["mean", "std"], ... flat_index=True, ... seed=0, ... ) mean std r2_feature_0 0.792... 0.131... r2_feature_1 2.478... 0.223... r2_feature_2 0.025... 0.003...