EstimatorReport.feature_importance.permutation#
- EstimatorReport.feature_importance.permutation(*, data_source='test', X=None, y=None, aggregate=None, scoring=None, n_repeats=5, max_samples=1.0, n_jobs=None, seed=None, flat_index=False)[source]#
- Report the permutation feature importance. - This computes the permutation importance using sklearn’s - permutation_importance()function, which consists in permuting the values of one feature and comparing the value of- scoringbetween with and without the permutation, which gives an indication on the impact of the feature.- By default, - seedis set to- None, which means the function will return a different result at every call. In that case, the results are not cached. If you wish to take advantage of skore’s caching capabilities, make sure you set the- seedparameter.- Parameters:
- data_source{“test”, “train”, “X_y”}, default=”test”
- The data source to use. - “test” : use the test set provided when creating the report. 
- “train” : use the train set provided when creating the report. 
- “X_y” : use the provided - Xand- yto compute the metric.
 
- Xarray-like of shape (n_samples, n_features), default=None
- New data on which to compute the metric. By default, we use the test set provided when creating the report. 
- yarray-like of shape (n_samples,), default=None
- New target on which to compute the metric. By default, we use the test target provided when creating the report. 
- aggregate{“mean”, “std”} or list of such str, default=None
- Function to aggregate the scores across the repeats. 
- scoringstr, callable, list, tuple, or dict, default=None
- The scorer to pass to - permutation_importance().- If - scoringrepresents a single score, one can use:- a single string, which must be one of the supported metrics; 
- a callable that returns a single value. 
 - If - scoringrepresents multiple scores, one can use:- a list or tuple of unique strings, which must be one of the supported metrics; 
- a callable returning a dictionary where the keys are the metric names and the values are the metric scores; 
- a dictionary with metric names as keys and callables a values. 
 
- n_repeatsint, default=5
- Number of times to permute a feature. 
- max_samplesint or float, default=1.0
- The number of samples to draw from - Xto compute feature importance in each repeat (without replacement).- If int, then draw max_samples samples. 
- If float, then draw max_samples * X.shape[0] samples. 
- If max_samples is equal to 1.0 or X.shape[0], all samples will be used. 
 - While using this option may provide less accurate importance estimates, it keeps the method tractable when evaluating feature importance on large datasets. In combination with n_repeats, this allows to control the computational speed vs statistical accuracy trade-off of this method. 
- n_jobsint or None, default=None
- Number of jobs to run in parallel. -1 means using all processors. 
- seedint or None, default=None
- The seed used to initialize the random number generator used for the permutations. 
- flat_indexbool, default=False
- Whether to flatten the multi-index columns. Flat index will always be lower case, do not include spaces and remove the hash symbol to ease indexing. 
 
- Returns:
- pandas.DataFrame
- The permutation importance. 
 
 - Examples - >>> from sklearn.datasets import make_regression >>> from sklearn.linear_model import Ridge >>> from skore import train_test_split >>> from skore import EstimatorReport >>> X, y = make_regression(n_features=3, random_state=0) >>> split_data = train_test_split(X=X, y=y, random_state=0, as_dict=True) >>> regressor = Ridge() >>> report = EstimatorReport(regressor, **split_data) - >>> report.feature_importance.permutation( ... n_repeats=2, ... seed=0, ... ) Repeat Repeat #0 Repeat #1 Metric Feature r2 Feature #0 0.699... 0.885... Feature #1 2.320... 2.636... Feature #2 0.028... 0.022... - >>> report.feature_importance.permutation( ... scoring=["r2", "rmse"], ... n_repeats=2, ... seed=0, ... ) Repeat Repeat #0 Repeat #1 Metric Feature r2 Feature #0 0.699... 0.885... Feature #1 2.320... 2.636... Feature #2 0.028... 0.022... rmse Feature #0 -47.222... -53.231... Feature #1 -86.608... -92.366... Feature #2 -8.930... -7.916... - >>> report.feature_importance.permutation( ... n_repeats=2, ... aggregate=["mean", "std"], ... seed=0, ... ) mean std Metric Feature r2 Feature #0 0.792... 0.131... Feature #1 2.478... 0.223... Feature #2 0.025... 0.003... - >>> report.feature_importance.permutation( ... n_repeats=2, ... aggregate=["mean", "std"], ... flat_index=True, ... seed=0, ... ) mean std r2_feature_0 0.792... 0.131... r2_feature_1 2.478... 0.223... r2_feature_2 0.025... 0.003...