Note
Go to the end to download the full example code.
Automatic detection of modelling issues#
skore can automatically detect common modeling pitfalls such as overfitting
and underfitting. This example walks through the checks accessor:
how to run checks, how to read the detected issues, and how to mute specific checks.
We use a purely non-linear regression target and deliberately pick models that fail in known ways:
a linear model that cannot capture the non-linearity → underfitting,
a single deep decision tree that memorizes the training set perfectly and fails to generalize → overfitting.
Setup#
The target is a product of trigonometric functions of the first two features: completely invisible to a linear model, yet easy to memorize for an unconstrained tree.
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
rng = np.random.default_rng(42)
n_samples = 500
X = rng.uniform(0, 1, (n_samples, 5))
y = np.sin(2 * np.pi * X[:, 0]) * np.cos(2 * np.pi * X[:, 1]) + rng.normal(
0, 0.1, n_samples
)
linear = LinearRegression()
deep_tree = DecisionTreeRegressor(random_state=42)
Calling summarize() explicitly#
Every report exposes a checks accessor which provides
access to several methods:
summarize()to run checks and get a summary of the findingsadd()to add custom checksremove()to remove checksavailable()to list the available checks
Let’s use summarize() to see what issues can be
found for the linear model.
from skore import evaluate
linear_report = evaluate(linear, X, y)
linear_report
| Metric | LinearRegression |
|---|---|
| R² | -0.015818 |
| RMSE | 0.504156 |
| MAE | 0.406739 |
| MAPE | 1.032250 |
| Fit time (s) | 0.001114 |
| Predict time (s) | 0.000391 |
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
Fitted attributes
| Feature 0 | Feature 1 | Feature 2 | Feature 3 | Feature 4 | Target | |
|---|---|---|---|---|---|---|
| 0 | 0.209 | 0.525 | 0.164 | 0.166 | 0.836 | -1.16 |
| 1 | 0.745 | 0.821 | 0.749 | 0.288 | 0.118 | -0.378 |
| 2 | 0.523 | 0.764 | 0.799 | 0.492 | 0.600 | 0.131 |
| 3 | 0.462 | 0.327 | 0.305 | 0.251 | 0.365 | -0.181 |
| 4 | 0.745 | 0.968 | 0.326 | 0.370 | 0.470 | -0.925 |
| 495 | 0.582 | 0.994 | 0.990 | 0.527 | 0.639 | -0.419 |
| 496 | 0.0435 | 0.181 | 0.237 | 0.249 | 0.571 | 0.213 |
| 497 | 0.119 | 0.937 | 0.895 | 0.186 | 0.323 | 0.677 |
| 498 | 0.779 | 0.135 | 0.536 | 0.514 | 0.858 | -0.500 |
| 499 | 0.0917 | 0.667 | 0.656 | 0.663 | 0.0198 | -0.272 |
Feature 0
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
500 (100.0%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.503 ± 0.295
- Median ± IQR
- 0.505 ± 0.518
- Min | Max
- 0.00107 | 1.00
Feature 1
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
500 (100.0%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.504 ± 0.287
- Median ± IQR
- 0.489 ± 0.502
- Min | Max
- 0.000568 | 0.994
Feature 2
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
500 (100.0%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.498 ± 0.285
- Median ± IQR
- 0.495 ± 0.500
- Min | Max
- 0.000519 | 0.999
Feature 3
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
500 (100.0%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.489 ± 0.292
- Median ± IQR
- 0.491 ± 0.529
- Min | Max
- 0.00123 | 0.999
Feature 4
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
500 (100.0%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.502 ± 0.293
- Median ± IQR
- 0.498 ± 0.494
- Min | Max
- 0.00474 | 0.999
Target
Float64DType- Null values
- 0 (0.0%)
- Unique values
-
500 (100.0%)
This column has a high cardinality (> 40).
- Mean ± Std
- 0.00464 ± 0.519
- Median ± IQR
- 0.0352 ± 0.711
- Min | Max
- -1.16 | 1.21
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
|
Column
|
Column name
|
dtype
|
Is sorted
|
Null values
|
Unique values
|
Mean
|
Std
|
Min
|
Median
|
Max
|
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Feature 0 | Float64DType | False | 0 (0.0%) | 500 (100.0%) | 0.503 | 0.295 | 0.00107 | 0.505 | 1.00 |
| 1 | Feature 1 | Float64DType | False | 0 (0.0%) | 500 (100.0%) | 0.504 | 0.287 | 0.000568 | 0.489 | 0.994 |
| 2 | Feature 2 | Float64DType | False | 0 (0.0%) | 500 (100.0%) | 0.498 | 0.285 | 0.000519 | 0.495 | 0.999 |
| 3 | Feature 3 | Float64DType | False | 0 (0.0%) | 500 (100.0%) | 0.489 | 0.292 | 0.00123 | 0.491 | 0.999 |
| 4 | Feature 4 | Float64DType | False | 0 (0.0%) | 500 (100.0%) | 0.502 | 0.293 | 0.00474 | 0.498 | 0.999 |
| 5 | Target | Float64DType | False | 0 (0.0%) | 500 (100.0%) | 0.00464 | 0.519 | -1.16 | 0.0352 | 1.21 |
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
Please enable javascript
The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").
linear_report.checks.summarize()
- [SKD002] Potential underfitting. Train/test scores are on par and not significantly better than the dummy baseline for 4/4 comparable metrics. Read more about this here.
- [SKD009] Model worse than baseline. Test scores are not significantly better than a HistGradientBoosting baseline for 4/4 default predictive metrics. Read more about this here.
- [SKD006] Coefficient interpretation. Features appear to be standardized: coefficients are comparable but no longer interpretable in the original feature units. Read more about this here.
- [SKD011] Golden feature. A model trained on feature(s) ['Feature 0', 'Feature 1', 'Feature 2', 'Feature 3', 'Feature 4'] alone has similar performance to a model trained on all the features, on the default predictive metrics. This may signal data leakage or excessive reliance on a single feature. Read more about this here.
- [SKD012] Useless features. Feature(s) ['Feature #0', 'Feature #1', 'Feature #3', 'Feature #4'] have permutation importance overlapping with zero and could likely be dropped without degrading performance. Dropping redundant features may also improve model performance. Read more about this here.
- [SKD004] High class imbalance. Read more about this here.
- [SKD005] Underrepresented classes. Read more about this here.
- [SKD007] MDI biased for high-cardinality features. Read more about this here.
- [SKD013] Train-test overlap in time series. Read more about this here.
- [SKD014] Hyperparameters at search edge. Read more about this here.
- [SKD015] Hyperparameters worth tuning. Read more about this here.
- [SKD016] Estimator not tuned. Read more about this here.
Mute a check by passing its code to ignore,
e.g. .checks.summarize(ignore=['SKD001']).
linear_report.metrics.summarize(data_source="both").frame()
| LinearRegression (train) | LinearRegression (test) | |
|---|---|---|
| Metric | ||
| R² | 0.001906 | -0.015818 |
| RMSE | 0.522214 | 0.504156 |
| MAE | 0.423723 | 0.406739 |
| MAPE | 1.426344 | 1.032250 |
| Fit time (s) | 0.001114 | 0.001114 |
| Predict time (s) | 0.000263 | 0.000391 |
The linear model is flagged for underfitting: its scores are on par between train and test, and not significantly better than a dummy baseline.
Let’s now inspect the deep tree model.
tree_report = evaluate(deep_tree, X, y)
tree_report.checks.summarize()
- [SKD001] Potential overfitting. Significant train/test gaps were found for 4/4 default predictive metrics. Read more about this here.
- [SKD009] Model worse than baseline. Test scores are not significantly better than a HistGradientBoosting baseline for 4/4 default predictive metrics. Read more about this here.
- [SKD007] MDI biased for high-cardinality features. High-cardinality features detected: Feature 0, Feature 1, Feature 2 (and 2 more). Mean Decrease in Impurity (MDI) importance is biased toward such features. Consider using permutation importance for a more robust alternative. Read more about this here.
- [SKD012] Useless features. Feature(s) ['Feature #2', 'Feature #3', 'Feature #4'] have permutation importance overlapping with zero and could likely be dropped without degrading performance. Dropping redundant features may also improve model performance. Read more about this here.
- [SKD016] Estimator not tuned. Estimator(s) left at default settings; consider tuning: ['ccp_alpha', 'max_features', 'min_samples_leaf'] for DecisionTreeRegressor. Read more about this here.
- [SKD004] High class imbalance. Read more about this here.
- [SKD005] Underrepresented classes. Read more about this here.
- [SKD006] Coefficient interpretation. Read more about this here.
- [SKD013] Train-test overlap in time series. Read more about this here.
- [SKD014] Hyperparameters at search edge. Read more about this here.
- [SKD015] Hyperparameters worth tuning. Read more about this here.
Mute a check by passing its code to ignore,
e.g. .checks.summarize(ignore=['SKD001']).
tree_report.metrics.summarize(data_source="both").frame()
| DecisionTreeRegressor (train) | DecisionTreeRegressor (test) | |
|---|---|---|
| Metric | ||
| R² | 1.000000 | 0.783887 |
| RMSE | 0.000000 | 0.232540 |
| MAE | 0.000000 | 0.180261 |
| MAPE | 0.000000 | 1.052768 |
| Fit time (s) | 0.003189 | 0.003189 |
| Predict time (s) | 0.000308 | 0.000245 |
The deep tree is flagged for overfitting: it achieves a perfect score on train but degrades on test. For this model, the tip about coefficients is not applicable and appears under the Not Applicable section of the checks summary.
Ignoring specific checks#
Each check has a stable code (e.g. SKD001, SKD002). You can
mute individual checks per call:
tree_report.checks.summarize(ignore=["SKD001"])
- [SKD009] Model worse than baseline. Test scores are not significantly better than a HistGradientBoosting baseline for 4/4 default predictive metrics. Read more about this here.
- [SKD007] MDI biased for high-cardinality features. High-cardinality features detected: Feature 0, Feature 1, Feature 2 (and 2 more). Mean Decrease in Impurity (MDI) importance is biased toward such features. Consider using permutation importance for a more robust alternative. Read more about this here.
- [SKD012] Useless features. Feature(s) ['Feature #2', 'Feature #3', 'Feature #4'] have permutation importance overlapping with zero and could likely be dropped without degrading performance. Dropping redundant features may also improve model performance. Read more about this here.
- [SKD016] Estimator not tuned. Estimator(s) left at default settings; consider tuning: ['ccp_alpha', 'max_features', 'min_samples_leaf'] for DecisionTreeRegressor. Read more about this here.
- [SKD004] High class imbalance. Read more about this here.
- [SKD005] Underrepresented classes. Read more about this here.
- [SKD006] Coefficient interpretation. Read more about this here.
- [SKD013] Train-test overlap in time series. Read more about this here.
- [SKD014] Hyperparameters at search edge. Read more about this here.
- [SKD015] Hyperparameters worth tuning. Read more about this here.
Mute a check by passing its code to ignore,
e.g. .checks.summarize(ignore=['SKD001']).
Or globally, so that every subsequent
summarize() call skips them:
import skore
with skore.configuration(ignore_checks=["SKD001"]):
checks_summary = tree_report.checks.summarize()
checks_summary
- [SKD009] Model worse than baseline. Test scores are not significantly better than a HistGradientBoosting baseline for 4/4 default predictive metrics. Read more about this here.
- [SKD007] MDI biased for high-cardinality features. High-cardinality features detected: Feature 0, Feature 1, Feature 2 (and 2 more). Mean Decrease in Impurity (MDI) importance is biased toward such features. Consider using permutation importance for a more robust alternative. Read more about this here.
- [SKD012] Useless features. Feature(s) ['Feature #2', 'Feature #3', 'Feature #4'] have permutation importance overlapping with zero and could likely be dropped without degrading performance. Dropping redundant features may also improve model performance. Read more about this here.
- [SKD016] Estimator not tuned. Estimator(s) left at default settings; consider tuning: ['ccp_alpha', 'max_features', 'min_samples_leaf'] for DecisionTreeRegressor. Read more about this here.
- [SKD004] High class imbalance. Read more about this here.
- [SKD005] Underrepresented classes. Read more about this here.
- [SKD006] Coefficient interpretation. Read more about this here.
- [SKD013] Train-test overlap in time series. Read more about this here.
- [SKD014] Hyperparameters at search edge. Read more about this here.
- [SKD015] Hyperparameters worth tuning. Read more about this here.
Mute a check by passing its code to ignore,
e.g. .checks.summarize(ignore=['SKD001']).
Checks on a CrossValidationReport#
When splitter is an integer, evaluate() returns a
CrossValidationReport. Checks aggregate issues across folds.
- [SKD003] Inconsistent performance across splits. Read more about this here.
- [SKD008] Highly correlated input features. Read more about this here.
- [SKD002] Potential underfitting. Read more about this here.
- [SKD010] Model slower than baseline. Read more about this here.
- [SKD011] Golden feature. Read more about this here.
- [SKD004] High class imbalance. Read more about this here.
- [SKD014] Hyperparameters at search edge. Read more about this here.
- [SKD015] Hyperparameters worth tuning. Read more about this here.
- [SKD006] Coefficient interpretation. Read more about this here.
- [SKD005] Underrepresented classes. Read more about this here.
- [SKD013] Train-test overlap in time series. Read more about this here.
Mute a check by passing its code to ignore,
e.g. .checks.summarize(ignore=['SKD001']).
Checks on a ComparisonReport#
Passing a list of estimators returns a ComparisonReport.
Issues are grouped by sub-report.
comparison_report = evaluate([linear, deep_tree], X, y)
comparison_report.checks.summarize()
- [SKD001] Potential overfitting. Detected in: [DecisionTreeRegressor]. Read more about this here.
- [SKD002] Potential underfitting. Detected in: [LinearRegression]. Read more about this here.
- [SKD009] Model worse than baseline. Detected in: [LinearRegression], [DecisionTreeRegressor]. Read more about this here.
- [SKD006] Coefficient interpretation. Detected in: [LinearRegression]. Read more about this here.
- [SKD007] MDI biased for high-cardinality features. Detected in: [DecisionTreeRegressor]. Read more about this here.
- [SKD011] Golden feature. Detected in: [LinearRegression]. Read more about this here.
- [SKD012] Useless features. Detected in: [LinearRegression], [DecisionTreeRegressor]. Read more about this here.
- [SKD016] Estimator not tuned. Detected in: [DecisionTreeRegressor]. Read more about this here.
- [SKD004] High class imbalance. Read more about this here.
- [SKD005] Underrepresented classes. Read more about this here.
- [SKD013] Train-test overlap in time series. Read more about this here.
- [SKD014] Hyperparameters at search edge. Read more about this here.
- [SKD015] Hyperparameters worth tuning. Read more about this here.
Mute a check by passing its code to ignore,
e.g. .checks.summarize(ignore=['SKD001']).
Total running time of the script: (0 minutes 6.187 seconds)