Note

Go to the end to download the full example code.

Automatic detection of modelling issues#

skore can automatically detect common modeling pitfalls such as overfitting and underfitting. This example walks through the .diagnose method: how to run checks, how to read the detected issues, and how to mute specific checks.

We use a purely non-linear regression target and deliberately pick models that fail in known ways:

a linear model that cannot capture the non-linearity → underfitting,
a single deep decision tree that memorizes the training set perfectly and fails to generalize → overfitting.

Setup#

The target is a product of trigonometric functions of the first two features: completely invisible to a linear model, yet easy to memorize for an unconstrained tree.

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor

rng = np.random.default_rng(42)
n_samples = 500
X = rng.uniform(0, 1, (n_samples, 5))
y = np.sin(2 * np.pi * X[:, 0]) * np.cos(2 * np.pi * X[:, 1]) + rng.normal(
    0, 0.1, n_samples
)

linear = LinearRegression()
deep_tree = DecisionTreeRegressor(random_state=42)

Calling `diagnose()` explicitly#

Every report exposes a diagnose() method. Checks are computed lazily and cached, so calling diagnose() is always cheap after the first call.

from skore import evaluate

linear_report = evaluate(linear, X, y)
linear_report

LinearRegression()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Please enable javascript

The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").

1 issue(s) detected, 2 check(s) ran.

linear_report.diagnose()

Diagnostic: 1 issue(s) detected, 2 check(s) ran, 0 ignored.
[SKD002] Potential underfitting. Train/test scores are on par and not significantly better than the dummy baseline for 2/2 comparable metrics. Read more about this here. Mute with ignore=['SKD002'].

linear_report.metrics.summarize(data_source="both").frame()

	LinearRegression (train)	LinearRegression (test)
Metric
R²	0.001906	-0.015818
RMSE	0.522214	0.504156
Fit time (s)	0.001152	0.001152
Predict time (s)	0.000152	0.000292

The linear model is flagged for underfitting: its scores are on par between train and test, and not significantly better than a dummy baseline.

tree_report = evaluate(deep_tree, X, y)
tree_report.diagnose()

Diagnostic: 1 issue(s) detected, 2 check(s) ran, 0 ignored.
[SKD001] Potential overfitting. Significant train/test gaps were found for 2/2 default predictive metrics. Read more about this here. Mute with ignore=['SKD001'].

tree_report.metrics.summarize(data_source="both").frame()

	DecisionTreeRegressor (train)	DecisionTreeRegressor (test)
Metric
R²	1.000000	0.783887
RMSE	0.000000	0.232540
Fit time (s)	0.003070	0.003070
Predict time (s)	0.000221	0.000195

The deep tree is flagged for overfitting: it achieves a perfect score on train but degrades on test.

Ignoring specific checks#

Each check has a stable code (e.g. SKD001, SKD002). You can mute individual checks per call:

tree_report.diagnose(ignore=["SKD001"])

Diagnostic: 0 issue(s) detected, 1 check(s) ran, 1 ignored.
No issues were detected in your report!

Or globally, so that every subsequent diagnose() call skips them:

import skore

with skore.configuration(ignore_checks=["SKD001"]):
    diagnosis = tree_report.diagnose()
diagnosis

Diagnostic: 0 issue(s) detected, 1 check(s) ran, 1 ignored.
No issues were detected in your report!

Diagnostics on a `CrossValidationReport`#

When splitter is an integer, evaluate() returns a CrossValidationReport. Checks aggregate issues across folds.

cv_report = evaluate(deep_tree, X, y, splitter=5)
cv_report.diagnose()

Diagnostic: 1 issue(s) detected, 2 check(s) ran, 0 ignored.
[SKD001] Potential overfitting. Detected in 5/5 evaluated splits. Read more about this here. Mute with ignore=['SKD001'].

Diagnostics on a `ComparisonReport`#

Passing a list of estimators returns a ComparisonReport. Issues are grouped by sub-report.

comparison_report = evaluate([linear, deep_tree], X, y)
comparison_report.diagnose()

Diagnostic: 2 issue(s) detected, 2 check(s) ran, 0 ignored.
[SKD002] Potential underfitting. [LinearRegression] Train/test scores are on par and not significantly better than the dummy baseline for 2/2 comparable metrics. Read more about this here. Mute with ignore=['SKD002'].
[SKD001] Potential overfitting. [DecisionTreeRegressor] Significant train/test gaps were found for 2/2 default predictive metrics. Read more about this here. Mute with ignore=['SKD001'].

Total running time of the script: (0 minutes 0.445 seconds)

Gallery generated by Sphinx-Gallery

	fit_intercept fit_intercept: bool, default=True Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).	True
	copy_X copy_X: bool, default=True If True, X will be copied; else, it may be overwritten.	True
	tol tol: float, default=1e-6 The precision of the solution (`coef_`) is determined by `tol` which specifies a different convergence criterion for the `lsqr` solver. `tol` is set as `atol` and `btol` of :func:`scipy.sparse.linalg.lsqr` when fitting on sparse training data. This parameter has no effect when fitting on dense data. .. versionadded:: 1.7	1e-06
	n_jobs n_jobs: int, default=None The number of jobs to use for the computation. This will only provide speedup in case of sufficiently large problems, that is if firstly `n_targets > 1` and secondly `X` is sparse or if `positive` is set to `True`. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary ` for more details.	None
	positive positive: bool, default=False When set to ``True``, forces the coefficients to be positive. This option is only supported for dense arrays. For a comparison between a linear regression model with positive constraints on the regression coefficients and a linear regression without such constraints, see :ref:`sphx_glr_auto_examples_linear_model_plot_nnls.py`. .. versionadded:: 0.24	False

	Feature 0	Feature 1	Feature 2	Feature 3	Feature 4	Target
	Feature 0	Feature 1	Feature 2	Feature 3	Feature 4	Target
0	0.209	0.525	0.164	0.166	0.836	-1.16
1	0.745	0.821	0.749	0.288	0.118	-0.378
2	0.523	0.764	0.799	0.492	0.600	0.131
3	0.462	0.327	0.305	0.251	0.365	-0.181
4	0.745	0.968	0.326	0.370	0.470	-0.925

495	0.582	0.994	0.990	0.527	0.639	-0.419
496	0.0435	0.181	0.237	0.249	0.571	0.213
497	0.119	0.937	0.895	0.186	0.323	0.677
498	0.779	0.135	0.536	0.514	0.858	-0.500
499	0.0917	0.667	0.656	0.663	0.0198	-0.272

Column	Column name	dtype	Is sorted	Unique values	Mean	Std	Min	Median	Max
0	Feature 0	Float64DType	False	500 (100.0%)	0.503	0.295	0.00107	0.505	1.00
1	Feature 1	Float64DType	False	500 (100.0%)	0.504	0.287	0.000568	0.489	0.994
2	Feature 2	Float64DType	False	500 (100.0%)	0.498	0.285	0.000519	0.495	0.999
3	Feature 3	Float64DType	False	500 (100.0%)	0.489	0.292	0.00123	0.491	0.999
4	Feature 4	Float64DType	False	500 (100.0%)	0.502	0.293	0.00474	0.498	0.999
5	Target	Float64DType	False	500 (100.0%)	0.00464	0.519	-1.16	0.0352	1.21

Automatic detection of modelling issues#

Setup#

Calling diagnose() explicitly#

Feature 0

Feature 1

Feature 2

Feature 3

Feature 4

Target