Diagnostics#

skore diagnostics provide quick checks for common model quality pitfalls. Use diagnose() to get concise findings about your model’s quality. Each finding has:

a short explanation,
a stable diagnostic code,
and a link to this page.

Diagnostics can be muted per call with ignore=...:

report.diagnose(ignore=["SKD001"])

You can also set a global ignore list with configuration.ignore_diagnostics = ...:

from skore import configuration
configuration.ignore_diagnostics = ["SKD001"]

For cross-validation reports, diagnostics are computed per split and then aggregated at report level, trough ~skore.CrossValidationReport.diagnose. A diagnostic is reported as an issue only when it appears in a strict majority of evaluated splits.

For comparison reports, ~skore.ComparisonReport.diagnose builds a global diagnostic from each component report in the comparison. Diagnostics are grouped by component report and emitted as a single message.

SKD001 - Potential overfitting#

How it is detected#

skore compares train and test scores across the report’s default predictive metrics (timing metrics are excluded). A metric votes for overfitting when the train-favored gap exceeds an adaptive threshold:

higher-is-better metrics: train - test >= threshold
lower-is-better metrics: test - train >= threshold

The threshold adapts to the scale of the scores: max(0.03, 0.10 * |reference|) where the reference is the train score for higher-is-better metrics and the test score for lower-is-better metrics. The floor of 0.03 prevents the threshold from vanishing on near-zero scores.

The diagnostic is raised when a strict majority of metrics vote for overfitting.

Why it matters#

A persistent train/test gap suggests the model has captured patterns specific to the training data and may generalize poorly.

How to reduce the risk#

simplify the model,
regularize more strongly,
improve feature engineering,
use better validation protocols or more data.

SKD002 - Potential underfitting#

How it is detected#

skore checks two conditions together across the report’s default predictive metrics. A metric votes for underfitting when both hold:

Train and test scores are on par: the absolute difference is within max(0.03, 0.05 * max(|train|, |test|)).
Neither score significantly outperforms a dummy baseline: a score is considered significantly better than the baseline only when it exceeds max(0.01, 0.03 * |baseline|). The baseline is a DummyClassifier(strategy="prior") for classification and a DummyRegressor(strategy="mean") for regression.

The diagnostic is raised when a strict majority of comparable metrics (those present in both the estimator and dummy reports) vote for underfitting.

Why it matters#

When model performance is close to a naive baseline, the model is likely too simple, under-trained, or using features that do not capture enough signal.

How to reduce the risk#

increase model capacity,
improve data representation and features,
tune hyperparameters,
collect richer data if possible.

Diagnostics#

SKD001 - Potential overfitting#

How it is detected#

Why it matters#

How to reduce the risk#

SKD002 - Potential underfitting#

How it is detected#

Why it matters#

How to reduce the risk#

This Page