.. _diagnostics: =========== Diagnostics =========== `skore` diagnostics provide quick checks for common model quality pitfalls. Use :meth:`~skore.EstimatorReport.diagnose` to get concise findings about your model's quality. Each finding has: - a short explanation, - a stable diagnostic code, - and a link to this page. Diagnostics can be muted per call with `ignore=...`: .. code-block:: python report.diagnose(ignore=["SKD001"]) You can also set a global ignore list with `configuration.ignore_diagnostics = ...`: .. code-block:: python from skore import configuration configuration.ignore_diagnostics = ["SKD001"] For cross-validation reports, diagnostics are computed per split and then aggregated at report level, trough `~skore.CrossValidationReport.diagnose`. A diagnostic is reported as an issue only when it appears in a strict majority of evaluated splits. For comparison reports, `~skore.ComparisonReport.diagnose` builds a global diagnostic from each component report in the comparison. Diagnostics are grouped by component report and emitted as a single message. .. _skd001-overfitting: SKD001 - Potential overfitting ------------------------------ How it is detected ^^^^^^^^^^^^^^^^^^ `skore` compares train and test scores across the report's default predictive metrics (timing metrics are excluded). A metric votes for overfitting when the train-favored gap exceeds an adaptive threshold: - **higher-is-better** metrics: ``train - test >= threshold`` - **lower-is-better** metrics: ``test - train >= threshold`` The threshold adapts to the scale of the scores: ``max(0.03, 0.10 * |reference|)`` where the reference is the train score for higher-is-better metrics and the test score for lower-is-better metrics. The floor of 0.03 prevents the threshold from vanishing on near-zero scores. The diagnostic is raised when a **strict majority** of metrics vote for overfitting. Why it matters ^^^^^^^^^^^^^^ A persistent train/test gap suggests the model has captured patterns specific to the training data and may generalize poorly. How to reduce the risk ^^^^^^^^^^^^^^^^^^^^^^ - simplify the model, - regularize more strongly, - improve feature engineering, - use better validation protocols or more data. .. _skd002-underfitting: SKD002 - Potential underfitting ------------------------------- How it is detected ^^^^^^^^^^^^^^^^^^ `skore` checks two conditions together across the report's default predictive metrics. A metric votes for underfitting when **both** hold: 1. **Train and test scores are on par**: the absolute difference is within ``max(0.03, 0.05 * max(|train|, |test|))``. 2. **Neither score significantly outperforms a dummy baseline**: a score is considered significantly better than the baseline only when it exceeds ``max(0.01, 0.03 * |baseline|)``. The baseline is a ``DummyClassifier(strategy="prior")`` for classification and a ``DummyRegressor(strategy="mean")`` for regression. The diagnostic is raised when a **strict majority** of comparable metrics (those present in both the estimator and dummy reports) vote for underfitting. Why it matters ^^^^^^^^^^^^^^ When model performance is close to a naive baseline, the model is likely too simple, under-trained, or using features that do not capture enough signal. How to reduce the risk ^^^^^^^^^^^^^^^^^^^^^^ - increase model capacity, - improve data representation and features, - tune hyperparameters, - collect richer data if possible.