{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n\n# Automatic detection of modelling issues\n\n`skore` can automatically detect common modeling pitfalls such as overfitting\nand underfitting. This example walks through the ``.diagnose`` method: how to\nrun checks, how to read the detected issues, and how to mute specific checks.\n\nWe use a purely non-linear regression target and deliberately pick models that\nfail in known ways:\n\n- a **linear model** that cannot capture the non-linearity \u2192 underfitting,\n- a **single deep decision tree** that memorizes the training set perfectly\n  and fails to generalize \u2192 overfitting.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setup\n\nThe target is a product of trigonometric functions of the first two features:\ncompletely invisible to a linear model, yet easy to memorize for an\nunconstrained tree.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import numpy as np\nfrom sklearn.linear_model import LinearRegression\nfrom sklearn.tree import DecisionTreeRegressor\n\nrng = np.random.default_rng(42)\nn_samples = 500\nX = rng.uniform(0, 1, (n_samples, 5))\ny = np.sin(2 * np.pi * X[:, 0]) * np.cos(2 * np.pi * X[:, 1]) + rng.normal(\n    0, 0.1, n_samples\n)\n\nlinear = LinearRegression()\ndeep_tree = DecisionTreeRegressor(random_state=42)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Calling :meth:`~skore.EstimatorReport.diagnose` explicitly\n\nEvery report exposes a :meth:`~skore.EstimatorReport.diagnose` method.\nChecks are computed lazily and cached, so calling\n:meth:`~skore.EstimatorReport.diagnose` is always cheap after the first call.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from skore import evaluate\n\nlinear_report = evaluate(linear, X, y)\nlinear_report"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "linear_report.diagnose()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "linear_report.metrics.summarize(data_source=\"both\").frame()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The linear model is flagged for underfitting: its scores are on par between\ntrain and test, and not significantly better than a dummy baseline.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "tree_report = evaluate(deep_tree, X, y)\ntree_report.diagnose()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "tree_report.metrics.summarize(data_source=\"both\").frame()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The deep tree is flagged for overfitting: it achieves a perfect score on\ntrain but degrades on test.\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Ignoring specific checks\n\nEach check has a stable code (e.g. ``SKD001``, ``SKD002``). You can\nmute individual checks per call:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "tree_report.diagnose(ignore=[\"SKD001\"])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Or globally, so that every subsequent :meth:`~skore.EstimatorReport.diagnose` call\nskips them:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import skore\n\nwith skore.configuration(ignore_checks=[\"SKD001\"]):\n    diagnosis = tree_report.diagnose()\ndiagnosis"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Diagnostics on a :class:`~skore.CrossValidationReport`\n\nWhen ``splitter`` is an integer, :func:`~skore.evaluate` returns a\n:class:`~skore.CrossValidationReport`. Checks aggregate issues across folds.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "cv_report = evaluate(deep_tree, X, y, splitter=5)\ncv_report.diagnose()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Diagnostics on a :class:`~skore.ComparisonReport`\n\nPassing a list of estimators returns a :class:`~skore.ComparisonReport`.\nIssues are grouped by sub-report.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "comparison_report = evaluate([linear, deep_tree], X, y)\ncomparison_report.diagnose()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.14.4"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}