{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n\n# `EstimatorReport`: Get insights from any scikit-learn estimator\n\nThis example shows how the :class:`skore.EstimatorReport` class can be used to\nquickly get insights from any scikit-learn estimator.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Loading our dataset and defining our estimator\n\nFirst, we load a dataset from skrub. Our goal is to predict if a healthcare\nmanufacturing companies paid a medical doctors or hospitals, in order to detect\npotential conflict of interest.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from skrub.datasets import fetch_open_payments\n\ndataset = fetch_open_payments()\ndf = dataset.X\ny = dataset.y"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from skrub import TableReport\n\nTableReport(df)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "TableReport(y.to_frame())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Looking at the distributions of the target, we observe that this classification\ntask is quite imbalanced. It means that we have to be careful when selecting a set\nof statistical metrics to evaluate the classification performance of our predictive\nmodel. In addition, we see that the class labels are not specified by an integer\n0 or 1 but instead by a string \"allowed\" or \"disallowed\".\n\nFor our application, the label of interest is \"allowed\".\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "pos_label, neg_label = \"allowed\", \"disallowed\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Before training a predictive model, we need to split our dataset into a training\nand a validation set.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from skore import train_test_split\n\n# If you have many dataframes to split on, you can always ask train_test_split to return\n# a dictionary. Remember, it needs to be passed as a keyword argument!\nsplit_data = train_test_split(X=df, y=y, random_state=42, as_dict=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "By the way, notice how skore's :func:`~skore.train_test_split` automatically warns us\nfor a class imbalance.\n\nNow, we need to define a predictive model. Hopefully, `skrub` provides a convenient\nfunction (:func:`skrub.tabular_pipeline`) when it comes to getting strong baseline\npredictive models with a single line of code. As its feature engineering is generic,\nit does not provide some handcrafted and tailored feature engineering but still\nprovides a good starting point.\n\nSo let's create a classifier for our task.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from skrub import tabular_pipeline\n\nestimator = tabular_pipeline(\"classifier\")\nestimator"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Getting insights from our estimator\n\n### Introducing the :class:`skore.EstimatorReport` class\n\nNow, we would be interested in getting some insights from our predictive model.\nOne way is to use the :class:`skore.EstimatorReport` class. This constructor will\ndetect that our estimator is unfitted and will fit it for us on the training data.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from skore import EstimatorReport\n\nreport = EstimatorReport(estimator, **split_data, pos_label=pos_label)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Once the report is created, we get some information regarding the available tools\nallowing us to get some insights from our specific model on our specific task by\ncalling the :meth:`~skore.EstimatorReport.help` method.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "report.help()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Be aware that we can access the help for each individual sub-accessor. For instance:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "report.metrics.help()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Metrics computation with aggressive caching\n\nAt this point, we might be interested to have a first look at the statistical\nperformance of our model on the validation set that we provided. We can access it\nby calling any of the metrics displayed above. Since we are greedy, we want to get\nseveral metrics at once and we will use the\n:meth:`~skore.EstimatorReport.metrics.summarize` method.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import time\n\nstart = time.time()\nmetric_report = report.metrics.summarize().frame()\nend = time.time()\nmetric_report"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(f\"Time taken to compute the metrics: {end - start:.2f} seconds\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "An interesting feature provided by the :class:`skore.EstimatorReport` is the\nthe caching mechanism. Indeed, when we have a large enough dataset, computing the\npredictions for a model is not cheap anymore. For instance, on our smallish dataset,\nit took a couple of seconds to compute the metrics. The report will cache the\npredictions and if we are interested in computing a metric again or an alternative\nmetric that requires the same predictions, it will be faster. Let's check by\nrequesting the same metrics report again.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "start = time.time()\nmetric_report = report.metrics.summarize().frame()\nend = time.time()\nmetric_report"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(f\"Time taken to compute the metrics: {end - start:.2f} seconds\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Note that when the model is fitted or the predictions are computed,\nwe additionally store the time the operation took:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "report.metrics.timings()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Since we obtain a pandas dataframe, we can also use the plotting interface of\npandas.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\n\nax = metric_report.plot.barh()\nax.set_title(\"Metrics report\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Whenever computing a metric, we check if the predictions are available in the cache\nand reload them if available. So for instance, let's compute the log loss.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "start = time.time()\nlog_loss = report.metrics.log_loss()\nend = time.time()\nlog_loss"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(f\"Time taken to compute the log loss: {end - start:.2f} seconds\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can show that without initial cache, it would have taken more time to compute\nthe log loss.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "report.clear_cache()\n\nstart = time.time()\nlog_loss = report.metrics.log_loss()\nend = time.time()\nlog_loss"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(f\"Time taken to compute the log loss: {end - start:.2f} seconds\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "By default, the metrics are computed on the test set only. However, if a training set\nis provided, we can also compute the metrics by specifying the `data_source`\nparameter.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "report.metrics.log_loss(data_source=\"train\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Be aware that we can also benefit from the caching mechanism with our own custom\nmetrics. Skore only expects that we define our own metric function to take `y_true`\nand `y_pred` as the first two positional arguments. It can take any other arguments.\nLet's see an example.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def operational_decision_cost(y_true, y_pred, amount):\n    mask_true_positive = (y_true == pos_label) & (y_pred == pos_label)\n    mask_true_negative = (y_true == neg_label) & (y_pred == neg_label)\n    mask_false_positive = (y_true == neg_label) & (y_pred == pos_label)\n    mask_false_negative = (y_true == pos_label) & (y_pred == neg_label)\n    fraudulent_refuse = mask_true_positive.sum() * 50\n    fraudulent_accept = -amount[mask_false_negative].sum()\n    legitimate_refuse = mask_false_positive.sum() * -5\n    legitimate_accept = (amount[mask_true_negative] * 0.02).sum()\n    return fraudulent_refuse + fraudulent_accept + legitimate_refuse + legitimate_accept"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "In our use case, we have a operational decision to make that translate the\nclassification outcome into a cost. It translate the confusion matrix into a cost\nmatrix based on some amount linked to each sample in the dataset that are provided to\nus. Here, we randomly generate some amount as an illustration.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import numpy as np\n\nrng = np.random.default_rng(42)\namount = rng.integers(low=100, high=1000, size=len(split_data[\"y_test\"]))\n\nreport.metrics.add(\n    metric=operational_decision_cost,\n    response_method=\"predict\",\n    amount=amount,\n)\n\ncost = report.metrics.summarize(metric=\"operational_decision_cost\")\ncost"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "By the way, skore caches the model predictions. It is really handy because it means\nthat we can compute some additional metrics without having to recompute the\nthe predictions.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "report.metrics.summarize(\n    metric=[\"precision\", \"recall\", \"operational_decision_cost\"],\n).frame()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Effortless one-liner plotting\n\nThe :class:`skore.EstimatorReport` class also provides a plotting interface that\nallows to plot *defacto* the most common plots. As for the metrics, we only\nprovide the meaningful set of plots for the provided estimator.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "report.metrics.help()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's start by plotting the ROC curve for our binary classification task.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "display = report.metrics.roc()\ndisplay.plot()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The plot functionality is built upon the scikit-learn display objects. We return\nthose display (slightly modified to improve the UI) in case we want to tweak some\nof the plot properties. We can have quick look at the available attributes and\nmethods by calling the ``help`` method or simply by printing the display.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "display"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "display.help()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "fig = display.plot()\n_ = fig.axes[0].set_title(\"Example of a ROC curve\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Similarly to the metrics, we aggressively use the caching to avoid recomputing the\npredictions of the model. We also cache the plot display object by detection if the\ninput parameters are the same as the previous call. Let's demonstrate the kind of\nperformance gain we can get.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "start = time.time()\n# we already trigger the computation of the predictions in a previous call\ndisplay = report.metrics.roc()\ndisplay.plot()\nend = time.time()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(f\"Time taken to compute the ROC curve: {end - start:.2f} seconds\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now, let's clean the cache and check if we get a slowdown.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "report.clear_cache()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "start = time.time()\ndisplay = report.metrics.roc()\ndisplay.plot()\nend = time.time()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(f\"Time taken to compute the ROC curve: {end - start:.2f} seconds\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "As expected, since we need to recompute the predictions, it takes more time.\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Visualizing the confusion matrix\n\nAnother useful visualization for classification tasks is the confusion matrix,\nwhich shows the counts of correct and incorrect predictions for each class.\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's first start with a basic confusion matrix:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "cm_display = report.metrics.confusion_matrix()\ncm_display.plot()\nplt.show(block=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "In binary classification, a confusion matrix depends on the decision threshold used\nto convert predicted probabilities into class labels. By default, skore uses a\nthreshold of 0.5, but confusion matrices are actually computed at every threshold\ninternally.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# To visualize the confusion matrix at a different threshold, use the ``threshold_value``\n# parameter. For example, a threshold of 0.3 will classify more samples as positive:\ncm_display.plot(threshold_value=0.3)\nplt.show(block=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can normalize the confusion matrix to get percentages instead of raw counts.\nHere we normalize by true labels (rows):\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "cm_display.plot(normalize=\"true\")\nplt.show(block=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "More plotting options are available via ``heatmap_kwargs``, which are passed to\nseaborn's heatmap. For example, we can customize the colormap and number format:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "cm_display.set_style(heatmap_kwargs={\"cmap\": \"Greens\", \"fmt\": \".2e\"})\ncm_display.plot()\nplt.show(block=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finally, the confusion matrix can also be exported as a pandas DataFrame for further\nanalysis:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "cm_frame = cm_display.frame()\ncm_frame"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        ".. seealso::\n\n  For using the :class:`~skore.EstimatorReport` to inspect your models,\n  see `example_feature_importance`.\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.14.4"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}