{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n\n# Using skrub DataOp cross-validation\n\nWhen a skrub :class:`~skrub.DataOp` defines a cross-validation splitter on\n:meth:`~skrub.DataOp.skb.mark_as_X`, :func:`~skore.evaluate` can reuse that\nconfiguration \u2014 including ``split_kwargs`` such as ``groups`` \u2014 instead of\nskore's default 80/20 holdout.\n\nThis example builds a small grouped cross-validation setup with skrub and\nevaluates it with skore.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Configure cross-validation on the DataOp\n\nWe use the toy products dataset and group products by seller. The goal is to\nassess generalization to new sellers with\n:class:`~sklearn.model_selection.LeaveOneGroupOut`.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import skrub\nfrom sklearn.dummy import DummyClassifier\nfrom sklearn.model_selection import LeaveOneGroupOut\n\ndf = skrub.datasets.toy_products()\ndata = skrub.var(\"df\")\ngroups = data[\"seller\"]\nX = data[[\"description\", \"price\"]].skb.mark_as_X(\n    cv=LeaveOneGroupOut(), split_kwargs={\"groups\": groups}\n)\ny = data[\"category\"].skb.mark_as_y()\npred = X.skb.apply(DummyClassifier(), y=y)\nlearner = pred.skb.make_learner()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Evaluate with skore (no explicit splitter)\n\nBecause ``mark_as_X`` was called with an explicit ``cv`` argument, calling\n:func:`~skore.evaluate` without a ``splitter`` returns a\n:class:`~skore.CrossValidationReport` that respects the DataOp grouping.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from skore import evaluate\n\nreport = evaluate(learner, data={\"df\": df})\nreport"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "There are two sellers, so cross-validation runs in two folds:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "len(report.reports_)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Inspect aggregated metrics with the same API as other skore reports:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "report.metrics.summarize().frame()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Default behavior without an explicit DataOp cv\n\nIf ``mark_as_X`` is called without an explicit ``cv`` argument,\n:func:`~skore.evaluate` still defaults to a single 80/20 holdout and returns\nan :class:`~skore.EstimatorReport`.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "simple_learner = skrub.X().skb.apply(DummyClassifier(), y=skrub.y()).skb.make_learner()\nholdout_report = evaluate(\n    simple_learner,\n    data={\"X\": df[[\"description\", \"price\"]], \"y\": df[\"category\"]},\n)\nholdout_report"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Explicitly passing a ``splitter`` always overrides the DataOp configuration.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "override_report = evaluate(learner, data={\"df\": df}, splitter=2)\noverride_report"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.14.6"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}