.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/technical_details/plot_skrub_data_op_cv.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_technical_details_plot_skrub_data_op_cv.py: .. _example_skrub_data_op_cv: =================================== Using skrub DataOp cross-validation =================================== When a skrub :class:`~skrub.DataOp` defines a cross-validation splitter on :meth:`~skrub.DataOp.skb.mark_as_X`, :func:`~skore.evaluate` can reuse that configuration — including ``split_kwargs`` such as ``groups`` — instead of skore's default 80/20 holdout. This example builds a small grouped cross-validation setup with skrub and evaluates it with skore. .. GENERATED FROM PYTHON SOURCE LINES 18-24 Configure cross-validation on the DataOp ======================================== We use the toy products dataset and group products by seller. The goal is to assess generalization to new sellers with :class:`~sklearn.model_selection.LeaveOneGroupOut`. .. GENERATED FROM PYTHON SOURCE LINES 24-38 .. code-block:: Python import skrub from sklearn.dummy import DummyClassifier from sklearn.model_selection import LeaveOneGroupOut df = skrub.datasets.toy_products() data = skrub.var("df") groups = data["seller"] X = data[["description", "price"]].skb.mark_as_X( cv=LeaveOneGroupOut(), split_kwargs={"groups": groups} ) y = data["category"].skb.mark_as_y() pred = X.skb.apply(DummyClassifier(), y=y) learner = pred.skb.make_learner() .. GENERATED FROM PYTHON SOURCE LINES 39-45 Evaluate with skore (no explicit splitter) ========================================== Because ``mark_as_X`` was called with an explicit ``cv`` argument, calling :func:`~skore.evaluate` without a ``splitter`` returns a :class:`~skore.CrossValidationReport` that respects the DataOp grouping. .. GENERATED FROM PYTHON SOURCE LINES 45-50 .. code-block:: Python from skore import evaluate report = evaluate(learner, data={"df": df}) report .. raw:: html
SkrubLearner(data_op=<Apply DummyClassifier>)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Please enable javascript

The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").



.. GENERATED FROM PYTHON SOURCE LINES 51-52 There are two sellers, so cross-validation runs in two folds: .. GENERATED FROM PYTHON SOURCE LINES 52-54 .. code-block:: Python len(report.reports_) .. rst-class:: sphx-glr-script-out .. code-block:: none 2 .. GENERATED FROM PYTHON SOURCE LINES 55-56 Inspect aggregated metrics with the same API as other skore reports: .. GENERATED FROM PYTHON SOURCE LINES 56-58 .. code-block:: Python report.metrics.summarize().frame() .. raw:: html
SkrubLearner
mean std
Metric Label
Score 0.666667 0.000000e+00
Accuracy 0.666667 0.000000e+00
Precision electronics 0.666667 0.000000e+00
tools 0.000000 0.000000e+00
Recall electronics 1.000000 0.000000e+00
tools 0.000000 0.000000e+00
ROC AUC 0.500000 0.000000e+00
Log loss 0.636514 0.000000e+00
Brier score 0.222222 2.775558e-17
Fit time (s) 0.003148 2.069348e-05
Predict time (s) 0.001972 8.291251e-05


.. GENERATED FROM PYTHON SOURCE LINES 59-65 Default behavior without an explicit DataOp cv ============================================== If ``mark_as_X`` is called without an explicit ``cv`` argument, :func:`~skore.evaluate` still defaults to a single 80/20 holdout and returns an :class:`~skore.EstimatorReport`. .. GENERATED FROM PYTHON SOURCE LINES 65-72 .. code-block:: Python simple_learner = skrub.X().skb.apply(DummyClassifier(), y=skrub.y()).skb.make_learner() holdout_report = evaluate( simple_learner, data={"X": df[["description", "price"]], "y": df["category"]}, ) holdout_report .. raw:: html
SkrubLearner(data_op=<Apply DummyClassifier>)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Please enable javascript

The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").



.. GENERATED FROM PYTHON SOURCE LINES 73-74 Explicitly passing a ``splitter`` always overrides the DataOp configuration. .. GENERATED FROM PYTHON SOURCE LINES 74-76 .. code-block:: Python override_report = evaluate(learner, data={"df": df}, splitter=2) override_report .. raw:: html
SkrubLearner(data_op=<Apply DummyClassifier>)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Please enable javascript

The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").



.. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 2.128 seconds) .. _sphx_glr_download_auto_examples_technical_details_plot_skrub_data_op_cv.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_skrub_data_op_cv.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_skrub_data_op_cv.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_skrub_data_op_cv.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_