.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/model_evaluation/plot_cross_validate.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_model_evaluation_plot_cross_validate.py: .. _example_cross_validate: ================ Cross-validation ================ This example illustrates the motivation and the use of skore's :class:`skore.CrossValidationReporter` to get assistance when developing ML/DS projects. .. warning :: **Deprecation Notice**: :class:`skore.CrossValidationReporter` is deprecated in favor of :class:`skore.CrossValidationReport`. .. GENERATED FROM PYTHON SOURCE LINES 18-20 Creating and loading the skore project ====================================== .. GENERATED FROM PYTHON SOURCE LINES 22-23 We create and load the skore project from the current directory: .. GENERATED FROM PYTHON SOURCE LINES 25-29 .. code-block:: Python import skore my_project = skore.open("my_project", create=True) .. rst-class:: sphx-glr-script-out .. code-block:: none ──────────────────────────────────────── skore ───────────────────────────────────────── Project file 'my_project.skore' was successfully created. .. GENERATED FROM PYTHON SOURCE LINES 30-48 Cross-validation in scikit-learn ================================ Scikit-learn holds two functions for cross-validation: * :func:`sklearn.model_selection.cross_val_score` * :func:`sklearn.model_selection.cross_validate` Essentially, :func:`sklearn.model_selection.cross_val_score` runs cross-validation for single metric evaluation, while :func:`sklearn.model_selection.cross_validate` runs cross-validation with multiple metrics and can also return extra information such as train scores, fit times, and score times. Hence, in skore, we are more interested in the :func:`sklearn.model_selection.cross_validate` function as it allows to do more than the historical :func:`sklearn.model_selection.cross_val_score`. Let us illustrate cross-validation on a multi-class classification task. .. GENERATED FROM PYTHON SOURCE LINES 50-56 .. code-block:: Python from sklearn.datasets import load_iris from sklearn.svm import SVC X, y = load_iris(return_X_y=True) clf = SVC(kernel="linear", C=1, random_state=0) .. GENERATED FROM PYTHON SOURCE LINES 57-58 Single metric evaluation using :func:`sklearn.model_selection.cross_validate`: .. GENERATED FROM PYTHON SOURCE LINES 60-65 .. code-block:: Python from sklearn.model_selection import cross_validate as sklearn_cross_validate cv_results = sklearn_cross_validate(clf, X, y, cv=5) print(f"test_score: {cv_results['test_score']}") .. rst-class:: sphx-glr-script-out .. code-block:: none test_score: [0.96666667 1. 0.96666667 0.96666667 1. ] .. GENERATED FROM PYTHON SOURCE LINES 66-67 Multiple metric evaluation using :func:`sklearn.model_selection.cross_validate`: .. GENERATED FROM PYTHON SOURCE LINES 69-81 .. code-block:: Python import pandas as pd cv_results = sklearn_cross_validate( clf, X, y, cv=5, scoring=["accuracy", "precision_macro"], ) test_scores = pd.DataFrame(cv_results)[["test_accuracy", "test_precision_macro"]] test_scores .. raw:: html

	test_accuracy	test_precision_macro
0	0.966667	0.969697
1	1.000000	1.000000
2	0.966667	0.969697
3	0.966667	0.969697
4	1.000000	1.000000

.. GENERATED FROM PYTHON SOURCE LINES 82-100 In scikit-learn, why do we recommend using ``cross_validate`` over ``cross_val_score``? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Here, for the :class:`~sklearn.svm.SVC`, the default score is the accuracy. If the users want other scores to better understand their model such as the precision and the recall, they can specify it which is very convenient. Otherwise, they would have to run several :func:`sklearn.model_selection.cross_val_score` with different ``scoring`` parameters each time, which leads to more unnecessary compute. Why do we recommend using skore's ``CrossValidationReporter`` over scikit-learn's ``cross_validate``? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In the example above, what if the users ran scikit-learn's :func:`sklearn.model_selection.cross_validate` but forgot to manually add a crucial score for their use case such as the recall? They would have to re-run the whole cross-validation experiment by adding this crucial score, which leads to more compute. .. GENERATED FROM PYTHON SOURCE LINES 102-114 Cross-validation in skore ========================= In order to assist its users when programming, skore has implemented a :class:`skore.CrossValidationReporter` class that wraps scikit-learn's :func:`sklearn.model_selection.cross_validate`, to provide more context and facilitate the analysis. Classification task ^^^^^^^^^^^^^^^^^^^ Let us continue with the same use case. .. GENERATED FROM PYTHON SOURCE LINES 116-119 .. code-block:: Python reporter = skore.CrossValidationReporter(clf, X, y, cv=5) reporter.plots.scores .. raw:: html

.. GENERATED FROM PYTHON SOURCE LINES 120-136 Skore's :class:`~skore.CrossValidationReporter` advantages are the following: * By default, it computes several useful scores without the need to manually specify them. For classification, one can observe that it computed the accuracy, the precision, and the recall. * We automatically get some interactive Plotly graphs to better understand how our model behaves depending on the split. For example: * We can compare the fitting and scoring times together for each split. * Moreover, we can focus on the times per data points as the train and test splits usually have a different number of samples. * We can compare the accuracy, precision, and recall scores together for each split. .. GENERATED FROM PYTHON SOURCE LINES 138-140 Regression task ^^^^^^^^^^^^^^^ .. GENERATED FROM PYTHON SOURCE LINES 142-151 .. code-block:: Python from sklearn.datasets import load_diabetes from sklearn.linear_model import Lasso X, y = load_diabetes(return_X_y=True) lasso = Lasso() reporter = skore.CrossValidationReporter(lasso, X, y, cv=5) reporter.plots.scores .. raw:: html

.. GENERATED FROM PYTHON SOURCE LINES 152-153 We can put the reporter in the project, and retrieve it as is: .. GENERATED FROM PYTHON SOURCE LINES 153-158 .. code-block:: Python my_project.put("cross_validation_reporter", reporter) reporter = my_project.get("cross_validation_reporter") reporter.plots.scores .. raw:: html

.. GENERATED FROM PYTHON SOURCE LINES 159-164 Cleanup the project ------------------- Let's clear the skore project (to avoid any conflict with other documentation examples). .. GENERATED FROM PYTHON SOURCE LINES 166-167 .. code-block:: Python my_project.clear() .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.201 seconds) .. _sphx_glr_download_auto_examples_model_evaluation_plot_cross_validate.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_cross_validate.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_cross_validate.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_cross_validate.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_