Note

Go to the end to download the full example code.

Cross-validation#

This example illustrates the motivation and the use of skore’s skore.CrossValidationReporter to get assistance when developing ML/DS projects.

Warning

Deprecation Notice: skore.CrossValidationReporter is deprecated in favor of skore.CrossValidationReport.

Creating and loading the skore project#

We create and load the skore project from the current directory:

import skore

my_project = skore.open("my_project", create=True)

──────────────────────────────────────── skore ─────────────────────────────────────────
Project file 'my_project.skore' was successfully created.

Cross-validation in scikit-learn#

Scikit-learn holds two functions for cross-validation:

Essentially, sklearn.model_selection.cross_val_score() runs cross-validation for single metric evaluation, while sklearn.model_selection.cross_validate() runs cross-validation with multiple metrics and can also return extra information such as train scores, fit times, and score times.

Hence, in skore, we are more interested in the sklearn.model_selection.cross_validate() function as it allows to do more than the historical sklearn.model_selection.cross_val_score().

Let us illustrate cross-validation on a multi-class classification task.

from sklearn.datasets import load_iris
from sklearn.svm import SVC

X, y = load_iris(return_X_y=True)
clf = SVC(kernel="linear", C=1, random_state=0)

Single metric evaluation using sklearn.model_selection.cross_validate():

from sklearn.model_selection import cross_validate as sklearn_cross_validate

cv_results = sklearn_cross_validate(clf, X, y, cv=5)
print(f"test_score: {cv_results['test_score']}")

test_score: [0.96666667 1.         0.96666667 0.96666667 1.        ]

Multiple metric evaluation using sklearn.model_selection.cross_validate():

import pandas as pd

cv_results = sklearn_cross_validate(
    clf,
    X,
    y,
    cv=5,
    scoring=["accuracy", "precision_macro"],
)
test_scores = pd.DataFrame(cv_results)[["test_accuracy", "test_precision_macro"]]
test_scores

	test_accuracy	test_precision_macro
0	0.966667	0.969697
1	1.000000	1.000000
2	0.966667	0.969697
3	0.966667	0.969697
4	1.000000	1.000000

Cross-validation in skore#

In order to assist its users when programming, skore has implemented a skore.CrossValidationReporter class that wraps scikit-learn’s sklearn.model_selection.cross_validate(), to provide more context and facilitate the analysis.

Classification task#

Let us continue with the same use case.

reporter = skore.CrossValidationReporter(clf, X, y, cv=5)
reporter.plots.scores

Skore’s CrossValidationReporter advantages are the following:

By default, it computes several useful scores without the need to manually specify them. For classification, one can observe that it computed the accuracy, the precision, and the recall.
We automatically get some interactive Plotly graphs to better understand how our model behaves depending on the split. For example:
- We can compare the fitting and scoring times together for each split.
- Moreover, we can focus on the times per data points as the train and test splits usually have a different number of samples.
- We can compare the accuracy, precision, and recall scores together for each split.

Regression task#

from sklearn.datasets import load_diabetes
from sklearn.linear_model import Lasso

X, y = load_diabetes(return_X_y=True)
lasso = Lasso()

reporter = skore.CrossValidationReporter(lasso, X, y, cv=5)
reporter.plots.scores

We can put the reporter in the project, and retrieve it as is:

my_project.put("cross_validation_reporter", reporter)

reporter = my_project.get("cross_validation_reporter")
reporter.plots.scores

Cleanup the project#

Let’s clear the skore project (to avoid any conflict with other documentation examples).

my_project.clear()

Total running time of the script: (0 minutes 0.201 seconds)

Gallery generated by Sphinx-Gallery

Cross-validation#

Creating and loading the skore project#

Cross-validation in scikit-learn#

In scikit-learn, why do we recommend using cross_validate over cross_val_score?#

Why do we recommend using skore’s CrossValidationReporter over scikit-learn’s cross_validate?#

Cross-validation in skore#

Classification task#

Regression task#

Cleanup the project#

In scikit-learn, why do we recommend using `cross_validate` over `cross_val_score`?#

Why do we recommend using skore’s `CrossValidationReporter` over scikit-learn’s `cross_validate`?#