Note
Go to the end to download the full example code.
Cross-validation#
This example illustrates the motivation and the use of skore’s
skore.CrossValidationReporter
to get assistance when developing ML/DS projects.
Warning
Deprecation Notice:
skore.CrossValidationReporter
is deprecated in favor of skore.CrossValidationReport
.
Creating and loading the skore project#
We create and load the skore project from the current directory:
import skore
my_project = skore.open("my_project", create=True)
──────────────────────────────────────── skore ─────────────────────────────────────────
Project file 'my_project.skore' was successfully created.
Cross-validation in scikit-learn#
Scikit-learn holds two functions for cross-validation:
Essentially, sklearn.model_selection.cross_val_score()
runs cross-validation for
single metric evaluation, while sklearn.model_selection.cross_validate()
runs
cross-validation with multiple metrics and can also return extra information such as
train scores, fit times, and score times.
Hence, in skore, we are more interested in the
sklearn.model_selection.cross_validate()
function as it allows to do
more than the historical sklearn.model_selection.cross_val_score()
.
Let us illustrate cross-validation on a multi-class classification task.
Single metric evaluation using sklearn.model_selection.cross_validate()
:
from sklearn.model_selection import cross_validate as sklearn_cross_validate
cv_results = sklearn_cross_validate(clf, X, y, cv=5)
print(f"test_score: {cv_results['test_score']}")
test_score: [0.96666667 1. 0.96666667 0.96666667 1. ]
Multiple metric evaluation using sklearn.model_selection.cross_validate()
:
import pandas as pd
cv_results = sklearn_cross_validate(
clf,
X,
y,
cv=5,
scoring=["accuracy", "precision_macro"],
)
test_scores = pd.DataFrame(cv_results)[["test_accuracy", "test_precision_macro"]]
test_scores
In scikit-learn, why do we recommend using cross_validate
over cross_val_score
?#
Here, for the SVC
, the default score is the accuracy.
If the users want other scores to better understand their model such as the
precision and the recall, they can specify it which is very convenient.
Otherwise, they would have to run several
sklearn.model_selection.cross_val_score()
with different scoring
parameters each time, which leads to more unnecessary compute.
Why do we recommend using skore’s CrossValidationReporter
over scikit-learn’s cross_validate
?#
In the example above, what if the users ran scikit-learn’s
sklearn.model_selection.cross_validate()
but forgot to manually add a
crucial score for their use case such as the recall?
They would have to re-run the whole cross-validation experiment by adding this
crucial score, which leads to more compute.
Cross-validation in skore#
In order to assist its users when programming, skore has implemented a
skore.CrossValidationReporter
class that wraps scikit-learn’s
sklearn.model_selection.cross_validate()
, to provide more
context and facilitate the analysis.
Classification task#
Let us continue with the same use case.
Skore’s CrossValidationReporter
advantages are the following:
By default, it computes several useful scores without the need to manually specify them. For classification, one can observe that it computed the accuracy, the precision, and the recall.
We automatically get some interactive Plotly graphs to better understand how our model behaves depending on the split. For example:
We can compare the fitting and scoring times together for each split.
Moreover, we can focus on the times per data points as the train and test splits usually have a different number of samples.
We can compare the accuracy, precision, and recall scores together for each split.
Regression task#
from sklearn.datasets import load_diabetes
from sklearn.linear_model import Lasso
X, y = load_diabetes(return_X_y=True)
lasso = Lasso()
reporter = skore.CrossValidationReporter(lasso, X, y, cv=5)
reporter.plots.scores
We can put the reporter in the project, and retrieve it as is:
my_project.put("cross_validation_reporter", reporter)
reporter = my_project.get("cross_validation_reporter")
reporter.plots.scores
Cleanup the project#
Let’s clear the skore project (to avoid any conflict with other documentation examples).
Total running time of the script: (0 minutes 0.201 seconds)