Note
Go to the end to download the full example code.
Skore: getting started#
This getting started guide illustrates how to use skore and why:
Get assistance when developing your machine learning projects to avoid common pitfalls and follow recommended practices.
skore.EstimatorReport
: get an insightful report on your estimator, for evaluation and inspectionskore.CrossValidationReport
: get an insightful report on your cross-validation resultsskore.ComparisonReport
: benchmark your skore estimator reportsskore.train_test_split()
: get diagnostics when splitting your data
Track your machine learning results using skore’s
Project
(for storage).
Machine learning evaluation and diagnostics#
Skore implements new tools or wraps some key scikit-learn class / functions to automatically provide insights and diagnostics when using them, as a way to facilitate good practices and avoid common pitfalls.
Model evaluation with skore#
In order to assist its users when programming, skore has implemented a
skore.EstimatorReport
class.
Let us load a binary classification dataset and get the estimator report for a
RandomForestClassifier
:
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from skore import EstimatorReport
X, y = load_breast_cancer(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
rf = RandomForestClassifier(random_state=0)
rf_report = EstimatorReport(
rf, X_train=X_train, X_test=X_test, y_train=y_train, y_test=y_test
)
Now, we can display the helper to see all the insights that are available to us (skore detected that we are doing binary classification):
╭───────────────── Tools to diagnose estimator RandomForestClassifier ─────────────────╮
│ EstimatorReport │
│ ├── .metrics │
│ │ ├── .accuracy(...) (↗︎) - Compute the accuracy score. │
│ │ ├── .brier_score(...) (↘︎) - Compute the Brier score. │
│ │ ├── .log_loss(...) (↘︎) - Compute the log loss. │
│ │ ├── .precision(...) (↗︎) - Compute the precision score. │
│ │ ├── .precision_recall(...) - Plot the precision-recall curve. │
│ │ ├── .recall(...) (↗︎) - Compute the recall score. │
│ │ ├── .roc(...) - Plot the ROC curve. │
│ │ ├── .roc_auc(...) (↗︎) - Compute the ROC AUC score. │
│ │ ├── .timings(...) - Get all measured processing times related │
│ │ │ to the estimator. │
│ │ ├── .custom_metric(...) - Compute a custom metric. │
│ │ └── .report_metrics(...) - Report a set of metrics for our estimator. │
│ ├── .feature_importance │
│ │ ├── .mean_decrease_impurity(...) - Retrieve the mean decrease impurity (MDI) │
│ │ │ of a tree-based model. │
│ │ └── .permutation(...) - Report the permutation feature importance. │
│ ├── .cache_predictions(...) - Cache estimator's predictions. │
│ ├── .clear_cache(...) - Clear the cache. │
│ ├── .get_predictions(...) - Get estimator's predictions. │
│ └── Attributes │
│ ├── .X_test - Testing data │
│ ├── .X_train - Training data │
│ ├── .y_test - Testing target │
│ ├── .y_train - Training target │
│ ├── .estimator_ - The cloned or copied estimator │
│ ├── .estimator_name_ - The name of the estimator │
│ ├── .fit_time_ - The time taken to fit the estimator, in │
│ │ seconds │
│ └── .ml_task - No description available │
│ │
│ │
│ Legend: │
│ (↗︎) higher is better (↘︎) lower is better │
╰──────────────────────────────────────────────────────────────────────────────────────╯
Note
This helper is great because:
it enables users to get a glimpse at the API of the different available accessors without having to look up the online documentation,
it provides methodological guidance: for example, we easily provide several metrics as a way to encourage users looking into them.
We can evaluate our model using the metrics()
accessor.
In particular, we can get the report metrics that is computed for us (including the
fit and prediction times):
rf_report.metrics.report_metrics(pos_label=1)
For inspection, we can also retrieve the predictions, on the train set for example (here we display only the first 10 predictions for conciseness purposes):
rf_report.get_predictions(data_source="train", response_method="predict")[0:10]
array([1, 1, 0, 1, 0, 1, 1, 1, 1, 1])
We can also plot the ROC curve that is generated for us:
roc_plot = rf_report.metrics.roc()
roc_plot.plot()

Furthermore, we can inspect our model using the
feature_importance()
accessor.
In particular, we can inspect the model using the permutation feature importance:
import matplotlib.pyplot as plt
rf_report.feature_importance.permutation(seed=0).T.boxplot(vert=False)
plt.tight_layout()

See also
For more information about the motivation and usage of
skore.EstimatorReport
, see the following use cases:
EstimatorReport: Get insights from any scikit-learn estimator for model evaluation,
EstimatorReport: Inspecting your models with the feature importance for model inspection.
Cross-validation with skore#
skore has also (re-)implemented a skore.CrossValidationReport
class that
contains several skore.EstimatorReport
, one for each fold.
from skore import CrossValidationReport
cv_report = CrossValidationReport(rf, X, y, cv_splitter=5)
We display the cross-validation report helper:
╭───────────────── Tools to diagnose estimator RandomForestClassifier ─────────────────╮
│ CrossValidationReport │
│ ├── .metrics │
│ │ ├── .accuracy(...) (↗︎) - Compute the accuracy score. │
│ │ ├── .brier_score(...) (↘︎) - Compute the Brier score. │
│ │ ├── .log_loss(...) (↘︎) - Compute the log loss. │
│ │ ├── .precision(...) (↗︎) - Compute the precision score. │
│ │ ├── .precision_recall(...) - Plot the precision-recall curve. │
│ │ ├── .recall(...) (↗︎) - Compute the recall score. │
│ │ ├── .roc(...) - Plot the ROC curve. │
│ │ ├── .roc_auc(...) (↗︎) - Compute the ROC AUC score. │
│ │ ├── .timings(...) - Get all measured processing times related │
│ │ │ to the estimator. │
│ │ ├── .custom_metric(...) - Compute a custom metric. │
│ │ └── .report_metrics(...) - Report a set of metrics for our estimator. │
│ ├── .cache_predictions(...) - Cache the predictions for sub-estimators │
│ │ reports. │
│ ├── .clear_cache(...) - Clear the cache. │
│ ├── .get_predictions(...) - Get estimator's predictions. │
│ └── Attributes │
│ ├── .X - The data to fit │
│ ├── .y - The target variable to try to predict in │
│ │ the case of supervised learning │
│ ├── .estimator_ - The cloned or copied estimator │
│ ├── .estimator_name_ - The name of the estimator │
│ ├── .estimator_reports_ - The estimator reports for each split │
│ └── .n_jobs - Number of jobs to run in parallel │
│ │
│ │
│ Legend: │
│ (↗︎) higher is better (↘︎) lower is better │
╰──────────────────────────────────────────────────────────────────────────────────────╯
We display the mean and standard deviation for each metric:
cv_report.metrics.report_metrics(pos_label=1)
or by individual fold:
cv_report.metrics.report_metrics(aggregate=None, pos_label=1)
We display the ROC curves for each fold:
roc_plot_cv = cv_report.metrics.roc()
roc_plot_cv.plot()

We can retrieve the estimator report of a specific fold to investigate further, for example getting the report metrics for the first fold only:
cv_report.estimator_reports_[0].metrics.report_metrics(pos_label=1)
See also
For more information about the motivation and usage of
skore.CrossValidationReport
, see Simplified experiment reporting.
Comparing estimator reports#
skore.ComparisonReport
enables users to compare several estimator reports
(corresponding to several estimators) on a same test set, as in a benchmark of
estimators.
Apart from the previous rf_report
, let us define another estimator report:
from sklearn.ensemble import GradientBoostingClassifier
gb = GradientBoostingClassifier(random_state=0)
gb_report = EstimatorReport(
gb, X_train=X_train, X_test=X_test, y_train=y_train, y_test=y_test
)
We can conveniently compare our two estimator reports, that were applied to the exact same test set:
from skore import ComparisonReport
comparator = ComparisonReport(reports=[rf_report, gb_report])
As for the EstimatorReport
and the
CrossValidationReport
, we have a helper:
╭──────────────────────────── Tools to compare estimators ─────────────────────────────╮
│ ComparisonReport │
│ ├── .metrics │
│ │ ├── .accuracy(...) (↗︎) - Compute the accuracy score. │
│ │ ├── .brier_score(...) (↘︎) - Compute the Brier score. │
│ │ ├── .log_loss(...) (↘︎) - Compute the log loss. │
│ │ ├── .precision(...) (↗︎) - Compute the precision score. │
│ │ ├── .precision_recall(...) - Plot the precision-recall curve. │
│ │ ├── .recall(...) (↗︎) - Compute the recall score. │
│ │ ├── .roc(...) - Plot the ROC curve. │
│ │ ├── .roc_auc(...) (↗︎) - Compute the ROC AUC score. │
│ │ ├── .timings(...) - Get all measured processing times related │
│ │ │ to the different estimators. │
│ │ ├── .custom_metric(...) - Compute a custom metric. │
│ │ └── .report_metrics(...) - Report a set of metrics for the estimators. │
│ ├── .cache_predictions(...) - Cache the predictions for sub-estimators │
│ │ reports. │
│ ├── .clear_cache(...) - Clear the cache. │
│ ├── .get_predictions(...) - Get estimator's predictions. │
│ └── Attributes │
│ ├── .estimator_reports_ - The compared estimator reports │
│ ├── .n_jobs - Number of jobs to run in parallel │
│ └── .report_names_ - The names of the compared estimator reports │
│ │
│ │
│ Legend: │
│ (↗︎) higher is better (↘︎) lower is better │
╰──────────────────────────────────────────────────────────────────────────────────────╯
Let us display the result of our benchmark:
comparator.metrics.report_metrics(pos_label=1)
Thus, we easily have the result of our benchmark for several recommended metrics.
Moreover, we can display the ROC curve for the two estimator reports we want to compare, by superimposing them on the same figure:
comparator.metrics.roc().plot()

Train-test split with skore#
Skore has implemented a skore.train_test_split()
function that wraps
scikit-learn’s sklearn.model_selection.train_test_split()
.
Let us load a dataset containing some time series data:
import pandas as pd
from skrub.datasets import fetch_employee_salaries
dataset_employee = fetch_employee_salaries()
X_employee, y_employee = dataset_employee.X, dataset_employee.y
X_employee["date_first_hired"] = pd.to_datetime(
X_employee["date_first_hired"], format="%m/%d/%Y"
)
X_employee.head(2)
Downloading 'employee_salaries' from https://github.com/skrub-data/skrub-data-files/raw/refs/heads/main/employee_salaries.zip (attempt 1/3)
We can observe that there is a date_first_hired
which is time-based.
Now, let us apply skore.train_test_split()
on this data:
import skore
_ = skore.train_test_split(
X=X_employee, y=y_employee, random_state=0, shuffle=False, as_dict=True
)
╭─────────────────────────────── TimeBasedColumnWarning ───────────────────────────────╮
│ We detected some time-based columns (column "date_first_hired") in your data. We │
│ recommend using scikit-learn's TimeSeriesSplit instead of train_test_split. │
│ Otherwise you might train on future data to predict the past, or get inflated model │
│ performance evaluation because natural drift will not be taken into account. │
╰──────────────────────────────────────────────────────────────────────────────────────╯
We get a TimeBasedColumnWarning
advising us to use
sklearn.model_selection.TimeSeriesSplit
instead!
Indeed, we should not shuffle time-ordered data!
See also
More methodological advice is available.
For more information about the motivation and usage of
skore.train_test_split()
, see train_test_split: get diagnostics when splitting your data.
Tracking: skore project#
Another key feature of skore is its Project
that allows us to store
and retrieve items of many types.
Setup: creating and loading a skore project#
Let us start by creating a skore project directory named my_project.skore
in our
current directory:
my_project = skore.Project("my_project")
Skore project: storing and retrieving some items#
Now that the project exists, we can store some useful items in it (in the same
directory) using put()
, with a “universal” key-value convention,
along with some annotations.
Let us store the accuracy and the estimator report of the random forest using
put()
, along with some annotation to help us track our
experiments:
my_project.put("accuracy", rf_report.metrics.accuracy(), note="random forest, float")
my_project.put(
"estimator_report", rf_report, note="random forest, skore estimator report"
)
Note
With the skore put()
, there is no need to remember the API for
saving or exporting each type of object: df.to_csv(...)
, plt.savefig(...)
,
np.save(...)
, etc.
There is also the unified get()
for loading items.
We can retrieve the value of an item using get()
:
my_project.get("accuracy")
0.972027972027972
We can also retrieve the storage date and our annotation:
from pprint import pprint
accuracies = my_project.get("accuracy", metadata="all")
pprint(accuracies)
{'date': '2025-04-24T15:26:56.702053+00:00',
'note': 'random forest, float',
'value': 0.972027972027972}
See also
For more information about the functionalities and the different types
of items that we can store in a skore Project
,
see Working with projects.
Tracking the history of items#
Now, for the gradient boosting model, let us store the same kinds of items
using the exact same keys, namely accuracy
and estimator_report
:
my_project.put(
"accuracy", gb_report.metrics.accuracy(), note="gradient boosting, float"
)
my_project.put(
"estimator_report", gb_report, note="gradient boosting, skore estimator report"
)
Skore does not overwrite items with the same name (key): instead, it stores their history so that nothing is lost:
accuracies_history = my_project.get("accuracy", metadata="all", version="all")
pprint(accuracies_history)
[{'date': '2025-04-24T15:26:56.702053+00:00',
'note': 'random forest, float',
'value': 0.972027972027972},
{'date': '2025-04-24T15:26:56.831418+00:00',
'note': 'gradient boosting, float',
'value': 0.965034965034965}]
Note
These tracking functionalities are very useful to:
never lose some key machine learning metrics,
and observe the evolution over time / runs.
See also
For more functionalities about the tracking of items using their history, see Tracking items.
Stay tuned!
These are only the initial features: skore is a work in progress and aims to be an end-to-end library for data scientists.
Feedbacks are welcome: please feel free to join our Discord or create an issue.
Total running time of the script: (0 minutes 4.107 seconds)