Note
Go to the end to download the full example code.
Store and retrieve Skore reports in MLflow#
The primilarly goal of skore is to create data science artifacts in the form of
structured reports. Those reports can easily be used programmatically via the Python
API. A subsequent aim is to store those reports that you create during your experiment
cycle in a way that it is easy to retrieve them later on.
Skore provides two natives ways to store reports: locally or on Skore Hub. Skore Hub provides additional interactivity features for you to explore, compare and share visual insights.
In addition, Skore also provides an MLflow integration to store the content of
reports directly as MLflow artifacts. This example shows how to persist reports in
MLflow using Project in mode="mlflow": log reports as MLflow runs
and inspect them.
To run this example against your own MLflow tracking server, use:
TRACKING_URI=<tracking_uri> PROJECT=<project> python plot_skore_mlflow_project.py
To try it locally, start an MLflow server with uvx mlflow server and set
TRACKING_URI=http://127.0.0.1:5000. For more setup details, see the
MLflow quickstart.
Create a Skore report#
First, we start by creating a Skore report by evaluating a logistic regression model on the iris dataset using some cross-validation.
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from skore import evaluate
X, y = load_iris(return_X_y=True, as_frame=True)
estimator = make_pipeline(StandardScaler(), LogisticRegression())
report = evaluate(estimator, X, y, splitter=5)
Store the Skore reports as MLflow artifacts#
Now, we will store the different items of the Skore report as MLflow artifacts.
For this matter, you need to create a Project in mode="mlflow"
and pass the information regarding the MLflow tracking server.
import io
# MLflow/Alembic emits verbose DB initialization logs; silence them so the
# example page focuses on skore usage rather than backend startup details.
with redirect_stdout(io.StringIO()), redirect_stderr(io.StringIO()):
# This creates an MLflow experiment with name `PROJECT`:
project = Project(
PROJECT,
mode="mlflow",
tracking_uri=TRACKING_URI,
)
Once the project created, the same API used to store a report locally or on Skore Hub applies.
project.put("logistic-regression", report)
Once that you stored the report, the artifacts will be available on the MLflow
tracking server at the following URL:
http://<TRACKING_URI>/#/<PROJECT>/1/runs/<RUN_ID>/artifacts
To find the run ID attributed by MLflow, you can check the section below.
Retrieve the Skore report from MLflow tracking server#
Like for the other modes (local and Skore Hub), you can access what is stored in the
project via the summarize() method.
import pandas as pd
summary = project.summarize()
pandas_summary = pd.DataFrame(summary).reset_index()
pandas_summary[["id", "key", "report_type", "learner", "ml_task", "dataset"]]
Then, you can retrieve a Skore report using the "id" column:
(run_id,) = pandas_summary["id"]
loaded_report = project.get(run_id)
loaded_report.metrics.summarize().frame()
You can directly use MLflow to access information stored in the MLflow tracking server.
import mlflow
mlflow_run = mlflow.get_run(run_id)
mlflow_run.data.metrics
{'accuracy': 0.96, 'accuracy_std': 0.043461349368017654, 'log_loss': 0.15035720436851405, 'log_loss_std': 0.040372774208441146, 'recall': 0.96, 'recall_std': 0.043461349368017654, 'precision': 0.96, 'precision_std': 0.043461349368017654, 'roc_auc': 0.9977777777777778, 'roc_auc_std': 0.0025153847605937046, 'fit_time': 0.006280393800011552, 'predict_time': 0.001042809599971406}
Conclusion#
Skore offers native integrations locally and with Skore Hub. However, if you are
already using MLflow, you can use the mode="mlflow" option to store reports as
MLflow artifacts directly inside the tracking server.
However, you will not benefit from the interactive user interface provided by Skore Hub.
Total running time of the script: (0 minutes 24.562 seconds)