Store and retrieve reports on Skore Hub#

This example shows how to use Project in hub mode: store reports remotely and inspect them. A key point is that summarize() returns a Summary object that holds the metadata and metrics of every report. In Jupyter it renders as an interactive table with different views where you can filter and pick rows to build a query string; outside of Jupyter you can work with the underlying pandas.DataFrame via its frame() method.

Examples#

To run this example and push in your own Skore Hub workspace and project, you can run this example with the following command:

WORKSPACE=<workspace> PROJECT=<project> python plot_skore_hub_project.py

In this gallery, we are going to push the different reports into a public workspace.

skore can communicate with Skore Hub which serves two main purposes: storing and retrieving any reports that you created and a user-friendly interface for you to explore and compare models.

First, we need to login to Skore Hub such that later we can push our reports to it.

from skore import login

login(mode="hub")
╭───────────────────────────────── Login to Skore Hub ─────────────────────────────────╮
│                                                                                      │
│                        Successfully logged in, using API key.                        │
│                                                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────╯

To illustrate the integration with Skore Hub, we use a binary classification task where the goal is to predict whether a patient has a tumor or not.

import numpy as np
import skrub
from sklearn.datasets import load_breast_cancer

X, y = load_breast_cancer(return_X_y=True, as_frame=True)
labels = np.array(["no tumor", "tumor"], dtype=object)
y = labels[y]
skrub.TableReport(X)

Please enable javascript

The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").



Store reports on Skore Hub#

On this problem, we use a logistic regression classifier with skrub’s tabular_pipeline() to preprocess the data if needed.

To send several reports to Skore Hub, we send models with different regularization parameters.

from numpy import logspace
from sklearn.linear_model import LogisticRegression
from skore import Project, evaluate

project = Project(f"{WORKSPACE}/{PROJECT}", mode="hub")

for regularization in logspace(-3, 3, 5):
    project.put(
        f"lr-regularization-{regularization:.1e}",
        evaluate(
            skrub.tabular_pipeline(LogisticRegression(C=regularization)),
            X,
            y,
            splitter=0.2,
            pos_label="tumor",
        ),
    )
  Putting lr-regularization-1.0e-03 0:00:20
Consult your report at
https://skore.probabl.ai/skore/example-skore-hub-project-dev/estimators/27269


  Putting lr-regularization-3.2e-02 0:00:19
Consult your report at
https://skore.probabl.ai/skore/example-skore-hub-project-dev/estimators/27270


  Putting lr-regularization-1.0e+00 0:00:19
Consult your report at
https://skore.probabl.ai/skore/example-skore-hub-project-dev/estimators/27271


  Putting lr-regularization-3.2e+01 0:00:19
Consult your report at
https://skore.probabl.ai/skore/example-skore-hub-project-dev/estimators/27272


  Putting lr-regularization-1.0e+03 0:00:19
Consult your report at
https://skore.probabl.ai/skore/example-skore-hub-project-dev/estimators/27273

Retrieve report stored on Skore Hub#

Retrieving a report on Skore Hub is similar to retrieving a report in local mode.

summarize() returns a Summary object. In a Jupyter environment it renders as an interactive table where you can filter rows and pick reports across the different views; the selection produces a query string ready to pass to query().



To work with the underlying table (e.g. in scripts or when you prefer a pandas.DataFrame), use the frame() method:

key date learner report_type dataset log_loss roc_auc fit_time predict_time
id
0 skore:report:estimator:27269 lr-regularization-1.0e-03 2026-06-16 08:11:34.030291+00:00 LogisticRegression estimator 7887e234e3f622242e475e3da0cb5837 0.406397 0.987298 0.085422 0.039191
1 skore:report:estimator:27270 lr-regularization-3.2e-02 2026-06-16 08:11:53.762335+00:00 LogisticRegression estimator 7887e234e3f622242e475e3da0cb5837 0.137499 0.995237 0.072827 0.039488
2 skore:report:estimator:27271 lr-regularization-1.0e+00 2026-06-16 08:12:13.392676+00:00 LogisticRegression estimator 7887e234e3f622242e475e3da0cb5837 0.080457 0.995554 0.073452 0.038586
3 skore:report:estimator:27272 lr-regularization-3.2e+01 2026-06-16 08:12:33.052789+00:00 LogisticRegression estimator 7887e234e3f622242e475e3da0cb5837 0.127250 0.992061 0.073937 0.039779
4 skore:report:estimator:27273 lr-regularization-1.0e+03 2026-06-16 08:12:52.743605+00:00 LogisticRegression estimator 7887e234e3f622242e475e3da0cb5837 0.245632 0.990314 0.082952 0.039972


Basically, our summary contains metadata related to various information that we need to quickly help filtering the reports.

summary.frame().info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 5 entries, (0, 'skore:report:estimator:27269') to (4, 'skore:report:estimator:27273')
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype
---  ------        --------------  -----
 0   key           5 non-null      string
 1   date          5 non-null      datetime64[ns, UTC]
 2   learner       5 non-null      category
 3   report_type   5 non-null      string
 4   dataset       5 non-null      string
 5   log_loss      5 non-null      float64
 6   roc_auc       5 non-null      float64
 7   fit_time      5 non-null      float64
 8   predict_time  5 non-null      float64
dtypes: category(1), datetime64[ns, UTC](1), float64(4), string(3)
memory usage: 854.0+ bytes

Filter reports by metric (e.g. keep only those above a given accuracy) and work with the result as a table.

summary.query("log_loss < 0.2").frame()["key"].tolist()
['lr-regularization-3.2e-02', 'lr-regularization-1.0e+00', 'lr-regularization-3.2e+01']

Use compare() to load the corresponding reports from the project (optionally after filtering the summary). Passing return_as="report" returns a ComparisonReport built from those reports.

reports = summary.query("log_loss < 0.2").compare(return_as="report")
len(reports.reports_)
3

Since we got a ComparisonReport, we can use the metrics accessor to summarize the metrics across the reports.

reports.metrics.summarize().frame()
Estimator LogisticRegression_1 LogisticRegression_2 LogisticRegression_3
Metric
Score 0.956140 0.964912 0.947368
Accuracy 0.956140 0.964912 0.947368
Precision 0.930556 0.970149 0.955224
Recall 1.000000 0.970149 0.955224
ROC AUC 0.995237 0.995554 0.992061
Log loss 0.137499 0.080457 0.127250
Brier score 0.035253 0.025149 0.029948
Fit time (s) 0.072827 0.073452 0.073937
Predict time (s) 0.040208 0.039981 0.040043


_ = reports.metrics.roc().plot(subplot_by=None)
ROC Curve Positive label: tumor Data source: Test set

Conclusion#

Skore Hub provides a user-friendly interface for you to explore and compare models. You can easily store reports created using Skore.

Total running time of the script: (1 minutes 50.426 seconds)

Gallery generated by Sphinx-Gallery