Adapt skore to your use-case by adding your own metrics#

By default, summarize() reports a curated set of metrics for your ML task. In practice you often need domain-specific scores: a business cost function, a custom fairness measure, an F-beta with a particular beta, etc.

This example walks through how to register such metrics with add() so they are computed and displayed alongside the built-in ones.

Setting up a classification problem#

import skore
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression

X, y = load_breast_cancer(return_X_y=True)

We create an EstimatorReport through evaluate() using a simple train/test split. pos_label=1 marks the malignant class as the positive class.

report = skore.evaluate(
    LogisticRegression(max_iter=10_000), X, y, pos_label=1, splitter=0.2
)

Let’s look at the default metrics:

report.metrics.summarize().frame()
LogisticRegression
Metric
Accuracy 0.947368
Precision 0.984127
Recall 0.925373
ROC AUC 0.993649
Log loss 0.110247
Brier score 0.036154
Fit time (s) 0.166261
Predict time (s) 0.000093


Adding a plain callable#

Any function with the signature (estimator, X, y, **kwargs) -> score can be registered with add(). The function name is used as the metric name by default. If your metric can be expressed as a callable with the signature (y_true, y_pred, **kwargs) -> score, then you can use sklearn’s make_scorer utility function to convert it.

from sklearn.metrics import make_scorer


def specificity(y_true, y_pred):
    """Proportion of true negatives among actual negatives."""
    tn = ((y_true == 0) & (y_pred == 0)).sum()
    fp = ((y_true == 0) & (y_pred == 1)).sum()
    return tn / (tn + fp)


report.metrics.add(make_scorer(specificity))
report.metrics.summarize().frame()
LogisticRegression
Metric
Specificity 0.978723
Accuracy 0.947368
Precision 0.984127
Recall 0.925373
ROC AUC 0.993649
Log loss 0.110247
Brier score 0.036154
Fit time (s) 0.166261
Predict time (s) 0.000093


specificity now appears alongside the built-in metrics.

Passing extra keyword arguments#

If your metric needs extra data at scoring time (e.g. sample-level amounts, a cost matrix, …), they can be passed as keyword arguments to add(); they will be forwarded to the metric function when it is computed. Alternatively, if the metric takes y_true and y_pred, the keyword arguments can be passed to make_scorer:

from sklearn.metrics import fbeta_score, make_scorer

f2_scorer = make_scorer(fbeta_score, beta=2, pos_label=1)
report.metrics.add(f2_scorer, name="f2")

report.metrics.summarize().frame()
LogisticRegression
Metric
F2 0.936556
Specificity 0.978723
Accuracy 0.947368
Precision 0.984127
Recall 0.925373
ROC AUC 0.993649
Log loss 0.110247
Brier score 0.036154
Fit time (s) 0.166261
Predict time (s) 0.000093


Cherry-picking metrics to display#

Once registered, custom metrics can be selected by name in summarize():

report.metrics.summarize(metric=["specificity", "f2"]).frame()
LogisticRegression
Metric
Specificity 0.978723
F2 0.936556


Selecting data_source="both" lets you compare train vs. test in one call:

report.metrics.summarize(metric=["specificity", "f2"], data_source="both").frame()
LogisticRegression (train) LogisticRegression (test)
Metric
Specificity 0.933333 0.978723
F2 0.975945 0.936556


Using a different response method#

By default, callables receive the output of estimator.predict(X). If your metric needs probabilities instead, set response_method="predict_proba".

import numpy as np


def mean_confidence(y_true, y_proba):
    """Average predicted probability assigned to the true class."""
    return np.where(y_true == 1, y_proba[:, 1], y_proba[:, 0]).mean()


report.metrics.add(make_scorer(mean_confidence, response_method="predict_proba"))

report.metrics.summarize(metric="mean_confidence").frame()
LogisticRegression
Metric
Mean Confidence 0.931087


Total running time of the script: (0 minutes 0.209 seconds)

Gallery generated by Sphinx-Gallery