Structuring data science experiments#
When experimenting with data science, many tasks are repetitive across experiments or projects. While the data exploration, transformation, and model architecture design might require innovation, the evaluation, inspection, and comparison of predictive models are usually repetitive. However, these tasks require developing substantial amounts of code and storing useful information. In itself, it is a challenge to get it right.
skore
provides a set of reporters that provide the following features:
Provide only methods applicable to the task at hand
Cache intermediate results to speed up exploring predictive models
Produce data science artifacts with the least amount of code
Below, we present the different types of reporters that skore
provides.
Reporter for a single estimator#
EstimatorReport
is the core reporter in skore
. It is designed to take a
scikit-learn compatible estimator and some training and test data. The training data is
optional if the estimator is already fitted. The parameter fit
in the constructor
gives full control over the fitting process. Omitting part of the data reduces the
number of available methods when inspecting the model. For instance, you cannot inspect
the metrics of the model on the test data if you do not provide the test data.
Model evaluation#
EstimatorReport.metrics
is the entry point that provides methods to evaluate the
statistical and performance metrics of the predictive model. This accessor provides two
types of methods: (i) methods that return some metrics and (ii) methods that return a
skore
Display
object.
Before diving into the details of these methods, we first discuss the parameters they
share. data_source
is a parameter that specifies the data to use to compute the
metrics. Set it to train
or test
to rely on the data provided to the constructor. In
addition, set data_source
to X_y
to pass a new dataset using the parameters X
and
y
. This is useful when you want to compare different models on a new left-out dataset.
There are individual methods to compute each metric specific to the problem at hand. They return usual python objects such as floats, integers, or dictionaries.
The second type of methods provided by EstimatorReport.metrics
are methods that
return a Display
object. They have a common API as well. They expose
three methods:
(i) plot
that plots graphically the information contained in the display,
(ii) set_style
that sets some graphical settings instead of passing them to the plot
method at each call.
(iii) frame
that returns a pandas.DataFrame
with the information contained in the
display.
We provide the EstimatorReport.metrics.summarize
method that aggregates metrics
in a single dataframe, available through a Display
. By default, a set of
metrics is computed based on the type of target variable (e.g. classification or
regression). Nevertheless, you can specify the metrics you want to compute thanks to the
scoring
parameter. We accept different types: (i) some strings that correspond to
scikit-learn scorer names or a built-in skore
metric name, (ii) a callable or a (iii)
scikit-learn scorer constructed with sklearn.metrics.make_scorer()
.
Refer to the Visualization via the skore display API section for more details regarding the skore
display
API. Refer to the Metrics section for more details on all the
available metrics in skore
.
Caching mechanism#
EstimatorReport
comes together with a caching mechanism that stores
intermediate information that is expensive to compute such as predictions. It
efficiently re-uses this information when recomputing the same metric or a metric
requiring the same intermediate information.
We expose three methods to interact with the cache:
EstimatorReport.cache_predictions()
to cache the predictions of the estimator without awaiting the computation when calling the evaluation metrics.EstimatorReport.clear_cache()
to clear the cache.EstimatorReport.get_predictions()
to get the predictions from the cache or compute them if they are not in the cache.
Note
The current implementation of the caching mechanism happens in-memory. It is
therefore not persisted between sessions, apart from loading an
EstimatorReport
from a Project
. Refer to the following
section Storing data science artifacts for more details.
Refer to the example entitled Cache mechanism to get a detailed view of the caching mechanism.
Cross-validation estimator#
CrossValidationReport
has a similar API to EstimatorReport
. The main
difference is in the initialization. It accepts an estimator, a dataset (i.e. X
and
y
) and a cross-validation strategy. Internally, the dataset is split according to the
cross-validation strategy and an estimator report is created for each split. Therefore,
a CrossValidationReport
is a collection of EstimatorReport
instances,
available through the CrossValidationReport.estimator_reports_
attribute.
For metrics and displays, the same API is exposed with an extra
parameter, aggregate
, to aggregate the metrics across the splits.
The CrossValidationReport
also comes with a caching mechanism by leveraging
the EstimatorReport
caching mechanism and exposes the same methods.
Refer to the Metrics section for more details on the
metrics available in skore
for cross-validation.
Comparison report#
CrossValidationReport
is a great tool to compare the performance of the same
predictive model architecture with a variation of the dataset. However, it is not
intended to compare different families of predictive models. For this purpose,
use ComparisonReport
.
ComparisonReport
takes a list (or a dictionary) of EstimatorReport
or
CrossValidationReport
instances. It then provides methods to compare the
performance of the different models.
The caching mechanism is also available and exposes the same methods.
Refer to the Metrics section for more details on the
metrics available in skore
for comparison.