Structuring data science experiments#

When experimenting with data science, many tasks are repetitive across experiments or projects. While the data exploration, transformation, and model architecture design might require innovation, the evaluation, inspection, and comparison of predictive models are usually repetitive. However, these tasks require developing substantial amounts of code and storing useful information. In itself, it is a challenge to get it right.

skore provides a set of reporters that provide the following features:

Provide only methods applicable to the task at hand
Cache intermediate results to speed up exploring predictive models
Produce data science artifacts with the least amount of code

Below, we present the different types of reporters that skore provides.

Reporter for a single estimator#

EstimatorReport is the core reporter in skore. It is designed to take a scikit-learn compatible estimator and some training and test data. The training data is optional if the estimator is already fitted. The parameter fit in the constructor gives full control over the fitting process. Omitting part of the data reduces the number of available methods when inspecting the model. For instance, you cannot inspect the metrics of the model on the test data if you do not provide the test data.

Model evaluation#

EstimatorReport.metrics is the entry point that provides methods to evaluate the statistical and performance metrics of the predictive model. This accessor provides two types of methods: (i) methods that return some metrics and (ii) methods that return a skore Display object.

Before diving into the details of these methods, we first discuss the parameters they share. data_source is a parameter that specifies the data to use to compute the metrics. Set it to train or test to rely on the data provided to the constructor. In addition, set data_source to X_y to pass a new dataset using the parameters X and y. This is useful when you want to compare different models on a new left-out dataset.

There are individual methods to compute each metric specific to the problem at hand. They return usual python objects such as floats, integers, or dictionaries.

The second type of methods provided by EstimatorReport.metrics are methods that return a Display object. They have a common API as well. They expose three methods: (i) plot that plots graphically the information contained in the display, (ii) set_style that sets some graphical settings instead of passing them to the plot method at each call. (iii) frame that returns a pandas.DataFrame with the information contained in the display.

We provide the EstimatorReport.metrics.summarize method that aggregates metrics in a single dataframe, available through a Display. By default, a set of metrics is computed based on the type of target variable (e.g. classification or regression). Nevertheless, you can specify the metrics you want to compute thanks to the scoring parameter. We accept different types: (i) some strings that correspond to scikit-learn scorer names or a built-in skore metric name, (ii) a callable or a (iii) scikit-learn scorer constructed with sklearn.metrics.make_scorer().

Refer to the Visualization via the skore display API section for more details regarding the skore display API. Refer to the Metrics section for more details on all the available metrics in skore.

Caching mechanism#

EstimatorReport comes together with a caching mechanism that stores intermediate information that is expensive to compute such as predictions. It efficiently re-uses this information when recomputing the same metric or a metric requiring the same intermediate information.

We expose three methods to interact with the cache:

EstimatorReport.cache_predictions() to cache the predictions of the estimator without awaiting the computation when calling the evaluation metrics.
EstimatorReport.clear_cache() to clear the cache.
EstimatorReport.get_predictions() to get the predictions from the cache or compute them if they are not in the cache.

Note

The current implementation of the caching mechanism happens in-memory. It is therefore not persisted between sessions, apart from loading an EstimatorReport from a Project. Refer to the following section Storing data science artifacts for more details.

Refer to the example entitled Cache mechanism to get a detailed view of the caching mechanism.

Cross-validation estimator#

CrossValidationReport has a similar API to EstimatorReport. The main difference is in the initialization. It accepts an estimator, a dataset (i.e. X and y) and a cross-validation strategy. Internally, the dataset is split according to the cross-validation strategy and an estimator report is created for each split. Therefore, a CrossValidationReport is a collection of EstimatorReport instances, available through the CrossValidationReport.estimator_reports_ attribute.

For metrics and displays, the same API is exposed with an extra parameter, aggregate, to aggregate the metrics across the splits.

The CrossValidationReport also comes with a caching mechanism by leveraging the EstimatorReport caching mechanism and exposes the same methods.

Refer to the Metrics section for more details on the metrics available in skore for cross-validation.

Comparison report#

CrossValidationReport is a great tool to compare the performance of the same predictive model architecture with a variation of the dataset. However, it is not intended to compare different families of predictive models. For this purpose, use ComparisonReport.

ComparisonReport takes a list (or a dictionary) of EstimatorReport or CrossValidationReport instances. It then provides methods to compare the performance of the different models.

The caching mechanism is also available and exposes the same methods.

Refer to the Metrics section for more details on the metrics available in skore for comparison.