ComparisonReport.create_estimator_report#

ComparisonReport.create_estimator_report(*, report_key, X_test=None, y_test=None, test_data=None, concatenate_train_and_test=False)[source]

Create an estimator report from one of the reports in the comparison.

This method creates a new EstimatorReport with the same estimator and the same data as the chosen report. It is useful to evaluate and deploy a model that was deemed optimal during the comparison. Provide a held out test set to properly evaluate the performance of the model.

Parameters:

report_keystr

The key associated with the estimator to create a report for, as stored in the reports_ attribute of the ComparisonReport. List the available keys with reports_.keys().

X_test{array-like, sparse matrix} of shape (n_samples, n_features) or None

Testing data when the chosen report uses tabular scikit-learn X/y. Must be provided together with y_test unless only test_data is used for a skrub-backed report.

y_testarray-like of shape (n_samples,) or (n_samples, n_outputs) or None

Testing target for tabular scikit-learn data.

test_datadict or None

When the chosen report is skrub-backed, bindings for variables contained in the DataOp (e.g. {"X": X_df, ...}) for the held-out evaluation set. Required in that case; X_test and y_test must then be omitted.

concatenate_train_and_testbool, default=False

When the chosen entry is an EstimatorReport backed by tabular scikit-learn data, controls whether to concatenate that report’s train and test splits into a single training set before fitting on the held-out X_test / y_test you provide. If False (default), the new report is fit on the report’s original X_train.

This option must be False if

report_key refers to a CrossValidationReport or
the estimator is a skrub SkrubLearner or
the estimator was built from a skrub DataOp.

Returns:

reportEstimatorReport: The estimator report.

Examples

>>> from sklearn.datasets import make_classification
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.linear_model import LogisticRegression
>>> from skore import train_test_split
>>> from skore import ComparisonReport, CrossValidationReport
>>> X, y = make_classification(random_state=42)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
>>> linear_report = CrossValidationReport(
...     LogisticRegression(random_state=42), X_train, y_train
... )
>>> forest_report = CrossValidationReport(
...     RandomForestClassifier(random_state=42), X_train, y_train
... )
>>> comparison_report = ComparisonReport([linear_report, forest_report])
>>> summary = comparison_report.metrics.summarize().frame()

>>> # Notice that e.g. the RandomForestClassifier performs best
>>> final_report = comparison_report.create_estimator_report(
...     report_key="RandomForestClassifier", X_test=X_test, y_test=y_test
... )
>>> final_report.metrics.summarize().frame()