ComparisonReport#
- class skore.ComparisonReport(reports, *, n_jobs=None)[source]#
- Report for comparing reports. - This object can be used to compare several - skore.EstimatorReportinstances, or several- CrossValidationReportinstances.- Refer to the Comparison report section of the user guide for more details. - Caution - Reports passed to - ComparisonReportare not copied. If you pass a report to- ComparisonReport, and then modify the report outside later, it will affect the report stored inside the- ComparisonReportas well, which can lead to inconsistent results. For this reason, modifying reports after creation is strongly discouraged.- Parameters:
- reportslist of reports or dict
- Reports to compare. If a dict, keys will be used to label the estimators; if a list, the labels are computed from the estimator class names. Expects at least two reports to compare, with the same test target. 
- n_jobsint, default=None
- Number of jobs to run in parallel. Training the estimators and computing the scores are parallelized. When accessing some methods of the - ComparisonReport, the- n_jobsparameter is used to parallelize the computation.- Nonemeans 1 unless in a- joblib.parallel_backendcontext.- -1means using all processors.
 
- Attributes:
- reports_dict mapping names to reports
- The compared reports. 
 
 - See also - skore.EstimatorReport
- Report for a fitted estimator. 
- skore.CrossValidationReport
- Report for the cross-validation of an estimator. 
 - Examples - >>> from sklearn.datasets import make_classification >>> from skore import train_test_split >>> from sklearn.linear_model import LogisticRegression >>> from skore import ComparisonReport, EstimatorReport >>> X, y = make_classification(random_state=42) >>> split_data = train_test_split(X=X, y=y, random_state=42, as_dict=True) >>> estimator_1 = LogisticRegression() >>> estimator_report_1 = EstimatorReport(estimator_1, **split_data) >>> estimator_2 = LogisticRegression(C=2) # Different regularization >>> estimator_report_2 = EstimatorReport(estimator_2, **split_data) >>> report = ComparisonReport([estimator_report_1, estimator_report_2]) >>> report.reports_ {'LogisticRegression_1': ..., 'LogisticRegression_2': ...} >>> report = ComparisonReport( ... {"model1": estimator_report_1, "model2": estimator_report_2} ... ) >>> report.reports_ {'model1': ..., 'model2': ...} - >>> from sklearn.datasets import make_classification >>> from sklearn.linear_model import LogisticRegression >>> from skore import ComparisonReport, CrossValidationReport >>> X, y = make_classification(random_state=42) >>> estimator_1 = LogisticRegression() >>> estimator_2 = LogisticRegression(C=2) # Different regularization >>> report_1 = CrossValidationReport(estimator_1, X, y) >>> report_2 = CrossValidationReport(estimator_2, X, y) >>> report = ComparisonReport([report_1, report_2]) >>> report = ComparisonReport({"model1": report_1, "model2": report_2}) - cache_predictions(response_methods='auto', n_jobs=None)[source]#
- Cache the predictions for sub-estimators reports. - Parameters:
- response_methods{“auto”, “predict”, “predict_proba”, “decision_function”}, default=”auto
- The methods to use to compute the predictions. 
- n_jobsint, default=None
- The number of jobs to run in parallel. If - None, we use the- n_jobsparameter when initializing the report.
 
 - Examples - >>> from sklearn.datasets import make_classification >>> from sklearn.linear_model import LogisticRegression >>> from skore import train_test_split >>> from skore import ComparisonReport, EstimatorReport >>> X, y = make_classification(random_state=42) >>> split_data = train_test_split(X=X, y=y, random_state=42, as_dict=True) >>> estimator_1 = LogisticRegression() >>> estimator_report_1 = EstimatorReport(estimator_1, **split_data) >>> estimator_2 = LogisticRegression(C=2) # Different regularization >>> estimator_report_2 = EstimatorReport(estimator_2, **split_data) >>> report = ComparisonReport([estimator_report_1, estimator_report_2]) >>> report.cache_predictions() >>> report._cache {...} 
 - clear_cache()[source]#
- Clear the cache. - Examples - >>> from sklearn.datasets import make_classification >>> from sklearn.linear_model import LogisticRegression >>> from skore import train_test_split >>> from skore import ComparisonReport, EstimatorReport >>> X, y = make_classification(random_state=42) >>> split_data = train_test_split(X=X, y=y, random_state=42, as_dict=True) >>> estimator_1 = LogisticRegression() >>> estimator_report_1 = EstimatorReport(estimator_1, **split_data) >>> estimator_2 = LogisticRegression(C=2) # Different regularization >>> estimator_report_2 = EstimatorReport(estimator_2, **split_data) >>> report = ComparisonReport([estimator_report_1, estimator_report_2]) >>> report.cache_predictions() >>> report.clear_cache() >>> report._cache {} 
 - get_predictions(*, data_source, response_method='predict', X=None, pos_label=<DEFAULT>)[source]#
- Get predictions from the underlying reports. - This method has the advantage to reload from the cache if the predictions were already computed in a previous call. - Parameters:
- data_source{“test”, “train”, “X_y”}, default=”test”
- The data source to use. - “test” : use the test set provided when creating the report. 
- “train” : use the train set provided when creating the report. 
- “X_y” : use the provided - Xand- yto compute the metric.
 
- response_method{“predict”, “predict_proba”, “decision_function”}, default=”predict”
- The response method to use to get the predictions. 
- Xarray-like of shape (n_samples, n_features), optional
- When - data_sourceis “X_y”, the input features on which to compute the response method.
- pos_labelint, float, bool, str or None, default=_DEFAULT
- The label to consider as the positive class when computing predictions in binary classification cases. By default, the positive class is set to the one provided when creating the report. If - None,- estimator_.classes_[1]is used as positive label.- When - pos_labelis equal to- estimator_.classes_[0], it will be equivalent to- estimator_.predict_proba(X)[:, 0]for- response_method="predict_proba"and- -estimator_.decision_function(X)for- response_method="decision_function".
 
- Returns:
- list of np.ndarray of shape (n_samples,) or (n_samples, n_classes) or list of such lists
- The predictions for each - EstimatorReportor- CrossValidationReport.
 
- Raises:
- ValueError
- If the data source is invalid. 
 
 - Examples - >>> from sklearn.datasets import make_classification >>> from skore import train_test_split >>> from sklearn.linear_model import LogisticRegression >>> from skore import ComparisonReport, EstimatorReport >>> X, y = make_classification(random_state=42) >>> split_data = train_test_split(X=X, y=y, random_state=42, as_dict=True) >>> estimator_1 = LogisticRegression() >>> estimator_report_1 = EstimatorReport(estimator_1, **split_data) >>> estimator_2 = LogisticRegression(C=2) # Different regularization >>> estimator_report_2 = EstimatorReport(estimator_2, **split_data) >>> report = ComparisonReport([estimator_report_1, estimator_report_2]) >>> report.cache_predictions() >>> predictions = report.get_predictions(data_source="test") >>> print([split_predictions.shape for split_predictions in predictions]) [(25,), (25,)] 
 
Gallery examples#
 
EstimatorReport: Inspecting your models with the feature importance
 
