CrossValidationReport#
- class skore.CrossValidationReport(estimator, X=None, y=None, data=None, pos_label=None, splitter=None, n_jobs=None)[source]#
Report for cross-validation results.
Upon initialization,
CrossValidationReportwill cloneestimatoraccording tosplitterand fit the generated estimators. The fitting is done in parallel.Refer to the Cross-validation estimator section of the user guide for more details.
- Parameters:
- estimatorestimator object
Estimator to make the cross-validation report from.
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The data to fit. Can be for example a list, or an array.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
The target variable to try to predict in the case of supervised learning.
- pos_labelint, float, bool or str, default=None
For binary classification, the positive class to use for metrics and displays that need one. If
None, skore does not infer a default positive class. Binary metrics and displays that support it will expose all classes instead. This parameter is rejected for non-binary tasks.- splitterint, cross-validation generator or an iterable, default=5
Determines the cross-validation splitting strategy. Possible inputs for
splitterare:int, to specify the number of splits in a
(Stratified)KFold,a scikit-learn CV splitter,
An iterable yielding (train, test) splits as arrays of indices.
For int/None inputs, if the estimator is a classifier and
yis either binary or multiclass,StratifiedKFoldis used. In all other cases,KFoldis used. These splitters are instantiated withshuffle=Falseso the splits will be the same across calls.Refer to scikit-learn’s User Guide for the various cross-validation strategies that can be used here.
- n_jobsint, default=None
Number of jobs to run in parallel. Training the estimator and computing the score are parallelized over the cross-validation splits. When accessing some methods of the
CrossValidationReport, then_jobsparameter is used to parallelize the computation.Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors.
- Attributes:
- estimator_estimator object
The cloned or copied estimator.
- estimator_name_str
The name of the estimator.
- estimator_reports_list of EstimatorReport
The estimator reports for each split.
See also
skore.EstimatorReportReport for a fitted estimator.
skore.ComparisonReportReport of comparison between estimators.
Examples
>>> from sklearn.datasets import make_classification >>> from sklearn.linear_model import LogisticRegression >>> X, y = make_classification(random_state=42) >>> estimator = LogisticRegression() >>> from skore import CrossValidationReport >>> report = CrossValidationReport(estimator, X=X, y=y, splitter=2)
- cache_predictions()[source]#
Cache the predictions for sub-estimators reports.
Examples
>>> from sklearn.datasets import load_breast_cancer >>> from sklearn.linear_model import LogisticRegression >>> from skore import CrossValidationReport >>> X, y = load_breast_cancer(return_X_y=True) >>> classifier = LogisticRegression(max_iter=10_000) >>> report = CrossValidationReport(classifier, X=X, y=y, splitter=2) >>> report.cache_predictions() >>> report.estimator_reports_[0]._cache {...}
- clear_cache()[source]#
Clear the cache.
Examples
>>> from sklearn.datasets import load_breast_cancer >>> from sklearn.linear_model import LogisticRegression >>> from skore import CrossValidationReport >>> X, y = load_breast_cancer(return_X_y=True) >>> classifier = LogisticRegression(max_iter=10_000) >>> report = CrossValidationReport(classifier, X=X, y=y, splitter=2) >>> report.cache_predictions() >>> report.clear_cache() >>> report.estimator_reports_[0]._cache {}
- create_estimator_report(*, X_test=None, y_test=None, test_data=None)[source]#
Create an estimator report from the cross-validation report.
This method creates a new
EstimatorReportwith the same estimator and the same data as the cross-validation report. It is useful to evaluate and deploy a model that was deemed optimal with cross-validation. Provide a held out test set to properly evaluate the performance of the model.- Parameters:
- X_test{array-like, sparse matrix} of shape (n_samples, n_features)
Testing data. It should have the same structure as the training data.
- y_testarray-like of shape (n_samples,) or (n_samples, n_outputs)
Testing target.
- Returns:
- report
EstimatorReport The estimator report.
- report
Examples
>>> from sklearn.datasets import make_classification >>> from sklearn.ensemble import RandomForestClassifier >>> from sklearn.linear_model import LogisticRegression >>> from skore import train_test_split >>> from skore import ComparisonReport, CrossValidationReport >>> X, y = make_classification(random_state=42) >>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) >>> linear_report = CrossValidationReport( ... LogisticRegression(random_state=42), X_train, y_train ... ) >>> forest_report = CrossValidationReport( ... RandomForestClassifier(random_state=42), X_train, y_train ... ) >>> comparison_report = ComparisonReport([linear_report, forest_report]) >>> summary = comparison_report.metrics.summarize().frame()
>>> # Notice that e.g. the RandomForestClassifier performs best >>> final_report = forest_report.create_estimator_report( ... X_test=X_test, y_test=y_test ... ) >>> final_report.metrics.summarize().frame()
- diagnose(*, ignore=None)[source]#
Run diagnostics and return a summary of detected issues.
Diagnostics check for common modeling problems such as overfitting and underfitting. Codes can be muted per-call via
ignoreor globally via)().- Parameters:
- ignorelist of str or tuple of str or None, default=None
Diagnostic codes to exclude from the results, e.g.
["SKD001"].
- Returns:
- DiagnosticsDisplay
A display object with an HTML representation, with the full diagnostic results accessible via the
frame()method.
Examples
>>> from skore import evaluate >>> from sklearn.dummy import DummyClassifier >>> from sklearn.datasets import make_classification >>> X, y = make_classification(random_state=42) >>> report = evaluate(DummyClassifier(), X, y, splitter=0.2) >>> report.diagnose() Diagnostics: 1 issue(s) detected, 2 check(s) ran, 0 ignored. - [SKD002] Potential underfitting. Train/test scores are on par and not significantly better than the dummy baseline for 8/8 comparable metrics. Read our documentation for more details: https://docs.skore.probabl.ai/dev/user_guide/diagnostics.html#skd002-underfitting. Mute with `ignore=['SKD002']`. >>> report.diagnose(ignore=["SKD002"]) Diagnostics: 0 issue(s) detected, 1 check(s) ran, 1 ignored. - No issues were detected in your report!
- get_predictions(*, data_source, response_method='predict')[source]#
Get estimator’s predictions.
This method has the advantage to reload from the cache if the predictions were already computed in a previous call.
- Parameters:
- data_source{“test”, “train”}, default=”test”
The data source to use.
“test” : use the test set provided when creating the report.
“train” : use the train set provided when creating the report.
- response_method{“predict”, “predict_proba”, “decision_function”}, default=”predict”
The response method to use to get the predictions.
- Returns:
- list of np.ndarray of shape (n_samples,) or (n_samples, n_classes)
The predictions for each cross-validation split.
- Raises:
- ValueError
If the data source is invalid.
Examples
>>> from sklearn.datasets import make_classification >>> from sklearn.linear_model import LogisticRegression >>> X, y = make_classification(random_state=42) >>> estimator = LogisticRegression() >>> from skore import CrossValidationReport >>> report = CrossValidationReport(estimator, X=X, y=y, splitter=2) >>> predictions = report.get_predictions(data_source="test") >>> print([split_predictions.shape for split_predictions in predictions]) [(50,), (50,)]