analyze#
- CrossValidationReport.data.analyze(*, with_y=True, subsample=None, subsample_strategy='head', seed=None)[source]#
Plot dataset statistics.
- Parameters:
- with_ybool, default=True
Whether to include the target variable in the analysis. If True, the target variable is concatenated horizontally to the features.
- subsampleint, default=None
The number of points to subsample the dataframe hold by the display, using the strategy set by
subsample_strategy. It must be a strictly positive integer. IfNone, no subsampling is applied.- subsample_strategy{“head”, “random”}, default=”head”
The strategy used to subsample the dataframe hold by the display. It only has an effect when
subsampleis not None.If
'head': subsample by taking thesubsamplefirst points of the dataframe, similar to Pandas:df.head(n).If
"random": randomly subsample the dataframe by using a uniform distribution. The random seed is controlled byseed.
- seedint, default=None
The random seed to use when randomly subsampling. It only has an effect when
subsampleis not None andsubsample_strategy='random'.
- Returns:
TableReportDisplayA display object containing the dataset statistics and plots.
Examples
>>> from sklearn.datasets import load_breast_cancer >>> from sklearn.linear_model import LogisticRegression >>> from skore import evaluate >>> X, y = load_breast_cancer(return_X_y=True) >>> classifier = LogisticRegression() >>> report = evaluate(classifier, X, y, splitter=2) >>> report.data.analyze().frame()