CrossValidationReport.data.analyze#
- CrossValidationReport.data.analyze(with_y=True, subsample=None, subsample_strategy='head', seed=None)[source]#
- Plot dataset statistics. - Parameters:
- with_ybool, default=True
- Whether to include the target variable in the analysis. If True, the target variable is concatenated horizontally to the features. 
- subsampleint, default=None
- The number of points to subsample the dataframe hold by the display, using the strategy set by - subsample_strategy. It must be a strictly positive integer. If- None, no subsampling is applied.
- subsample_strategy{‘head’, ‘random’}, default=’head’,
- The strategy used to subsample the dataframe hold by the display. It only has an effect when - subsampleis not None.- If - 'head': subsample by taking the- subsamplefirst points of the dataframe, similar to Pandas:- df.head(n).
- If - 'random': randomly subsample the dataframe by using a uniform distribution. The random seed is controlled by- random_state.
 
- seedint, default=None
- The random seed to use when randomly subsampling. It only has an effect when - subsampleis not None and- subsample_strategy='random'.
 
- Returns:
- TableReportDisplay
- A display object containing the dataset statistics and plots. 
 
 - Examples - >>> from sklearn.datasets import load_breast_cancer >>> from sklearn.linear_model import LogisticRegression >>> from skore import CrossValidationReport >>> X, y = load_breast_cancer(return_X_y=True) >>> classifier = LogisticRegression(max_iter=10_000) >>> report = CrossValidationReport(classifier, X=X, y=y, pos_label=1) >>> report.data.analyze().frame()