TrainTestSplit#

class skore.TrainTestSplit(test_size=0.2, train_size=None, random_state=0, shuffle=True, stratify=None)[source]#

Single train-test split implementing the cross-validation protocol.

This splitter wraps sklearn.model_selection.train_test_split() and exposes split / get_n_splits so that it can be passed as the splitter argument of any skore or scikit-learn function.

Parameters:
test_sizefloat or int or None, default=0.2

Proportion (float) or absolute number (int) of samples for the test set. When None, the complement of train_size is used.

train_sizefloat or int or None, default=None

Proportion (float) or absolute number (int) of samples for the training set. When None, the complement of test_size is used.

random_stateint, RandomState instance or None, default=0

Controls the shuffling applied before splitting. Pass an int for reproducible output across multiple calls.

shufflebool, default=True

Whether to shuffle the data before splitting.

stratifyarray-like or None, default=None

If not None, data is split in a stratified fashion using this as the class labels.

Examples

>>> import numpy as np
>>> from skore import TrainTestSplit
>>> splitter = TrainTestSplit(test_size=0.3, random_state=0)
>>> X = np.arange(20).reshape(10, 2)
>>> for train, test in splitter.split(X):
...     train, test
(array([9, 1, 6, 7, 3, 0, 5]), array([2, 8, 4]))
get_n_splits(X=None, y=None, groups=None)[source]#

Return the number of splits (always 1).

split(X, y=None, groups=None)[source]#

Generate a single train-test split of indices.

Parameters:
Xarray-like

Training data used to determine the number of samples.

yarray-like or None, default=None

Ignored, present for API compatibility.

groupsarray-like or None, default=None

Ignored, present for API compatibility.

Yields:
trainndarray

The training set indices.

testndarray

The testing set indices.