evaluate#

skore.evaluate(estimator, X=None, y=None, data=None, *, splitter=0.2, pos_label=None, n_jobs=None)[source]#

Evaluate one or more estimators on the given data.

Passing several estimators provides a report to compare them, while the splitter parameter controls whether a train-test split or cross-validation is used.

Parameters:

estimatorestimator object, list of estimators, or dict of estimators

The estimator to evaluate of several estimators to compare. An estimator can be one of the following:

a scikit-learn compatible estimator as a BaseEstimator;
a skrub DataOp to preprocess the data;
a skrub SkrubLearner extracted from a DataOp by calling make_learner().

Xarray-like, list of array-like, dict of array-like, or None

Feature matrix. When estimator is a list, X can be a list of feature matrices (one per estimator) to compare models with different preprocessing pipelines. When estimator is a dict, X can be a dict with the same keys, mapping each name to its feature matrix, or a single matrix broadcast to every estimator. When comparing prefit estimators and no test features are needed, pass X=None. A list of X is not supported when estimator is a dict; use a dict aligned on names or a single matrix.

yarray-like of shape (n_samples,), or None

Target vector.

datadict or None

When estimator is a skrub SkrubLearner, bindings for variables contained in the DataOp that was used to create this learner (e.g. {"X": X_df, "other_table": df, ...}).

splitterfloat, int, str, or cross-validation object, default=0.2

Determines how the data is split:

float: perform a single train-test split where the data is shuffled before splitting with a fixed seed (random_state=0) for reproducibility. Pass a TrainTestSplit instance for more control over the splitting parameters.
"prefit": the estimator is assumed to be already fitted; X and y are used as the test set.
int: number of folds for cross-validation (passed to CrossValidationReport).
cross-validation splitter (e.g. KFold, StratifiedKFold): passed directly to CrossValidationReport.

pos_labelint, float, bool or str, default=None

The positive class label for binary classification metrics. Forwarded to the underlying report.

n_jobsint or None, default=None

Number of jobs for parallel execution. Forwarded to CrossValidationReport or ComparisonReport.

Returns:

reportEstimatorReport, CrossValidationReport or ComparisonReport: The report corresponding to the evaluation strategy.

See also

compare(): Compare already evaluated reports.
EstimatorReport: Report for a fitted estimator on a test set.
CrossValidationReport: Report for cross-validation of an estimator.
ComparisonReport: Report comparing several evaluated models.

Examples

>>> from sklearn.datasets import make_classification
>>> from sklearn.linear_model import LogisticRegression
>>> from skore import evaluate
>>> X, y = make_classification(random_state=42)

Default 80/20 train-test split:

>>> report = evaluate(LogisticRegression(), X, y)

Cross-validation with 5 folds:

>>> report = evaluate(LogisticRegression(), X, y, splitter=5)

Evaluate a pre-fitted estimator:

>>> from sklearn.model_selection import train_test_split
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
>>> fitted_model = LogisticRegression().fit(X_train, y_train)
>>> report = evaluate(fitted_model, X_test, y_test, splitter="prefit")

Compare several named estimators:

>>> report = evaluate(
...     {"m1": LogisticRegression(), "m2": LogisticRegression(C=2.0)},
...     X,
...     y,
...     splitter=0.2,
... )
>>> list(report.reports_)
['m1', 'm2']

Gallery examples#

Skore: getting started

Using skore with scikit-learn compatible estimators

Store and retrieve reports on Skore Hub

Store and retrieve Skore reports in MLflow

Adapt skore to your use-case by adding your own metrics

EstimatorReport: Get insights from any scikit-learn estimator

Automatic detection of modelling issues

Adding custom checks

The skore API

Local skore Project

Simplified and structured experiment reporting

EstimatorReport: Inspecting your models with the feature importance