.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/technical_details/plot_sklearn_api.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_technical_details_plot_sklearn_api.py: .. _example_sklearn_api: ===================================================================================== Using skore with scikit-learn compatible estimators ===================================================================================== This example shows how to use skore with scikit-learn compatible estimators. Any model that can be used with the scikit-learn API can be used with skore. Use :func:`~skore.evaluate` to create a report from any estimator that has a ``fit`` and ``predict`` method (or only ``predict`` if already fitted). .. note:: When computing the ROC AUC or ROC curve for a classification task, the estimator must have a ``predict_proba`` method. In this example, we showcase a gradient boosting model (`XGBoost `_) and a custom estimator. Note that this example is not exhaustive; many other scikit-learn compatible models can be used with skore: - More gradient boosting libraries like `LightGBM `_, and `CatBoost `_, - Deep learning frameworks such as `Keras `_ and `skorch `_ (a wrapper for `PyTorch `_). - Tabular foundation models such as `TabICL `_ and `TabPFN `_, - etc. .. GENERATED FROM PYTHON SOURCE LINES 42-47 Loading a binary classification dataset ======================================= We generate a synthetic binary classification dataset with only 1,000 samples to keep the computation time reasonable: .. GENERATED FROM PYTHON SOURCE LINES 49-54 .. code-block:: Python from sklearn.datasets import make_classification X, y = make_classification(n_samples=1_000, random_state=42) print(f"{X.shape = }") .. rst-class:: sphx-glr-script-out .. code-block:: none X.shape = (1000, 20) .. GENERATED FROM PYTHON SOURCE LINES 55-61 Gradient-boosted decision trees with XGBoost ============================================ For this binary classification task, we consider a gradient-boosted decision trees model from a library external to scikit-learn. One of the most popular is `XGBoost `_. .. GENERATED FROM PYTHON SOURCE LINES 63-71 .. code-block:: Python from skore import evaluate from xgboost import XGBClassifier xgb = XGBClassifier(n_estimators=50, max_depth=3, learning_rate=0.1, random_state=42) xgb_report = evaluate(xgb, X, y, splitter=0.2, pos_label=1) xgb_report.metrics.summarize().frame() .. raw:: html

	XGBClassifier
Metric
Accuracy	0.900000
Precision	0.989899
Recall	0.837607
ROC AUC	0.980126
Brier score	0.064364
Fit time (s)	0.059164
Predict time (s)	0.000817

.. GENERATED FROM PYTHON SOURCE LINES 72-73 We can easily get the summary of metrics, and also a ROC curve plot for example: .. GENERATED FROM PYTHON SOURCE LINES 75-77 .. code-block:: Python xgb_report.metrics.roc().plot() .. image-sg:: /auto_examples/technical_details/images/sphx_glr_plot_sklearn_api_001.png :alt: ROC Curve for XGBClassifier Positive label: 1 Data source: Test set :srcset: /auto_examples/technical_details/images/sphx_glr_plot_sklearn_api_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 78-79 We can also inspect our model: .. GENERATED FROM PYTHON SOURCE LINES 81-83 .. code-block:: Python xgb_report.inspection.permutation_importance().frame() .. raw:: html

	data_source	metric	feature	value_mean	value_std
0	test	accuracy	Feature #0	0.002	0.005701
1	test	accuracy	Feature #1	0.004	0.004183
2	test	accuracy	Feature #2	-0.005	0.003536
3	test	accuracy	Feature #3	0.000	0.000000
4	test	accuracy	Feature #4	-0.003	0.002739
5	test	accuracy	Feature #5	0.388	0.026599
6	test	accuracy	Feature #6	-0.001	0.002236
7	test	accuracy	Feature #7	0.000	0.000000
8	test	accuracy	Feature #8	0.000	0.000000
9	test	accuracy	Feature #9	0.000	0.000000
10	test	accuracy	Feature #10	0.000	0.000000
11	test	accuracy	Feature #11	-0.004	0.007416
12	test	accuracy	Feature #12	0.000	0.000000
13	test	accuracy	Feature #13	-0.002	0.002739
14	test	accuracy	Feature #14	0.019	0.011937
15	test	accuracy	Feature #15	-0.001	0.002236
16	test	accuracy	Feature #16	0.000	0.000000
17	test	accuracy	Feature #17	-0.003	0.004472
18	test	accuracy	Feature #18	-0.006	0.002236
19	test	accuracy	Feature #19	-0.001	0.004183

.. GENERATED FROM PYTHON SOURCE LINES 84-86 Custom model ------------ .. GENERATED FROM PYTHON SOURCE LINES 88-91 Let us use a custom estimator inspired from the `scikit-learn documentation `_, a nearest neighbor classifier: .. GENERATED FROM PYTHON SOURCE LINES 93-118 .. code-block:: Python import numpy as np from sklearn.base import BaseEstimator, ClassifierMixin from sklearn.metrics import euclidean_distances from sklearn.utils.multiclass import unique_labels from sklearn.utils.validation import check_is_fitted, validate_data class CustomClassifier(ClassifierMixin, BaseEstimator): def __init__(self): pass def fit(self, X, y): X, y = validate_data(self, X, y) self.classes_ = unique_labels(y) self.X_ = X self.y_ = y return self def predict(self, X): check_is_fitted(self) X = validate_data(self, X, reset=False) closest = np.argmin(euclidean_distances(X, self.X_), axis=1) return self.y_[closest] .. GENERATED FROM PYTHON SOURCE LINES 119-123 .. note:: The estimator above does not have a `predict_proba` method, therefore we cannot display its ROC curve as done previously. .. GENERATED FROM PYTHON SOURCE LINES 125-126 We can now use this model with skore: .. GENERATED FROM PYTHON SOURCE LINES 128-131 .. code-block:: Python custom_report = evaluate(CustomClassifier(), X, y, splitter=0.2, pos_label=1) custom_report.metrics.precision() .. rst-class:: sphx-glr-script-out .. code-block:: none 0.7864077669902912 .. GENERATED FROM PYTHON SOURCE LINES 132-152 Conclusion ========== This example demonstrates how skore can be used with scikit-learn compatible estimators. This allows practitioners to use consistent reporting and visualization tools across different estimators. .. seealso:: For a practical example of using language models within scikit-learn pipelines, see :ref:`example_use_case_employee_salaries` which demonstrates how to use skrub's :class:`~skrub.TextEncoder` (a language model-based encoder) in a scikit-learn pipeline for feature engineering. .. seealso:: For an example of wrapping Large Language Models (LLMs) to be compatible with scikit-learn APIs, see the tutorial on `Quantifying LLMs Uncertainty with Conformal Predictions `_. The article demonstrates how to wrap models like Mistral-7B-Instruct in a scikit-learn-compatible interface. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.500 seconds) .. _sphx_glr_download_auto_examples_technical_details_plot_sklearn_api.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_sklearn_api.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_sklearn_api.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_sklearn_api.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_