.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/technical_details/plot_sklearn_api.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_technical_details_plot_sklearn_api.py: .. _example_sklearn_api: ===================================================================================== Using skore with scikit-learn compatible estimators ===================================================================================== This example shows how to use skore with scikit-learn compatible estimators. Any model that can be used with the scikit-learn API can be used with skore. Use :func:`~skore.evaluate` to create a report from any estimator that has a ``fit`` and ``predict`` method (or only ``predict`` if already fitted). .. note:: When computing the ROC AUC or ROC curve for a classification task, the estimator must have a ``predict_proba`` method. In this example, we showcase a gradient boosting model (`XGBoost `_) and a custom estimator. Note that this example is not exhaustive; many other scikit-learn compatible models can be used with skore: - More gradient boosting libraries like `LightGBM `_, and `CatBoost `_, - Deep learning frameworks such as `Keras `_ and `skorch `_ (a wrapper for `PyTorch `_). - Tabular foundation models such as `TabICL `_ and `TabPFN `_, - etc. .. GENERATED FROM PYTHON SOURCE LINES 42-47 Loading a binary classification dataset ======================================= We generate a synthetic binary classification dataset with only 1,000 samples to keep the computation time reasonable: .. GENERATED FROM PYTHON SOURCE LINES 49-54 .. code-block:: Python from sklearn.datasets import make_classification X, y = make_classification(n_samples=1_000, random_state=42) print(f"{X.shape = }") .. rst-class:: sphx-glr-script-out .. code-block:: none X.shape = (1000, 20) .. GENERATED FROM PYTHON SOURCE LINES 55-61 Gradient-boosted decision trees with XGBoost ============================================ For this binary classification task, we consider a gradient-boosted decision trees model from a library external to scikit-learn. One of the most popular is `XGBoost `_. .. GENERATED FROM PYTHON SOURCE LINES 63-71 .. code-block:: Python from skore import evaluate from xgboost import XGBClassifier xgb = XGBClassifier(n_estimators=50, max_depth=3, learning_rate=0.1, random_state=42) xgb_report = evaluate(xgb, X, y, splitter=0.2, pos_label=1) xgb_report.metrics.summarize().frame() .. raw:: html
XGBClassifier
Metric
Accuracy 0.900000
Precision 0.989899
Recall 0.837607
ROC AUC 0.980126
Brier score 0.064364
Fit time (s) 0.059164
Predict time (s) 0.000817


.. GENERATED FROM PYTHON SOURCE LINES 72-73 We can easily get the summary of metrics, and also a ROC curve plot for example: .. GENERATED FROM PYTHON SOURCE LINES 75-77 .. code-block:: Python xgb_report.metrics.roc().plot() .. image-sg:: /auto_examples/technical_details/images/sphx_glr_plot_sklearn_api_001.png :alt: ROC Curve for XGBClassifier Positive label: 1 Data source: Test set :srcset: /auto_examples/technical_details/images/sphx_glr_plot_sklearn_api_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 78-79 We can also inspect our model: .. GENERATED FROM PYTHON SOURCE LINES 81-83 .. code-block:: Python xgb_report.inspection.permutation_importance().frame() .. raw:: html
data_source metric feature value_mean value_std
0 test accuracy Feature #0 0.002 0.005701
1 test accuracy Feature #1 0.004 0.004183
2 test accuracy Feature #2 -0.005 0.003536
3 test accuracy Feature #3 0.000 0.000000
4 test accuracy Feature #4 -0.003 0.002739
5 test accuracy Feature #5 0.388 0.026599
6 test accuracy Feature #6 -0.001 0.002236
7 test accuracy Feature #7 0.000 0.000000
8 test accuracy Feature #8 0.000 0.000000
9 test accuracy Feature #9 0.000 0.000000
10 test accuracy Feature #10 0.000 0.000000
11 test accuracy Feature #11 -0.004 0.007416
12 test accuracy Feature #12 0.000 0.000000
13 test accuracy Feature #13 -0.002 0.002739
14 test accuracy Feature #14 0.019 0.011937
15 test accuracy Feature #15 -0.001 0.002236
16 test accuracy Feature #16 0.000 0.000000
17 test accuracy Feature #17 -0.003 0.004472
18 test accuracy Feature #18 -0.006 0.002236
19 test accuracy Feature #19 -0.001 0.004183


.. GENERATED FROM PYTHON SOURCE LINES 84-86 Custom model ------------ .. GENERATED FROM PYTHON SOURCE LINES 88-91 Let us use a custom estimator inspired from the `scikit-learn documentation `_, a nearest neighbor classifier: .. GENERATED FROM PYTHON SOURCE LINES 93-118 .. code-block:: Python import numpy as np from sklearn.base import BaseEstimator, ClassifierMixin from sklearn.metrics import euclidean_distances from sklearn.utils.multiclass import unique_labels from sklearn.utils.validation import check_is_fitted, validate_data class CustomClassifier(ClassifierMixin, BaseEstimator): def __init__(self): pass def fit(self, X, y): X, y = validate_data(self, X, y) self.classes_ = unique_labels(y) self.X_ = X self.y_ = y return self def predict(self, X): check_is_fitted(self) X = validate_data(self, X, reset=False) closest = np.argmin(euclidean_distances(X, self.X_), axis=1) return self.y_[closest] .. GENERATED FROM PYTHON SOURCE LINES 119-123 .. note:: The estimator above does not have a `predict_proba` method, therefore we cannot display its ROC curve as done previously. .. GENERATED FROM PYTHON SOURCE LINES 125-126 We can now use this model with skore: .. GENERATED FROM PYTHON SOURCE LINES 128-131 .. code-block:: Python custom_report = evaluate(CustomClassifier(), X, y, splitter=0.2, pos_label=1) custom_report.metrics.precision() .. rst-class:: sphx-glr-script-out .. code-block:: none 0.7864077669902912 .. GENERATED FROM PYTHON SOURCE LINES 132-152 Conclusion ========== This example demonstrates how skore can be used with scikit-learn compatible estimators. This allows practitioners to use consistent reporting and visualization tools across different estimators. .. seealso:: For a practical example of using language models within scikit-learn pipelines, see :ref:`example_use_case_employee_salaries` which demonstrates how to use skrub's :class:`~skrub.TextEncoder` (a language model-based encoder) in a scikit-learn pipeline for feature engineering. .. seealso:: For an example of wrapping Large Language Models (LLMs) to be compatible with scikit-learn APIs, see the tutorial on `Quantifying LLMs Uncertainty with Conformal Predictions `_. The article demonstrates how to wrap models like Mistral-7B-Instruct in a scikit-learn-compatible interface. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.500 seconds) .. _sphx_glr_download_auto_examples_technical_details_plot_sklearn_api.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_sklearn_api.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_sklearn_api.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_sklearn_api.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_