.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/technical_details/plot_sklearn_api.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_technical_details_plot_sklearn_api.py: .. _example_sklearn_api: ===================================================================================== Using skore with scikit-learn compatible estimators ===================================================================================== This example shows how to use skore with scikit-learn compatible estimators. Any model that can be used with the scikit-learn API can be used with skore. Skore's :class:`~skore.EstimatorReport` can be used to report on any estimator that has a ``fit`` and ``predict`` method. In fact, skore only requires the ``predict`` method if the estimator has already been fitted. .. note:: When computing the ROC AUC or ROC curve for a classification task, the estimator must have a ``predict_proba`` method. In this example, we showcase a gradient boosting model (`XGBoost `_) and a custom estimator. Note that this example is not exhaustive; many other scikit-learn compatible models can be used with skore: - More gradient boosting libraries like `LightGBM `_, and `CatBoost `_, - Deep learning frameworks such as `Keras `_ and `skorch `_ (a wrapper for `PyTorch `_). - Tabular foundation models such as `TabICL `_ and `TabPFN `_, - etc. .. GENERATED FROM PYTHON SOURCE LINES 44-49 Loading a binary classification dataset ======================================= We generate a synthetic binary classification dataset with only 1,000 samples to keep the computation time reasonable: .. GENERATED FROM PYTHON SOURCE LINES 51-56 .. code-block:: Python from sklearn.datasets import make_classification X, y = make_classification(n_samples=1_000, random_state=42) print(f"{X.shape = }") .. rst-class:: sphx-glr-script-out .. code-block:: none X.shape = (1000, 20) .. GENERATED FROM PYTHON SOURCE LINES 57-58 We split our data: .. GENERATED FROM PYTHON SOURCE LINES 60-64 .. code-block:: Python from skore import train_test_split split_data = train_test_split(X, y, random_state=42, as_dict=True) .. rst-class:: sphx-glr-script-out .. code-block:: none ╭───────────────────────────────── ShuffleTrueWarning ─────────────────────────────────╮ │ We detected that the `shuffle` parameter is set to `True` either explicitly or from │ │ its default value. In case of time-ordered events (even if they are independent), │ │ this will result in inflated model performance evaluation because natural drift will │ │ not be taken into account. We recommend setting the shuffle parameter to `False` in │ │ order to ensure the evaluation process is really representative of your production │ │ release process. │ ╰──────────────────────────────────────────────────────────────────────────────────────╯ .. GENERATED FROM PYTHON SOURCE LINES 65-71 Gradient-boosted decision trees with XGBoost ============================================ For this binary classification task, we consider a gradient-boosted decision trees model from a library external to scikit-learn. One of the most popular is `XGBoost `_. .. GENERATED FROM PYTHON SOURCE LINES 73-81 .. code-block:: Python from skore import EstimatorReport from xgboost import XGBClassifier xgb = XGBClassifier(n_estimators=50, max_depth=3, learning_rate=0.1, random_state=42) xgb_report = EstimatorReport(xgb, pos_label=1, **split_data) xgb_report.metrics.summarize().frame() .. raw:: html
XGBClassifier
Metric
Precision 0.943089
Recall 0.859259
ROC AUC 0.942931
Brier score 0.086748
Fit time (s) 0.026103
Predict time (s) 0.000539


.. GENERATED FROM PYTHON SOURCE LINES 82-83 We can easily get the summary of metrics, and also a ROC curve plot for example: .. GENERATED FROM PYTHON SOURCE LINES 85-87 .. code-block:: Python xgb_report.metrics.roc().plot() .. image-sg:: /auto_examples/technical_details/images/sphx_glr_plot_sklearn_api_001.png :alt: ROC Curve for XGBClassifier :srcset: /auto_examples/technical_details/images/sphx_glr_plot_sklearn_api_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 88-89 We can also inspect our model: .. GENERATED FROM PYTHON SOURCE LINES 91-93 .. code-block:: Python xgb_report.feature_importance.permutation() .. raw:: html
Repeat Repeat #0 Repeat #1 Repeat #2 Repeat #3 Repeat #4
Metric Feature
accuracy Feature #0 0.000 -0.004 0.004 -0.008 -0.004
Feature #1 -0.008 -0.004 -0.004 -0.008 -0.008
Feature #2 0.000 0.004 0.000 0.008 -0.004
Feature #3 0.000 0.000 0.000 0.000 0.000
Feature #4 0.000 0.000 0.000 0.000 0.000
Feature #5 0.372 0.336 0.328 0.292 0.304
Feature #6 0.000 0.008 0.004 0.004 0.004
Feature #7 0.000 0.000 0.000 0.000 0.000
Feature #8 0.000 0.000 0.000 0.000 0.000
Feature #9 -0.004 -0.004 -0.004 -0.004 -0.004
Feature #10 -0.004 -0.004 -0.004 -0.004 -0.004
Feature #11 0.004 0.004 0.000 0.008 0.020
Feature #12 0.004 0.008 0.008 0.004 0.004
Feature #13 0.004 0.008 0.000 0.004 0.004
Feature #14 0.084 0.076 0.064 0.064 0.064
Feature #15 0.000 0.000 0.000 0.000 0.000
Feature #16 -0.004 -0.004 -0.004 -0.004 0.000
Feature #17 0.000 0.000 0.000 0.000 -0.004
Feature #18 -0.004 0.008 0.008 0.004 0.012
Feature #19 0.000 0.000 0.000 0.000 0.000


.. GENERATED FROM PYTHON SOURCE LINES 94-96 Custom model ------------ .. GENERATED FROM PYTHON SOURCE LINES 98-101 Let us use a custom estimator inspired from the `scikit-learn documentation `_, a nearest neighbor classifier: .. GENERATED FROM PYTHON SOURCE LINES 103-128 .. code-block:: Python from sklearn.base import BaseEstimator, ClassifierMixin from sklearn.utils.validation import validate_data, check_is_fitted from sklearn.utils.multiclass import unique_labels from sklearn.metrics import euclidean_distances import numpy as np class CustomClassifier(ClassifierMixin, BaseEstimator): def __init__(self): pass def fit(self, X, y): X, y = validate_data(self, X, y) self.classes_ = unique_labels(y) self.X_ = X self.y_ = y return self def predict(self, X): check_is_fitted(self) X = validate_data(self, X, reset=False) closest = np.argmin(euclidean_distances(X, self.X_), axis=1) return self.y_[closest] .. GENERATED FROM PYTHON SOURCE LINES 129-133 .. note:: The estimator above does not have a `predict_proba` method, therefore we cannot display its ROC curve as done previously. .. GENERATED FROM PYTHON SOURCE LINES 135-136 We can now use this model with skore: .. GENERATED FROM PYTHON SOURCE LINES 138-141 .. code-block:: Python custom_report = EstimatorReport(CustomClassifier(), pos_label=1, **split_data) custom_report.metrics.precision() .. rst-class:: sphx-glr-script-out .. code-block:: none 0.831858407079646 .. GENERATED FROM PYTHON SOURCE LINES 142-162 Conclusion ========== This example demonstrates how skore can be used with scikit-learn compatible estimators. This allows practitioners to use consistent reporting and visualization tools across different estimators. .. seealso:: For a practical example of using language models within scikit-learn pipelines, see :ref:`example_use_case_employee_salaries` which demonstrates how to use skrub's :class:`~skrub.TextEncoder` (a language model-based encoder) in a scikit-learn pipeline for feature engineering. .. seealso:: For an example of wrapping Large Language Models (LLMs) to be compatible with scikit-learn APIs, see the tutorial on `Quantifying LLMs Uncertainty with Conformal Predictions `_. The article demonstrates how to wrap models like Mistral-7B-Instruct in a scikit-learn-compatible interface. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.265 seconds) .. _sphx_glr_download_auto_examples_technical_details_plot_sklearn_api.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_sklearn_api.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_sklearn_api.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_sklearn_api.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_