.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/technical_details/plot_sklearn_api.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_technical_details_plot_sklearn_api.py: .. _example_sklearn_api: ===================================================================================== Using skore with scikit-learn compatible estimators ===================================================================================== This example shows how to use skore with scikit-learn compatible estimators. Any model that can be used with the scikit-learn API can be used with skore. Use :func:`~skore.evaluate` to create a report from any estimator that has a ``fit`` and ``predict`` method (or only ``predict`` if already fitted). .. note:: When computing the ROC AUC or ROC curve for a classification task, the estimator must have a ``predict_proba`` method. In this example, we showcase a gradient boosting model (`XGBoost `_) and a custom estimator. Note that this example is not exhaustive; many other scikit-learn compatible models can be used with skore: - More gradient boosting libraries like `LightGBM `_, and `CatBoost `_, - Deep learning frameworks such as `Keras `_ and `skorch `_ (a wrapper for `PyTorch `_). - Tabular foundation models such as `TabICL `_ and `TabPFN `_, - etc. .. GENERATED FROM PYTHON SOURCE LINES 42-47 Loading a binary classification dataset ======================================= We generate a synthetic binary classification dataset with only 1,000 samples to keep the computation time reasonable: .. GENERATED FROM PYTHON SOURCE LINES 49-54 .. code-block:: Python from sklearn.datasets import make_classification X, y = make_classification(n_samples=1_000, random_state=42) print(f"{X.shape = }") .. rst-class:: sphx-glr-script-out .. code-block:: none X.shape = (1000, 20) .. GENERATED FROM PYTHON SOURCE LINES 55-61 Gradient-boosted decision trees with XGBoost ============================================ For this binary classification task, we consider a gradient-boosted decision trees model from a library external to scikit-learn. One of the most popular is `XGBoost `_. .. GENERATED FROM PYTHON SOURCE LINES 63-71 .. code-block:: Python from skore import evaluate from xgboost import XGBClassifier xgb = XGBClassifier(n_estimators=50, max_depth=3, learning_rate=0.1, random_state=42) xgb_report = evaluate(xgb, X, y, splitter=0.2, pos_label=1) xgb_report.metrics.summarize().frame() .. raw:: html
XGBClassifier
Metric
Accuracy 0.900000
Precision 0.989899
Recall 0.837607
ROC AUC 0.980126
Log loss 0.218888
Brier score 0.064364
Fit time (s) 0.029269
Predict time (s) 0.000341


.. GENERATED FROM PYTHON SOURCE LINES 72-73 We can easily get the summary of metrics, and also a ROC curve plot for example: .. GENERATED FROM PYTHON SOURCE LINES 75-77 .. code-block:: Python xgb_report.metrics.roc().plot() .. rst-class:: sphx-glr-script-out .. code-block:: none
.. GENERATED FROM PYTHON SOURCE LINES 78-79 We can also inspect our model: .. GENERATED FROM PYTHON SOURCE LINES 81-83 .. code-block:: Python xgb_report.inspection.permutation_importance().frame() .. raw:: html
data_source metric feature value_mean value_std
0 test accuracy Feature #0 0.001 0.002236
1 test accuracy Feature #1 0.003 0.006708
2 test accuracy Feature #2 -0.012 0.005701
3 test accuracy Feature #3 0.000 0.000000
4 test accuracy Feature #4 -0.002 0.002739
5 test accuracy Feature #5 0.348 0.044102
6 test accuracy Feature #6 -0.001 0.002236
7 test accuracy Feature #7 0.000 0.000000
8 test accuracy Feature #8 0.000 0.000000
9 test accuracy Feature #9 0.000 0.000000
10 test accuracy Feature #10 0.000 0.000000
11 test accuracy Feature #11 -0.004 0.010840
12 test accuracy Feature #12 0.000 0.000000
13 test accuracy Feature #13 -0.003 0.002739
14 test accuracy Feature #14 0.022 0.010954
15 test accuracy Feature #15 0.000 0.000000
16 test accuracy Feature #16 0.000 0.000000
17 test accuracy Feature #17 -0.002 0.002739
18 test accuracy Feature #18 -0.006 0.002236
19 test accuracy Feature #19 0.001 0.002236


.. GENERATED FROM PYTHON SOURCE LINES 84-86 Custom model ------------ .. GENERATED FROM PYTHON SOURCE LINES 88-91 Let us use a custom estimator inspired from the `scikit-learn documentation `_, a nearest neighbor classifier: .. GENERATED FROM PYTHON SOURCE LINES 93-118 .. code-block:: Python import numpy as np from sklearn.base import BaseEstimator, ClassifierMixin from sklearn.metrics import euclidean_distances from sklearn.utils.multiclass import unique_labels from sklearn.utils.validation import check_is_fitted, validate_data class CustomClassifier(ClassifierMixin, BaseEstimator): def __init__(self): pass def fit(self, X, y): X, y = validate_data(self, X, y) self.classes_ = unique_labels(y) self.X_ = X self.y_ = y return self def predict(self, X): check_is_fitted(self) X = validate_data(self, X, reset=False) closest = np.argmin(euclidean_distances(X, self.X_), axis=1) return self.y_[closest] .. GENERATED FROM PYTHON SOURCE LINES 119-123 .. note:: The estimator above does not have a `predict_proba` method, therefore we cannot display its ROC curve as done previously. .. GENERATED FROM PYTHON SOURCE LINES 125-126 We can now use this model with skore: .. GENERATED FROM PYTHON SOURCE LINES 128-131 .. code-block:: Python custom_report = evaluate(CustomClassifier(), X, y, splitter=0.2, pos_label=1) custom_report.metrics.precision() .. rst-class:: sphx-glr-script-out .. code-block:: none 0.7864077669902912 .. GENERATED FROM PYTHON SOURCE LINES 132-152 Conclusion ========== This example demonstrates how skore can be used with scikit-learn compatible estimators. This allows practitioners to use consistent reporting and visualization tools across different estimators. .. seealso:: For a practical example of using language models within scikit-learn pipelines, see :ref:`example_use_case_employee_salaries` which demonstrates how to use skrub's :class:`~skrub.TextEncoder` (a language model-based encoder) in a scikit-learn pipeline for feature engineering. .. seealso:: For an example of wrapping Large Language Models (LLMs) to be compatible with scikit-learn APIs, see the tutorial on `Quantifying LLMs Uncertainty with Conformal Predictions `_. The article demonstrates how to wrap models like Mistral-7B-Instruct in a scikit-learn-compatible interface. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.294 seconds) .. _sphx_glr_download_auto_examples_technical_details_plot_sklearn_api.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_sklearn_api.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_sklearn_api.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_sklearn_api.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_