.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/getting_started/plot_skore_getting_started.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_getting_started_plot_skore_getting_started.py: .. _example_skore_getting_started: ====================== Skore: getting started ====================== .. GENERATED FROM PYTHON SOURCE LINES 10-24 This getting started guide illustrates how to use skore and why: #. Track your ML/DS results using skore's :class:`~skore.Project` (for storage). #. Machine learning diagnostics: get assistance when developing your ML/DS projects to avoid common pitfalls and follow recommended practices. * :class:`skore.EstimatorReport`: get an insightful report for your estimator * :class:`skore.CrossValidationReport`: get an insightful report for your cross-validation results * :func:`skore.train_test_split`: get diagnostics when splitting your data .. GENERATED FROM PYTHON SOURCE LINES 26-31 Tracking: skore project ======================= A key feature of skore is its :class:`~skore.Project` that allows to store items of many types. .. GENERATED FROM PYTHON SOURCE LINES 33-35 Setup: creating and loading a skore project ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. GENERATED FROM PYTHON SOURCE LINES 37-39 Let's start by creating a skore project directory named ``my_project.skore`` in our current directory. .. GENERATED FROM PYTHON SOURCE LINES 41-45 .. code-block:: Python import skore my_project = skore.open("my_project", create=True) .. GENERATED FROM PYTHON SOURCE LINES 46-49 Now that the project exists, we can write some Python code (in the same directory) to add (:func:`~skore.Project.put`) some useful items in it, with a key-value convention: .. GENERATED FROM PYTHON SOURCE LINES 51-53 .. code-block:: Python my_project.put("my_int", 3) .. GENERATED FROM PYTHON SOURCE LINES 54-55 We can retrieve the value of an item: .. GENERATED FROM PYTHON SOURCE LINES 57-59 .. code-block:: Python my_project.get("my_int") .. rst-class:: sphx-glr-script-out .. code-block:: none 3 .. GENERATED FROM PYTHON SOURCE LINES 60-66 Skore project: storing some items ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ As an illustration of the usage of the skore project with a machine learning motivation, let us perform a hyperparameter sweep and store relevant information in the skore project. .. GENERATED FROM PYTHON SOURCE LINES 68-70 We search for the ``alpha`` hyperparameter of a Ridge regression on the Diabetes dataset: .. GENERATED FROM PYTHON SOURCE LINES 72-86 .. code-block:: Python import numpy as np from sklearn.datasets import load_diabetes from sklearn.linear_model import Ridge from sklearn.model_selection import GridSearchCV X, y = load_diabetes(return_X_y=True) gs_cv = GridSearchCV( Ridge(), param_grid={"alpha": np.logspace(-3, 5, 50)}, scoring="neg_root_mean_squared_error", ) gs_cv.fit(X, y) .. raw:: html

GridSearchCV(estimator=Ridge(),
                 param_grid={'alpha': array([1.00000000e-03, 1.45634848e-03, 2.12095089e-03, 3.08884360e-03,
           4.49843267e-03, 6.55128557e-03, 9.54095476e-03, 1.38949549e-02,
           2.02358965e-02, 2.94705170e-02, 4.29193426e-02, 6.25055193e-02,
           9.10298178e-02, 1.32571137e-01, 1.93069773e-01, 2.81176870e-01,
           4.09491506e-01, 5.96362332e-01, 8.68511374e-01, 1.26485...
           3.72759372e+01, 5.42867544e+01, 7.90604321e+01, 1.15139540e+02,
           1.67683294e+02, 2.44205309e+02, 3.55648031e+02, 5.17947468e+02,
           7.54312006e+02, 1.09854114e+03, 1.59985872e+03, 2.32995181e+03,
           3.39322177e+03, 4.94171336e+03, 7.19685673e+03, 1.04811313e+04,
           1.52641797e+04, 2.22299648e+04, 3.23745754e+04, 4.71486636e+04,
           6.86648845e+04, 1.00000000e+05])},
                 scoring='neg_root_mean_squared_error')

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

.. GENERATED FROM PYTHON SOURCE LINES 87-89 Now, we store the hyperparameter's metrics in a dataframe and make a custom plot: .. GENERATED FROM PYTHON SOURCE LINES 91-97 .. code-block:: Python import pandas as pd df = pd.DataFrame(gs_cv.cv_results_) df.insert(len(df.columns), "rmse", -df["mean_test_score"].values) df[["param_alpha", "rmse"]].head() .. raw:: html

	param_alpha	rmse
0	0.001000	54.692670
1	0.001456	54.694527
2	0.002121	54.698033
3	0.003089	54.703919
4	0.004498	54.712676

.. GENERATED FROM PYTHON SOURCE LINES 98-108 .. code-block:: Python import matplotlib.pyplot as plt fig = plt.figure(layout="constrained") plt.plot(df["param_alpha"], df["rmse"]) plt.xscale("log") plt.xlabel("Alpha hyperparameter") plt.ylabel("RMSE") plt.title("Ridge regression") plt.show() .. image-sg:: /auto_examples/getting_started/images/sphx_glr_plot_skore_getting_started_001.png :alt: Ridge regression :srcset: /auto_examples/getting_started/images/sphx_glr_plot_skore_getting_started_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 109-111 Finally, we store these relevant items in our skore project, so that we can visualize them later: .. GENERATED FROM PYTHON SOURCE LINES 114-118 .. code-block:: Python my_project.put("my_gs_cv", gs_cv) my_project.put("my_df", df) my_project.put("my_fig", fig) .. GENERATED FROM PYTHON SOURCE LINES 119-124 .. seealso:: For more information about the functionalities and the different types of items that we can store in a skore :class:`~skore.Project`, see :ref:`example_working_with_projects`. .. GENERATED FROM PYTHON SOURCE LINES 126-128 Tracking the history of items ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. GENERATED FROM PYTHON SOURCE LINES 130-147 Suppose we store several values for a same item called ``my_key_metric``: .. code-block:: python my_project.put("my_key_metric", 4) my_project.put("my_key_metric", 9) my_project.put("my_key_metric", 16) Skore does not overwrite items with the same name (key): instead, it stores their history so that nothing is lost. These tracking functionalities are very useful to: * never lose some key machine learning metrics, * and observe the evolution over time / runs. .. GENERATED FROM PYTHON SOURCE LINES 149-153 .. seealso:: For more information about the tracking of items using their history, see :ref:`example_tracking_items`. .. GENERATED FROM PYTHON SOURCE LINES 155-161 Machine learning diagnostics and evaluation =========================================== Skore re-implements or wraps some key scikit-learn class / functions to automatically provide diagnostics and checks when using them, as a way to facilitate good practices and avoid common pitfalls. .. GENERATED FROM PYTHON SOURCE LINES 163-171 Model evaluation with skore ^^^^^^^^^^^^^^^^^^^^^^^^^^^ In order to assist its users when programming, skore has implemented a :class:`skore.EstimatorReport` class. Let us load some synthetic data and get the estimator report for a :class:`~sklearn.linear_model.LogisticRegression`: .. GENERATED FROM PYTHON SOURCE LINES 173-188 .. code-block:: Python from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from skore import EstimatorReport X, y = make_classification(n_classes=2, n_samples=100_000, n_informative=4) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0) clf = LogisticRegression() est_report = EstimatorReport( clf, X_train=X_train, X_test=X_test, y_train=y_train, y_test=y_test ) .. GENERATED FROM PYTHON SOURCE LINES 189-191 Now, we can display the help tree to see all the insights that are available to us given that we are doing binary classification: .. GENERATED FROM PYTHON SOURCE LINES 193-195 .. code-block:: Python est_report.help() .. rst-class:: sphx-glr-script-out .. code-block:: none ╭────────────────── Tools to diagnose estimator LogisticRegression ───────────────────╮ │ report │ │ ├── .metrics │ │ │ ├── .accuracy(...) (↗︎) - Compute the accuracy score. │ │ │ ├── .brier_score(...) (↘︎) - Compute the Brier score. │ │ │ ├── .log_loss(...) (↘︎) - Compute the log loss. │ │ │ ├── .precision(...) (↗︎) - Compute the precision score. │ │ │ ├── .recall(...) (↗︎) - Compute the recall score. │ │ │ ├── .roc_auc(...) (↗︎) - Compute the ROC AUC score. │ │ │ ├── .custom_metric(...) - Compute a custom metric. │ │ │ ├── .report_metrics(...) - Report a set of metrics for our estimator. │ │ │ └── .plot │ │ │ ├── .precision_recall(...) - Plot the precision-recall curve. │ │ │ └── .roc(...) - Plot the ROC curve. │ │ ├── .cache_predictions(...) - Cache estimator's predictions. │ │ ├── .clear_cache(...) - Clear the cache. │ │ └── Attributes │ │ ├── .X_test │ │ ├── .X_train │ │ ├── .y_test │ │ ├── .y_train │ │ ├── .estimator_ │ │ └── .estimator_name_ │ │ │ │ │ │ Legend: │ │ (↗︎) higher is better (↘︎) lower is better │ ╰─────────────────────────────────────────────────────────────────────────────────────╯ .. GENERATED FROM PYTHON SOURCE LINES 196-197 We can get the report metrics that was computed for us: .. GENERATED FROM PYTHON SOURCE LINES 199-202 .. code-block:: Python df_est_report_metrics = est_report.metrics.report_metrics() df_est_report_metrics .. raw:: html

Metric	Precision (↗︎)		Recall (↗︎)		ROC AUC (↗︎)	Brier score (↘︎)
Class label	0	1	0	1
LogisticRegression	0.654552	0.669305	0.686292	0.636757	0.720915	0.213467

.. GENERATED FROM PYTHON SOURCE LINES 203-204 We can also plot the ROC curve that was generated for us: .. GENERATED FROM PYTHON SOURCE LINES 206-212 .. code-block:: Python import matplotlib.pyplot as plt roc_plot = est_report.metrics.plot.roc() roc_plot plt.tight_layout() .. image-sg:: /auto_examples/getting_started/images/sphx_glr_plot_skore_getting_started_002.png :alt: plot skore getting started :srcset: /auto_examples/getting_started/images/sphx_glr_plot_skore_getting_started_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 213-217 .. seealso:: For more information about the motivation and usage of :class:`skore.EstimatorReport`, see :ref:`example_estimator_report`. .. GENERATED FROM PYTHON SOURCE LINES 220-225 Cross-validation with skore ^^^^^^^^^^^^^^^^^^^^^^^^^^^ skore has also implemented a :class:`skore.CrossValidationReport` class that contains several :class:`skore.EstimatorReport` for each fold. .. GENERATED FROM PYTHON SOURCE LINES 227-231 .. code-block:: Python from skore import CrossValidationReport cv_report = CrossValidationReport(clf, X, y, cv_splitter=5) .. rst-class:: sphx-glr-script-out .. code-block:: none Processing cross-validation ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% for LogisticRegression .. GENERATED FROM PYTHON SOURCE LINES 232-233 We display the cross-validation report helper: .. GENERATED FROM PYTHON SOURCE LINES 235-237 .. code-block:: Python cv_report.help() .. rst-class:: sphx-glr-script-out .. code-block:: none ╭─────────────────── Tools to diagnose estimator LogisticRegression ───────────────────╮ │ report │ │ ├── .metrics │ │ │ ├── .accuracy(...) (↗︎) - Compute the accuracy score. │ │ │ ├── .brier_score(...) (↘︎) - Compute the Brier score. │ │ │ ├── .log_loss(...) (↘︎) - Compute the log loss. │ │ │ ├── .precision(...) (↗︎) - Compute the precision score. │ │ │ ├── .recall(...) (↗︎) - Compute the recall score. │ │ │ ├── .roc_auc(...) (↗︎) - Compute the ROC AUC score. │ │ │ ├── .custom_metric(...) - Compute a custom metric. │ │ │ ├── .report_metrics(...) - Report a set of metrics for our estimator. │ │ │ └── .plot │ │ │ ├── .precision_recall(...) - Plot the precision-recall curve. │ │ │ └── .roc(...) - Plot the ROC curve. │ │ ├── .cache_predictions(...) - Cache the predictions for sub-estimators │ │ │ reports. │ │ ├── .clear_cache(...) - Clear the cache. │ │ └── Attributes │ │ ├── .X │ │ ├── .y │ │ ├── .estimator_ │ │ ├── .estimator_name_ │ │ ├── .estimator_reports_ │ │ └── .n_jobs │ │ │ │ │ │ Legend: │ │ (↗︎) higher is better (↘︎) lower is better │ ╰──────────────────────────────────────────────────────────────────────────────────────╯ .. GENERATED FROM PYTHON SOURCE LINES 238-239 We display the metrics for each fold: .. GENERATED FROM PYTHON SOURCE LINES 241-244 .. code-block:: Python df_cv_report_metrics = cv_report.metrics.report_metrics() df_cv_report_metrics .. rst-class:: sphx-glr-script-out .. code-block:: none Compute metric for each split ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% .. raw:: html

	Metric	Precision (↗︎)		Recall (↗︎)		ROC AUC (↗︎)	Brier score (↘︎)
	Class label	0	1	0	1
LogisticRegression	Split #0	0.660040	0.673203	0.686888	0.645787	0.727659	0.211502
	Split #1	0.655163	0.676556	0.698181	0.632079	0.722143	0.213528
	Split #2	0.656199	0.672321	0.689186	0.638483	0.722865	0.212927
	Split #3	0.657407	0.676296	0.695383	0.637182	0.726331	0.211556
	Split #4	0.657320	0.672727	0.688787	0.640484	0.724441	0.212437

.. GENERATED FROM PYTHON SOURCE LINES 245-246 We display the ROC curves for each fold: .. GENERATED FROM PYTHON SOURCE LINES 248-252 .. code-block:: Python roc_plot = cv_report.metrics.plot.roc() roc_plot plt.tight_layout() .. image-sg:: /auto_examples/getting_started/images/sphx_glr_plot_skore_getting_started_003.png :alt: plot skore getting started :srcset: /auto_examples/getting_started/images/sphx_glr_plot_skore_getting_started_003.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Computing predictions for display ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% .. GENERATED FROM PYTHON SOURCE LINES 253-257 .. seealso:: For more information about the motivation and usage of :class:`skore.CrossValidationReport`, see :ref:`example_use_case_employee_salaries`. .. GENERATED FROM PYTHON SOURCE LINES 259-266 Train-test split with skore ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Skore has implemented a :func:`skore.train_test_split` function that wraps scikit-learn's :func:`sklearn.model_selection.train_test_split`. Let us load a dataset containing some time series data: .. GENERATED FROM PYTHON SOURCE LINES 268-275 .. code-block:: Python from skrub.datasets import fetch_employee_salaries dataset = fetch_employee_salaries() X, y = dataset.X, dataset.y X["date_first_hired"] = pd.to_datetime(X["date_first_hired"]) X.head(2) .. raw:: html

	gender	department	department_name	division	assignment_category	employee_position_title	date_first_hired	year_first_hired
0	F	POL	Department of Police	MSB Information Mgmt and Tech Division Records...	Fulltime-Regular	Office Services Coordinator	1986-09-22	1986
1	M	POL	Department of Police	ISB Major Crimes Division Fugitive Section	Fulltime-Regular	Master Police Officer	1988-09-12	1988

.. GENERATED FROM PYTHON SOURCE LINES 276-278 We can observe that there is a ``date_first_hired`` which is time-based. Now, let us apply :func:`skore.train_test_split` on this data: .. GENERATED FROM PYTHON SOURCE LINES 280-284 .. code-block:: Python X_train, X_test, y_train, y_test = skore.train_test_split( X, y, random_state=0, shuffle=False ) .. rst-class:: sphx-glr-script-out .. code-block:: none ╭─────────────────────────────── TimeBasedColumnWarning ───────────────────────────────╮ │ We detected some time-based columns (column "date_first_hired") in your data. We │ │ recommend using scikit-learn's TimeSeriesSplit instead of train_test_split. │ │ Otherwise you might train on future data to predict the past, or get inflated model │ │ performance evaluation because natural drift will not be taken into account. │ ╰──────────────────────────────────────────────────────────────────────────────────────╯ .. GENERATED FROM PYTHON SOURCE LINES 285-288 We get a ``TimeBasedColumnWarning`` advising us to use :class:`sklearn.model_selection.TimeSeriesSplit` instead! Indeed, we should not shuffle time-ordered data! .. GENERATED FROM PYTHON SOURCE LINES 291-296 .. seealso:: More methodological advice is available. For more information about the motivation and usage of :func:`skore.train_test_split`, see :ref:`example_train_test_split`. .. GENERATED FROM PYTHON SOURCE LINES 298-306 .. admonition:: Stay tuned! These are only the initial features: skore is a work in progress and aims to be an end-to-end library for data scientists. Feedbacks are welcome: please feel free to join our `Discord `_ or `create an issue `_. .. GENERATED FROM PYTHON SOURCE LINES 308-313 Cleanup the project ------------------- Let's clear the skore project (to avoid any conflict with other documentation examples). .. GENERATED FROM PYTHON SOURCE LINES 315-316 .. code-block:: Python my_project.clear() .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 4.846 seconds) .. _sphx_glr_download_auto_examples_getting_started_plot_skore_getting_started.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_skore_getting_started.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_skore_getting_started.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_skore_getting_started.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_