.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/technical_details/plot_cache_mechanism.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_technical_details_plot_cache_mechanism.py: .. _example_cache_mechanism: =============== Cache mechanism =============== This example shows how :class:`~skore.EstimatorReport` and :class:`~skore.CrossValidationReport` use caching to speed up computations. .. GENERATED FROM PYTHON SOURCE LINES 13-19 Loading some data ================= First, we load a dataset from `skrub`. Our goal is to predict if a company paid a physician. The ultimate goal is to detect potential conflict of interest when it comes to the actual problem that we want to solve. .. GENERATED FROM PYTHON SOURCE LINES 19-25 .. code-block:: Python from skrub.datasets import fetch_open_payments dataset = fetch_open_payments() df = dataset.X y = dataset.y .. GENERATED FROM PYTHON SOURCE LINES 26-30 .. code-block:: Python from skrub import TableReport TableReport(df) .. raw:: html

	Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_Name	Dispute_Status_for_Publication	Name_of_Associated_Covered_Device_or_Medical_Supply1	Name_of_Associated_Covered_Drug_or_Biological1	Physician_Specialty
0	ELI LILLY AND COMPANY	No			Allopathic & Osteopathic Physicians\|Pediatrics\|Pediatric Rheumatology
1	ELI LILLY AND COMPANY	No			Allopathic & Osteopathic Physicians\|Internal Medicine\|Nephrology
2	ELI LILLY AND COMPANY	No			Allopathic & Osteopathic Physicians\|Internal Medicine\|Rheumatology
3	ELI LILLY AND COMPANY	No			Allopathic & Osteopathic Physicians\|Internal Medicine\|Endocrinology, Diabetes & Metabolism
4	ELI LILLY AND COMPANY	No		EFFIENT	Allopathic & Osteopathic Physicians\|Pediatrics\|Pediatric Hematology-Oncology

73,553	GlaxoSmithKline, LLC.	No		ZIAGEN
73,554	ALERE SCARBOROUGH, INC.	No	Alere PBP2a
73,555	NovoCure Limited	No
73,556	Wright Medical Technology, Inc.	No		HIPS
73,557	Alcon Research Ltd	No		Express

Column	Column name	dtype	Is sorted	Null values	Unique values
0	Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_Name	ObjectDType	False	0 (0.0%)	1466 (2.0%)
1	Dispute_Status_for_Publication	ObjectDType	False	0 (0.0%)	2 (< 0.1%)
2	Name_of_Associated_Covered_Device_or_Medical_Supply1	ObjectDType	False	43088 (58.6%)	4372 (5.9%)
3	Name_of_Associated_Covered_Drug_or_Biological1	ObjectDType	False	36233 (49.3%)	2262 (3.1%)
4	Physician_Specialty	ObjectDType	False	3996 (5.4%)	513 (0.7%)

Column 1	Column 2	Cramér's V
Name_of_Associated_Covered_Device_or_Medical_Supply1	Name_of_Associated_Covered_Drug_or_Biological1	0.261
Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_Name	Name_of_Associated_Covered_Drug_or_Biological1	0.257
Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_Name	Name_of_Associated_Covered_Device_or_Medical_Supply1	0.156
Dispute_Status_for_Publication	Name_of_Associated_Covered_Device_or_Medical_Supply1	0.125
Dispute_Status_for_Publication	Physician_Specialty	0.0894
Name_of_Associated_Covered_Device_or_Medical_Supply1	Physician_Specialty	0.0634
Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_Name	Physician_Specialty	0.0588
Name_of_Associated_Covered_Drug_or_Biological1	Physician_Specialty	0.0583
Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_Name	Dispute_Status_for_Publication	0.0508
Dispute_Status_for_Publication	Name_of_Associated_Covered_Drug_or_Biological1	0.0188

Please enable javascript

The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").

.. GENERATED FROM PYTHON SOURCE LINES 31-35 .. code-block:: Python import pandas as pd TableReport(pd.DataFrame(y)) .. raw:: html

	status
0	disallowed
1	disallowed
2	disallowed
3	disallowed
4	disallowed

73,553	allowed
73,554	allowed
73,555	allowed
73,556	allowed
73,557	allowed

Column	Column name	dtype	Is sorted	Null values	Unique values	Mean	Std	Min	Median	Max
0	status	ObjectDType	True	0 (0.0%)	2 (< 0.1%)

Please enable javascript

.. GENERATED FROM PYTHON SOURCE LINES 36-38 The dataset has over 70,000 records with only categorical features. Some categories are not well defined. .. GENERATED FROM PYTHON SOURCE LINES 41-46 Caching with :class:`~skore.EstimatorReport` and :class:`~skore.CrossValidationReport` ====================================================================================== We use `skrub` to create a simple predictive model that handles our dataset's challenges. .. GENERATED FROM PYTHON SOURCE LINES 46-52 .. code-block:: Python from skrub import tabular_learner model = tabular_learner("classifier") model .. rst-class:: sphx-glr-script-out .. code-block:: none /opt/hostedtoolcache/Python/3.12.11/x64/lib/python3.12/site-packages/skrub/_tabular_pipeline.py:75: FutureWarning: tabular_learner will be deprecated in the next release. Equivalent functionality is available in skrub.tabular_pipeline. .. raw:: html

Pipeline(steps=[('tablevectorizer',
                     TableVectorizer(low_cardinality=ToCategorical())),
                    ('histgradientboostingclassifier',
                     HistGradientBoostingClassifier())])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Pipeline

?Documentation for PipelineiNot fitted

Parameters

	steps	[('tablevectorizer', ...), ('histgradientboostingclassifier', ...)]
	transform_input	None
	memory	None
	verbose	False

tablevectorizer: TableVectorizer

Parameters

	cardinality_threshold	40
	low_cardinality	ToCategorical()
	high_cardinality	StringEncoder()
	numeric	PassThrough()
	datetime	DatetimeEncoder()
	specific_transformers	()
	drop_null_fraction	1.0
	drop_if_constant	False
	drop_if_unique	False
	datetime_format	None
	n_jobs	None

numeric

PassThrough

Parameters

datetime

DatetimeEncoder

Parameters

	resolution	'hour'
	add_weekday	False
	add_total_seconds	True
	add_day_of_year	False
	periodic_encoding	None

low_cardinality

ToCategorical

Parameters

high_cardinality

StringEncoder

Parameters

	n_components	30
	vectorizer	'tfidf'
	ngram_range	(3, ...)
	analyzer	'char_wb'
	stop_words	None
	random_state	None

HistGradientBoostingClassifier

?Documentation for HistGradientBoostingClassifier

Parameters

	loss	'log_loss'
	learning_rate	0.1
	max_iter	100
	max_leaf_nodes	31
	max_depth	None
	min_samples_leaf	20
	l2_regularization	0.0
	max_features	1.0
	max_bins	255
	categorical_features	'from_dtype'
	monotonic_cst	None
	interaction_cst	None
	warm_start	False
	early_stopping	'auto'
	scoring	'loss'
	validation_fraction	0.1
	n_iter_no_change	10
	tol	1e-07
	verbose	0
	random_state	None
	class_weight	None

.. GENERATED FROM PYTHON SOURCE LINES 53-55 This model handles all types of data: numbers, categories, dates, and missing values. Let's train it on part of our dataset. .. GENERATED FROM PYTHON SOURCE LINES 56-64 .. code-block:: Python from skore import train_test_split X_train, X_test, y_train, y_test = train_test_split(df, y, random_state=42) # Let's keep a completely separate dataset X_train, X_external, y_train, y_external = train_test_split( X_train, y_train, random_state=42 ) .. rst-class:: sphx-glr-script-out .. code-block:: none ╭───────────────────────────── HighClassImbalanceWarning ──────────────────────────────╮ │ It seems that you have a classification problem with a high class imbalance. In this │ │ case, using train_test_split may not be a good idea because of high variability in │ │ the scores obtained on the test set. To tackle this challenge we suggest to use │ │ skore's CrossValidationReport with the `splitter` parameter of your choice. │ ╰──────────────────────────────────────────────────────────────────────────────────────╯ ╭───────────────────────────────── ShuffleTrueWarning ─────────────────────────────────╮ │ We detected that the `shuffle` parameter is set to `True` either explicitly or from │ │ its default value. In case of time-ordered events (even if they are independent), │ │ this will result in inflated model performance evaluation because natural drift will │ │ not be taken into account. We recommend setting the shuffle parameter to `False` in │ │ order to ensure the evaluation process is really representative of your production │ │ release process. │ ╰──────────────────────────────────────────────────────────────────────────────────────╯ ╭───────────────────────────── HighClassImbalanceWarning ──────────────────────────────╮ │ It seems that you have a classification problem with a high class imbalance. In this │ │ case, using train_test_split may not be a good idea because of high variability in │ │ the scores obtained on the test set. To tackle this challenge we suggest to use │ │ skore's CrossValidationReport with the `splitter` parameter of your choice. │ ╰──────────────────────────────────────────────────────────────────────────────────────╯ ╭───────────────────────────────── ShuffleTrueWarning ─────────────────────────────────╮ │ We detected that the `shuffle` parameter is set to `True` either explicitly or from │ │ its default value. In case of time-ordered events (even if they are independent), │ │ this will result in inflated model performance evaluation because natural drift will │ │ not be taken into account. We recommend setting the shuffle parameter to `False` in │ │ order to ensure the evaluation process is really representative of your production │ │ release process. │ ╰──────────────────────────────────────────────────────────────────────────────────────╯ .. GENERATED FROM PYTHON SOURCE LINES 65-73 Caching the predictions for fast metric computation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ First, we focus on :class:`~skore.EstimatorReport`, as the same philosophy will apply to :class:`~skore.CrossValidationReport`. Let's explore how :class:`~skore.EstimatorReport` uses caching to speed up predictions. We start by training the model: .. GENERATED FROM PYTHON SOURCE LINES 73-80 .. code-block:: Python from skore import EstimatorReport report = EstimatorReport( model, X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test ) report.help() .. rst-class:: sphx-glr-script-out .. code-block:: none ╭───────────── Tools to diagnose estimator HistGradientBoostingClassifier ─────────────╮ │ EstimatorReport │ │ ├── .metrics │ │ │ ├── .accuracy(...) (↗︎) - Compute the accuracy score. │ │ │ ├── .brier_score(...) (↘︎) - Compute the Brier score. │ │ │ ├── .confusion_matrix(...) - Plot the confusion matrix. │ │ │ ├── .log_loss(...) (↘︎) - Compute the log loss. │ │ │ ├── .precision(...) (↗︎) - Compute the precision score. │ │ │ ├── .precision_recall(...) - Plot the precision-recall curve. │ │ │ ├── .recall(...) (↗︎) - Compute the recall score. │ │ │ ├── .roc(...) - Plot the ROC curve. │ │ │ ├── .roc_auc(...) (↗︎) - Compute the ROC AUC score. │ │ │ ├── .timings(...) - Get all measured processing times related │ │ │ │ to the estimator. │ │ │ ├── .custom_metric(...) - Compute a custom metric. │ │ │ └── .summarize(...) - Report a set of metrics for our estimator. │ │ ├── .feature_importance │ │ │ └── .permutation(...) - Report the permutation feature importance. │ │ ├── .data │ │ │ └── .analyze(...) - Plot dataset statistics. │ │ ├── .cache_predictions(...) - Cache estimator's predictions. │ │ ├── .clear_cache(...) - Clear the cache. │ │ ├── .get_predictions(...) - Get estimator's predictions. │ │ └── Attributes │ │ ├── .X_test - Testing data │ │ ├── .X_train - Training data │ │ ├── .y_test - Testing target │ │ ├── .y_train - Training target │ │ ├── .estimator - Estimator to make the report from │ │ ├── .estimator_ - The cloned or copied estimator │ │ ├── .estimator_name_ - The name of the estimator │ │ ├── .fit - Whether to fit the estimator on the │ │ │ training data │ │ ├── .fit_time_ - The time taken to fit the estimator, in │ │ │ seconds │ │ ├── .ml_task - No description available │ │ └── .pos_label - For binary classification, the positive │ │ class │ │ │ │ │ │ Legend: │ │ (↗︎) higher is better (↘︎) lower is better │ ╰──────────────────────────────────────────────────────────────────────────────────────╯ .. GENERATED FROM PYTHON SOURCE LINES 81-82 We compute the accuracy on our test set and measure how long it takes: .. GENERATED FROM PYTHON SOURCE LINES 83-90 .. code-block:: Python import time start = time.time() result = report.metrics.accuracy() end = time.time() result .. rst-class:: sphx-glr-script-out .. code-block:: none 0.9511147362697118 .. GENERATED FROM PYTHON SOURCE LINES 91-93 .. code-block:: Python print(f"Time taken: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken: 1.57 seconds .. GENERATED FROM PYTHON SOURCE LINES 94-95 For comparison, here's how scikit-learn computes the same accuracy score: .. GENERATED FROM PYTHON SOURCE LINES 96-103 .. code-block:: Python from sklearn.metrics import accuracy_score start = time.time() result = accuracy_score(report.y_test, report.estimator_.predict(report.X_test)) end = time.time() result .. rst-class:: sphx-glr-script-out .. code-block:: none 0.9511147362697118 .. GENERATED FROM PYTHON SOURCE LINES 104-106 .. code-block:: Python print(f"Time taken: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken: 1.59 seconds .. GENERATED FROM PYTHON SOURCE LINES 107-111 Both approaches take similar time. Now, watch what happens when we compute the accuracy again with our skore estimator report: .. GENERATED FROM PYTHON SOURCE LINES 112-117 .. code-block:: Python start = time.time() result = report.metrics.accuracy() end = time.time() result .. rst-class:: sphx-glr-script-out .. code-block:: none 0.9511147362697118 .. GENERATED FROM PYTHON SOURCE LINES 118-120 .. code-block:: Python print(f"Time taken: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken: 0.00 seconds .. GENERATED FROM PYTHON SOURCE LINES 121-123 The second calculation is instant! This happens because the report saves previous calculations in its cache. Let's look inside the cache: .. GENERATED FROM PYTHON SOURCE LINES 124-126 .. code-block:: Python report._cache .. rst-class:: sphx-glr-script-out .. code-block:: none {(np.int64(-4239121635448246093), None, 'predict', 'test', None): array(['disallowed', 'disallowed', 'disallowed', ..., 'disallowed', 'disallowed', 'disallowed'], shape=(18390,), dtype=object), (np.int64(-4239121635448246093), 'test', None, 'predict_time'): 1.5536823679999543, (np.int64(-4239121635448246093), 'accuracy_score', 'test'): 0.9511147362697118} .. GENERATED FROM PYTHON SOURCE LINES 127-130 The cache stores predictions by type and data source. This means that computing metrics that use the same type of predictions will be faster. Let's try the precision metric: .. GENERATED FROM PYTHON SOURCE LINES 130-135 .. code-block:: Python start = time.time() result = report.metrics.precision() end = time.time() result .. rst-class:: sphx-glr-script-out .. code-block:: none {'allowed': 0.6633785450061652, 'disallowed': 0.9643893281756641} .. GENERATED FROM PYTHON SOURCE LINES 136-138 .. code-block:: Python print(f"Time taken: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken: 0.06 seconds .. GENERATED FROM PYTHON SOURCE LINES 139-144 We observe that it takes only a few milliseconds to compute the precision because we don't need to re-compute the predictions and only have to compute the precision metric itself. Since the predictions are the bottleneck in terms of computation time, we observe an interesting speedup. .. GENERATED FROM PYTHON SOURCE LINES 146-150 Caching all the possible predictions at once ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We can pre-compute all predictions at once using parallel processing: .. GENERATED FROM PYTHON SOURCE LINES 150-152 .. code-block:: Python report.cache_predictions(n_jobs=4) .. GENERATED FROM PYTHON SOURCE LINES 153-155 Now, all possible predictions are stored. Any metric calculation will be much faster, even on different data (like the training set): .. GENERATED FROM PYTHON SOURCE LINES 156-161 .. code-block:: Python start = time.time() result = report.metrics.log_loss(data_source="train") end = time.time() result .. rst-class:: sphx-glr-script-out .. code-block:: none 0.10310973983150455 .. GENERATED FROM PYTHON SOURCE LINES 162-164 .. code-block:: Python print(f"Time taken: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken: 0.08 seconds .. GENERATED FROM PYTHON SOURCE LINES 165-170 Caching external data ^^^^^^^^^^^^^^^^^^^^^ The report can also work with external data. We use `data_source="X_y"` to indicate that we want to pass those external data. .. GENERATED FROM PYTHON SOURCE LINES 170-175 .. code-block:: Python start = time.time() result = report.metrics.log_loss(data_source="X_y", X=X_external, y=y_external) end = time.time() result .. rst-class:: sphx-glr-script-out .. code-block:: none 0.12791143480330047 .. GENERATED FROM PYTHON SOURCE LINES 176-178 .. code-block:: Python print(f"Time taken: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken: 1.35 seconds .. GENERATED FROM PYTHON SOURCE LINES 179-182 The first calculation of the above cell is slower than when using the internal train or test sets because it needs to compute a hash of the new data for later retrieval. Let's calculate it again: .. GENERATED FROM PYTHON SOURCE LINES 183-188 .. code-block:: Python start = time.time() result = report.metrics.log_loss(data_source="X_y", X=X_external, y=y_external) end = time.time() result .. rst-class:: sphx-glr-script-out .. code-block:: none 0.12791143480330047 .. GENERATED FROM PYTHON SOURCE LINES 189-191 .. code-block:: Python print(f"Time taken: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken: 0.14 seconds .. GENERATED FROM PYTHON SOURCE LINES 192-195 It is much faster for the second time as the predictions are cached! The remaining time corresponds to the hash computation. Let's compute the ROC AUC on the same data: .. GENERATED FROM PYTHON SOURCE LINES 196-201 .. code-block:: Python start = time.time() result = report.metrics.roc_auc(data_source="X_y", X=X_external, y=y_external) end = time.time() result .. rst-class:: sphx-glr-script-out .. code-block:: none 0.9304604856924651 .. GENERATED FROM PYTHON SOURCE LINES 202-204 .. code-block:: Python print(f"Time taken: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken: 0.16 seconds .. GENERATED FROM PYTHON SOURCE LINES 205-208 We observe that the computation is already efficient because it boils down to two computations: the hash of the data and the ROC-AUC metric. We save a lot of time because we don't need to re-compute the predictions. .. GENERATED FROM PYTHON SOURCE LINES 210-214 Caching for plotting ^^^^^^^^^^^^^^^^^^^^ The cache also speeds up plots. Let's create a ROC curve: .. GENERATED FROM PYTHON SOURCE LINES 214-220 .. code-block:: Python start = time.time() display = report.metrics.roc(pos_label="allowed") display.plot() end = time.time() .. image-sg:: /auto_examples/technical_details/images/sphx_glr_plot_cache_mechanism_001.png :alt: ROC Curve for HistGradientBoostingClassifier :srcset: /auto_examples/technical_details/images/sphx_glr_plot_cache_mechanism_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 221-223 .. code-block:: Python print(f"Time taken: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken: 0.03 seconds .. GENERATED FROM PYTHON SOURCE LINES 224-225 The second plot is instant because it uses cached data: .. GENERATED FROM PYTHON SOURCE LINES 226-231 .. code-block:: Python start = time.time() display = report.metrics.roc(pos_label="allowed") display.plot() end = time.time() .. image-sg:: /auto_examples/technical_details/images/sphx_glr_plot_cache_mechanism_002.png :alt: ROC Curve for HistGradientBoostingClassifier :srcset: /auto_examples/technical_details/images/sphx_glr_plot_cache_mechanism_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 232-234 .. code-block:: Python print(f"Time taken: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken: 0.01 seconds .. GENERATED FROM PYTHON SOURCE LINES 235-237 We only use the cache to retrieve the `display` object and not directly the matplotlib figure. It means that we can still customize the cached plot before displaying it: .. GENERATED FROM PYTHON SOURCE LINES 238-240 .. code-block:: Python display.plot(roc_curve_kwargs={"color": "tab:orange"}) .. image-sg:: /auto_examples/technical_details/images/sphx_glr_plot_cache_mechanism_003.png :alt: ROC Curve for HistGradientBoostingClassifier :srcset: /auto_examples/technical_details/images/sphx_glr_plot_cache_mechanism_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 241-242 Be aware that we can clear the cache if we want to: .. GENERATED FROM PYTHON SOURCE LINES 243-246 .. code-block:: Python report.clear_cache() report._cache .. rst-class:: sphx-glr-script-out .. code-block:: none {} .. GENERATED FROM PYTHON SOURCE LINES 247-254 It means that nothing is stored anymore in the cache. Caching with :class:`~skore.CrossValidationReport` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :class:`~skore.CrossValidationReport` uses the same caching system for each split in cross-validation by leveraging the previous :class:`~skore.EstimatorReport`: .. GENERATED FROM PYTHON SOURCE LINES 255-260 .. code-block:: Python from skore import CrossValidationReport report = CrossValidationReport(model, X=df, y=y, splitter=5, n_jobs=4) report.help() .. rst-class:: sphx-glr-script-out .. code-block:: none ╭───────────── Tools to diagnose estimator HistGradientBoostingClassifier ─────────────╮ │ CrossValidationReport │ │ ├── .metrics │ │ │ ├── .accuracy(...) (↗︎) - Compute the accuracy score. │ │ │ ├── .brier_score(...) (↘︎) - Compute the Brier score. │ │ │ ├── .log_loss(...) (↘︎) - Compute the log loss. │ │ │ ├── .precision(...) (↗︎) - Compute the precision score. │ │ │ ├── .precision_recall(...) - Plot the precision-recall curve. │ │ │ ├── .recall(...) (↗︎) - Compute the recall score. │ │ │ ├── .roc(...) - Plot the ROC curve. │ │ │ ├── .roc_auc(...) (↗︎) - Compute the ROC AUC score. │ │ │ ├── .timings(...) - Get all measured processing times related │ │ │ │ to the estimator. │ │ │ ├── .custom_metric(...) - Compute a custom metric. │ │ │ └── .summarize(...) - Report a set of metrics for our estimator. │ │ ├── .cache_predictions(...) - Cache the predictions for sub-estimators │ │ │ reports. │ │ ├── .clear_cache(...) - Clear the cache. │ │ ├── .get_predictions(...) - Get estimator's predictions. │ │ └── Attributes │ │ ├── .X - The data to fit │ │ ├── .y - The target variable to try to predict in │ │ │ the case of supervised learning │ │ ├── .estimator - Estimator to make the cross-validation │ │ │ report from │ │ ├── .estimator_ - The cloned or copied estimator │ │ ├── .estimator_name_ - The name of the estimator │ │ ├── .estimator_reports_ - The estimator reports for each split │ │ ├── .ml_task - No description available │ │ ├── .n_jobs - Number of jobs to run in parallel │ │ ├── .pos_label - For binary classification, the positive │ │ │ class │ │ ├── .split_indices - No description available │ │ └── .splitter - Determines the cross-validation splitting │ │ strategy │ │ │ │ │ │ Legend: │ │ (↗︎) higher is better (↘︎) lower is better │ ╰──────────────────────────────────────────────────────────────────────────────────────╯ .. GENERATED FROM PYTHON SOURCE LINES 261-265 Since a :class:`~skore.CrossValidationReport` uses many :class:`~skore.EstimatorReport`, we will observe the same behaviour as we previously exposed. The first call will be slow because it computes the predictions for each split. .. GENERATED FROM PYTHON SOURCE LINES 266-271 .. code-block:: Python start = time.time() result = report.metrics.summarize().frame() end = time.time() result .. raw:: html

		HistGradientBoostingClassifier
		mean	std
Metric	Label / Average
Precision	allowed	0.437118	0.124285
Precision	disallowed	0.959834	0.005511
Recall	allowed	0.420696	0.100269
Recall	disallowed	0.951837	0.045050
ROC AUC		0.873744	0.033318
Brier score		0.064103	0.034532
Fit time (s)		16.164843	3.272836
Predict time (s)		2.113055	0.453458

.. GENERATED FROM PYTHON SOURCE LINES 272-274 .. code-block:: Python print(f"Time taken: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken: 11.47 seconds .. GENERATED FROM PYTHON SOURCE LINES 275-276 But the subsequent calls are fast because the predictions are cached. .. GENERATED FROM PYTHON SOURCE LINES 277-282 .. code-block:: Python start = time.time() result = report.metrics.summarize().frame() end = time.time() result .. raw:: html

		HistGradientBoostingClassifier
		mean	std
Metric	Label / Average
Precision	allowed	0.437118	0.124285
Precision	disallowed	0.959834	0.005511
Recall	allowed	0.420696	0.100269
Recall	disallowed	0.951837	0.045050
ROC AUC		0.873744	0.033318
Brier score		0.064103	0.034532
Fit time (s)		16.164843	3.272836
Predict time (s)		2.113055	0.453458

.. GENERATED FROM PYTHON SOURCE LINES 283-285 .. code-block:: Python print(f"Time taken: {end - start:.2f} seconds") .. rst-class:: sphx-glr-script-out .. code-block:: none Time taken: 0.00 seconds .. GENERATED FROM PYTHON SOURCE LINES 286-287 Hence, we observe the same type of behaviour as we previously exposed. .. rst-class:: sphx-glr-timing **Total running time of the script:** (1 minutes 13.409 seconds) .. _sphx_glr_download_auto_examples_technical_details_plot_cache_mechanism.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_cache_mechanism.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_cache_mechanism.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_cache_mechanism.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_

Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_Name

Dispute_Status_for_Publication

Name_of_Associated_Covered_Device_or_Medical_Supply1

Name_of_Associated_Covered_Drug_or_Biological1

Physician_Specialty

Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_Name

Dispute_Status_for_Publication

Name_of_Associated_Covered_Device_or_Medical_Supply1

Name_of_Associated_Covered_Drug_or_Biological1

Physician_Specialty

Please enable javascript

status

status

Please enable javascript