MedInc	HouseAge	AveRooms	AveBedrms	Population	AveOccup	Latitude	Longitude	MedHouseVal
0	8.3252	41.0	6.984127	1.02381	322.0	2.555556	37.88	-122.23	4.526
1	8.3014	21.0	6.238137	0.97188	2401.0	2.109842	37.86	-122.22	3.585

MedInc

HouseAge

AveRooms

AveBedrms

Population

AveOccup

Latitude

Longitude

MedHouseVal

8.3252

41.0

6.984127

1.02381

322.0

2.555556

37.88

-122.23

4.526

8.3014

21.0

6.238137

0.97188

2401.0

2.109842

37.86

-122.22

3.585

Please enable javascript

The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").

	MedInc	HouseAge	AveRooms	AveBedrms	Population	AveOccup	Latitude	Longitude	MedHouseVal
0	8.3252	41.0	6.984126984126984	1.0238095238095237	322.0	2.5555555555555554	37.88	-122.23	4.526
1	8.3014	21.0	6.238137082601054	0.9718804920913884	2401.0	2.109841827768014	37.86	-122.22	3.585
2	7.2574	52.0	8.288135593220339	1.073446327683616	496.0	2.8022598870056497	37.85	-122.24	3.521
3	5.6431	52.0	5.8173515981735155	1.0730593607305936	558.0	2.547945205479452	37.85	-122.25	3.413
4	3.8462	52.0	6.281853281853282	1.0810810810810811	565.0	2.1814671814671813	37.85	-122.25	3.422

20635	1.5603	25.0	5.045454545454546	1.1333333333333333	845.0	2.5606060606060606	39.48	-121.09	0.781
20636	2.5568	18.0	6.114035087719298	1.3157894736842106	356.0	3.1228070175438596	39.49	-121.21	0.771
20637	1.7	17.0	5.20554272517321	1.120092378752887	1007.0	2.325635103926097	39.43	-121.22	0.923
20638	1.8672	18.0	5.329512893982808	1.171919770773639	741.0	2.1232091690544412	39.43	-121.32	0.847
20639	2.3886	16.0	5.254716981132075	1.1622641509433962	1387.0	2.616981132075472	39.37	-121.24	0.894

Column	Column name	dtype	Unique values	Mean	Std	Min	Median	Max
0	MedInc	Float64DType	12928 (62.6%)	3.87	1.90	0.500	3.53	15.0
1	HouseAge	Float64DType	52 (0.3%)	28.6	12.6	1.00	29.0	52.0
2	AveRooms	Float64DType	19392 (94.0%)	5.43	2.47	0.846	5.23	142.
3	AveBedrms	Float64DType	14233 (69.0%)	1.10	0.474	0.333	1.05	34.1
4	Population	Float64DType	3888 (18.8%)	1.43e+03	1.13e+03	3.00	1.17e+03	3.57e+04
5	AveOccup	Float64DType	18841 (91.3%)	3.07	10.4	0.692	2.82	1.24e+03
6	Latitude	Float64DType	862 (4.2%)	35.6	2.14	32.5	34.3	42.0
7	Longitude	Float64DType	844 (4.1%)	-120.	2.00	-124.	-118.	-114.
8	MedHouseVal	Float64DType	3842 (18.6%)	2.07	1.15	0.150	1.80	5.00

Column 1	Column 2	Cramér's V	Pearson's Correlation
AveRooms	AveBedrms	0.862	0.902
Latitude	Longitude	0.501	-0.923
MedInc	MedHouseVal	0.313	0.678
Longitude	MedHouseVal	0.193	-0.0558
Latitude	MedHouseVal	0.191	-0.139
HouseAge	Longitude	0.172	-0.109
HouseAge	Population	0.143	-0.307
HouseAge	Latitude	0.111	0.00103
MedInc	Latitude	0.102	-0.0837
AveOccup	MedHouseVal	0.100	-0.228
HouseAge	MedHouseVal	0.0968	0.143
AveRooms	Latitude	0.0968	0.106
AveBedrms	Latitude	0.0943	0.0737
MedInc	AveOccup	0.0910	-0.0575
MedInc	HouseAge	0.0891	-0.113
AveOccup	Latitude	0.0891	-0.137
MedInc	Longitude	0.0867	-0.00558
AveOccup	Longitude	0.0865	0.141
AveRooms	Longitude	0.0821	-0.0311
HouseAge	AveOccup	0.0812	-0.0261

	Ridge
R²	0.591163
RMSE	0.735134
Fit time (s)	0.004233
Predict time (s)	0.000993

Ridge

Metric

R²

0.591163

RMSE

0.735134

Fit time (s)

0.004233

Predict time (s)

0.000993

	Coefficient
Intercept	2.074373
MedInc	0.831864
HouseAge	0.121025
AveRooms	-0.261571
AveBedrms	0.303819
Population	-0.008702
AveOccup	-0.029855
Latitude	-0.891545
Longitude	-0.863022

Coefficient

Intercept

2.074373

MedInc

0.831864

HouseAge

0.121025

AveRooms

-0.261571

AveBedrms

0.303819

Population

-0.008702

AveOccup

-0.029855

Latitude

-0.891545

Longitude

-0.863022

	Coefficient
Intercept	-36.573930
MedInc	0.439072
HouseAge	0.009606
AveRooms	-0.103240
AveBedrms	0.616257
Population	-0.000008
AveOccup	-0.004490
Latitude	-0.416969
Longitude	-0.430202

Coefficient

Intercept

-36.573930

MedInc

0.439072

HouseAge

0.009606

AveRooms

-0.103240

AveBedrms

0.616257

Population

-0.000008

AveOccup

-0.004490

Latitude

-0.416969

Longitude

-0.430202

Pipeline(steps=[('columntransformer', ColumnTransformer(remainder='passthrough', transformers=[('kmeans', KMeans(n_clusters=10, random_state=0), ['Latitude', 'Longitude'])])), ('splinetransformer', SplineTransformer(sparse_output=True)), ('polynomialfeatures', PolynomialFeatures(include_bias=False, interaction_only=True)), ('ridge', Ridge())])

Estimator	Vanilla Ridge	Ridge w/ feature engineering
R²	0.591163	0.726869
RMSE	0.735134	0.600865
Fit time (s)	0.004233	11.780685
Predict time (s)	0.000993	0.325554

Estimator

Vanilla Ridge

Ridge w/ feature engineering

Metric

R²

0.591163

0.726869

RMSE

0.735134

0.600865

Fit time (s)

0.004233

11.780685

Predict time (s)

0.000993

0.325554

	split	MedInc	HouseAge	AveRooms	AveBedrms	Population	AveOccup	Latitude	Longitude	squared_error	y_true	y_pred
18986	train	2.8357	17.0	5.308256	1.059254	2899.0	1.930093	38.37	-121.94	0.000462	1.201	1.222487
5896	train	3.4016	23.0	3.576316	1.126316	770.0	2.026316	34.16	-118.33	0.078490	2.346	2.626161
9088	train	4.3636	28.0	6.138021	1.062500	1162.0	3.026042	34.67	-118.22	0.001246	1.797	1.832302
255	train	2.3309	46.0	3.485876	1.059322	1183.0	3.341808	37.77	-122.21	0.216579	0.987	1.452381
13037	train	2.3333	42.0	4.931925	1.126761	945.0	2.218310	38.67	-121.18	0.003741	1.160	1.098840
9438	train	2.5551	12.0	5.676892	1.210634	1236.0	2.527607	37.43	-119.98	0.014709	1.050	0.928720
18410	train	4.3438	22.0	4.935750	1.144975	2169.0	3.573311	37.27	-121.81	0.046813	2.097	2.313364
9645	test	2.5950	29.0	5.635628	1.051282	2092.0	2.823212	37.05	-120.87	0.001240	1.042	1.006792
7288	train	3.1250	42.0	4.012821	0.916667	625.0	4.006410	33.98	-118.22	0.000219	1.663	1.677784
19543	train	3.6250	36.0	4.285714	0.857143	51.0	3.642857	37.64	-120.96	0.317231	0.675	1.238232

split

MedInc

HouseAge

AveRooms

AveBedrms

Population

AveOccup

Latitude

Longitude

squared_error

y_true

y_pred

18986

train

2.8357

17.0

5.308256

1.059254

2899.0

1.930093

38.37

-121.94

0.000462

1.201

1.222487

5896

train

3.4016

23.0

3.576316

1.126316

770.0

2.026316

34.16

-118.33

0.078490

2.346

2.626161

9088

train

4.3636

28.0

6.138021

1.062500

1162.0

3.026042

34.67

-118.22

0.001246

1.797

1.832302

255

train

2.3309

46.0

3.485876

1.059322

1183.0

3.341808

37.77

-122.21

0.216579

0.987

1.452381

13037

train

2.3333

42.0

4.931925

1.126761

945.0

2.218310

38.67

-121.18

0.003741

1.160

1.098840

9438

train

2.5551

12.0

5.676892

1.210634

1236.0

2.527607

37.43

-119.98

0.014709

1.050

0.928720

18410

train

4.3438

22.0

4.935750

1.144975

2169.0

3.573311

37.27

-121.81

0.046813

2.097

2.313364

9645

test

2.5950

29.0

5.635628

1.051282

2092.0

2.823212

37.05

-120.87

0.001240

1.042

1.006792

7288

train

3.1250

42.0

4.012821

0.916667

625.0

4.006410

33.98

-118.22

0.000219

1.663

1.677784

19543

train

3.6250

36.0

4.285714

0.857143

51.0

3.642857

37.64

-120.96

0.317231

0.675

1.238232

	left_column_name	left_column_idx	right_column_name	right_column_idx	cramer_v	pearson_corr
4	squared_error	9	y_pred	11	0.367433	0.161141
10	squared_error	9	y_true	10	0.167185	0.390676
13	AveOccup	6	squared_error	9	0.098006	0.057073
27	MedInc	1	squared_error	9	0.040225	0.035507
29	Longitude	8	squared_error	9	0.035152	0.017732
33	HouseAge	2	squared_error	9	0.032740	0.080408
37	Latitude	7	squared_error	9	0.028564	-0.077758
51	split	0	squared_error	9	0.018031	NaN
59	Population	5	squared_error	9	0.009687	-0.073801
60	AveRooms	3	squared_error	9	0.008547	-0.024258
62	AveBedrms	4	squared_error	9	0.004992	0.009436

left_column_name

left_column_idx

right_column_name

right_column_idx

cramer_v

pearson_corr

squared_error

y_pred

0.367433

0.161141

squared_error

y_true

0.167185

0.390676

AveOccup

squared_error

0.098006

0.057073

MedInc

squared_error

0.040225

0.035507

Longitude

squared_error

0.035152

0.017732

HouseAge

squared_error

0.032740

0.080408

Latitude

squared_error

0.028564

-0.077758

split

squared_error

0.018031

NaN

Population

squared_error

0.009687

-0.073801

AveRooms

squared_error

0.008547

-0.024258

AveBedrms

squared_error

0.004992

0.009436

Estimator	Vanilla Ridge	Ridge w/ feature engineering	Ridge w/ feature engineering and selection
R²	0.591163	0.726869	0.648411
RMSE	0.735134	0.600865	0.681724
Fit time (s)	0.004233	11.780685	7.421773
Predict time (s)	0.000993	0.325554	0.281035

Estimator

Vanilla Ridge

Ridge w/ feature engineering

Ridge w/ feature engineering and selection

Metric

R²

0.591163

0.726869

0.648411

RMSE

0.735134

0.600865

0.681724

Fit time (s)

0.004233

11.780685

7.421773

Predict time (s)

0.000993

0.325554

0.281035

Estimator	Vanilla Ridge	Ridge w/ feature engineering	Ridge w/ feature engineering and selection	Decision tree
R²	0.591163	0.726869	0.648411	0.583785
RMSE	0.735134	0.600865	0.681724	0.741737
Fit time (s)	0.004233	11.780685	7.421773	0.187554
Predict time (s)	0.000993	0.325554	0.281035	0.002226

Estimator

Vanilla Ridge

Ridge w/ feature engineering

Ridge w/ feature engineering and selection

Decision tree

Metric

R²

0.591163

0.726869

0.648411

0.583785

RMSE

0.735134

0.600865

0.681724

0.741737

Fit time (s)

0.004233

11.780685

7.421773

0.187554

Predict time (s)

0.000993

0.325554

0.281035

0.002226

Estimator	Vanilla Ridge	Ridge w/ feature engineering	Ridge w/ feature engineering and selection	Decision tree	Random forest
R²	0.591163	0.726869	0.648411	0.583785	0.794168
RMSE	0.735134	0.600865	0.681724	0.741737	0.521612
Fit time (s)	0.004233	11.780685	7.421773	0.187554	11.508939
Predict time (s)	0.000993	0.325554	0.281035	0.002226	0.125518

Estimator

Vanilla Ridge

Ridge w/ feature engineering

Ridge w/ feature engineering and selection

Decision tree

Random forest

Metric

R²

0.591163

0.726869

0.648411

0.583785

0.794168

RMSE

0.735134

0.600865

0.681724

0.741737

0.521612

Fit time (s)

0.004233

11.780685

7.421773

0.187554

11.508939

Predict time (s)

0.000993

0.325554

0.281035

0.002226

0.125518

	Repeat	Repeat #0	Repeat #1	Repeat #2	Repeat #3	Repeat #4
r2	MedInc	1.015997	1.065468	1.022420	1.021447	1.011064
HouseAge	0.018828	0.024312	0.020991	0.020678	0.021016
AveRooms	0.090388	0.089230	0.085344	0.086729	0.083356
AveBedrms	0.099849	0.098794	0.102469	0.104776	0.107545
Population	-0.000181	-0.000103	-0.000181	-0.000023	-0.000065
AveOccup	0.003671	0.006451	0.007217	0.005760	0.005956
Latitude	1.229644	1.155974	1.208524	1.193838	1.206576
Longitude	1.097392	1.103421	1.136658	1.137157	1.116989

Repeat

Repeat #0

Repeat #1

Repeat #2

Repeat #3

Repeat #4

Metric

Feature

MedInc

1.015997

1.065468

1.022420

1.021447

1.011064

HouseAge

0.018828

0.024312

0.020991

0.020678

0.021016

AveRooms

0.090388

0.089230

0.085344

0.086729

0.083356

AveBedrms

0.099849

0.098794

0.102469

0.104776

0.107545

Population

-0.000181

-0.000103

-0.000181

-0.000023

-0.000065

AveOccup

0.003671

0.006451

0.007217

0.005760

0.005956

Latitude

1.229644

1.155974

1.208524

1.193838

1.206576

Longitude

1.097392

1.103421

1.136658

1.137157

1.116989

	steps	[('columntransformer', ...), ('splinetransformer', ...), ...]
	transform_input	None
	memory	None
	verbose	False

	transformers	[('kmeans', ...)]
	remainder	'passthrough'
	sparse_threshold	0.3
	n_jobs	None
	transformer_weights	None
	verbose	False
	verbose_feature_names_out	True
	force_int_remainder_cols	'deprecated'

	n_clusters	10
	init	'k-means++'
	n_init	'auto'
	max_iter	300
	tol	0.0001
	verbose	0
	random_state	0
	copy_x	True
	algorithm	'lloyd'

	n_knots	5
	degree	3
	knots	'uniform'
	extrapolation	'constant'
	include_bias	True
	order	'C'
	sparse_output	True

	alpha	1.0
	fit_intercept	True
	copy_X	True
	max_iter	None
	tol	0.0001
	solver	'auto'
	positive	False
	random_state	None