With cross-validation enabled, nobs of training and cross-validation metrics should be different unless nfold=2

Description

I have run the following pyunit test: pyunit_cv_cars_gbm.py and nfold = 5.

The training metrics look like this:

The cross-validation metrics look like this:

Note that the nobs (number of observation) should be different in this case but are not.

Activity

Show:
Wendy
February 9, 2021, 12:00 AM

Implementation is correct. Carbon interface interpretation is incorrect.

Wendy
February 8, 2021, 11:59 PM

Thanks to @michalk. The cv metric is an aggregate of all the folds. Hence, the nobs will be the same as the training metric which is from the final model trained over all the data.

 

I have dug into each CV model and noted the following from GAM:

 

 

 

So everything works fine.

Won't Do

Assignee

New H2O Bugs

Fix versions

None

Reporter

Wendy

Support ticket URL

None

Labels

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

ReleaseNotesHidden

None

CustomerVisible

No

Priority

Major