Error when trying to use a fold column when number of folds < official number of levels in that column
Description
I am trying to do to a pretty standard thing in ML and i am getting an error.
task:
there’s a “cv” categorical column, which has 5 values (5-folds)
i subset the frame by the cv column, to make train (1-4) and test (5)
now i try to train a h2o.glm using train and i want to do 4-fold CV here using the 4 folds i have left, using the fold_column argument.
however there’s an error in h2o.glm because its mad that train$cv says it has 5 levels, but only 4 are represented in the dataset. ive confimed this because it works if i use the original dataset with all 5 folds.
i can’t find a way to re-level the frame to tell it that cv column only has 4 levels. h2o.setLevels() is just a re-naming tool but you cant change the cardinality of the domain.
C
an we relax this restriction on fold_column in H2O algos?