Change HGLM interface according to Erin suggestion

Description

hey @wendy i noticed that we have some weird default values in R for rand_family and rand_link in GLM. I think you were trying to default to a list with one element, e.g. c("[gaussian]") but instead, in R, that just became a string that looks like a python list. i think this needs to be fixed… in the gen_R.py file.
currently:
rand_family = c("[gaussian]"),
rand_link = c("[identity]", "[family_default]"),
Is it always going to just be one value? if so we can simply use:
Option 1:
rand_family = c("gaussian"),
rand_link = c("identity", "family_default"),
somehow, this is already supported, so there’s nothing to do here except change the R bindings to specify the defaults differently:
gg <- h2o.glm(y = 1, training_frame = as.h2o(iris), rand_family = "[gaussian]")

==========================================================================

100%

==========================================================================

100%
> gg <- h2o.glm(y = 1, training_frame = as.h2o(iris), rand_family = "gaussian")

==========================================================================

100%

==========================================================================

100%
Option 2. or do we need to support passing multiple values (now or in the future)? (edited)

erin 10:27 PM
while it seems like Option 1 makes the most sense (assuming we only pass 1 value at a time), it doesn’t match up with python, which expects a list. this will fail if i try to pass a string, so i am curious why the list needs to be there:
H2OTypeError: Argument `rand_family` should be a ?list(Enum["gaussian"]), got string gaussian
(edited)

erin 10:33 PM
currently must do this:
h2o_glm = H2OGeneralizedLinearEstimator(HGLM=True,
family="gaussian",
rand_family=["gaussian"],
random_columns=z,
rand_link=["identity"],
calc_like=True)

Activity

Show:
Wendy
February 22, 2021, 5:40 PM

@erin: so is there only one value that can be passed at a time?  or do we need the input to be an array/list?

@wendy
I looked at GLMV3.java and it seems that rand_family and rand_link are both arrays.

Assignee

New H2O Bugs

Fix versions

None

Reporter

Erin LeDell

Support ticket URL

None

Labels

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

ReleaseNotesHidden

None

CustomerVisible

No

Priority

Major