GLM raises ArrayIndexOutOfBoundsException with CV+lambda_search+alphas+categorical columns

Description

The exception is pretty rare but reproducible with the attached test file.

Activity

Show:
Wendy
7 days ago

First I would like to thank . His tests are what enabled me to locate the issues. I have added his tests to the PR.

Wendy
7 days ago

Here is my diagnostics of the problem:

During normal GLM model building, the coefficient length can shrink when coefficients/gram matrix has zero rows/columns. Since betaCnd is allocated at the beginning of iteration loop and the coefficient length change happened within the iteration loop, there can be a discrepancy in the coefficient lengths. Normally, this is not a problem because the action of betaCnd = ADMM_solve() or other solvers. But, in this case, that call is skipped. Hence, you will get betaCnd of one length and _state.beta() of another length. My fix is to make sure when there is a length difference, I will extract the correct coefficients from betaCnd such that it will be of the same length as _state.beta().

Assignee

Wendy

Fix versions

None

Reporter

Sebastien Poirier

Support ticket URL

None

Labels

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

ReleaseNotesHidden

None

CustomerVisible

No

Components

Priority

Major
Configure