Fix adaptFrameToTrain to support TE with non-AUTO encoders when using validation frame

Description

adaptFrameToTrain can encode Frame based on conditions unknown to the method’s caller, making it impossible to decide when TE should be applied before.
This is a problem only when using non-AUTO categorical encoder (in this case, adaptFrameToTrain will never encode the frame), and AutoML is currently only supporting AUTO.

Suggestion

Inject the encoder to this method as as a functional param instead of being guessed internally. This would allow to inject an encoder that does both TE + categorical encoding.

Tried it, but doesn’t work with all algos, looks like the right approach though.

Activity

Show:
Erin LeDell
September 19, 2020, 6:59 AM

Is this a blocker for 3.32.0.1, or is it ok because AutoML currently supports AUTO?

Sebastien Poirier
September 19, 2020, 2:57 PM

it’s a follow-up ticket (cf. fix version set to 3.32.0.x), it doesn’t impact TE in AutoML as soon as AutoML keeps using AUTO as categorical encoding.

Assignee

New H2O Bugs

Fix versions

Reporter

Sebastien Poirier

Support ticket URL

None

Labels

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

ReleaseNotesHidden

None

CustomerVisible

No

Components

Priority

Major
Configure