Limit impact of high cardinality features in deep learning in AutoML

Description

AutoML can have hard time with datasets with high cardinality columns, e.g., Albert[1]. One of the reasons is DeepLearning that one-hot encodes the dataset yielding over 1M columns.

[1] https://www.openml.org/d/41147

Assignee

Tomas Fryda

Fix versions

Reporter

Tomas Fryda

Support ticket URL

None

Labels

Affected Spark version

None

Customer Request Type

None

Task progress

None

ReleaseNotesHidden

None

CustomerVisible

No

Priority

Major