Expose one-hot encoding to H2OFrame operations

Description

Requesting a method on a H2ODataframe with one categorical column that outputs a multi column dataframe (one column for each unique category) and a 0/1 value for that row.

Activity

Show:
Michal Kurka
October 15, 2019, 11:24 PM

this function is GLM-specific - it might produce a different order of columns than how xgboost sees the frame - which might be important in some cases

Erin LeDell
October 15, 2019, 11:11 PM

just pointed out that R has a (private, non-exported) function which already does this FYI .getExpanded:

Erin LeDell
October 15, 2019, 10:59 PM

Let’s revisit this. Lots of interest on Stack Overflow, especially by people using interpretability methods.

 

Michal Malohlava
March 10, 2017, 12:55 AM

It will be closed, when issue hit the master.

Vlad Patryshev
February 17, 2017, 10:14 PM

It's in branch vlad_PUBDEV_3955, waiting for Michal's approval.

Assignee

Vlad Patryshev

Fix versions

None

Reporter

Mark Chan

Support ticket URL

None

Labels

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

ReleaseNotesHidden

None

CustomerVisible

No

AffectedCustomers

Sprint

None

Priority

Critical