Sparkling Water pipelines add duplicate response column to the list of features

Description

When creating a (GBM or other algo) stage for a spark pipeline, the prediction col should be ignored by default.

Activity

Show:
Jakub Hava
April 26, 2018, 3:05 PM

This is fixed as part of . H2O correctly ignores response column if it's part of the features, however we in the pipelines, we were adding the response column to the list of existing features in all cases. That lead to behavior that H2O created a new column(because 2 columns can exists with the same name) and was used normally for training purposes which is unwanted behavior

Fixed

Assignee

Jakub Hava

Reporter

Stefan Pacinda

Labels

None

CustomerVisible

No

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

None

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

None

Customer Request Type

None

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

ReleaseNotesHidden

None

Fix versions

Priority

Major
Configure