Sparkling Water pipelines add duplicate response column to the list of features
Description
When creating a (GBM or other algo) stage for a spark pipeline, the prediction col should be ignored by default.
Activity
Show:
Jakub Hava
April 26, 2018, 3:05 PM
This is fixed as part of . H2O correctly ignores response column if it's part of the features, however we in the pipelines, we were adding the response column to the list of existing features in all cases. That lead to behavior that H2O created a new column(because 2 columns can exists with the same name) and was used normally for training purposes which is unwanted behavior
Fixed
Assignee
Reporter
Labels
None
CustomerVisible
No
testcase 1
None
testcase 2
None
testcase 3
None
h2ostream link
None
Affected Spark version
None
AffectedContact
None
AffectedCustomers
None
AffectedPilots
None
AffectedOpenSource
None
Support Assessment
None
Customer Request Type
None
Support ticket URL
None
End date
None
Baseline start date
None
Baseline end date
None
Task progress
None
Task mode
None
ReleaseNotesHidden
None