getFeaturesCols() should not return the fold column or weight column

Description

Currently the `getFeaturesCols` will return the fold column and the weights column. We should update this method so that it only returns columns that are used to train on.

Activity

Show:
Jakub Hava
April 26, 2019, 4:11 AM

It should provably print warning, letting the user know about this behavior.

Also, from data science point of view, , can you verify that excluding these 2 columns for scoring does make sense? I don't have the DS background. Thanks!

Lauren DiPerna
April 26, 2019, 8:38 PM

hi yes I think it makes sense to exclude these two columns from a DS perspective, because the fold column and weights column shouldn't be used as features during training, and the fold column in particular shouldn't be required during scoring. hope this helps!

Jakub Hava
April 26, 2019, 8:44 PM

Cool! thank you . This change will overlap with big API clean up introduced by ( will go into major release)

Marek Novotny
April 29, 2019, 6:35 PM
Fixed

Assignee

Marek Novotny

Reporter

Lauren DiPerna

Labels

None

CustomerVisible

No

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

None

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

None

Customer Request Type

None

Support ticket URL

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

ReleaseNotesHidden

None

Fix versions

Priority

Major