Integrate XGBoost in Sparkling Water

Description

We want to have H2O's XGBoost available in SW as well. XGBoost should be exposed just like the other algos.

We will have to document how to set memory requirements and configure Spark. XGBoost will allocate off-heap memory, this is an issue on Hadoop and the containers need to get proper memory configuration.

It would be good to have (real-life deployment) tests of this functionality because integrating XGBoost can be tricky.

Activity

Show:
Michal Kurka
August 7, 2018, 8:07 PM

cc:

Michal Kurka
August 7, 2018, 8:08 PM

can you please plan this into a release?

Jakub Hava
August 7, 2018, 8:12 PM
Edited

Sure!, at this point I think it just requires testing it out and documenting the memory config as you mentioned. People can use it the same way as they are used to in H2O.

We can also expose the XGBoost as Sparkling Water pipeline Stage, but I would put that into another JIRA

Fixed

Assignee

Jakub Hava

Reporter

Michal Kurka

Labels

None

CustomerVisible

No

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

None

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

None

Customer Request Type

None

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

ReleaseNotesHidden

None

Fix versions

Priority

Major
Configure