Verify & Document run of RSparkling on top of Databricks Azure cluster

Description

None

Activity

Show:
Jakub Hava
January 12, 2018, 2:12 PM

I wanted to try uploading the dependency manually, there is either bug in databricks or our maven packages are wrong somehow. This also happens for different packages

See the specified package

The error

Jakub Hava
January 12, 2018, 2:12 PM
Edited

This happens for all sparkling-water packages and different packages as well, and for the most h2o packages as well

Jakub Hava
January 19, 2018, 11:47 AM
Edited

So current state and steps
1) Log in to Azure
2) Create Data bricks Azure environment
3) Create SW Library
-> Go to libraries, Upload library as JAR and upload Sparkling Water Assembly JAR - for example SW 2.2.7. ( can be download from our page)


You can configure the library to be automatically attached to the newly started clusters. It is probably better if you are attaching library to already running cluster to restart the cluster to make sure it has clean and correct state.
4) Create Cluster in Databricks, and make sure this library is attached to the cluster
Also configure additional properties on the cluster, add this line to the spark configuration field:
spark.ui.enabled false

And then create the cluster
(RSparkling does not support passing configuration to H2OContext - see https://0xdata.atlassian.net/browse/SW-684, so we need to configure it manually like this) The reason why we need to set this configuration is explained here https://0xdata.atlassian.net/browse/SW-683


Important -> Make sure to create cluster with Spark 2.2.0, as SparklyR probably still does not support 2.2.1
If there is error in creating, read the error description - it is probably because we configured too much cores and there are limits set up by Microsoft and Databricks at this point
6) Create R notebook in it, attach it to the cluster
7) The notebook content should be:

And Voala, we should have Sparkling Water via RSparkling running on Databricks Azure

I used one node because of the core limitations, but that shouldn't change anything.

Jakub Hava
January 19, 2018, 1:52 PM

I think it can be said that this issue is resolved, since it works with small workaround. The core issues are different:
-> can't upload dependency as maven package - https://0xdata.atlassian.net/browse/SW-672 - that is why we need to upload it as JAR
-> when uploading the jar, we need to disable spark ui, see why here -> https://0xdata.atlassian.net/browse/SW-683

Jakub Hava
January 19, 2018, 1:52 PM

Still needs to be documented

Fixed

Assignee

Jakub Hava

Reporter

Michal Malohlava

Labels

None

CustomerVisible

No

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

None

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

None

Customer Request Type

None

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

ReleaseNotesHidden

None

Fix versions

Priority

Major
Configure