Intermittent failure in creating H2O cloud

Description

Though I get a cloud eventually, it still does fail now and then.
So I'll make a habit out of reporting the stack traces, so you know it still has rough edges.

The incantation code:

from pysparkling import *
conf = (H2OConf(sc)
.use_auto_cluster_start()
.set_yarn_queue("spark-analytics")
.set_num_of_external_h2o_nodes(8)
.set_mapper_xmx("10G")
)

context = H2OContext.getOrCreate(sc, conf)

Many times this works, but today I got:

Py4JJavaError: An error occurred while calling z:org.apache.spark.h2o.JavaH2OContext.getOrCreate.
: java.io.FileNotFoundException: notify_sparkling-water-bteeuwen_155098482 (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at scala.io.Source$.fromFile(Source.scala:91)
at scala.io.Source$.fromFile(Source.scala:76)
at scala.io.Source$.fromFile(Source.scala:54)
at org.apache.spark.h2o.backends.external.ExternalH2OBackend.launchH2OOnYarn(ExternalH2OBackend.scala:75)
at org.apache.spark.h2o.backends.external.ExternalH2OBackend.init(ExternalH2OBackend.scala:109)
at org.apache.spark.h2o.H2OContext.init(H2OContext.scala:102)
at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:279)
at org.apache.spark.h2o.H2OContext.getOrCreate(H2OContext.scala)
at org.apache.spark.h2o.JavaH2OContext.getOrCreate(JavaH2OContext.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)

I reran the code, and it worked.

Activity

Show:
Jakub Hava
March 8, 2017, 11:09 AM
Edited

Marking this problem as fixed. If there's ever problem with creation of h2o cloud in automatic mode in YARN, we can see it in logs which wasn't supported before and new issue should be reported in that case

Jakub Hava
March 7, 2017, 9:01 AM

This problem probably means that for some reason h2o cluster couldn't be started, thus notify file wasn't created and that's why we see this exception.

The solution at least for know is to ensure transparent logging of h2o cluster startup in sparkling-water application. Then we can see why h2o failed to start up and can investigate further.

Avkash Chauhan
January 16, 2017, 8:48 PM

#90282 (https://support.h2o.ai/helpdesk/tickets/90282) - creating h2o cloud fails

Fixed

Assignee

Jakub Hava

Reporter

Avkash Chauhan

Labels

None

CustomerVisible

No

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

Platform Issue

Customer Request Type

Support Incident

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

ReleaseNotesHidden

None

Fix versions

Priority

Major