GBM ModelMetrics, airlines_all (8 machines)*** Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122

Description

thought I'd try some multi-machine

I did a git clone on mr-0xd10 and built, so it's head of master

can run this from any machine as it copies the jars to the machines (mr-0xd2 thru mr-0xd10)

(one warning, since I use h2o.py, have to uninstall any h2o python package you installed. I probably need to rename my h2o.py)

using airlines_all from the usual /home/0xdiag/datasets on each machine

seems to past the training...the progress advances to 1.0 while polling

I did it twice, failed both times

The last h2o request is ModelMetrics (it finished training, then did Models.json, then Frames.json, then ModelMetrics.json)

2015-02-25 01:37:53.805546 – Start http://172.16.2.189:54321/3/ModelMetrics.json/models/GBMModelKey/frames/airlines_all.hex # None;

not sure if it does the same thing with fewer machines.

cd h2o-dev/py2/testdir_single_jvm
python test_GBM_airlines.py -cj ../testdir_hosts/pytest_config-182-190.json

======================================================================
ERROR: test_GBM_airlines (_main_.Basic)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_GBM_airlines.py", line 8, in tearDown
h2o.check_sandbox_for_errors()
File "../h2o_test.py", line 254, in check_sandbox_for_errors
python_test_name=python_test_name)
File "../h2o_sandbox.py", line 289, in check_sandbox_for_errors
raise Exception(errorMessage)
Exception: check_sandbox_for_errors: Errors in sandbox stdout or stderr (or R stdout/stderr).
Could have occurred at any prior time

water.DException$DistributedException: from /172.16.2.187:54321; by class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: *** Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122
at water.RPC.get(RPC.java:252)
at water.TaskGetKey.get(TaskGetKey.java:28)
02-25 01:29:55.792 172.16.2.186:54321 27724 # Session WARN: Caught exception: water.DException$DistributedException: from /172.16.2.186:54321; by class water.KeySnapshot$GlobalUKeySetTask; class water.DException$DistributedException: from /172.16.2.187:54321; by class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: *** Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122; Stacktrace: [water.MRTask.getResult(MRTask.java:265), water.MRTask.doAll(MRTask.java:295), water.MRTask.doAllNodes(MRTask.java:287), water.KeySnapshot.globalSnapshot(KeySnapshot.java:234), water.KeySnapshot.globalSnapshot(KeySnapshot.java:221), water.api.ModelMetricsHandler$ModelMetricsList.fetch(ModelMetricsHandler.java:22), water.api.ModelMetricsHandler.fetch(ModelMetricsHandler.java:142), water.api.ModelMetricsHandler.score(ModelMetricsHandler.java:155), sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method), sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57), sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43), java.lang.reflect.Method.invoke(Method.java:606), water.api.Handler.handle(Handler.java:57), water.api.RequestServer.handle(RequestServer.java:602), water.api.RequestServer.serve(RequestServer.java:560), water.NanoHTTPD$HTTPSession.run(NanoHTTPD.java:433), java.lang.Thread.run(Thread.java:745)] at water.DKV.get(DKV.java:210)

at water.DKV.get(DKV.java:168)
at water.Key.get(Key.java:84)
at water.fvec.Frame.vecs_impl(Frame.java:246)
at water.fvec.Frame.vecs(Frame.java:232)
at water.fvec.Frame.anyVec(Frame.java:208)
at water.KeySnapshot$KeyInfo.<init>(KeySnapshot.java:52)
at water.KeySnapshot.localSnapshot(KeySnapshot.java:212)
at water.KeySnapshot$GlobalUKeySetTask.setupLocal(KeySnapshot.java:249)
at water.MRTask.setupLocal0(MRTask.java:339)
at water.MRTask.dinvoke(MRTask.java:282)
at water.RPC$RPCCall.compute2(RPC.java:333)
at water.H2O$H2OCountedCompleter.compute(H2O.java:582)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:429)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
java.lang.AssertionError
at water.AutoBuffer.<init>(AutoBuffer.java:132)
at water.RPC.response(RPC.java:572)
at water.UDPAck.call(UDPAck.java:17)
at water.FJPacket.compute2(FJPacket.java:21)
at water.H2O$H2OCountedCompleter.compute(H2O.java:582)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:429)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

----------------------------------------------------------------------

Your pinned fields
Click on the next to a field label to start pinning.

Assignee

New H2O Bugs

Reporter

Kevin Normoyle

CustomerVisible

No