java.lang.ArrayIndexOutOfBoundsException: Index 1684 out of bounds for length 1684 when using deeplearning in ensemble

Description

raise EnvironmentError("Job with key {} failed with an exception: {}\nstacktrace: "
OSError: Job with key $03017f00000132d4ffffffff$_90f979146e9d13e0fa230dc8b964786 failed with an exception: DistributedException from /127.0.0.1:54321: 'Index 1684 out of bounds for length 1684', caused by java.lang.ArrayIndexOutOfBoundsException: Index 1684 out of bounds for length 1684
stacktrace:
DistributedException from /127.0.0.1:54321: 'Index 1684 out of bounds for length 1684', caused by java.lang.ArrayIndexOutOfBoundsException: Index 1684 out of bounds for length 1684
at water.MRTask.getResult(MRTask.java:494)
at water.MRTask.getResult(MRTask.java:502)
at water.MRTask.doAll(MRTask.java:397)
at water.MRTask.doAll(MRTask.java:403)
at hex.Model.predictScoreImpl(Model.java:1784)
at hex.Model.score(Model.java:1618)
at water.api.ModelMetricsHandler$1.compute2(ModelMetricsHandler.java:403)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1575)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1684 out of bounds for length 1684
at hex.genmodel.GenModel.setCats(GenModel.java:707)
at hex.genmodel.GenModel.setInput(GenModel.java:686)
at hex.genmodel.algos.deeplearning.DeeplearningMojoModel.score0(DeeplearningMojoModel.java:70)
at hex.genmodel.algos.deeplearning.DeeplearningMojoModel.score0(DeeplearningMojoModel.java:158)
at hex.genmodel.algos.ensemble.StackedEnsembleMojoModel.score0(StackedEnsembleMojoModel.java:39)
at hex.generic.GenericModel.score0(GenericModel.java:93)
at hex.Model.score0(Model.java:1992)
at hex.Model.score0(Model.java:1959)
at hex.Model$BigScore.score0(Model.java:1903)
at hex.Model$BigScore.map(Model.java:1881)
at water.MRTask.compute2(MRTask.java:675)
at water.H2O$H2OCountedCompleter.compute1(H2O.java:1578)
at hex.Model$BigScore$Icer.compute1(Model$BigScore$Icer.java)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1574)
... 5 more

Activity

Show:
Tomas Fryda
January 18, 2021, 3:37 PM

Thank you for your cooperation! I found the issue hopefully the fix will be in the next release. The problem was with fold column handling, since the fold column is the last column of your dataset, I think you can workaround it by modifying the mojo (if you didn’t find any other way):
1) unpack the mojo
2) open the top-level model.ini
3) modify line 11 n_features = 1685 => n_features = 1684 and save
4) compress it again

This worked on iris dataset, hopefully it will work on yours too but if you will use the this workaround please make sure the predictions are the same, for example:

Hassan Hawilo
January 15, 2021, 7:43 PM

Done

Many Thanks!

Tomas Fryda
January 15, 2021, 7:39 PM

That would be great! tomas.fryda@h2o.ai
Thanks!

Hassan Hawilo
January 15, 2021, 6:50 PM

if you can send me a link or email to share the model and a prediction row csv file privately would be appreciated

Hassan Hawilo
January 15, 2021, 6:48 PM

Can share with you the model and a prediction row that can produce the error

Fixed

Assignee

Tomas Fryda

Fix versions

Reporter

Hassan Hawilo

Support ticket URL

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

ReleaseNotesHidden

None

CustomerVisible

No

Components

Affects versions

Priority

Critical