Exception when there is a column with BOOLEAN type in dataset during H2OMOJOModel transformation

Description

When I try to get prediction for some Spark dataset which contains column with BOOLEAN type an exception is thrown:
Caused by: hex.genmodel.easy.exception.PredictUnknownCategoricalLevelException: Unknown categorical level (IsArrDelayed,1)
at hex.genmodel.easy.EasyPredictModelWrapper.fillRawData(EasyPredictModelWrapper.java:868)
at hex.genmodel.easy.EasyPredictModelWrapper.predict(EasyPredictModelWrapper.java:890)
at hex.genmodel.easy.EasyPredictModelWrapper.preamble(EasyPredictModelWrapper.java:756)
at hex.genmodel.easy.EasyPredictModelWrapper.predictBinomial(EasyPredictModelWrapper.java:501)
at hex.genmodel.easy.EasyPredictModelWrapper.predictBinomial(EasyPredictModelWrapper.java:489)
at hex.genmodel.easy.EasyPredictModelWrapper.predict(EasyPredictModelWrapper.java:300)

I investigated this issue and I can say that problem in this code blocks:

H2OMOJOModel.rowToRowData
case BooleanType =>
if (row.getBoolean(idxRow)) put(f.name, 1.toString) else put(f.name, 0.toString)

We see that original value is converted to 0 or 1

But here

EasyPredictModelWrapper.fillRawData

else {
// Column has categorical value.
Object o = data.get(dataColumnName);
double value;
if (o instanceof String) {
String levelName = (String) o;
HashMap<String, Integer> columnDomainMap = domainMap.get(index);
Integer levelIndex = columnDomainMap.get(levelName);
if (levelIndex == null) {
levelIndex = columnDomainMap.get(dataColumnName + "." + levelName);
}
...

When this line code is executed
Integer levelIndex = columnDomainMap.get(levelName);
levelIndex becomes null because the keys in columnDomainMap contains values from original dataset without transformation (true -> 1, false -> 0)

Some debug info:
levelName = "1"
columnDomainMap = {HashMap@10059} size = 2
0 = {HashMap$Node@10079} "false" -> "0"
1 = {HashMap$Node@10080} "true" -> "1"

Activity

Show:
Alex Denisenko
March 13, 2019, 9:05 PM
Edited

Hi . Yes, sure. For example I have MOJO model

, csv

data set and I run such a code

Sorry for Kotlin
In this case I specifically transform a field into a boolean type because csv file doesn't save types but for example parquet file or table in database saves (I originally reproduced this bug on the parquet file). This is just an example

Jakub Hava
March 23, 2019, 12:13 PM

Thank you for the details. Will have a look

Jakub Hava
March 26, 2019, 12:34 PM

Thanks, can reproduce.

For reference, code to reproduce in sparkling-shell:

Jakub Hava
March 26, 2019, 1:05 PM

implemented fix, waiting for tests to pass https://github.com/h2oai/sparkling-water/pull/1109

Alex Denisenko
March 31, 2019, 7:24 PM

Thank you very much

Assignee

Jakub Hava

Reporter

Alex Denisenko

Labels

None

CustomerVisible

No

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

None

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

None

Customer Request Type

None

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

ReleaseNotesHidden

None

Fix versions

Priority

Critical
Configure