Improve memory efficiency of H2OMOJOPipelineModel

Description

Can we cache loaded MOJO models in memory to avoid duplication if `H2OMOJOPipelineModel` transformer is instantiated multiple times?

Actions:
1) verify how many times the transformer is instantiated if there is n-partitions on an executor
2) introduce a local cache (WeakReference) of for loaded MOJO models

CC:

Activity

Show:
Nidhi Mehta
April 12, 2019, 10:27 PM

#94138 (https://support.h2o.ai/a/tickets/94138) - Re: Deploying MOJO on Spark

Jakub Hava
April 15, 2019, 2:36 PM

We have this code, but after testing it today I can verify it does not work as expected. the reader back end is created every time the prediction is done. Will fix, for start

Jakub Hava
April 23, 2019, 7:19 PM

If we put print statement into getOrCreateModel we see it is being created all over again.

First step is to create some sort of registry which is local to executor and ensures the mojo bytes does not have to be serialized and deserialized and new instance created

Jakub Hava
April 25, 2019, 2:17 PM

Created first implementation which avoids serializing the mojo and creating new instance for each row.

We should however investigate why this was happening in the first place. Putting this change to release so the user can try it as soon as possible

Fixed

Assignee

Jakub Hava

Reporter

Michal Malohlava

Labels

None

CustomerVisible

No

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

None

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

None

Customer Request Type

None

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

ReleaseNotesHidden

None

Fix versions

Priority

Critical
Configure