Add support for Spark Dynamic Allocation
Description
For us not having Spark dynamic allocation is a major obstacle for Sparkling Water
deployment in a multi-tenant environment.
Would be awesome to have this fixed.
We looked at Sparkling Water last year, and we hoped we could start using SW once this is available.
Sparkling Water could just register for LiveListenerBus events as it has events
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-SparkListener.html
onExecutorAdded / onExecutorRemoved to scale SW's memory structures?
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-LiveListenerBus.html
Hope to see this fixed soon - we're excited to become SW users/customers but can't have it without Dynamic Allocation working.
Activity
Just noticed that this got unassigned, is this still in the future of Sparkling Water? It may determine whether we can stay in the H2O ecosystem long-term.
Created for the latter part (Arrow serialization)
FYI - Spark 2.3 supports dataframes based on Apache Arrow serialization https://issues.apache.org/jira/browse/SPARK-13534
This might be easier if H2O would use Apache Arrow for its frames, then in the future it would be possible to support Spark Dynamic Allocation?
Arrow was created to address this zero-copy need between different frameworks https://arrow.apache.org/
Arrow: All systems utilize the same memory format; No overhead for cross-system communication
I don't know h2o architecture and probably oversimplifying here.
Thank you for feedback, in meantime we designed "external" cluster deployment of Sparkling Water. It separates life-cycle of Spark driver and H2O cluster. But still H2O cluster needs to be deployed in no-elastic environment.