Support parallel training (e.g. spark_apply in rsparkling, or Python/R)


Feature to fit a model by group in h2o using some type of distributed apply function.

Here’s an example using of what it’d look like using spark_apply:

Current workaround is in spark, loop through categories and pulling back spark dataframes by category, and then fit a model.


Michal Kurka
March 2, 2020, 6:34 PM

Formulated for rsparkling, but it will be exposed in H2O-3 in both Python & R.

Joseph Granados
October 16, 2019, 3:25 PM

instead of spark_apply, create a bulk algo execution function conceptually similar to the grid search API in h2o3 - take the algo name and parameters, grouping columns and train multiple models.



Michal Kurka

Fix versions


Joseph Granados

Support ticket URL