Support parallel training (e.g. spark_apply in rsparkling, or Python/R)
Feature to fit a model by group in h2o using some type of distributed apply function.
Here’s an example using of what it’d look like using spark_apply:
Current workaround is in spark, loop through categories and pulling back spark dataframes by category, and then fit a model.
Formulated for rsparkling, but it will be exposed in H2O-3 in both Python & R.
instead of spark_apply, create a bulk algo execution function conceptually similar to the grid search API in h2o3 - take the algo name and parameters, grouping columns and train multiple models.