Anomaly of H2O AutoML grid-search


I tried to make a few observations on the h2o grid search and was trying to check how my models perform when I slowly increase my sample size for training. I have made a few observations and I would like to make a discussion regarding the same.

Below are the graphs:

  • Which were observed by slowly increasing the sample size from 10,000 to max available ~200,000

  • The next observation was made using an H2OGradientBoostingEstimator with a fixed set of hyperparameters. And the observations against the performance metric with a slow increase of samples are resulting as below.

From the above observations made, it seems the grid-search is not performing as expected. I will be glad to explain to you better if I know how to go further with this. Thanks.


Prabhu Subramanian
November 16, 2020, 12:31 AM

Hi @Erin and Team,

The below script runs through a given dataset and generates an array for observation. That helped me plot it separately on a Jupyter Notebook. I am providing you the below script for a head start. Pardon me for less commenting. But, Let me know if you or anyone handling it needs me to explain. But, for now, it is self-explanatory.


Prabhu Subramanian
October 18, 2020, 3:35 AM

Hi @Erin
Yes. I did see your comment. I am just taking a little while to generate those results in a well-explained notebook for you. While we do that, I remember you had replied to my Prof on how this works. So, we are making a few more tests by proportionally increasing the runtime to the seize of data we test with to check how it varies in the results. Is it ok, if I take a while to look into it and let you know on the results we observe?

Erin LeDell
October 15, 2020, 6:08 PM

Hey I was wondering if you saw my questions above? Thanks!

Erin LeDell
September 19, 2020, 5:01 AM

Hi can you share the code you used to generate & evaluate the different grid searches? I assume these are random grid searches? Here are things I would want to know from looking at the code:

  1. Are you looking at the grids produced inside AutoML, or using the grid search interface?

  2. Are you using a seed?

  3. Are you fixing the number of models for each grid search? Or is the number of models changing between the different grid searches (with different training sizes)?

  4. What type of problem is this (binary classification?)


New H2O Bugs

Fix versions


Prabhu Subramanian

Support ticket URL




Affected Spark version


Customer Request Type


Task progress






Epic Link


Affects versions