H2OFrame in Python is adding additional duplicate rows to the Pandas DataFrame
Description
When converting a Pandas dataframe to a H2o frame using the h2o.H2OFrame() function an error is occuring.
Additional rows are being created in the H2o Frame. When I looked into this, it appears the new rows are duplicates of other rows. Depending on the data size the number of duplicate rows added varies, but typically around 2-10.
Code:
train_h2o = h2o.H2OFrame(python_obj=train_df_complete)
print(train_df_complete.shape[0])
print(train_h2o.nrow)
Output:
3871998
3872000
Activity
Show:
Michal Raška
August 28, 2017, 7:23 AM
Can you please specify on which frame it occurs? I cannot reproduce it and I've tried many of the smalldata datasets. Tried Python 3.5 and 3.6. Thanks