XGBoost gets stuck with 50+ executors (instead of failing outright)
Description
See support ticket for details: https://support.h2o.ai/a/tickets/98339
Activity
Show:
Neema Mashayekhi
February 9, 2021, 4:52 AM
Neema Mashayekhi
February 9, 2021, 4:51 AM
User resolved issue by upgrading to 3.32.0.3, which had XGBoost upgrade to 1.2 and fix ( )
Jan Sterba
January 6, 2021, 1:56 PM
was not able to reproduce on 3.30.1.x, suggesting upgrade since the code causing the error above was changed in xgboost
Jan Sterba
January 5, 2021, 6:49 PM
Investigation of logs revealed:
there are two bugs: first that the training did not stop because of this error, and second that the error even happened - but that could be an xgboost bug
this is bad because it prevents us from shutting down cleanly