We need to advise users whether high concurrency is safe on Azure Databricks.
High concurrency can cause a job to be killed when multiple users are connected to the cluster: https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/preemption#preemption
It appears that sparkling water is robust against it, however we need tests to validate this.
Note: we tested it manually simulating pre-emption on high-concurrency DBC cluster in Azure and validated that result of transfer from DBC → H2O is correct (100000000 in 200 partitions, 2nodes).
It would great to have a test scenario which would simulate task preemption during call of as_h2o_frame. Eg. kill the task based on a specific data pattern.
Remove from doc the statement High Concurrency clusters are not supported from doc after this test is done (and working)