Explore possibilities of encrypted H2O communication among H2O nodes on K8S

Description

Goal: Use TLS as a secure layer for communication among H2O nodes. This is a customer-driven requirement. Customer's environment is K8S. H2O is deployed by Steam onto the K8S.

How it is done now:
For securing trafffic (+ some additional services like traffic monitoring), service mesh is used. Namely linkerd. When linkerd is used, H2O is unable to form a cluster. See the log snippet below:

The log snippet above shows the following:

  • Headless service had been contacted and ClusterIPs of all underlying H2O pods have been retrieved.

  • The ClusterIPs are used in NetworkInit to form a cluster.

  • Clustering fails and each H2O Pod forms a separate cluster of 1, instead of cluster size 3 (desired status).

H2O uses UDP for communication among nodes - this is deeply wired in `AutoBuffer.java` class. The service mesh used does NOT support UDP communication, therefore clustering is not possible, as well as further communication among nodes (almost anything from data ingest to MR task is sent and received via AutoBuffer in H2O, which uses UDP).

Possible fixes
The fixes below contain an exhaustive list of options with no opinions taken on their feasibility.

  • Make H2O use TCP instead - this would require different connection handling and would introduce some overhead. H2O seems to have

 

in H2O.java, yet this option sets nothing and seems to be unimplemented. Code present in this ancient commit seems to be no longer there.

  • Use service mesh that supports UDP (are there any ?)

  • Start H2O with configured built-in JKS. As H2O is started by Steam, steam would have to start H2O properly configured to use TLS. This options means H2O would use end-to-end secure connection among nodes, yet would NOT be part of the service mesh and the traffic could NOT be monitored.

Won't Fix

Assignee

Pavel Pscheidl

Fix versions

None

Reporter

Pavel Pscheidl

Support ticket URL

None

Labels

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

ReleaseNotesHidden

None

CustomerVisible

No

Priority

Blocker