SW - API exception with calling H2OContext.getOrCreate(sc)

Description

Repro steps:
1. Have SW 1.6.7
2. $ export MASTER="local-cluster[3,2,1024]"
3. run bin/pysparkling
4. Try the following
>>> from pysparkling import *
>>> hc = H2OContext.getOrCreate(sc)

[Avkash] You will see the problem as below:

Warning: if you don't want to start local H2O server, then use of `h2o.connect()` is preferred.
Checking whether there is an H2O instance running at http://10.0.0.30:54327.
================================ EXCEPTION INFO ================================

[LOCAL FRAMES]
Omitted: imported modules, class declarations, _future_ features, None-valued

Within <module>() line 1 in file <stdin>:
sqlCtx: <pyspark.sql.context.HiveContext object at 0x7fc6239622d0>
sqlContext: <pyspark.sql.context.HiveContext object at 0x7fc6239622d0>
sc: <pyspark.context.SparkContext object at 0x7fc624836d50>
_pythonstartup:

Within getOrCreate() line 133 in file /tmp/ec2-user/spark/work/spark-3ee89a0e-9f35-4dfa-89a0-9fd728d64018/userFiles-71eecdda-aa25-49fd-9980-67cd7949a961/h2o_pysparkling_1.6-1.6.7-py2.7.egg/pysparkling/context.py:
spark_context: <pyspark.context.SparkContext object at 0x7fc624836d50>
selected_conf: Sparkling Water configuration:... (+ 10 lines)
method_params: [Ljava.lang.Object;@3b7b310d
method_def: [Ljava.lang.Class;@14d16afd
method: public static org.apache.spark.h2o.JavaH2OContext org.apache.spark.h2o.JavaH2OContext.getOrCreate(org.apache.spark.api.java.JavaSparkContext,org.apache.spark.h2o.H2OConf)
jvm: <py4j.java_gateway.JVMView object at 0x7fc623c5ca90>
jsc: org.apache.spark.api.java.JavaSparkContext@40c712b8
jhc_klazz: class org.apache.spark.h2o.JavaH2OContext
jhc: ... (+ 12 lines)
h2o_context: H2OContext: ip=10.0.0.30, port=54327 (open UI at http://10.0.0.30:54327 )
gw: <py4j.java_gateway.JavaGateway object at 0x7fc623c5c8d0>
conf_klazz: class org.apache.spark.h2o.H2OConf

Within init() line 199 in file /tmp/ec2-user/spark/work/spark-3ee89a0e-9f35-4dfa-89a0-9fd728d64018/userFiles-71eecdda-aa25-49fd-9980-67cd7949a961/h2o_pysparkling_1.6-1.6.7-py2.7.egg/h2o/h2o.py:
strict_version_check: False
start_h2o: False
scheme: http
port: 54327
nthreads: -1
kwargs: {}
ip: 10.0.0.30
insecure: False
get_mem_size: <function get_mem_size at 0x7fc621fd7758>
enable_assertions: True

Within open() line 171 in file /tmp/ec2-user/spark/work/spark-3ee89a0e-9f35-4dfa-89a0-9fd728d64018/userFiles-71eecdda-aa25-49fd-9980-67cd7949a961/h2o_pysparkling_1.6-1.6.7-py2.7.egg/h2o/backend/connection.py:
verify_ssl_certificates: True
verbose: True
scheme: http
retries: 5
port: 54327
name: SPARK_WORKER_DIR
ip: 10.0.0.30
https: False
conn: <H2OConnection uninitialized>
_msgs: (u'Checking whether there is an H2O instance running at {url}', u'connected.', u'not found.')

Within _test_connection() line 395 in file /tmp/ec2-user/spark/work/spark-3ee89a0e-9f35-4dfa-89a0-9fd728d64018/userFiles-71eecdda-aa25-49fd-9980-67cd7949a961/h2o_pysparkling_1.6-1.6.7-py2.7.egg/h2o/backend/connection.py:
self: <H2OConnection uninitialized>
messages: (u'Checking whether there is an H2O instance running at {url}', u'connected.', u'not found.')
max_retries: 5
errors: []
_: 0

Within request() line 239 in file /tmp/ec2-user/spark/work/spark-3ee89a0e-9f35-4dfa-89a0-9fd728d64018/userFiles-71eecdda-aa25-49fd-9980-67cd7949a961/h2o_pysparkling_1.6-1.6.7-py2.7.egg/h2o/backend/connection.py:
urltail: /3/Cloud
url: http://10.0.0.30:54327/3/Cloud
start_time: 1474325213.65
self: <H2OConnection uninitialized>
method: GET
match: <_sre.SRE_Match object at 0x7fc6221ca4f8>
headers: {u'X-Cluster': None, u'User-Agent': u'H2O Python client/2.7.12 (default, Sep 1 2016, 22:14:00) [GCC 4.8.3 20140911 (Red Hat 4.8.3-9)]'}
endpoint: GET /3/Cloud

Within request() line 44 in file /usr/lib/python2.7/dist-packages/requests/api.py:
url: http://10.0.0.30:54327/3/Cloud
session: <requests.sessions.Session object at 0x7fc621fdc090>
method: GET
kwargs: {'files': None, 'verify': True, 'auth': None, 'headers': {u'X-Cluster': None, u'User-Agent': u'H2O Python client/2.7.12 (default, Sep 1 2016, 22:14:00) [GCC 4.8.3 20140911 (Red Hat 4.8.3-9)]'}, 'json': None, 'params': None, 'timeout': 3.0, 'proxies': None, 'data': None}

[STACKTRACE]

File <stdin>:
<module>() #0001

File /tmp/ec2-user/spark/work/spark-3ee89a0e-9f35-4dfa-89a0-9fd728d64018/userFiles-71eecdda-aa25-49fd-9980-67cd7949a961/h2o_pysparkling_1.6-1.6.7-py2.7.egg/pysparkling/context.py:
getOrCreate() #0133 h2o.init(ip=h2o_context._client_ip, port=h2o_context._client_port, start_h2o=False, strict_version_check=False)

File /tmp/ec2-user/spark/work/spark-3ee89a0e-9f35-4dfa-89a0-9fd728d64018/userFiles-71eecdda-aa25-49fd-9980-67cd7949a961/h2o_pysparkling_1.6-1.6.7-py2.7.egg/h2o/h2o.py:
init() #0199 "connected.", "not found."))

File /tmp/ec2-user/spark/work/spark-3ee89a0e-9f35-4dfa-89a0-9fd728d64018/userFiles-71eecdda-aa25-49fd-9980-67cd7949a961/h2o_pysparkling_1.6-1.6.7-py2.7.egg/h2o/backend/connection.py:
open() #0171 conn._cluster = conn._test_connection(retries, messages=_msgs)
_test_connection() #0395 cld = self.request("GET /3/Cloud")
request() #0239 auth=self._auth, verify=self._verify_ssl_cert, proxies=self._proxies)

File /usr/lib/python2.7/dist-packages/requests/api.py:
request() #0044 return session.request(method=method, url=url, **kwargs)

[EXCEPTION]
TypeError: request() got an unexpected keyword argument 'json'
at line 44 in /usr/lib/python2.7/dist-packages/requests/api.py

>>>
>>>
>>>
>>>
>>> hc = H2OContext.getOrCreate(sc)
Warning: if you don't want to start local H2O server, then use of `h2o.connect()` is preferred.
Checking whether there is an H2O instance running at http://10.0.0.30:54327.

Activity

Show:
Jakub Hava
March 23, 2017, 11:41 AM

This happens since h2o requires newer version of requests package. This commit in h2o checks for the correct version https://github.com/h2oai/h2o-3/commit/c812669050a402eaf32ad74696e2d8f3d8f2af58. It is included in ueno 2 which is already used in released sparkling water versions 2.0.6 and 2.1.2. There is this PR https://github.com/h2oai/sparkling-water/pull/219 bringing latest state from master to rel-1.6 branch which also brings latest h2o with desired check.

So once this is merged, the user will be notified on Spark 2.1, 2.0 and 1.6 wether his/her requests library should be upgraded.

NikosT
November 11, 2016, 2:08 PM

Not sure if it helps but I'm getting the above error with requests==2.2.1 but not with requests==2.11.1.

Fixed

Assignee

Jakub Hava

Reporter

Avkash Chauhan

Labels

None

CustomerVisible

No

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

None

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

None

Customer Request Type

None

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

ReleaseNotesHidden

None

Fix versions

Affects versions

Priority

Major