Get rid of numpy and pyspark dependency

Description

None

Activity

Show:
Jakub Hava
March 31, 2020, 11:41 PM

CC:

Marek Novotny
April 2, 2020, 3:59 PM

error message:

 

Jakub Hava
April 3, 2020, 3:45 AM
Edited

I have replaced the _jvm() calls, but later I get:

The numpy depedendency is used in several places in Spark, in this cases it is defined in pyspark.ml.param in init file. So changing the code won’t help.

Jakub Hava
April 3, 2020, 3:52 AM

But more I think about it, we might remove the dependency to pyspark & numpy all together.

In absolutely most cases, spark is already available so it just downloads the extra dependencies.
Only case affected would be when people would have fresh python intallation, to external spark and wold do:

pip install h2o_pysparkling_2.4. In this case the intallation would succeed but the user could not use Sparkling Water as there is no Spark.

So what I would do:

  1. remove numpy & pyspark dependencies from setup.py

  2. Document that if the python environment does not have Spark, user need to install it, either as pip install pyspark or download spark distribution and adding python libs on the PYTHONPATH

 

Fixed

Assignee

Jakub Hava

Reporter

Jakub Hava

Labels

None

CustomerVisible

No

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

None

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

None

Customer Request Type

None

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

ReleaseNotesHidden

None

Fix versions

Priority

Major
Configure