parquet import fails on HDP with Spark 2.0 (azure hdi cluster)

Description

Activity

Show:
Michal Kurka
February 8, 2018, 10:41 PM

, , this was fix in H2O. There is no fix needed on SW side. I assigned the ticket to you because I think Parquet import should be automatically tested in end-to-end tests in Docker.

Jakub Hava
February 8, 2018, 10:42 PM

When we have the Hadoop-docker tests import parquet can be one of the tests to ensure it works across different Hadoop versions

Michal Kurka
February 8, 2018, 11:33 PM

Especially different Spark versions, Hadoop version can be important too but at this point Spark is more important because Spark bundles Parquet libraries.

Jakub Hava
February 9, 2018, 7:01 AM
Edited

I see. So creating a simple unit test for parquet import would help here. We cherry-pick the changes to all release branches so that means the test would be running on all supported Spark versions.

I would decouple this from the dockerization of Sparkling Water and , close this guy and implement the test in different PR - - would do you think? As the core issue in this PR is actually solved and we just need to write a test

Jakub Hava
February 9, 2018, 7:11 AM

Feel free to reopen this if you think it is still necessary to keep this open. I will work on as soon as possible to ensure we test the parquet import at least on basic levels

Fixed

Assignee

Michal Raška

Reporter

Nidhi Mehta

Labels

None

CustomerVisible

No

testcase 1

None

testcase 2

None

testcase 3

None

h2ostream link

None

Affected Spark version

None

AffectedContact

None

AffectedCustomers

None

AffectedPilots

None

AffectedOpenSource

None

Support Assessment

None

Customer Request Type

None

Support ticket URL

None

End date

None

Baseline start date

None

Baseline end date

None

Task progress

None

Task mode

None

ReleaseNotesHidden

None

Fix versions

Priority

Major
Configure