Escape quotes by default when writing CSV

Description

In the Frame.export() method, there are quote escaping parameters already built in.
Those should be exposed in export_file methods (both Python and R). Python is crucial, as MLI would like to use that.

Escaping quotes should be the default - or at least consider it. The following verifications must be done:

  • IF standard quotes are not escaped and/or mixed

  • Backwards compatibility is not broken (should NOT be).

Ideally, we could quote everything by default (this is correct way to save CSVs according to https://tools.ietf.org/html/rfc4180 ). Safest way possible. Autodetection is also fine and could save some space (when a quote inside a token is ran across, the token is quoted) - nice little addition with relatively small extra effort.

It is highly debatable whether it is the USER who should BY DEFAULT make this decision, as the user typically does not inspect every cell of the dataset. Should be user - overridable. The point is to produce a valid CSV by default.

Advised workaround is to convert H2O frame into Pandas DF and then let Pandas write it to CSV (unusable for huge data sets).

Activity

Show:
Pavel Pscheidl
January 8, 2021, 2:22 PM

Mentioned the ability to export in binary format, if you’d like to. This needs fixing anyway.

Fixed

Assignee

Pavel Pscheidl

Fix versions

Reporter

Pavel Pscheidl

Support ticket URL

None

Labels

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

ReleaseNotesHidden

None

CustomerVisible

No

Priority

Major