RuleFit treating integers as categorical

Description

Rule looks like this:
(AirTime in {104.0, 105.0, 110.0, 111.0, 112.0, 113.0, 114.0, 120.0, 121.0, 124.0, 125.0, 126.0, 127.0, 128.0, 130.0, 131.0, 132.0, 133.0, 136.0, 138.0, 139.0, 14.0, 140.0, 142.0, 146.0, 147.0, 148.0, 15.0, 150.0, 151.0

even though AirTime is an integer column.

Using allyears2k data for this.

It also seems like the rules may be overfitting to categoricals. It would be nice to have the option to turn on one hot encoding for this. This would also help simplify the rules.

Assignee

Zuzana Olajcová

Fix versions

None

Reporter

Megan Kurka

Support ticket URL

None

Labels

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

ReleaseNotesHidden

None

CustomerVisible

No

Priority

Major