Add multiclass support to Stacking

Description

Support mutliclass response columns in the Stacked Ensemble method. We should support several types of multiclass stacking: Stacking, StackingC and possibly sMM5

StackingC differs from regular Stacking in these points: for each linear model associated with a specific class, only the partial class probability distribution which deals with this very class is used during training and testing. While Stacking uses probabilities for all classes and from all component classifiers for each linear model, StackingC uses only the class probabilities associated with the class which we want our linear model to predict.

Dˇzeroski and Zenko (2002) investigate Stacking in the extension proposed by Ting & Witten (1999). They introduce a new variant sMM5 which they claim to be in a league of its own. Their new variant is quite competitive to StackingC but much slower, according to unpublished experiments on our twenty-six datasets. However, combining both ideas does not improve performance.

More info in the attached paper.

Activity

Show:
Erin LeDell
August 12, 2017, 1:31 AM
Edited

Let's start with traditional multiclass stacking – use all class probabilities (cv preds) from all the base learners to train the metalearner. That means that if there are three learners and five classes, the level-one matrix will be 15 columns + response column.

Navdeep
August 12, 2017, 1:45 AM

Yep, thats a good first approach. We can always benchmark against the references mentioned above after.

Darren Cook
August 14, 2017, 11:18 AM

+1: This is an annoying gap, and something, anything, is going to be much more useful than not working at all.

Erin LeDell
August 25, 2017, 12:20 AM

coming soon! The PR will be merged into master tonight or tomorrow.

Lauren DiPerna
September 8, 2017, 11:24 PM

please update automl code to access stacked ensembles multiclass abilites

Fixed

Assignee

Navdeep

Fix versions

None

Reporter

Erin LeDell

Support ticket URL

None

Labels

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

ReleaseNotesHidden

None

CustomerVisible

No

Epic Link

Components

Priority

Major
Configure