Support mutliclass response columns in the Stacked Ensemble method. We should support several types of multiclass stacking: Stacking, StackingC and possibly sMM5
StackingC differs from regular Stacking in these points: for each linear model associated with a specific class, only the partial class probability distribution which deals with this very class is used during training and testing. While Stacking uses probabilities for all classes and from all component classifiers for each linear model, StackingC uses only the class probabilities associated with the class which we want our linear model to predict.
Dˇzeroski and Zenko (2002) investigate Stacking in the extension proposed by Ting & Witten (1999). They introduce a new variant sMM5 which they claim to be in a league of its own. Their new variant is quite competitive to StackingC but much slower, according to unpublished experiments on our twenty-six datasets. However, combining both ideas does not improve performance.
More info in the attached paper.
Let's start with traditional multiclass stacking – use all class probabilities (cv preds) from all the base learners to train the metalearner. That means that if there are three learners and five classes, the level-one matrix will be 15 columns + response column.
Yep, thats a good first approach. We can always benchmark against the references mentioned above after.
+1: This is an annoying gap, and something, anything, is going to be much more useful than not working at all.
coming soon! The PR will be merged into master tonight or tomorrow.
please update automl code to access stacked ensembles multiclass abilites