Add metalearner_transform option to Stacked Ensemble
This ticket is now just to add the metalearner_transform option, defaulting to ”NONE” (same as before) but the non-deafult option is ”Logit” which takes the logit transform of the CV preds before training. This helps on most classification datasets.
Keeping other info below for historical purposes (we can move this to another ticket later).
Two transformation ideas for the cvpreds when the predictions are from a binary classification (maybe lets start with that first):
Take logit of cvpreds before training the metalearner
Take percentile rankings of the cvpreds before training the metalearner
Some potential design choices:
I am not sure yet if we should do this for any arbitrary Stacked Ensemble (by default) or if we should start with the AutoML SEs since they're more predictable in terms of model diversity (at least by in the default setting).
We could either expose the transforms as a flag or named list of options? metalearner_transform = c("logit", "percentile_rank", "none"), or
We could just do it automatically
Once we have a draft PR, we can run a benchmark to see if there's an improvement in Stacked Ensemble performance across a large number of datasets. These types of transformations would make sense for a GLM or DNN metalearner, but it would (or at least should) be inconsequential for any tree-based metalearner, so we could choose to limit the transformation to GLM/DNN metalearners, or else we can always apply the transformation (if that can make the code simpler and the computational cost is negligible).
I have been trying this out using some custom GLM metalearning code in R and it's consistently been giving me better performance (though I have been only working with highly correlated XGBoost & LGBM models). I have been trying this only in binary classification but it probably also makes sense to apply it in the multiclass case as well.
Here's the transformation I'm making (code in R):
After seeing some more results with percentile rank, that also looks like a strong candidate for transformation (in some cases, better than logit). It’s probably dataset dependent, so doing a full run of the benchmark will help determine which would be a better default.