Add model summary for Stacked Ensembles in Python API

Description

We need to come up with a list of metadata to display for the `model_summary` for Stacked Ensemble and then add that to the model object and expose via R and Python APIs.

Currently we have:

R Example (for Random Forest):

Python Example:

Activity

Show:
Erin LeDell
March 9, 2021, 6:25 PM

Duplicate of https://h2oai.atlassian.net/browse/PUBDEV-7807
We will do this in Java instead.

Erin LeDell
March 4, 2021, 7:30 PM
Edited

Here’s the current R ensemble summary output (I think we need to remove the extra metrics at the end and keep it simple by only including the first part, but we can address that in a separate JIRA):


Top part:



However, here’s what we are trying to replicate in Python (just the top part). There are a few things I would like to change though: Right now we are just getting the metalearner type by doing an if statement on the metalearner_algorithm input, so it if its not set it will specify it as “glm”, however, and it also thinks there are no hyperparameters set because metalearner_params is set to NULL/None by default. However, our default metalearner does have some params set by default (e.g. non_negative = True) so what we should do instead is query the metalearner_model attribute and get the algorithm type and hyperparameters directly and fill them in. Maybe we can print out a bulleted list below Metalearner hyperparameters:
so something like:


Lastly, if we want to get creative, we could print out the weights of the metalearner if its a GLM (or at least the top 20 weights by default, in case there’s a lot of base models).

Erin LeDell
February 3, 2021, 7:26 AM

Update in 3.32: When you print the ensemble object, it will print No model summary for this model but then also prints the training metrics object (for regular H2O models, we print the training metrics by default… I think we should not print training metrics here because it’s a huge amount of text and users can print the metrics if they specifically want that). So it should be removed from here too (not sure if this needs to be done at the model level to all models, in a separate Jira ticket, or if we can remove the auto-printing of the training metrics for the SE individually, in this ticket).

Here’s what we see:

 

It’s the same output if you use the .summary method too:


Navdeep
June 12, 2018, 1:05 AM
Duplicate

Assignee

Unassigned

Fix versions

None

Reporter

Erin LeDell

Support ticket URL

None

Labels

Affected Spark version

None

Customer Request Type

None

Task progress

None

ReleaseNotesHidden

None

CustomerVisible

No

Components

Sprint

None

Priority

Major