GLRM returns different results for sample weights column with ones vs None

Description

Using a fixed random seed and including constant columns, passing weights_column = None gives different results than weights_column with a sample weight column of only ones

Activity

Show:
Ada Palmisano
March 8, 2021, 1:55 PM

Thank you, Wendy, for your quick response.

Ada

Wendy
March 6, 2021, 12:07 AM

instead of using x = traindata.names, specify the columns as x = ["c1","c2",...]

Wendy
March 5, 2021, 6:38 PM

Ada:

GLRM does not take weights_column.  So,  if the user add a weight column to the frame and include it in the x specification, GLRM will be performed on the dataset including the weight column.  Hence, the user is performing two different GLRM frames, one with weight column and one without the weight column.

Instead of using x = dataset.names when calling glrm, do x = ['c1','c2','c3',…]. If you run GLRM again using the same dataset with and without the weight columns but use x = […] to specify the columns you want to use, you should get the same results when you call predict. Let me know if this helps.

 

 

Thanks, Wendy

Done

Assignee

Wendy

Fix versions

None

Reporter

Ada Palmisano

Support ticket URL

None

Labels

None

Affected Spark version

None

Customer Request Type

None

Task progress

None

ReleaseNotesHidden

None

CustomerVisible

No

Components

Priority

Major