Skip to content

NAs in metadata$Corr_matrix #4

@hopkinsjj9

Description

@hopkinsjj9

Thank you for putting together a great package!

I'm getting infinite or missing values in 'x' errors when I try to send the following data through the process:
https://www.kaggle.com/pradeeptripathi/predicting-house-prices-using-r/data

train <- data.frame(readr::read_csv('../data/train.csv'))
str(train)
train <- train %>% mutate_if(is.character,as.factor)
str(train)

cleaned <- missCompare::clean(train,
var_removal_threshold = 0.5,
ind_removal_threshold = 0.8,
missingness_coding = -9)

make sure
cleaned <- missCompare::clean(cleaned,
var_removal_threshold = 0.5,
ind_removal_threshold = 0.8,
missingness_coding = -9)

metadata <- missCompare::get_data(cleaned,
matrixplot_sort = T,
plot_transform = T)
Warning message:
In stats::cor(X, use = "pairwise.complete.obs", method = "pearson") :
the standard deviation is zero

simulated <- missCompare::simulate(rownum = metadata$Rows,
colnum = metadata$Columns,
cormat = metadata$Corr_matrix,
meanval = 0,
sdval = 1)
Error in eigen(if (doDykstra) R else Y, symmetric = TRUE) :
infinite or missing values in 'x'

I found two NAs in metadata$Corr_matrix. Utilities/LotFrontage
Not knowing exactly how to handle this, I just set them to zero (hack)

colnames(metadata$Corr_matrix)[colSums(is.na(metadata$Corr_matrix)) > 0]
metadata$Corr_matrix[is.na(metadata$Corr_matrix)] <- 0

I can now restart at the simulate step
But, there's got to be a better way
Shouldn't clean or get_data take care of this somehow?

Thanks again
Jack Hopkins

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions