Skip to content

Resampling with store_models = TRUE is slow #1222

Open
@be-marc

Description

library(mlr3learners)

task = tsk("pima")
learner = lrn("classif.ranger", num.trees = 5000)
resampling = rsmp("cv", folds = 10)

system.time(resample(task, learner, resampling))
 
#    user  system elapsed 
#  25.037   1.413  24.330 

system.time(resample(task, learner, resampling, store_models = TRUE))

#    user  system elapsed 
#  27.560   1.701  27.138 

Saving the models take 3 seconds longer. While tuning with store_models it took almost 8 seconds longer. The ranger models get quite large ~60MB. The 3 seconds get lost when creating this data.table:

mlr3/R/resample.R

Lines 121 to 131 in 2c8734a

data = data.table(
task = list(task),
learner = grid$learner,
learner_state = map(res, "learner_state"),
resampling = list(resampling),
iteration = seq_len(n),
prediction = map(res, "prediction"),
uhash = UUIDgenerate(),
param_values = map(res, "param_values"),
learner_hash = map_chr(res, "learner_hash")
)

Profiling of this part

image

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions