libsvm
2016/11/27
hskksk @ JapanR 2016
• :
• : R, Python, C++
• :
• :
xgboost kaggler
Bosch Production Line
Performance 15
xgboost
xgb.DMatrix
# feature
label = readRDS("label.rds")
feature_set_A = readRDS("feature_set_A.rds")
feature_set_B = readRDS("feature_set_B.rds")
# feature cbind
mat = cbind(
feature_set_A,
feature_set_B
)
↑
# DMatrix
dmat = xgb.DMatrix(mat, label=label)
cbind
※cbind rm(vars); gc()
xgb.DMatrix
Python
libsvm
※R
1. cbind libsvm
2. DMatrix
cbind libsvm
data.table::fwrite_libsvm(list_of_matrices, file)
data.table fork fwrite
# feature
label = readRDS("label.rds")
feature_set_A = fread("feature_set_A.csv")
feature_set_B = fread("feature_set_B.csv")
# feature list
# 1 label
matrices = list(label, feature_set_A, feature_set_B)
# libsvm
fwrite_libsvm(matrices, "libsvm.txt")
# DMatrix
dmat = xgb.DMatrix("libsvm.txt")
fwrite OpenMP
8.5GB/120sec @ Xeon 2.5GHz ✕ 8
data.table PR
https://github.com/hskksk/data.table
kaggler !!
Enjoy Kaggling with R !!

高速・省メモリにlibsvm形式で ダンプする方法を研究してみた