åã¾ã§Rã使ã£ã¦ããã®ã§ããããã£ããPythonã使ã£ã¦ããã®ã ããå ¨é¨ç§»è¡ããããªã¼ã¨æã£ã¦è²ã åå¼·ãã¦ã¾ãã
numpyã§csvãèªã¿è¾¼ã
numpy.genfromtxt関数ãnumpy.loadtxt関数ã使ãã°ç°¡åã«èªã¿è¾¼ããã¿ããã§ãã
import numpy as np data = np.genfromtxt('data.csv', delimiter=',')
scikit-learnã§svmlight(libsvm)å½¢å¼ã®ãã¡ã¤ã«ãèªã¿è¾¼ã
load_svmlight_file関数ã使ãã°è¯ãã¿ããã§ã
以ä¸ã®ä¾ã®ããã«åæã«è¤æ°èªã¿è¾¼ãå ´åã«ã¯load_svmlight_filesé¢æ°(æå¾ã«sãä»ãã¦ãã)ã使ãã¾ãã
from sklearn.datasets import load_svmlight_files feature_train, label_train, feature_test, label_test = load_svmlight_files(['train.txt', 'test.txt'])
ãããã®é¢æ°ã«ã¯æ³¨æç¹ã2ã¤ããã¾ãã
1ã¤ç®ã¯ç¹å¾´éãçè¡åã¨ãã¦èªã¿è¾¼ã¾ãããã¨ã§ãã
çè¡åç¨ã§ãªãé¢æ°ãªã©ã使ãæã¯todenseé¢æ°ãtoarrayé¢æ°ã使ã£ã¦å½¢å¼ãå¤æãã¦ããå¿
è¦ãããã¾ãã
2ã¤ç®ã¯ç¹å¾´éã®ãµã¤ãºãæå®ããªãã¨ãããªãå ´åããããã¨ã§ãã
ä¸ã®ä¾ã®ããã«ä¸åº¦ã«ãã¬ã¼ãã³ã°ç¨ã®ãã¼ã¿ã¨ãã¹ãç¨ã®ãã¼ã¿ãèªã¿è¾¼ãã§ããå ´åã«ã¯åé¡ããã¾ããã
load_svmlight_fileé¢æ°ã§ä¸ã¤ã®ãã¡ã¤ã«ãã¤ãã©ãã©ã«èªã¿è¾¼ãå ´åã§ããã¼ã¿ä¸ã®ç¹å¾´éã®ã¤ã³ããã¯ã¹ã®æ大å¤ãç°ãªãå ´åã«ã¯å¾ããèªã¿è¾¼ãæ¹ã«n_featuresã§ç¹å¾´éã®æ¬¡å
æ°ãæå®ããå¿
è¦ãããã¾ãã
load_svmlight_file('test.txt', n_features=feature_train.shape[1])