mlpy (Machine Learning Python) ã® SVM ã試ã
mlpy : http://mlpy.sourceforge.net/
ä¾é¡ã¨ãã¦ï¼UCI Repository ã§æä¾ããã¦ãã "Breast Cancer Wisconsin (Original) Data Set" ãç¨ããã
http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original)
- ãµã³ãã«æ°ï¼683 (æ¬æ¥ã®ãµã³ãã«æ° 699åãã missing value ãã㤠16åãé¤ãã)
- 次å æ° : 9
æåã« numpy ãå©ç¨ãã¦ãã¼ã¿ãèªã¿è¾¼ãã
>>> import numpy as np >>> f = np.loadtxt("breast-cancer-wisconsin.data",dtype='int',delimiter=',') >>> print f [[1000025 5 1 ..., 1 1 2] [1002945 5 4 ..., 2 1 2] [1015425 3 1 ..., 1 1 2] ..., [ 888820 5 10 ..., 10 2 4] [ 897471 4 8 ..., 6 1 4] [ 897471 4 8 ..., 4 1 4]] >>> print f.shape (683, 11)
ã©ãã«ã¯ "2" 㨠"4" ã®äºå¤ã"-1" 㨠"+1" ã§ãªãã¦å¤§ä¸å¤«ãã¨ããä¸å®ã¯ãããï¼ï¼ã¨ããããï¼ãã®ã¾ã¾çªã£èµ°ãã
ãã¼ã¿å
¨ä½ãçºãã¦ã¿ãã¨ï¼ã©ãã«ã®å¤ãã©ã³ãã ã«ä¸¦ãã§ããã¿ããã ããï¼åå 343åãå¦ç¿ç¨ï¼å¾å340åããã¹ãç¨ã¨ããã
ï¼åç®ã ID number ããä¸è¦ãã¾ãï¼æçµåãã¯ã©ã¹çªå·ããï¼ä»¥ä¸ã®ããã«å¦ç¿ç¨ãµã³ãã« trainx, å¦ç¿ç¨ã©ãã« trainyï¼ãã¹ãç¨ãµã³ãã« testxï¼ãã¹ãç¨ã©ãã« testy ãè¨å®ããã
>>> trainx = f[0:343,1:10] >>> trainy = f[0:343,10] >>> testx = f[343:,1:10] >>> testy = f[343:,10] >>> trainx.shape (343, 9) >>> testy.shape (340,)
ç·å½¢ SVM ãè¨å®ãï¼å¦ç¿ããã³åé¡ãè¡ããmlpy.LibSvm.learn 㨠mlpy.LibSvm.pred ãåããã ãã®ãæ軽ãã
>>> svm = mlpy.LibSvm() >>> svm.learn(trainx,trainy) >>> result = svm.pred(testx)
åé¡çµæã¨ãã¹ããã¼ã¿ã®æ£è§£ã©ãã«ã¨ãæ¯è¼ããã
>>> rate = 0 >>> for i in range(340): if result[i] == testy[i]: rate = rate + 1 >>> print rate, rate / 340.0 335 0.985294117647
å¦ç¿ãã¼ã¿ãå¤ããããããåé¡ç²¾åº¦ã¯ 98.5% ã¨é«ãã
ã¾ãï¼ã§ã
http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.names
ã«ç¤ºããã¦ãã "Past Usage" ã§ã 93.5% ã 95.9% ã¨ããå¤ãåºã¦ããããããã¨ãããã