åã«æ±ºå®æ¨ã®å¯è¦åããããã¨æã£ã¦ãã£ã¦ãªãã£ãã®ã§ãã£ã¦ããã¾ã
決å®æ¨ã®ã©ã¤ãã©ãªã¯ä¾ã®ãã¨ãscikit-learnã使ã
pythonの機械学習ライブラリscikit-learnの紹介 - 唯物是真 @Scaled_Wurm
決å®æ¨ã¨ã¯
決å®æ¨ã¯æ師ããå¦ç¿ã§ä½¿ãããã¢ãã«ã§ãã«ã¼ã«ãæ¨ã¨ãã¦å¦ç¿ãã¾ã
ä¾ãã°èº«é·ãä½éããæ§å¥ãäºæ¸¬ãããå ´åã身é·ã170cm以ä¸ã§ä½é60kg以ä¸ãªãç·ãã¿ãããªã«ã¼ã«ãå¦ç¿ãã¾ã
æ§è½ã¯ãã¾ããããªãã¢ãã«ã§ããã人éã«ãããããããã«ã¼ã«ãåºåãã(ä»ã®ã¢ãã«ã¨æ¯ã¹ãã°)ã¨ããç¹å¾´ãããã¾ã
ç°¡åã«èª¬æããã¨ãããå¤æ°ãä¸å®å¤ä»¥ä¸ã§ãããã¨ããæ¡ä»¶ã§åããæã«ããã¼ã¿ã®ã©ãã«(æ§å¥ãªãç·å¥³)ãã¨ã®åå¸ãã©ã¡ããã«åããããªæ¡ä»¶ã§æ¨ãä½ã£ã¦ããã¾ã
äºæ¸¬ããã¨ãã«ã¯ããã¼ã¿ãæ¡ä»¶ãæºããã¦ãããã¼ãããã©ã£ã¦æ¨ã®ä¸çªä¸ã®èãã¼ãã¾ã§ãã£ã¦ãèãã¼ãã«å²ãå½ã¦ãããå¦ç¿ãã¼ã¿ãä¸çªå¤ãã©ãã«ã«åé¡ããã¾ã
å¯è¦åããã¨ä»¥ä¸ã®ããã«ãªãã¾ã
ããã¯ãã®è¨äºã®ä¸ã®æ¹ã§ãããµã¶ã¨ããã®ã¸ã£ã³ã±ã³äºæ¸¬ã®æ±ºå®æ¨ã®ç°¡åãªä¾ã§ã
valueã¯ãã®èãã¼ãã«å²ãå½ã¦ãããã°ã¼ããã§ãããã¼ããããã®ãã¼ã¿ã®æ°ãginiã¯ã©ãã«ã®åå¸ã®åãã®ææ¨ã§ã(0ã«è¿ãã»ã©åã£ã¦ãã)
æ¡ä»¶ãçã®æã«1ã¨å®ç¾©ãã¦ããã®ã§ã1æåããã§ã以å¤ãªãå·¦å´ã®èãã¼ãã«ãã©ãã¤ãã®ã§æ¬¡ã¯ãã§ãã1æåããã§ãã ã£ããå³å´ã®èãã¼ãã«ãã©ãã¤ãã®ã§æ¬¡ã¯ãã¼ã¨ããäºæ¸¬ã«ã¼ã«ã表ãã¦ãããã¨ã«ãªãã¾ã
ãµã¶ã¨ããã®ã¸ã£ã³ã±ã³äºæ¸¬
é©å½ãªãã¼ã¿ã§è©¦ãã¦ã¿ããã¨ãããã¨ã§ãµã¶ã¨ããã®ã¸ã£ã³ã±ã³ã®æ¬¡ã®æãäºæ¸¬ãã¾ã
å
è¡ç 究ã«ããã°æ£è§£ç50%以ä¸ã§äºæ¸¬ã§ãããã¨ãç¥ããã¦ãã¾ã
サザエさんのじゃんけん予測問題のサーベイ - 唯物是真 @Scaled_Wurm
ãã¼ã¿ã¯éå»ã«ã¯ãã¼ã«ããã®ããã®ã¾ã¾ãã£ãã®ã§ä½¿ãã¾ãâ
Janken_Classification/sazae.tri at master · mugenen/Janken_Classification · GitHub
サザエさん(とプリキュア)のジャンケンデータのダウンロード - 唯物是真 @Scaled_Wurm
éå»ã®3åã¾ã§ã®æãã次ã®æãäºæ¸¬ãã¾ã
20åå²ã®ã¯ãã¹ããªãã¼ã·ã§ã³(ãã¼ã¿ã®19/20ã§å¦ç¿ãã¦æ®ãã®1/20ã§è©ä¾¡ããã®ã20åç¹°ãè¿ã)ã§æ£è§£çãè©ä¾¡ãã¦(ãã¯ã)å¹³åãåºåãã¾ã
scikit-learnã®æ±ºå®æ¨ã®ã©ã¤ãã©ãªã§ã¯æ¨ã®æ·±ã(âã«ã¼ã«ã®è¤éã)ã®æ大å¤ãæå®ã§ããã®ã§ãå¤ãã¦å®é¨ãã¦ãã¾ã
ããç¨åº¦ã®æ·±ãã¾ã§ã¯æ£è§£çãä¸ããã¾ãããå¢ããéããã¨æ£è§£çãä¸ããã¾ã
æ¨ã®æ大ã®æ·±ã | æ£è§£çã®å¹³å |
---|---|
1 | 42.6% |
2 | 48.4% |
3 | 49.9% |
4 | 51.2% |
5 | 51.5% |
6 | 51.3% |
æ£è§£çãåºãã®ã«ä½¿ã£ãã³ã¼ã
# -*- coding: utf-8 -*- import sklearn.tree import sklearn.datasets import sklearn.cross_validation #ãã¼ã¿ã®èªã¿è¾¼ã¿ X, y = sklearn.datasets.load_svmlight_file('sazae.tri.txt') #æ·±ããå¤ãã¦å®é¨ for d in xrange(1, 7): clf = sklearn.tree.DecisionTreeClassifier(max_depth=d) result = sklearn.cross_validation.cross_val_score(clf, X.toarray(), y, cv=20) print u'æ大深ã: {}, æ£è§£çã®å¹³å: {:.1%}'.format(d, result.mean())
決å®æ¨ã®å¯è¦å
ããããæ¬é¡ã§ãããscikit-learnã«ã¯å¯è¦åç¨ã®é¢æ°ãããã®ã§ç°¡åã«ã§ãã¾ã
sklearn.tree.export_graphviz関数ã«æ±ºå®æ¨ãä¸ããã°ãã°ã©ãå¯è¦åç¨ã®ã½ããã®graphvizã®å½¢å¼ã§åºåãã¦ããã¾ã
以ä¸ã®ã¹ã¯ãªãããå®è¡ãã¦ã§ãããã¡ã¤ã«ã«å¯¾ãã¦ã"dot -Tpng å
¥åãã¡ã¤ã«å -o åºåãã¡ã¤ã«å"ã¨ã³ãã³ããå®è¡ããã¨pngå½¢å¼ã§å¾ããã¾ã(Graphvizã®ã¤ã³ã¹ãã¼ã«ãå¿
è¦)
ãã©ã³ããæå®ããªãã¨æ¥æ¬èªã表示ã§ããªãã®ã§æ³¨æ
# -*- coding: utf-8 -*- import sklearn.tree import sklearn.datasets import StringIO import contextlib X, y = sklearn.datasets.load_svmlight_file('sazae.tri.txt') clf = sklearn.tree.DecisionTreeClassifier(max_depth=2) clf.fit(X.toarray(), y) with contextlib.closing(StringIO.StringIO()) as temp: sklearn.tree.export_graphviz(clf, out_file=temp, feature_names='ã°ã¼(3æå) ãã§ã(3æå) ãã¼(3æå) ã°ã¼(2æå) ãã§ã(2æå) ãã¼(2æå) ã°ã¼(1æå) ãã§ã(1æå) ãã¼(1æå)'.split()) output = temp.getvalue().splitlines() #æ¥æ¬èªã表示ããã¨ãã¯ãã©ã³ããæå®ããªãã¨ãããªã output.insert(1, 'node[fontname="meiryo"];') with open('tree.dot', 'w') as f: f.write('\n'.join(output))
ã©ããªã«ã¼ã«ãå¦ç¿ã§ããã
決å®æ¨ã®å¯è¦åã¯ä»¥ä¸ã®ãããªåºåã«ãªãã¾ã
æ
£ããªãã¨ãããã¥ããã¨æã£ãã®ã§å·¦ä¸ã«èµ¤åã§äºæ¸¬çµæãä¸ã«ãã®èãã¼ãã«è¾¿ãçãéå»ã®ã¸ã£ã³ã±ã³ã®æãæ¸ãã¦ããã¾ãã
æ·±ã1ã¾ã§ã®å ´åããããªåç´ãªã«ã¼ã«ã§ã4å²ä»¥ä¸å½ããã¾ã
æ·±ã2ã¾ã§ãéå»ã«åºãæ以å¤ãé¸ã¶ãããªã«ã¼ã«ãå¦ç¿ããã¦ãã¾ã
æ·±ã3以ä¸ã¯æ¨ã大ããã¦ãããã¥ããã®ã§çç¥ãã¾ãã