決å®æ¨ã®å¯è¦åã©ã¤ãã©ãªãdtreevizããåãã£ãã®ã§ã¾ã¨ãã
ããã«ã¡ã¯ã
決å®æ¨ã®å¯è¦åã¨ããã°ãæ£ç´scikit-learnã¨graphvizã使ããã¤ãããã£ããã¨ããªãã£ãã®ã§ãããå æ¥ä»¥ä¸ã®è¨äºãã¿ã¦è¡æãåãã¾ãããããã§ä»åã¯ã以ä¸ã®è§£èª¬è¨äºä¸ã§ç´¹ä»ããã¦ããã©ã¤ãã©ãªãdtreevizãã«ã¤ãã¦ã¾ã¨ãã¾ãã
dtreevizã®æ¦è¦
dtreevizã¨ã¯
ããè¯ã決å®æ¨ã®å¯è¦åãç®æãã¦ä½ãããã©ã¤ãã©ãªã§ãã
- 解説è¨äº : How to visualize decision trees
- Github : GitHub - parrt/dtreeviz: A python machine learning library for structured data.
- Sample Imagesdtreeviz/testing/samples at master · parrt/dtreeviz · GitHub
å¤åç¾èã¯ä¸è¦ã«å¦ããã ã¨æãã®ã§ãGithubã«æ²è¼ããã¦ãããµã³ãã«ç»åãè¼ãã¾ãã
ããããã¨ã«ãããããã
ããããè¯ã決å®æ¨ã®å¯è¦åã¨ã¯ã©ããããã®ãªã®ãï¼
詳ããã¯ã解説è¨äºã«è¨è¼ããã¦ãã¾ãããããã§ã¯ããã§ç´¹ä»ããã¦ãã決å®æ¨ãå¯è¦åãããã¨ã«ãã£ã¦ãè¦ãããã¤ã³ãããªã¹ãã¢ãããã¾ãã
- 決å®æ¨ã®åãã¼ãã«ãããç¹å¾´ã¨ç®æ¨ã®å¤ã®åå¸
- 決å®æ¨ã®åãã¼ãã«ãããç¹å¾´éã®ååã¨ãã®ç¹å¾´éã®åå²å¤
- èãã¼ãã®ç´åº¦
- èãã¼ãã®äºæ¸¬å¤
- 決å®æ¨ã®åãã¼ãã®ãµã³ãã«æ°
- èãã¼ãã®ãµã³ãã«æ°
- ããç¹å¾´ãã¯ãã«ã樹æ¨ãèã«åãã£ã¦ã©ã®ããã«èµ°ãããã(ããã¯ãç¹å®ã®ç¹å¾´ãã¯ãã«ã«ãã£ã¦ãªããã®äºæ¸¬å¤ãå¾ãã®ãã説æããã®ã«å½¹ç«ã¡ã¾ã)
ãã®ãããªæ±ºå®æ¨ããã¼ã¹ã«è¦ãããéè¦ãªãã¤ã³ããè¦ãããã«ããdtreevizããã¨ã¦ã便å©ã§ãã
çµæ§ä»äºã§æ±ºå®æ¨ã¨ãã使ããã¨ãããã®ã§ãããå¯è¦åããã¾ãã¤ã±ã¦ãªãã¦ãã©ãããã£ããªã¼ã¨æã£ã¦ããã¨ããã«ããã®ã©ã¤ãã©ãªãããã®ã§æåãã¾ããã
å¾æ¥ã®æ±ºå®æ¨ã®å¯è¦åã¨æ¯è¼ãã¦è¦ãã¨ããã
é常決å®æ¨ãå¯è¦åããããã¨ããã¨ãscikit-learnã¨graphvizã使ã£ã¦ã®å¯è¦åã«ãªãã¾ããããããããã¾ãããã¦ãªããã§ããããããã
ãã¯ãæ¯è¼ããã¾ã§ããªãæãããã¾ããã解説記事中ã§ç´¹ä»ããã¦ããirisã®ãã¼ã¿ã§æ±ºå®æ¨ãæ§ç¯ããå¯è¦åããã¦è¦ãçµæãè¦æ¯ã¹ãã¨ä¸ç®çç¶ã§ãããã
ãªãã¨ããããã³ã·ã§ã³ãä¸ããã¾ãç¬ãè²åãã¨ããã¢ãã³ãªæãã§ããããã
解説è¨äºã§ã¯ãRã»SASã»SPSSãªã©ã®å¯è¦åã¨ãæ¯è¼ããã¦ããã®ã§ãèå³ãããæ¹ã¯ãã²è§£èª¬è¨äºãè¦ã¦ããã ããã°ã¨æãã¾ãã以ä¸ã¯SPSSã®å¯è¦åã®å 容ãscikit-learnããã¯ããæãã§ããã
解説è¨äºä¸ã§ãç´¹ä»ããã¦ãã¾ããããdtreevizãã®ã©ã¤ãã©ãªèªä½ã¯ã以ä¸ã®ãA visual introduction to machine learningãã«ããªãã®å½±é¿ãåãã¦ä½æããã¿ããã§ããã ãã¶ã¾ãã«ãããè¦ãã¨ãã«ãå¯è¦åã£ã¦ããã¼ãªã¼ã£ã¦æã£ãè¨æ¶ãããã¾ããå¯è¦åã§ãããªã«å¤ãããã ãªãã£ã¦æãã¾ãããããã ããããããã£ã¨ãã ãã£ã¦ããããããã
ãdtreevizãã®å¯è¦åçµæã®è¦æ¹ã«ã¤ãã¦
åé¡ã¨å帰ã®ä¸¡æ¹ã«ããã¦ã®å¯è¦åçµæããµã³ãã«ãã¼ã¿ã使ããªããè¦ã¦ããã¾ãã
ç¹å¾´éã¨ç®æ¨ã©ãã«ç©ºéã®å¯è¦å
以ä¸ã¯ãBostonã®ãã¼ã¿ã§ãpriceã¨ããã¿ã¼ã²ããã«å¯¾ãã¦AGEã¨ããç¹å¾´éã®ã¿ã使ã£ã¦å帰ãããã®ãå¯è¦åãããå³ã§ãã
ãã®å³ãããããã¤ãã®ãã¨ãèªã¿åãã¾ãã
- åãã¼ãã«ãããAGEã®åå²ç¹
- åãã¼ãã§è¡¨ç¤ºããã¦ããã°ã©ãã®ãx軸ã«å¹³è¡ãªç¹ç·ãè¦ããã¨ã§ãå¹³åå¤ãããã
- èãã¼ã以å¤ã®ãã¼ãã«ã¤ãã¦ã¯ãx軸ã®ç¯å²ã常ã«ä¸å®
決å®æ¨ã®å¯è¦åãããç¹å¾´éãã©ãã§åå²ããã¦ãã¦ããããç®æ¨ã©ãã«ã«å¯¾ãã¦ã©ã®ãããªäºæ¸¬çµæã®å½±é¿ãä¸ãã¦ãããããããã¾ãã
ã¾ãã以ä¸ã¯ãUSER KNOWLEDGEã¨ãããã¼ã¿ãPEGã¨ããä¸ã¤ã®ç¹å¾´éã使ã£ã¦åé¡ãããã®ãå¯è¦åãããå³ã§ãã
ãã®å³ããããããã¤ãã®ãã¨ãèªã¿åãã¾ãã
- æ£ã°ã©ãããåç®æ¨ã¨ãªãã«ãã´ãªã®éãç©ã¿ä¸ãå¼ã§è¡¨ç¾ãã¦ãã
- èãã¼ãã«ããã¦ãåã°ã©ãã®ãµã¤ãºããµã³ãã«æ°ã®ãµã¤ãºã表ãã¦ãã¦ãç´æçã«ãããããã
解説è¨äºã«ããã¨ãåã°ã©ãã¯ããã®è©å¤ã®æªããç¥ããªããã解éæ§ã®é«ããæèãã¦ä½¿ã£ã¦ããã¨æ¸ããã¦ãã¾ããã確ãã«ãåã°ã©ãã®å¤§å°ãè¦ããã¨ã§ãµã³ãã«æ°ã®å¤§ãããç´æçã«ç解ã§ããã®ã§ãã¨ã¦ãããããããã§ãã
ããä¸ã¤ã®è¦³æ¸¬çµæã®è§£éã®å¯è¦å
ããç¹å¾´éãç¹å®ã®å¤ãåã£ãã¨ãã«ããããã©ã®ãããªçµæã«ç¹ãããã®å¯è¦åãã§ããã¿ããã§ãã以ä¸ã®å³ã¿ããã«ã§ããããã§ããã
åå²ç¹ãè¦ãã°ã¾ãåããã¨ãã§ããã®ã§ããã説ææã«ã¯ã¨ã¦ãããããããã§ããããã
å·¦ããå³æ¹åã§ã®å¯è¦å
å人çã«ã¯ãããã確ãã«ããã ãªã£ã¦æã£ãã®ã§ããããã¤ãä¸ããä¸ã®å³ããè¦ã¦ãªãã£ãã®ã§ãå·¦ããå³ã¨ããã®ã¯ããã ãªã¨æãã¾ããã
ãã®ã»ã
解説è¨äºä¸ã§ã¯ãã·ã³ãã«ãªãã¼ã¸ã§ã³ã§ã®å¯è¦åãã試ãããæ¡ç¨ã«è³ããªãã£ãå¯è¦åãªã©ãç´¹ä»ããã¦ãã¾ãã
å°æ¥çãªé¨å
以ä¸ã®ã©ã³ãã ãã©ã¬ã¹ãã®ç¹å¾´éãã¿ãã©ã¤ãã©ãªã追å ãããã¿ããã§ããã
dtreevizã®ä½¿ãæ¹
ã¤ã³ã¹ãã¼ã«
pipã§ã©ã¤ãã©ãªã¯ã¤ã³ã¹ãã¼ã«ã§ãã¾ãã
$ pip install dtreeviz
å ãã¦ã以ä¸ã®ã©ã¤ãã©ãªãbrewã§ã¤ã³ã¹ãã¼ã«ããå¿ è¦ãããã¿ããã§ãã
$ brew install poppler $ brew install pdf2svg $ brew install graphviz --with-librsvg --with-app --with-pango
ã¡ãã£ã¨ä¸çªæå¾ã®ã³ãã³ãã§ä»¥ä¸ã®ã¨ã©ã¼ã§ã¤ã¾ã¥ããã®ã§ããã以ä¸ã®ãµã¤ããè¦ãã解決ã§ãã¾ããã
ã³ã¼ã
ãµã³ãã«ã³ã¼ããããã£ã¦è¦ã¾ããã
from sklearn.datasets import * from sklearn import tree from dtreeviz.trees import * import graphviz regr = tree.DecisionTreeRegressor(max_depth=2) boston = load_boston() regr.fit(boston.data, boston.target) viz = dtreeviz(regr, boston.data, boston.target, target_name='price', feature_names=boston.feature_names) viz
ç°¡åã§ãããããå®éã«ä½æããç»åã¯ä»¥ä¸ã®éãã
ãã¼ãï¼
ä»ã«ã使ããªããããæãã®å¯è¦åã楽ããã§ã¿ããã¨æãã¾ãã