説æã¢ãã«SHAPæ¦èª¬ï¼èª¬æã¢ãã«ãè¦å åæã«ä½¿ãæã®ã¡ã¢
Twitterå§ãã¾ãããããã£ãããã©ãã¼ãã¦ããããã¨å¬ããã§ãï¼
ãªã³ã¯
ã¾ããç®ä¸åå¼·ä¸ã®ããããããã/éãã¨æã/ãããã«ããç¹ãªã©äºç´°ãªãã¨ã§ããã£ã¼ãããã¯ãããã ããã¨ã¨ã¦ãå¬ããã§ãã
ç« ç«ã¦ã¯ä»¥ä¸ã®ããã«ãªãã¾ãã
- ã¯ããã«
- SHAPã¨ã¯
- Shapleyå¤è¨ç®æ³
- ï¼ããã±ã¼ã¸ã§ã§ããï¼å¯è¦åä¾
- 説æã¢ãã«ã§è¦å åæããéã«æ°ã¥ãã/æèããç¹
- ãã¾ãã説æã¢ãã«ãè¦å åæã«ä½¿ãéã®èå¯
- æç®ãªã¹ã
ã¯ããã«
æ¨å¹´ã®ãã¥ã¼ã¹ã«ãAIã®å¤æã«ã¤ãã¦ä¼æ¥ã«èª¬æ責任ãæ±ããã¨ãããã®ãããã¾ãããæ¯éã¯ãã¦ãããAIã»äººå·¥ç¥è½ã®èª¬ææ§ã¨ãããããã¯ã¯ä¸éä¸è¬ã§ã話é¡ã«ãªã£ã¦ããããã«æãã¾ããå°ãæè¡ç®ç·ã«ç§»ãã¨ãä»ç¾å¨æ³¨ç®ãéãã¦ããæ©æ¢°å¦ç¿ã®ã¢ãã«è§£éææ³ã®ä¸ã¤ã«SHAP(Shapley Additive exPlanation)ãããã¨æãã¾ãã è«æã¯2017å¹´NIPSã«æ¡æããã¦ããããkaggleã®äºæ¸¬ã¢ãã«ã®èª¬ææ§ãæ±ãã³ã¼ã¹ã«ãSHAPãçãè¾¼ã¾ãã¦ãã¾ãã
ã¾ãGoogleã®ãµã¼ãã¹ã§ã¯SHAPã®å
ã¢ã¤ãã¢ã¨ãªãShapleyå¤ãç¨ãã¦Webãã¼ã¸ã®è©ä¾¡ãè¡ãªã£ã¦ããã¨ã®è¨è¼ãããã¾ããã墨ä»ãã®ããã§å°ãå¿å¼·ãã§ãããç¬
ãã®ããã«SHAPã¯ç¥å度ãé«ãã¨æããä¸æ¹ãæ¥æ¬èªã®æ
å ±ãå°ãªãã¨æãã¦ãã¾ããããã®ããè¨äºãæ¸ããã¨æã£ãã®ã§ããããã¡ãã®ã¡ã¢ãè¦ãã TreeSHAPè«æã®ç´ æ´ãããã¡ã¢ã§ææ°ä»ã è«æã®è©±ã¯å
容ãéè¤ããã¨æã£ãã®ã§ãããã§ã¯SHAPã®æ¦è¦ã¨å®åã§è¦å åæã«ç¨ããéã«æãããã¨ã»æèããç¹ã«ã¤ãã¦ã¾ã¨ãããã¨æãã¾ãã
以ä¸ãSHAPã®ææ³ã§ç®åºãããå¤ã®ãã¨ãSHAPå¤ã¨è¡¨ç¾ãã¾ãããªããç§ãSHAPã使ã£ãçµé¨ã¯Treeç³»ã¢ãã«ã®ã¿ã®ããã解説ã¯Treeç³»ã®ã¢ãã«ã«åã£ã¦ãã¾ãã
SHAPã¨ã¯
- ã²ã¼ã çè«ã®Shapleyå¤ãèµ·æº
- Shapleyå¤ã¯ååãã¦è¡ãã²ã¼ã ã®åå è ããããã«ãè²¢ç®åº¦ãå²ãæ¯ãããã®ææ³
- SHAPã¯Shapleyå¤ãæ©æ¢°å¦ç¿ã¢ãã«ã«é©ç¨ã§ããå½¢ã«ãããã®
- æ©æ¢°å¦ç¿ã§ã¯ãåºåçµæã«ããããã®å¤æ°ãã©ãã ãå½±é¿ããããæå³ãã
- ã¦ã¼ã¹ã±ã¼ã¹ã¯ä»¥ä¸ã®ï¼ç¹
- äºæ¸¬ã¢ãã«ã®åºåçµæã説æãã
- ä½ãã®åå ã¨ãªãè¦å ã®åæ
- (ãµã³ãã«æ°*説æå¤æ°ã®æ°)ã®å½¢ã§åºåããã
Shapleyå¤ã¨SHAPã®é¢ä¿
ã·ã£ã¼ãã¬ã¤å¤ï¼ã·ã£ã¼ãã¬ã¤ã¡ãè±: Shapley valueï¼ã¨ã¯ãã²ã¼ã çè«ã«ããã¦ååã«ãã£ã¦å¾ãããå©å¾ãåãã¬ã¤ã¤ã¼ã¸å ¬æ£ã«[1] åé ããæ¹æ³ã®ä¸æ¡ã§ããã1953å¹´ã«ãã®å¤ãå°å ¥ãããã¤ãã»ã·ã£ã¼ãã¬ã¼ãè¨å¿µãã¦å½åããããï¼Wikipediaå¼ç¨)
ãã¡ãã¯ãSHAPã®å ã¢ã¤ãã¢ã¨ãªãShapleyå¤ã®èª¬ææã§ããç°¡åã«ããã¨è¤æ°äººã§ååãã¦è¡ãã²ã¼ã ã®ã¹ã³ã¢ããããããã®ãã¬ã¼ã¤ã¼ã®è²¢ç®åº¦ã«åºã¥ãã¦å²ãæ¯ãææ³ã§ãã
SHAPã¯ããã®Shapleyå¤ãæ©æ¢°å¦ç¿ã¢ãã«ã«å¯¾ãã¦é©ç¨ã§ããããã«ãããã®ã§ããè«æã§ã¯ã説æã¢ãã«ã«æã¾ããï¼ã¤ã®æ§è³ªãæã¤ãã®ã¯ãã ï¼ã¤ã§ããããããShapleyå¤ã«çãã主張ããã¦ãã¾ãã
SHAPã«ããã¦ã¯ãäºæ¸¬å¤ãè¤æ°äººã§ååãã¦è¡ãã²ã¼ã ã®ã¹ã³ã¢ã«ã説æå¤æ°ãããããã®ãã¬ã¤ã¤ã¼ã«ãããã¦SHAPå¤ãããããã®ãã¬ã¼ã¤ã¼ã®è²¢ç®åº¦ã«å¯¾å¿ãã¾ãã
ã¤ã¾ãSHAPå¤ã¯ãããããã®èª¬æå¤æ°ãã¢ãã«ã®äºæ¸¬å¤ã«ã©ãã ãå½±é¿ãä¸ãããã表ãã¾ãã
ã¦ã¼ã¹ã±ã¼ã¹
kaggleã®è§£éæ§ã«ã¤ãã¦ã®ã³ã¼ã¹ã§ã¯ã以ä¸ã®ãããªã¦ã¼ã¹ã±ã¼ã¹ãæãã¦ãã¾ãã
A model says a bank shouldn't loan someone money, and the bank is legally required to explain the basis for each loan rejection
A healthcare provider wants to identify what factors are driving each patient's risk of some disease so they can directly address those risk factors with targeted health interventions
æ訳
- ã¢ãã«ãèè³ãè¡ãã¹ãã§ã¯ãªãã¨å¤æããæã«ãéè¡ã¯èè³ãæã£ãçç±ã説æããæ³ç義åããã
- å»çæä¾è ã¯ããã種ã®ç ã«ããããªã¹ã¯ãåä¸ãããè¦å ãç¹å®ãããããããã«ãããå½¼ãã¯ã¿ã¼ã²ããåããå¥åº·çä»å ¥ããããã¨ã§ããªã¹ã¯è¦å ã«ç´æ¥çã«å¯¾å¦ã§ããããã ãï¼ãã¿ã¾ããããã訳ãæãæµ®ãã³ã¾ããã§ãããï¼
åè
ã¯äºæ¸¬ã¢ãã«ã®åºåã説æãããSHAPæ¬æ¥ã®ä½¿ãæ¹ã§ãããã¡ãã¯åé ã®èª¬ææ§ã®ãããã¯ã«è©²å½ãã¾ããå¾è
ã¯ä½ãã®åå ã¨ãªãè¦å ã®åæã§ããç§ã¯å®åã§ä¸»ã«å¾è
ã®ç¨éã§SHAPãç¨ãã¾ããã
SHAPã®ç¹å¾´
ç¶ãã¦ãä»ã®ä¸»ãªå¤æ°éè¦åº¦ï¼Gain,Split,Permutationãªã©ï¼ã¨æ¯è¼ããå®ç¨ä¸ã®SHAPã®ç¹å¾´ã3ç¹èª¬æãã¾ãã
- å ¥åãã¼ã¿ã»ããã¨åãå½¢ã§SHAPå¤ãè¨ç®ããã
- æ£è² ã®ç¬¦å·ä»ã
- ä»ã®èª¬æå¤æ°ã¨ã®çµã¿åããã§å¤ã決ã¾ã
1ç¹ç®ã«ãSHAPã¯å
¥åãããã¼ã¿ã»ããã¨åãå½¢ã®è¡åã§æ°å¤ãè¿ã£ã¦ãã¾ããã¢ãã«å
¨ä½ã«å¯¾ãã¦ããããã®èª¬æå¤æ°ãã©ãã ãå¹ãããã®ç²åº¦ã§åºåããããããSHAPã¯ç´°ããç²åº¦ã§èª¬æå¤æ°ã®å½±é¿ãè¦ããã¨ãã§ãã¾ãããªããSHAPå¤ããµã³ãã«ã®æ¹åã«éè¨ãããã¨ã§ãGainãªã©ã®ãããªèª¬æå¤æ°ã®å¤æ°éè¦åº¦ãç®åºãããã¨ãå¯è½ã§ãã
2ç¹ç®ã«ãSHAPå¤ã¯æ£è² ã®ç¬¦å·ä»ãã§è¿ã£ã¦ããããããã®èª¬æå¤æ°ãäºæ¸¬ã®ã©ã¡ãã®åãã«å¹ãããããããã¾ãã
3ç¹ç®ã«ãSHAPã¯ä»ã®èª¬æå¤æ°ã¨ã®çµã¿åããã§å¤ã決ã¾ãã¾ãããã説æå¤æ°Aã®å¤ãåããµã³ãã«ããã£ãã¨ãã¦ããä»ã®èª¬æå¤æ°ã¨ã®ç¸äºä½ç¨ã«ãã£ã¦ã説æå¤æ°Aã®SHAPå¤ã¯å¤åãå¾ã¾ããä¾ãã°è¥å¹´å±¤ã«æå¹ãªåºåããã£ãå ´åããã®åºåãè¦ã/è¦ã¦ããªãã¨ãã説æå¤æ°ã¯è¥å¹´å±¤ã®ãµã³ãã«ã«ã¯å¤§ããªSHAPå¤ã¨ãªããé«é½¢å±¤ã®ãµã³ãã«ã«ã¯å°ããªSHAPå¤ã¨ãªããã¨ãèãããã¾ãã
ä¸è¨ã®ç¹å¾´ã¯ãã¢ãã«å
¨ä½ã§ã®å説æå¤æ°ã®å¯ä¸ã§ã¯ãªããããããã®ãµã³ãã«ã¬ãã«ã§å説æå¤æ°ãåºåã«ã©ãå½±é¿ãä¸ãããã説æãã¦ããç¹ã«èµ·å ãã¦ãã¾ãã
Shapleyå¤è¨ç®æ³
ããã§ã¯ãã²ã¼ã çè«ã«ãããShapleyå¤ã®è¨ç®æ³ã説æãã¾ããã
ä½è£ãããã°ï¼åå¼·ãç´ãã¦ï¼SHAPã®è«æã®è§£èª¬ãæ¸ããã¨æãã¾ããããã¡ãã®è¨ç®æ³ãç¥ã£ã¦ããã°å¤§ããªåé¡ã¯ãªãã¨èãã¦ãã¾ãã
Shapleyå¤ã®è¨ç®å¼
Shapleyå¤ã®è¨ç®å¼ã§è¡¨ç¾ããã¨ã以ä¸ã®ããã«ãªãã¾ãã
]
N:ãã¬ã¤ã¤ã¼ã®å
¨ä½éå
S:ãã¬ã¤ã¤ã¼ã®é¨åéå
v(S):é¨åéåSã®ãã¬ã¤ã¤ã¼ãã¡ãåå ããæã®ã¹ã³ã¢
n:ãã¬ã¤ã¤ã¼ã®ç·æ°
è¨ç®æ¹æ³ãç°¡åã«ããã¨ããã¬ã¤ã¤ã¼ãæ°ãã«ã²ã¼ã ã«å ãã£ãæã®ã¹ã³ã¢ã®å¢åï¼è²¢ç®åº¦ï¼ããèããããå ¨ã¦ã®ãã¿ã¼ã³ï¼é åï¼ã«åºã¥ãã¦è¨ç®ãã¦å¹³åãã¨ããã¨ã§è¨ç®ãã¦ãã¾ãã
以ä¸ã§ä¾ãç¨ãã¦èª¬æãã¾ãããªãããã¡ãã®èª¬æã¯shiibassããã®è¨äºãåèã«ããã¦ããã ãã¾ããã
ä¾ããã¬ã¤ã¤ã¼A, B, Cãåå ããã²ã¼ã ã®ãã¬ã¤ã¤ã¼Aã®Shapleyå¤ï¼è²¢ç®åº¦ï¼ãç®åºããã
A, B, Cãããããåå /éåå ã ã£ãå ´åã®ã¹ã³ã¢ã以ä¸ã®éãã¨ãã¾ãã
S(åå ãããã¬ã¤ã¤ã¼) = ã¹ã³ã¢
S(0) = 0
S(A) = 20
S(B) = 20
S(C) = 20
S(A,B) = 60
S(A,C) = 110
S(B,C) = 100
S(A,B,C) = 120
ãã¬ã¤ã¤ã¼ãæ°ãã«ã²ã¼ã ã«å ããæã®ãã¿ã¼ã³ã¯é åã¨ã®é¡æ¨ã§ç¶²ç¾
ã§ããããã§ã¯
éãã¨ãªãã¾ãã
ç¶ãã¦ãé åB -> A -> C ãä¾ã¨ãã㨠Aåå åã«ã¯ãã¬ã¤ã¤ã¼ã¯Bã®ã¿ã§ãAåå å¾ã«ã¯ãã¬ã¤ã¤ã¼ã¯A,Bã¨ãªãããããã®é åã§ã®Aã®è²¢ç®åº¦ã¯
(Aåå å¾ã®ã¹ã³ã¢) - (Aåå åã®ã¹ã³ã¢)
ã¨è¨ç®ã§ãã¾ãã
ãã®è¦é ã§å ¨ã¦ã®ãã¿ã¼ã³ã«ãããAã®è²¢ç®åº¦ãè¨ç®ããã¨ã以ä¸ã®è¡¨ã®ããã«ãªãã¾ãã
ãã¬ã¤ã¤ã¼é å | Aåå åã®ã¹ã³ã¢ | Aåå å¾ã®ã¹ã³ã¢ | Aã®è²¢ç®åº¦ |
---|---|---|---|
A -> B -> C | S(0) = 0 | S(A) = 20 | 20 - 0 = 20 |
A -> C -> B | S(0) = 0 | S(A) = 20 | 20 - 0 = 20 |
B -> A -> C | S(B) = 20 | S(A,B) = 60 | 60 - 20 = 40 |
B -> C -> A | S(B, C) = 100 | S(A, B, C) = 120 | 120 - 100 = 20 |
C -> A -> B | S(C) = 20 | S(A, C) = 110 | 110 - 20 = 90 |
C -> B -> A | S(B, C) = 100 | S(A, B, C) = 120 | 120 - 100 = 20 |
æå¾ã«ãAã®è²¢ç®åº¦ã®å¹³åå¤ãåã㨠ã¨ãªãã¾ãã åæ§ã«è¨ç®ããã¨ãBã®è²¢ç®åº¦ã¯30ãCã®è²¢ç®åº¦ã¯55ã¨ãªãã足ãåãããã¨35 + 30 + 55 = 120ã¨ãªããS(A,B,C)ã«ä¸è´ãã¾ãã
ãªããããã§ã¯ãã¬ã¤ã¤ã¼ï¼èª¬æå¤æ°ã®æ°ï¼ãï¼äººã¨ãã¾ããããå®éã®æ©æ¢°å¦ç¿ã¢ãã«ã§ã¯èª¬æå¤æ°ã¯æ°ç¾æ°åã¨ãªãå ´åãããã«ããã¨æãã¾ãããã®æãè¨ç®éãéä¹ã®ãªã¼ãã¼ã§å¢ãã¦ããã¨ç¾å®çãªè¨ç®éã§åã¾ããªããªããããTreeSHAPã®è«æã§ã¯ãããé«éåããã¢ã«ã´ãªãºã ãæ¡ç¨ãã¦ããããã§ããå ·ä½çãªè¨ç®éã¯ã
O(æ¨ã®æ¬æ°*ã¢ãã«ä¸ã®æ¨ã®èã®æ大æ°*æ¨ã®æ·±ã2)ã®ãªã¼ãã¼ã§ãã
è£è¶³ã§ãããSHAPã¯ã²ã¼ã çè«ã®Shapleyå¤ãå ã«
- äºæ¸¬ã¢ãã«ãå æ³çç·å½¢ã¢ãã«ã«å¤æãã
- ããããã®èª¬æå¤æ°ãèæ ®ãããããªãã01ã®ãã©ã°ã«å¤æãã¦èª¬æã¢ãã«ã®å ¥åã¨ãã
- èæ ®ããªãå¤æ°ã¯å¦ç¿ãã¼ã¿ã®åå¸ã§ãã®å¤æ°ãè£å®ãã¦è¨ç®ãã
ã®å·¥å¤«ãå ãããã®ã¨ç§ã¯ç解ãã¦ãã¾ã
ç´°ããé¨åãç°¡æ½ã«èª¬æããã®ãç§ã«ã¯å°é£ãªã®ã§ãä½è£ãããã°(åå¼·ãç´ãã¦)解説è¨äºãæ¸ããã¨æãã¾ãã
ï¼ããã±ã¼ã¸ã§ã§ããï¼å¯è¦åä¾
ä¸çªä¸ã®å¯è¦åãé¤ãã¦ãbostonã¨ããä½å®
ã®ä¾¡æ ¼ã®ãã¼ã¿ã»ããã«å¯¾ãã¦SHAPã使ã£ãä¾ã§ãããã¼ã¿ã»ããã®èª¬æã¯ãã¡ãã«ããã¾ãã
ç¹å®ã®ãµã³ãã«ã®SHAPå¤
ãããµã³ãã«ã®ããããã®èª¬æå¤æ°ãäºæ¸¬å¤ãã©ãã ãå¤åãããããå¯è¦åããå³ã§ãã
ã¾ããã«ã©ã¼ãã¼ã®ä¸ã®æåã®èª¬æãããbase valueã¯å
¨ã¦ã®å¤æ°ãèæ
®ãã¦ããªãæã®äºæ¸¬å¤ã§ãå¦ç¿ãã¼ã¿ã®ç®çå¤æ°ã®å¹³åå¤ã¨ãªãã¾ããããã¦model outputããã®ãµã³ãã«ã®äºæ¸¬å¤ã§ãã
ç¶ãã¦ã«ã©ã¼ãã¼ã®èª¬æã§ãããã³ã¯ãï¼æ¹åã®å¯ä¸ããéãã¼æ¹åã®å¯ä¸ã表ãã¦ãã¾ããä¾ãã°ããã®ãµã³ãã«ã§ã¯LSTATï¼ä½æå¾è
ã®å²åï¼ã®å¤æ°ãèæ
®ããå ´åã¨ããªãã£ãå ´åã§æ¯è¼ãã¦äºæ¸¬å¤ã4.98å¤åãã¦ããï¼ããã§ã¯èæ
®ãããã¨ã§ä¸æãã¦ããï¼ã¨è¦ã¾ãã
ããã¦ãbase valueã¨å
¨èª¬æå¤æ°ã®SHAPå¤ã足ãåãããã¨model outputã«ãªãã¾ãã1
å ¨ãµã³ãã«ã®SHAPå¤
ä¸ã¤ç®ã®å³ã¯ãå
¨ãµã³ãã«ã®SHAPå¤ãå説æå¤æ°ãã¨ã«ãããããããã®ã§ããè²ã説æå¤æ°ã®å¤ã®å¤§å°ããããããã®ä½ç½®ãSHAPå¤ã®å¤§å°ã表ãã¾ããä¾ãã°ãLS
TATï¼ä½æå¾è
ã®å²åï¼ãä½ãå¤æ°ã®æ¹ãä½å®
ä¾¡æ ¼ã®äºæ¸¬å¤ãé«ããªãå¾åã«ããäºãå³ããèªã¿åãã¾ãã
縦ã«é·ã伸ã³ãå½¢ã§ããããããã¦ããé¨åã¯ããã®è¾ºãã®SHAPå¤ãã¨ããµã³ãã«ãå¤ããããã¨ãæå³ãã¾ãã
äºã¤ç®ã®å³ã¯ãä¸çªä¸ã®å³ãå
¨ãµã³ãã«åå¯è¦åãããã®ã§ããããã«SHAPå¤ã®é¡ä¼¼åº¦ãã¨ã«ä¸¦ã¹æ¿ãããã¨ãå¯è½ã§ããã®ããä¸ã®å³ã¯ããã¤ãã®ã°ã«ã¼ãã«åããããããªå½¢ç¶ããã¦ãã¾ãã
ããããã®èª¬æå¤æ°ã®å¤æ°éè¦åº¦
Treeç³»ã¢ãã«ã®ããã©ã«ãã§åºåã§ããä¸è¬çãªéè¦å¤æ°ã¨åãå½¢å¼ã§ãã
SAHPå¤ã®çµ¶å¯¾å¤ã®å¹³åå¤ã大ããé ã«ä¸¦ã¹ããã®ã§ãã
2å¤æ°ã¨SHAPå¤ã®é¢ä¿
ï¼å¤æ°ã¨SHAPå¤ãå
¨ãµã³ãã«åãããããããã®ã§ããå³ã§ã¯ãè²ãRADãå·¦å³ãRMãä¸ä¸ãRMã®SHAPå¤ã表ãã¦ãã¾ããå³ãè¦ãã¨RM=7.3辺ããå¢ã«SHAPå¤ã大ããå¤åãã¦ããã®ããããã¾ããæ¨æ¸¬ã§ãããããããäºæ¸¬ã¢ãã«å
ã§æ¨ã®åå²ããã®RM=7.3辺ãã§èµ·ãããã©ã¡ãã®åå²ã«å
¥ã£ããã§äºæ¸¬å¤ã大ããå¤åãã¦ããã¨èãããã¾ãã
ï¼ã¤ã®èª¬æå¤æ°ã®äº¤äºä½ç¨å¹æ
ãã¡ãã¯ãï¼ã¤ã®èª¬æå¤æ°ã®äº¤äºä½ç¨å¹æããããããããã®ã§ããå³ãè¡åã¨è¦ãæã対è§ç·ä¸ã®ãããã群ã¯ãã®å¤æ°ã®ä¸»å¹æãã対è§ç·ä»¥å¤ã®ããããã¯äº¤äºä½ç¨å¹æã表ç¾ãã¦ããããã§ããè²ã¯è¡æ¹åã®å¤æ°ã®å¤ã®å¤§å°ããå·¦å³ã交äºä½ç¨å¹æã表ç¾ãã¦ãã¾ãã
説æã¢ãã«ã§è¦å åæããéã«æ°ã¥ãã/æèããç¹
ãã®ç« ã¯ç¹ã«ãæãã¨ãããããã°ãã£ã¼ãããã¯ãããã ããã¨å¬ããã§ãã
ããã¦ãSHAPã«ç¹åãã話ãã説æã¢ãã«ã®ä¸è¬çãªå
容ãå¤ããã¨ã«æ°ã¥ããã®ã§ãã¿ã¤ãã«ãå¤ãã¾ããã以ä¸ã§ã¯SHAPå¤ã§èª¬æãã¦ãã¾ããããæ¿ç¥ãããã ããã
- äºæ¸¬ã¢ãã«ã®ç²¾åº¦ãé«ããã¨ãæã¾ãã
- SHAPå¤ãã°ãã¤ããã¨ãããã®ã§ãããªã¢ã³ã¹ãå°ãããã工夫ãè¡ã£ãã»ãããã
- äºæ¸¬ã¢ãã«ã®åºåã説æããææ³ãªã®ã§ãå æé¢ä¿ãèæ
®ãã¦ããããã§ã¯ãªã
äºæ¸¬ã¢ãã«ã®ç²¾åº¦ãé«ããã¨ãæã¾ãã
SHAPå¤ã¯ããã¾ã§ãäºæ¸¬ã¢ãã«ã®åºåçµæã«å¯¾ãã¦å¤æ°ãã©ã®ç¨åº¦å¹ããããç®åºããææ³ã§ãããªã®ã§ãäºæ¸¬ã¢ãã«ã®ç²¾åº¦ãä½ãå ´åããã®èª¬æã¢ãã«ã®åºåã§ããSHAPå¤èªä½ã®çã®ã¢ãã«ã«å¯¾ããä¿¡é ¼æ§ãä½ãã¨èãããã¾ãããã®ãããæ§ç¯ããäºæ¸¬ã¢ãã«ãããç¨åº¦ã®ç²¾åº¦ãæã¤ãã¨ãæã¾ããã§ãã
SHAPå¤ãã°ãã¤ããã¨ãããã®ã§ãããªã¢ã³ã¹ãå°ãããã工夫ãè¡ã£ãã»ãããã
ä¿¡é ¼æ§ã®è¦³ç¹ããããªãã¹ãåºåå¤ã®ããªã¢ã³ã¹ãä¸ãã工夫ãè¡ã£ãæ¹ãããã¨æãã¾ããç¹ã«Treeç³»ã¢ãã«ã®åé¡ãªã®ã§ãããå
¥åãããã¼ã¿ã»ããã®éãã«ãã£ã¦ã¢ãã«ã®æ§é ã大ããå¤åãã¾ããã·ã¼ããå¤ãã¦è¤æ°åCV(cross validation)ãã¦SHAPå¤ãè¨ç®ããå®é¨ãè¡ã£ã¦ã¿ãã¨ãåããµã³ãã«ã§ãSHAPå¤ãå²ã¨ãã©ã¤ããããã¾ãããã®ãããã·ã¼ããå¤ãã¦è¤æ°åCVãè¡ãªã£ã¦çµæãå¹³åãããªã©ããªã¢ã³ã¹ãå°ããããå¦çãè¡ã£ãæ¹ãè¯ãã¨æãã¾ãã
äºæ¸¬ã¢ãã«ã®åºåã説æããææ³ãªã®ã§ãå æé¢ä¿ãèæ ®ãã¦ããããã§ã¯ãªã
ç¹°ãè¿ãã§ãããSHAPå¤ã¯ããã¾ã§äºæ¸¬ã¢ãã«ã®åºåçµæã解éããææ³ã§ãããã®ããã説æå¤æ°ã¨äºæ¸¬å¤ã«ãªãããã®å¾åãè¦ããã¨ãã¦ãããããã¤ã³ã¼ã«è¦å ã表ã訳ã§ã¯ããã¾ããã 以ä¸ãå¼ç¨ã§ãã
ãµã³ã´ã¨ãã®æé£è ã®ä¾ãæ¡ãä¸ãã¦ã¿ã¾ãï¼ãã®è©±ã¯ãã®togetterã®å 容ãåºã«ãã¦ãã¾ãããæ¬è¨äºã§ã¯ããã¾ã§ã説æã®ããã®ä»®æ³ä¾ã¨ãã¦ãã£ãã¼ã«ã¯ç¡è¦ãã¦åãæ±ãã¾ãï¼ã
ãµã³ã´ã®ä¿å ¨ã®ããã®èª¿æ»ããããµã³ã´ã®çåçã¨ãµã³ã´ã®æé£è Oã®åä½æ°ã«ä»¥ä¸ã®ç¸é¢é¢ä¿ã示ããã¦ããã¨ãã¾ããããã¾ããæé£è Oã¯å®éã«ãµã³ã´ãæé£ãã¦ãããã¨ããã£ã¼ã«ãã§ã®è¦³å¯ããåãã£ã¦ããã¨ãã¾ãã
ãã®ã¨ãããæé£è Oã®å¢å âãµã³ã´ã®çåçã®æ¸å°ãã¨ããå æé¢ä¿ãæ³èµ·ããã®ã¯èªç¶ãªãã¨ããããã¾ããã
ãããã®ãããªå æé¢ä¿ãåå¨ãããªãã°ããæé£è Oããæ¸å°ããããã¨ã«ããããµã³ã´ã®çåçããå¢å ããããã¨ãã§ãããã§ãã
ã¯ã¦ãã¦ããããªããï¼
ãã詳細ãªèª¿æ»ããããæé£è Oã¯æ»ã«ããã®ãµã³ã´ããé£ã¹ãªãããã¨ãåãã£ã¦ããã¨ãã¾ãããã®ã¨ããæé£è Oã¯å®ã¯çæ ç³»ã®ä¸ã§ã¹ã«ãã³ã¸ã£ã¼çå½¹å²ãæããã¦ããã¨ãããã¨ã«ãªãã¾ãã
ãããªãã¨ãããµã³ã´ã®çåçãä½ä¸âã¹ã«ãã³ã¸ã£ã¼ã§ããæé£è Oãå¢å ãã¨ããéã®å æãçã§ããå¯è½æ§ãåºã¦ãã¾ãããããã®å½¢ã®å æãçãªãã°ããæé£è Oãæ¸å°ããããã¨ã«ãããµã³ã´ã®çåçãå¢å ããããã¨ããä¿å ¨æ½çã¯å ¨ãå¹æãåã¼ããªããã¨ã«ãªãã¾ããï¼ããããã¹ã«ãã³ã¸ã£ã¼ãæé¤ãããã¨ã«ãããµã³ã´ã®å¥å ¨ãªæ°é³ä»£è¬ã妨ããããå¯è½æ§ããããããããã¾ããï¼
ããã¦ããã®ã©ã¡ãã®ãå æã®åãããããçã«è¿ãã®ãã¯ãåºæ¬çã«ã¯ç¾å ´ã§ã®è¦³å¯ and/or ä»å ¥ã«ãã£ã¦ããæããã«ãããã¨ã¯ã§ãã¾ãã*7ã
ãã®ããã«ãå æé¢ä¿ãããã¨æã£ã¦ããã¨å®ã¯å æãéãããããªãã¨ããä¾ã¯ããã¸ãã¹ã®ç¾å ´ã§ãããã¾ããä»ã«ããããå¤æ°ã¨ç®çå¤æ°ã«å
±éã®è¦å ã¨ãªã交絡å åã絡ãã§ãããªã©ãå æé¢ä¿ã¨æ··åãããããªå¤æ°éã®é¢ä¿æ§ã¯è²ã
ããã¾ãããã®ãããçµæãéµåã¿ã«ããããã¼ã¿ã®çæéç¨ãæ¨æ¸¬ãå¤æ°å士ã®èå¾ã«ããé¢ä¿æ§ãæ´çãããã¨ã§ãä»è¦ã¦ãããã®ãä½ãã人ãå¤æããå¿
è¦ãããã¾ãã2å æé¢ä¿ã«ã¾ã¤ããå¤æ°éã®é¢ä¿æ§ã«ã¤ãã¦ã¯ã詳ããã¯å¼ç¨å
ã®ãã¼ã¸ã§èª¬æãªããã¦ãã¾ãã
ä½è«ã§ãããè¤éãªå æé¢ä¿ãæ´çãããã¬ã¼ã ã¯ã¼ã¯ã¨ãã¦ä¸å¸ã®æ¹ããã·ã¹ãã ã·ã³ãã³ã°ã®æ¬ãç´¹ä»ããã¾ããã3ç§ã¯ç¾ç¶ãã¾ã使ãã¦ããªãã®ã§ãããã¨ã¦ãããããããããããã§ãã
ãã¾ãã説æã¢ãã«ãè¦å åæã«ä½¿ãéã®èå¯
以ä¸ãç°¡åã®ããå¦ç¿ãã¼ã¿ã®è©ä¾¡ææ¨å¤ãTS(Train Score)ãæ¤è¨¼ãã¼ã¿ã®è©ä¾¡ææ¨å¤ãVS(Validation Score)ã¨è¡¨ç¾ãã¾ãã
ããã¯ç§ã®ä¸ã®ä»®èª¬ãªã®ã§ããã説æã¢ãã«ã§èª¬æå¤æ°ãç®çå¤æ°ã«ä¸ããè¦å åæãè¡ãå ´åãäºæ¸¬ã¢ãã«ã¯å¿ ãããVSãæ大ã«ãªã(early stopping)ã¾ã§å¦ç¿ãããªãæ¹ãè¯ãã®ã§ã¯ãªããã¨èãã¦ãã¾ããä¸è¬ã«VSãæ大ã«ãªãã¾ã§ã¢ãã«ãå¦ç¿ãããå ´åãTSã¨VSã¯ä¹é¢ãããããã¢ãã«ããããªãã«ãã¤ãºãæ¾ã£ã¦ããã¨èããããããã§ãã
å¦ç¿éä¸ã§å®æçã«è©ä¾¡ææ¨ããããããã¦ã¿ãã¨ãæåã®æ¹ã¯TSã¨VSãåãå½¢ã§ä¼¸ã³ã¦ããã¾ãããå¾ã ã«TSã®æ¹ããããªããæçµçã«TSãæ¹åãç¶ããä¸æ¹VSã®æ¹ã¯ææ¨ãæªåãå§ãã¾ãã ãã®éç¨ã¯ãèªåã®é ã®ä¸ã§ã¯ä¸ã®ããã«ã¤ã¡ã¼ã¸ãã¦ãã¾ãããã
ã
æ¸å¿µã¨ãã¦æã£ã¦ããã®ã¯ãVSãæ大ã«ãªãã¾ã§å¦ç¿ãé²ããã¨å¦ç¿ãã¼ã¿ã®ãã¤ãºãç¸å¿ã«å«ãã ã¢ãã«ã¨ãªã£ã¦ãã¾ãã®ã§ã¯ãªããã¨ããç¹ã§ãã ãã®ãããè¦å ãåæããç®çã§èª¬æã¢ãã«ãç¨ããå ´åã¯ãå¿ ãããVSãæ大ã«ãããããªå¦ç¿ã§æ§ç¯ããã¢ãã«ãé©åã§ã¯ãªãã®ã§ã¯ãªãããã¨èãã¦ãã¾ãã
ä¸è¨ã®èª¬æã¯ãVSãä¸ããããã¾ã§å¦ç¿ãããã¢ãã«ã¨ãTSã¨VSã®ä¹é¢ãå°ãªãå¦ç¿éä¸ã®ã¢ãã«ã®ã©ã¡ããçã®ã¢ãã«ã«è¿ãããã¨ãã話ã«å¸°çããã¨æãã¾ãã
ãã¡ãã«ã¤ãã¦ã¯ããã®ãã¡æ¤è¨¼ãããã¨èãã¦ãã¾ããããããã©ãããã°æ¤è¨¼ãããã¨ã«ãªããé£ãããããããããã¼ã¿ãã¢ãã«ä¾åãªé¨åãããã¾ããããå¦ç¿ã®åæã»ä¸ç¤ã»çµç¤ã§ãã¢ãã«ã§åå²ã«ä½¿ç¨ãããå¤æ°ãSHAPå¤ãã©ãå¤åãããã調ã¹ãããããæãã®ã·ãã¥ã¬ã¼ã·ã§ã³ãã¼ã¿ãããã°ããã®SHAPå¤ã¨å®éã®ç®çå¤æ°ã«ä¸ãã¦ããè²¢ç®åº¦ãæ¯è¼ã§ããããªã©ã§ããããã è¯ãæ¹æ³ããã£ããããããã¯è¿ãããªè«æãªã©ãåç¥ã®æ¹ãããã£ããã£ããæãã¦ããã ããã°å¹¸ãã§ã...ï¼
ãã ãDARTï¼å¾é ãã¼ã¹ãã£ã³ã°ã®ã¢ã¼ãã®ï¼ã¤ï¼ã®è«æã§ã¯å¦ç¿çµç¤ã«æ§ç¯ããæ¨ã¯å¦ç¿åºç¤ã®æ¨ã¨æ¯ã¹ã¦ã¢ãã«å ¨ä½ã«ä¸ããå½±é¿ãå°ãããã¨ãåé¡ç¹ã¨ãã¦ææããã¦ãã¾ãããã®ãããå¦ç¿çµç¤ã®æ¨ãSHAPå¤ã«ä¸ããå½±é¿ã大ããã¯ãªãã®ããããã¾ãããã
ããã¾ã§æ¸ãã¦ä»åã¯åå°½ããã®ã§ãSHAPãå®éã«åããå 容ã¯æ¬¡å以éã«æã¡è¶ãã¾ãã
æç®ãªã¹ã
SHAPã«é¢ãã主ãªè«æãªã¹ãã¨ããã±ã¼ã¸ã¯ä»¥ä¸ã®éãã§ããç§ã¯SHAPç·è«ã¨TreeSHAPããèªãã¦ãã¾ããã
-
åé¡ã¢ãã«ã®å ´åãå説æå¤æ°ã®SHAPå¤ã®ç·åã¯ç¢ºçå¤ã§ã®åºåã§ã¯ä¸è´ãããæ¨ã®èã®å¤ï¼lightgbmãªãã·ã°ã¢ã¤ãå¤æãããåã®raw valueï¼ã«å¯¾ãã¦ä¸è´ããããã§ããshap.readthedocs.io↩
-
ä½è«ã§ãããå æé¢ä¿ãèããã®ã¯é常ã«é£ããã§ãã以åå°ãã ãå ææ¨è«ãæ±ã£ãã®ã§ãããèªèº«ã®çè«çãªç¥è/ãã¡ã¤ã³ç¥èã®ä¸è¶³ãçæããããã¾ããã↩
-
æ¬ããã°ã¯å¯æ¥ãç®çã§ã¯ãªãã®ã§ããªã³ã¯ãè¸ãã§ãç§ã®å ã«ãéã¯å ¥ãã¾ãããããã£ãã¨æãæ¬ã¯ç©æ¥µçã«ç´¹ä»ãã¦ããäºå®ãªã®ã§ãèå³ãããã°æ¯éèªãã§ããã ããã°ã¨æãã¾ãã↩