å¤åè¿ä¼¼ï¼Variational Approximationï¼ã®åºæ¬ï¼ï¼ï¼
ååã®è¨äºã§å¤åè¿ä¼¼ã¯ãã£ããéããã®ã§ãããä»å¾ããã§é »ç¹ã«ä½¿ã£ã¦ãããã¨èãã¦ããã®ã§ã¨ãããããç´¹ä»ã§ãã
å¤åè¿ä¼¼ï¼variational approximationï¼ã¨ã¯ã確çåå¸ãè¿ä¼¼çã«æ±ããæ¹æ³ã®ã²ã¨ã¤ã§ã*1ãä¸è¬çã«ã¯ç¢ºçåå¸ãæ±ããã«ã¯æ£è¦åï¼ç©åãã¦ï¼ã«ãªãããã«ããï¼ããªããã°ãªããªãã®ã§ãããè¤éãªåå¸ï¼ä¾ãã°æ½å¨å¤æ°ã¢ãã«ã®äºå¾åå¸ï¼ã«ãªã£ã¦ããã¨ãã©ããã¦ã解æçã«ç©åãã§ããªããªã£ã¦ãã¾ãã¾ããå¤åè¿ä¼¼ã§ã¯ãã®ãããªè¤éããã¦æ£è¦åã§ããªããããªç¢ºçåå¸ãããã£ã¨ã·ã³ãã«ãªç¢ºçåå¸ãã¡ã®ç©ã«å解ããï¼ï¼ç¬ç«æ§ãä»®å®ããï¼ãã¨ã«ããè¿ä¼¼ãã¾ããå解ãä»®å®ãããã¨ã«ãã£ã¦å¤æ°ã®ä¾åé¢ä¿ãç°¡ç¥åããæ°å¤æé©åã§ããã¨ããã®åå¾®åã使ã£ãå¾é æ³ã¨ä¼¼ããããªãã¨ã確çåå¸ã®æ¨è«ã«å¯¾ãã¦ãè¡ããããã«ãªãã¾ãã
ããã使ããããã«ãªãã¨ãæ§ã ãªãã¼ã¿ãµã¤ã¨ã³ã¹ã®èª²é¡ã«åããã¦ç¢ºçã¢ãã«ãä½ã*2ãèªåã§èªç±ã«åå¸æ¨å®ãã§ããããã«ãªãã¾ããå®éã«ãç»åãé³å£°ãéèãã¼ã¿ãçå½æ å ±ãèªç¶è¨èªãå種ã»ã³ãµã¼ãã¼ã¿ãªã©ãç¾å¨ã¾ã§ã§ã»ã¼ãã¹ã¦ã®æ©æ¢°å¦ç¿ã®åé¡ã«é©ç¨ããã¦ãã¦ãã¾ãã
Â
[å¿ è¦ãªç¥è]
ä¸è¨ãããã£ã¨ã ã確èªãã¦ããã¨ããã§ãã
- 確çã®å æ³å®çï¼sum ruleï¼ã¨ä¹æ³å®çï¼product ruleï¼ããã¤ãºã®å®çï¼Bayes' theoremï¼
- KL divergenceÂ
Â
ä»ã次ã®ãããªç¢ºçã¢ãã«ãèãããã¨æãã¾ãã
\[ p(x,z) \]
$x$ã¯è¦³æ¸¬ãã¼ã¿ã§ã$z$ã¯æ¨å®ãããæªç¥ã®å¤æ°ï¼æ¬ æå¤ããã©ã¡ã¼ã¿ãæªæ¥ã®äºæ¸¬å¤ãªã©ï¼ã§ãã¨ãã«å¤æ¬¡å ãã¯ãã«ã£ã¦ãã¨ã«ãã¦ããã¦ãã ãããä»åã¯é£ç¶å¤ãåãå¤æ°ãä»®å®ãã¾ãããé¢æ£å¤ã§ãã¾ã£ããåãè°è«ã«ãªãã¾ãã
æ©æ¢°å¦ç¿ã®ç®çã¯zã®äºå¾åå¸$p(z|x)$ãä¸è¨ã®ããã«ãã¤ãºã®å®çãç¨ãã¦æ¨å®ãããã¨ã§ãã
\[ p(z|x) = \frac{p(x|z)p(z)}{p(x)} = \frac{p(x|z)p(z)}{\int p(x,z) dz} \tag{1} \]
ä¾ãã°æ®éã«$x$ã¨$z$ãã¨ãã«ã¬ã¦ã¹åå¸ã«å¾ããããªã¢ãã«ã§ã¯ãå¼(1)ã®åæ¯ã®ç©åãå ¬å¼ã使ãã°ç°¡åã«è¡ããã®ã§ãäºå¾åå¸ã¯æ®éã«æè¨ç®ã§ä¸çºã§è§£ãã¾ããããã解æçã«è§£ããã¨ããclosed formã§è§£ããã¨ãã£ã¦è¨ãã¾ãã
ãã ããä»åã¯ãããã©ããã¦ãã§ããªãã¨ä»®å®ãã¾ããã¤ã¾ãå¼(1)ã®ç©åè¨ç®ããã¡ããã¡ãè¤éã§ã解æ解ãå¾ãããªãç¶æ ã«ããã¨ãã¾ãã
Â
ããããã¨ãã«ç»å ´ããã®ãå¤åè¿ä¼¼ã®ãããªè¿ä¼¼æ¨è«æ³ã§ããäºå¾åå¸ã次ã®ãããªå¥ã®é¢æ°å½¢ã§è¿ä¼¼ãã¾ãã
\[ p(z|x) \approx q(z) \]
$q(z)$ã®å ·ä½çãªé¢æ°ï¼ã¬ã¦ã¹åå¸ã ã¨ãï¼ã¯ä»®å®ãã¦ããªããã¨ã«æ³¨æãã¦ãã ããã
ä»ãããããã¨ã¯ã$q(z)$ã$p(z|x)$ã¨ããªãã¹ããä¼¼ããããã«ãããã¨ãããã¨ã§ãã
ï¼ã¤ã®ç¢ºçåå¸ãã©ãã ããä¼¼ã¦ããªãããã表ãææ¨ã®ï¼ã¤ã¨ãã¦ãKL divergenceãããã¾ããä¾ãã°ãæ··ä¹±ãé¿ããããã«ç¢ºçå¤æ°$w$ãä¸æçã«ä½¿ãã¨ã確çåå¸$p(w)$ã¨$q(w)$ã®éã®ï¼q(w)ããè¦ã*3ï¼KL divergenceã¯
\[ KL(q(w)||p(w)) = - \int q(w) \ln \frac{p(w)}{q(w)} dw \]
ã®ããã«å®ç¾©ããã¾ãã$q(w)=p(w)$ãæãç«ã¤ã¨ããã®å¼ã¯0ã«ãªãã¾ãã
ä»åã¯äºã¤ã®ç¢ºçåå¸$p(z|x)$ã¨$q(z)$ããªãã¹ããä¼¼ããããã®ã§ããã®2ã¤ã®ç¢ºçåå¸ã®éã®KL divergenceãæå°åãããã¨ã«ããç®çãéæãããã¨æãã¾ããã¤ã¾ãã
\[ KL(q(z)||p(z|x)) = - \int q(z) \ln \frac{p(z|x)}{q(z)} dz \]
ãæå°ã«ãããããª$q(z)$ãæ±ãããã¨ãç®æ¨ã«ãªãã¾ãã
Â
ãããããã§çåãæ®ãã¾ãã
$p(z|x)$ã¯ãç©åããã§ããªããã®ã®ã確ãã«ä½ããã®å½¢ç¶ãåå¨ãããããªç¢ºçåå¸ã§ãããããæåã®ä»®å®ã®éãããã®åå¸ã¯ç´æ¥æè¨ç®ããã¦æ±ãããã¨ã¯ã§ããªããç´æ¥è¨ç®ã§ããªãåå¸ã¨ãè¿ä¼¼åå¸$q(z)$ã®éã®è·é¢ãããã£ããã©ããã£ã¦ç¸®ããã®ãï¼
Â
ã¡ãã£ã¨é·ããªã£ãã®ã§ããã§ãã£ããåãã¾ãã
[ç¶ãã»é¢é£]
å¤åè¿ä¼¼ï¼Variational Approximationï¼ã®åºæ¬ï¼ï¼ï¼ - ä½ã£ã¦éã¶æ©æ¢°å¦ç¿ã
ä»åã®è¨äºãããããããï¼ã¨ããæ¹ã«ã¯ï¼æ¬¡ã®ãããªå ¥éæ¸ãããã¾ãï¼
*1:ä»ã«ããå¤åæ¨è«ï¼variational inferenceï¼ã¨ããã åã«å¤åæ³ï¼variational methodï¼ã¨ãã£ã¦å¼ãã ãããã¾ãããã¤ãºã¢ãã«ã§ãããã¨ã強調ããå ´åã«ã¯ãå¤åãã¤ãºï¼variational Bayesï¼ã¨å¼ã¶ãã¨ãããã¾ãã
*2:確çã¢ãã«ã®ä½ãæ¹ã«é¢ãã¦ã¯ã°ã©ãã£ã«ã«ã¢ãã«ã®è¨äºããåèãã ããã
http://machine-learning.hatenablog.com/entry/2016/02/10/184755
*3:ä¸è¬ã«$KL(q||p)$ã¨$KL(p||q)$ã¯ä¸è´ãã¾ããã