ä¸ã®è¨äºãèªãã§åå¼·ã«ãªã£ãã®ã§ãããæ°å¼ãããã¹ãã§èªã¿ã¥ããã£ãã®ã¨ãå¤å¤éã§ãªã1次å ã®æ£è¦åå¸ã®å°åºã®æ®µéã§ããããããªãã£ãã®ã§èª¿ã¹ã¦è¨äºã«ã¾ã¨ãã¾ãã
注æ
æ°å¼ã¯MathJax(JavaScriptã®ã©ã¤ãã©ãª)ã使ã£ã¦è¡¨ç¤ºãã¦ãã¾ã
SVGãæç»ã§ããªãã¨è¡¨ç¤ºãããªãã®ã§ãæè¿ã®ãã©ã¦ã¶ã§é²è¦§ãã¦ãã ãã
KLãã¤ãã¼ã¸ã§ã³ã¹(KullbackâLeibler divergence)
確çåå¸ã®å·®ã®å¤§ããã測ã尺度ã
æ©æ¢°å¦ç¿ã®åéã ã¨ãã©ã¡ã¼ã¿ã®æé©åãªã©ã¯ãçµå±KLãã¤ãã¼ã¸ã§ã³ã¹ã®æå°åã¨åãã«ãªããã¨ãå¤ãã
æ¬ã¨ãè«æãèªãã§ããã¨ããåºã¦ãã
å¼
2ã¤ã®ç¢ºçåå¸\(P, Q\)ãèãã
確çåå¸ãé£ç¶ç¢ºçåå¸ã®æKLãã¤ãã¼ã¸ã§ã³ã¹ã¯ä»¥ä¸ã®ããã«ãªã
$$D_{\mathrm{KL}}(P\|Q) = \int_{-\infty}^{\infty} p(x) \log \frac{p(x)}{q(x)} \; dx$$
æ§è³ª
常ã«0以ä¸ã§ããã0ã«ãªãã®ã¯2ã¤ã®åå¸ãçããæã§ãããã¨ã¨ã\(D_{\mathrm{KL}}(P\|Q) \ne D_{\mathrm{KL}}(Q\|P)\)ã§ãããã¨ã«æ³¨æ
ãã®ä»ã®è©³ããæ§è³ªã¯ä»¥ä¸ã®è¨äºåç
§
1次å æ£è¦åå¸ã®KLãã¤ãã¼ã¸ã§ã³ã¹
1次å
æ£è¦åå¸ã¯å¹³åã\(\mu\)ãåæ£ã\(\sigma^2\)ã¨ããã¨ã以ä¸ã®ãããªå¼ã«ãªã£ã¦ãã
$$N(\mu, \sigma)=\frac{1}{\sqrt{2\pi\sigma^{2}}} \exp\!\left(-\frac{(x-\mu)^2}{2\sigma^2} \ \right) $$
2ã¤ã®æ£è¦åå¸\(p(x)=N(\mu_1, \sigma_1), q(x)=N(\mu_2, \sigma_2)\)ã®éã®KLãã¤ãã¼ã¸ã§ã³ã¹ãèãã¾ã
$$D_{\mathrm{KL}}(P\|Q) = \int_{-\infty}^{\infty} p(x) \log \frac{p(x)}{q(x)} \; dx$$
ã¾ã\(\log\)ã¨ãã®ä¸èº«ã«ã¤ãã¦å¼å±éãã¾ã
$$
\begin{eqnarray*}
\log \frac{p(x)}{q(x)} &=& \log p(x) - \log q(x) \\
&=& \log \frac{1}{\sqrt{2\pi\sigma^{2}_1}} \exp\!\left(-\frac{(x-\mu_1)^2}{2\sigma^2_1} \ \right)- \log \frac{1}{\sqrt{2\pi\sigma^{2}_2}} \exp\!\left(-\frac{(x-\mu_2)^2}{2\sigma^2_2} \ \right) \\
&=& \left( -\frac{1}{2} \log 2\pi\sigma^{2}_1 - \frac{(x-\mu_1)^2}{2\sigma^2_1} \right) - \left( -\frac{1}{2} \log 2\pi\sigma^{2}_2 - \frac{(x-\mu_2)^2}{2\sigma^2_2} \right) \\
&=& \log \frac{\sigma_2}{\sigma_1} + \frac{(x-\mu_2)^2}{2\sigma^2_2} - \frac{(x-\mu_1)^2}{2\sigma^2_1}
\end{eqnarray*}
$$
ããã§ç¢ºçåå¸\(p(x)\)ã«åºã¥ãæå¾
å¤ã\(\mathrm{E}_p\left[\cdot\right]\)ã¨ããã¨ã以ä¸ã®ããã«æå¾
å¤ã使ã£ã¦è¡¨ããã¨ãã§ãã
$$
\begin{eqnarray*}
\int_{-\infty}^{\infty} p(x) \log \frac{p(x)}{q(x)} \; dx
&=& \mathrm{E}_p\left[ \log \frac{p(x)}{q(x)} \right] \\
&=& \mathrm{E}_p\left[\log \frac{\sigma_2}{\sigma_1}\right] + \mathrm{E}_p\left[\frac{(x-\mu_2)^2}{2\sigma^2_2}\right] - \mathrm{E}_p\left[\frac{(x-\mu_1)^2}{2\sigma^2_1}\right]
\end{eqnarray*}
$$
1ã¤ç®ã®æå¾
å¤ã¯\(x\)ã«é¢ããé
ãå«ã¾ãªãã®ã§ãã®ã¾ã¾ã«ãªãã
$$\mathrm{E}_p\left[\log \frac{\sigma_2}{\sigma_1}\right]=\log \frac{\sigma_2}{\sigma_1}$$
2ã¤ç®ã®æå¾
å¤ã¯確率変数の分散の定義ãã次ã®é¢ä¿\(\mathrm{E}_p\left[x^2\right] = \sigma^2_1 + \mu^2_1\)ãç¨ããã¨ã以ä¸ã®ããã«è¨ç®ã§ãã
$$
\begin{eqnarray*}
\mathrm{E}_p\left[\frac{(x-\mu_2)^2}{2\sigma^2_2}\right]&=&\frac{\mathrm{E}_p\left[x^2\right] + \mathrm{E}_p\left[-2x\mu_2\right] + \mathrm{E}_p\left[\mu_2^2\right]}{2\sigma^2_2} \\
&=& \frac{(\sigma^2_1 + \mu_1^2 ) - 2\mu_1\mu_2 + \mu_2^2}{2\sigma^2_2} \\
&=& \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2\sigma^2_2}
\end{eqnarray*}
$$
3ã¤ç®ã®æå¾
å¤ãåæ§ã«è¨ç®ã§ãã
$$
\begin{eqnarray*}
\mathrm{E}_p\left[\frac{(x-\mu_1)^2}{2\sigma^2_1}\right] &=& \frac{\mathrm{E}_p\left[x^2\right] - \mathrm{E}_p\left[2x \mu_1\right] + \mathrm{E}_p\left[\mu_1^2\right]}{2\sigma^2_1} \\
&=& \frac{(\sigma^2_1 + \mu^2_1) - 2\mu_1^2 + \mu_1^2}{2\sigma^2_1} \\
&=&\frac{1}{2}
\end{eqnarray*}
$$
ã¾ã¨ããã¨
$$
\begin{eqnarray*}
D_{\mathrm{KL}}(P\|Q) &=& \log \frac{\sigma_2}{\sigma_1} + \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2\sigma^2_2} - \frac{1}{2}
\end{eqnarray*}
$$
å¤å¤éæ£è¦åå¸ã®KLãã¤ãã¼ã¸ã§ã³ã¹
\(d\)次å
ã®å¤å¤éæ£è¦åå¸ã¯å¹³å(ãã¯ãã«)ã\(\vec \mu\)ãå
±åæ£è¡åã\(\Sigma\)ã¨ããã¨ã以ä¸ã®ãããªå¼ã«ãªã£ã¦ãã
$$N({\vec \mu}, \Sigma)=\frac{1}{\sqrt{(2\pi)^d|\Sigma|}} \exp\!\left(-\frac{1}{2}({\vec x}-{\vec \mu})^\mathrm{T} \Sigma^{-1} ({\vec x}-{\vec \mu}) \right) $$
2ã¤ã®æ£è¦åå¸\(p({\vec x})=N(\vec\mu_1, \Sigma_1), q({\vec x})=N(\vec \mu_2, \Sigma_2)\)ã®éã®KLãã¤ãã¼ã¸ã§ã³ã¹ãèãã¾ã
$$D_{\mathrm{KL}}(P\|Q) = \int_{-\infty}^{\infty} p({\vec x}) \log \frac{p({\vec x})}{q({\vec x})} \; d{\vec x}$$
ã¾ã\(\log\)ã¨ãã®ä¸èº«ã«ã¤ãã¦å¼å±éãã¾ã
$$
\begin{eqnarray*}
\log \frac{p({\vec x})}{q({\vec x})} &=& \log p({\vec x}) - \log q({\vec x}) \\
&=& \log \frac{1}{\sqrt{(2\pi)^d|\Sigma_1|}} \exp\!\left(-\frac{1}{2}({\vec x}-{\vec \mu_1})^\mathrm{T} \Sigma_1^{-1} ({\vec x} -{\vec \mu_1}) \right) \\
&&- \log \frac{1}{\sqrt{(2\pi)^d|\Sigma_2|}} \exp\!\left(-\frac{1}{2}({\vec x}-{\vec \mu_2})^\mathrm{T} \Sigma_2^{-1} ({\vec x}-{\vec \mu_2}) \right) \\
&=& \left( -\frac{d}{2} \log 2\pi - \frac{1}{2} \log |\Sigma_1| - \frac{1}{2}({\vec x}-{\vec \mu_1})^\mathrm{T} \Sigma_1^{-1} ({\vec x} -{\vec \mu_1}) \right)\\
&& - \left( -\frac{d}{2} \log 2\pi - \frac{1}{2} \log |\Sigma_2| - \frac{1}{2}({\vec x}-{\vec \mu_2})^\mathrm{T} \Sigma_2^{-1} ({\vec x} -{\vec \mu_2}) \right)\\
&=& \frac{1}{2}\log \frac{|\Sigma_2|}{|\Sigma_1|} + \frac{1}{2}({\vec x}-{\vec \mu_2})^\mathrm{T} \Sigma_2^{-1} ({\vec x} -{\vec \mu_2}) - \frac{1}{2}({\vec x}-{\vec \mu_1})^\mathrm{T} \Sigma_1^{-1} ({\vec x} -{\vec \mu_1})
\end{eqnarray*}
$$
1次å
æ£è¦åå¸ã®å ´åã¨åæ§ã«KLãã¤ãã¼ã¸ã§ã³ã¹ãæå¾
å¤ã§è¡¨ãã¨ä»¥ä¸ã®ããã«ãªã
$$
\begin{eqnarray*}
D_{\mathrm{KL}}(P\|Q)
&=& \mathrm{E}_p\left[\frac{1}{2}\log \frac{|\Sigma_2|}{|\Sigma_1|}\right]\\
&& + \mathrm{E}_p\left[\frac{1}{2}({\vec x}-{\vec \mu_2})^\mathrm{T} \Sigma_2^{-1} ({\vec x} -{\vec \mu_2})\right]\\
&& - \mathrm{E}_p\left[\frac{1}{2}({\vec x}-{\vec \mu_1})^\mathrm{T} \Sigma_1^{-1} ({\vec x} -{\vec \mu_1})\right]
\end{eqnarray*}
$$
第1é
ã¯å®æ°ãªã®ã§
$$\mathrm{E}_p\left[\frac{1}{2}\log \frac{|\Sigma_2|}{|\Sigma_1|}\right] = \frac{1}{2}\log \frac{|\Sigma_2|}{|\Sigma_1|}$$
第2é
以éã¯è¡åã®äºæ¬¡å½¢å¼ã®æ§è³ª\(({\vec x}-{\vec \mu})^\mathrm{T} \Sigma^{-1} ({\vec x} -{\vec \mu})=\mathrm{tr}\left\{({\vec x}-{\vec \mu})^\mathrm{T} \Sigma^{-1} ({\vec x} -{\vec \mu})\right\}\)ã¨ãã¬ã¼ã¹ã®æ§è³ª\(\mathrm{tr}\left\{({\vec x}-{\vec \mu})^\mathrm{T} \Sigma^{-1} ({\vec x} -{\vec \mu})\right\}=\mathrm{tr}\left\{\Sigma^{-1}({\vec x}-{\vec \mu}) ({\vec x} -{\vec \mu})^\mathrm{T} \right\}\)ãç¨ãã
$$
\begin{eqnarray*}
\mathrm{E}_p\left[({\vec x}-{\vec \mu_2})^\mathrm{T} \Sigma_2^{-1} ({\vec x} -{\vec \mu_2})\right]
&=&\mathrm{E}_p\left[\mathrm{tr}\left\{\Sigma_2^{-1}({\vec x}-{\vec \mu_2}) ({\vec x} -{\vec \mu_2})^\mathrm{T} \right\}\right]\\
&=&\mathrm{tr}\left\{\Sigma_2^{-1} \mathrm{E}_p\left[({\vec x}-{\vec \mu_2}) ({\vec x} -{\vec \mu_2})^\mathrm{T}\right] \right\}\\
&=&\mathrm{tr}\left\{\Sigma_2^{-1} \left(\mathrm{E}_p\left[{\vec x}{\vec x}^\mathrm{T}\right] - \mathrm{E}_p\left[{\vec x}{\vec \mu_2}^\mathrm{T}\right] - \mathrm{E}_p\left[{\vec \mu_2}{\vec x}^\mathrm{T}\right] + \mathrm{E}_p\left[{\vec \mu_2}{\vec \mu_2}^\mathrm{T}\right]\right) \right\}\\
&=&\mathrm{tr}\left\{\Sigma_2^{-1} \left(\mathrm{E}_p\left[{\vec x}{\vec x}^\mathrm{T}\right] - 2\mathrm{E}_p\left[{\vec x}{\vec \mu_2}^\mathrm{T}\right] + \mathrm{E}_p\left[{\vec \mu_2}{\vec \mu_2}^\mathrm{T}\right]\right) \right\}
\end{eqnarray*}
$$
ä¸ã®å¼ã®æå¾ã§ãã¬ã¼ã¹å
ã§è»¢ç½®ãã¨ã£ã¦ãçããã¨ããæ§è³ª\(\mathrm{tr}\left\{{\vec \mu_2}{\vec x}^\mathrm{T}\right\}=\mathrm{tr}\left\{\left({\vec \mu_2}{\vec x}^\mathrm{T}\right)^\mathrm{T}\right\}=\mathrm{tr}\left\{{\vec x}{\vec \mu_2}^\mathrm{T}\right\}\)ãç¨ãã
ããã§共分散行列の定義\(\Sigma_1=\mathrm{E}_p\left[{({\vec x}-{\vec \mu_1})}{({\vec x}-{\vec \mu_1})}^\mathrm{T}\right]=\mathrm{E}_p\left[{\vec x}{\vec x}^\mathrm{T}\right]-\mathrm{E}_p\left[{\vec x}\right]\mathrm{E}_p\left[{\vec x}^\mathrm{T}\right]\)ãç¨ãã¦
$$
\begin{eqnarray*}
\mathrm{E}_p\left[({\vec x}-{\vec \mu_2})^\mathrm{T} \Sigma_2 ({\vec x} -{\vec \mu_2})\right]
&=&\mathrm{tr}\left\{\Sigma_2^{-1} \left(\mathrm{E}_p\left[{\vec x}{\vec x}^\mathrm{T}\right] - 2\mathrm{E}_p\left[{\vec x}{\vec \mu_2}^\mathrm{T}\right] + \mathrm{E}_p\left[{\vec \mu_2}{\vec \mu_2}^\mathrm{T}\right]\right) \right\}\\
&=&\mathrm{tr}\left\{\Sigma_2^{-1} \left(\left(\Sigma_1 + {\vec \mu_1}{\vec \mu_1}^\mathrm{T} \right) - 2{\vec \mu_1}{\vec \mu_2}^\mathrm{T} + {\vec \mu_2}{\vec \mu_2}^\mathrm{T}\right) \right\}\\
&=&\mathrm{tr}\left\{\Sigma_2^{-1} \left(\Sigma_1 + ({\vec \mu_1} - {\vec \mu_2})({\vec \mu_1} - {\vec \mu_2})^\mathrm{T}\right) \right\}\\
&=&\mathrm{tr}\left\{\Sigma_2^{-1}\Sigma_1\right\} + ({\vec \mu_1} - {\vec \mu_2})^\mathrm{T} \Sigma_2^{-1} ({\vec \mu_1} - {\vec \mu_2})
\end{eqnarray*}
$$
次ã«ç¬¬3é
ã«ã¤ãã¦ãåæ§ã«è¨ç®ãã
ãã ã\(\mathrm{E}_p\left[{({\vec x}-{\vec \mu_1})}{({\vec x}-{\vec \mu_1})}^\mathrm{T}\right]\)ã¯共分散行列の定義ãã®ãã®ã§ãã
$$
\begin{eqnarray*}
\mathrm{E}_p\left[({\vec x}-{\vec \mu_1})^\mathrm{T} \Sigma_1^{-1} ({\vec x} -{\vec \mu_1})\right]
&=&\mathrm{E}_p\left[\mathrm{tr}\left\{\Sigma_1^{-1}({\vec x}-{\vec \mu_1}) ({\vec x} -{\vec \mu_1})^\mathrm{T} \right\}\right]\\
&=&\mathrm{tr}\left\{\Sigma_1^{-1} \mathrm{E}_p\left[({\vec x}-{\vec \mu_1}) ({\vec x} -{\vec \mu_1})^\mathrm{T}\right] \right\}\\
&=&\mathrm{tr}\left\{\Sigma_1^{-1} \Sigma_1\right\}\\
&=&\mathrm{tr}\left\{I\right\}\\
&=&d
\end{eqnarray*}
$$
以ä¸ãã¾ã¨ããã¨ä»¥ä¸ã®ããã«ãªã
$$
\begin{eqnarray*}
D_{\mathrm{KL}}(P\|Q)
&=& \mathrm{E}_p\left[\frac{1}{2}\log \frac{|\Sigma_2|}{|\Sigma_1|}\right]\\
&& + \mathrm{E}_p\left[\frac{1}{2}({\vec x}-{\vec \mu_2})^\mathrm{T} \Sigma_2^{-1} ({\vec x} -{\vec \mu_2})\right]\\
&& - \mathrm{E}_p\left[\frac{1}{2}({\vec x}-{\vec \mu_1})^\mathrm{T} \Sigma_1^{-1} ({\vec x} -{\vec \mu_1})\right]\\
&=& \frac{1}{2}\left[\log \frac{|\Sigma_2|}{|\Sigma_1|} + \mathrm{tr}\left\{\Sigma_2^{-1}\Sigma_1\right\} + ({\vec \mu_1} - {\vec \mu_2})^\mathrm{T} \Sigma_2^{-1} ({\vec \mu_1} - {\vec \mu_2}) -d\right]
\end{eqnarray*}
$$
ã¾ã¨ã
ã¨ããããã§æ£è¦åå¸éã®KLãã¤ãã¼ã¸ã§ã³ã¹ãå°åºã§ãã
ä¸ãèªãã°ãããããã«çµæ§æéããããã¾ãã
以ä¸ã®ããã«1次å ã¨\(d\)次å ããããã®æ£è¦åå¸ã®KLãã¤ãã¼ã¸ã§ã³ã¹ã並ã¹ã¦è¦æ¯ã¹ãã¨ãå¼ã¨ãã¦ã¯åãå½¢ããã¦ããã®ãããã
1次å
ã®æ£è¦åå¸ã®KLãã¤ãã¼ã¸ã§ã³ã¹ãè¦æ¯ã¹ãããããã«å¤å½¢ãããã®
$$
\begin{eqnarray*}
D_{\mathrm{KL}}(P\|Q) &=& \frac{1}{2}\left[\log \frac{\sigma_2^2}{\sigma_1^2} + \frac{\sigma_1^2}{\sigma^2_2} + \frac{(\mu_1 - \mu_2)^2}{\sigma^2_2} - 1 \right]
\end{eqnarray*}
$$
\(d\)次å
ã®å¤å¤éæ£è¦åå¸ã®KLãã¤ãã¼ã¸ã§ã³ã¹
$$
\begin{eqnarray*}
D_{\mathrm{KL}}(P\|Q) &=& \frac{1}{2}\left[\log \frac{|\Sigma_2|}{|\Sigma_1|} + \mathrm{tr}\left\{\Sigma_2^{-1}\Sigma_1\right\} + ({\vec \mu_1} - {\vec \mu_2})^\mathrm{T} \Sigma_2^{-1} ({\vec \mu_1} - {\vec \mu_2}) -d\right]
\end{eqnarray*}
$$
åè
- 多変量(多次元)正規分布のKLダイバージェンスの求め方 - EchizenBlog-Zwei
- normal distribution - KL divergence between two univariate Gaussians - Cross Validated
- normal distribution - KL divergence between two multivariate Gaussians - Cross Validated
- The Matrix Cookbook(PDF)
å
±åæ£è¡åã®äºæ¬¡å½¢å¼ã®æå¾
å¤ã¨ãã¯ä¸ã§ãã£ãããã«ããã°ã£ã¦è¨ç®ããªãã§ãThe Matrix Cookbook(PDF)ã®8.2ç« ã«å
¬å¼ã¨ãã¦æ¸ãã¦ãã£ãããã¾ã
The Matrix Cookbook(PDF)ã¯ããããªè¡åã®å
¬å¼ãã¾ã¨ã¾ã£ã¦ãã¦ããããããã§ã