Q. ãã¥ã¼ã©ã«ãããã¯ã¼ã¯ã«ããã¦æ´»æ§åé¢æ°ã¯ãªãéç·å½¢ã§ããã®ãï¼
A. ç·å½¢ã®æ´»æ§åé¢æ°ã§ã¯éç·å½¢åé¢ã§ããªãããã
ãã¥ã¼ã©ã«ãããã¯ã¼ã¯ã®é ã層ã§ã¯æ´»æ§åé¢æ°ã¨ãã¦ã·ã°ã¢ã¤ãé¢æ°ãªã©ã®éç·å½¢é¢æ°ãç¨ããããã ãã®çç±ã¯ããç·å½¢é¢æ°ãæ´»æ§åé¢æ°ã«ç¨ããã¨ä¸é層/é ã層ããã£ã¦ãåç´ãã¼ã»ãããã³ã¨ã§ãããã¨ãåããã ããã ã¤ã¾ãç·å½¢æ´»æ§åé¢æ°ãç¨ãããã¥ã¼ã©ã«ãããã¯ã¼ã¯ã¯ãé ã層ã®ãªããã¥ã¼ã©ã«ããã(âåç´ãã¼ã»ãããã³)ã¨åå¤ã§ããã
証æ
ããããã ãã©æ°å¼ã®æã¡æ¹ããããã¬ãã°ã°ã£ããTeXã¯ãããªããããã©ãããã
ç°¡åãªä¾ã¨ãã¦"10層ãã¥ã¼ã©ã«ãããã¯ã¼ã¯"ãèããã
第2,3,4層ã®åºåãã¯ãã«ãa2,a3,a4ã¨ããã
2層ãã3層ã¸ã®å¤æè¡åãW3,biasãb3ã3層ãã4層ã¸ã®å¤æè¡åãW4,biasãb4ã¨ããã
ã¾ãæ´»æ§åé¢æ°ã¯ãã¯ãã«ãå¼æ°ã«ã¨ããã¯ãã«ãè¿ãf()ã¨ããã
確èªã¨ãã¦ç¬¬3層ãã第4層ã¸ã®å¤æãèããã¨ã
a4 = f(W4ã»a3 + b4)
ããããç¨ãã2,3,4層ã®è¨3層ã2層ã¨å¤ãããªããã¨ã示ãã
a4 = f(W4ã»a3 + b4)
ã¾ã
a3 = f(W3ã»a2 + b3)
â´
a4 = f(W4ã»{f(W3ã»a2 + b3)} + b4)
f()ã¯ç·å½¢é¢æ°ã¨ä»®å®ããã®ã§f(x)=kx+lã¨ããã
â´
a4 = f(W4ã»{k(W3ã»a2 + b3)+ l} + b4)
= f(W4ã»kã»W3ã»a2 + W4ã»b3 + W4ã»l + b4)
è¡åã¨ãã¯ãã«ãã¾ã¨ãã¦
W = W4ã»kã»W3
b = W4ã»b3 + W4ã»l + b4
ã¨ããã¨
a4 = f(Wã»a2 + b)
ãã£ã¦ç¬¬3層ããªããªãããããå層ã«ã¤ãã¦ç¹°ãè¿ããã¨ã«ãããæ´»æ§åé¢æ°ã«ç·å½¢é¢æ°ãç¨ããå ´åã¯ä¸é層ããªããªãã¨ããããfin.
ãã ãç´å¾ãããªã(ç§ã¯æåãããããã)ã¨ãããããããã®ããã«ç¶ãã
ãã½ã«ãªãã®ã¯W,bã«ã¾ã¨ããã¨ããã
ãã¥ã¼ã©ã«ãããã¯ã¼ã¯ã§ã¯weightã¨biasãå¦ç¿ããããã¨ã«ãã£ã¦åé¡çã®ç®æ¨ãéæãããã
ã¤ã¾ãweight(W)ãbias(b)ã¯èªç±ã«åãã¦ããã¨ãããã¨ã
W = W4ã»kã»W3ãã®å³è¾ºã«ã¯3ã¤ãå¤æ°ãããã®ã«ä¸ã¤ã«ãã¦ããã®ï¼ã¨æããªãããªãããW4,k,W3ãç¬ç«ã«åããã¦å¾ããããã®ã¯çµå±ã²ã¨ã¤ã®è¡åãã ããWã«ã¾ã¨ããããã
ãããªã¨ããï¼
æç®
ãã ã®ç·å½¢å¤æã«ãã£ã¦ãã¥ã¼ã©ã«ããããä½ã£ãæã ãã¢ãã«ã®è¡¨ç¾åãèããæãªããããã¨ãããã¼ã¼ã³ãã©ããã証æããï¼
人工ç¥çµã®æ´»æ§åé¢æ°ã¯ããããã¯ã¼ã¯ãå¼·åã¾ãã¯åç´åãããããªç¹æ§ãæã¤ãã®ãé¸ã°ãããå®éãç·åä¼éé¢æ°ã使ã£ãå¤å±¤ãã¼ã»ãããã³ã«ã¯ãå ¨ãç価ãªå層ãããã¯ã¼ã¯ãå¿ ãåå¨ãããå¾ã£ã¦ãå¤å±¤ãããã¯ã¼ã¯ã®å©ç¹ãçããã«ã¯éç·å½¢é¢æ°ãå¿ é ã§ããã
We have noted before that if we have a regression problem with non-binary network outputs, then it is appropriate to have a linear output activation function. So why not simply use linear activation functions on the hidden layers as well? With activation functions f(n)(x) at layer n, the outputs of a two-layer MLP are (2) (2) (1) (2) (2) (1) (1) (2) outk =fâoutj.wjk=fâfï£âiniwij.wjk ï£jï£ji so if the hidden layer activations are linear, i.e. f(1)(x) = x, this simplifies to (2) (2)  (1) (2) out =f  in. w w  k â i â ij jk  ï£iï£j  But this is equivalent to a single layer network with weights wik = â w(1)w(2 ) know that such a network cannot deal with non-linearly separable problems. L7-5 and we j ij jk