ããã¹ãããç»åãçæããGANã¾ã¨ã
ãã®è¨äºã¯ï¼ããã¹ãããç»åãçæããGANã«ã¤ãã¦æ¨ªæçã«ã¾ã¨ãããã¨ãç®æãã¾ããï¼
"text-to-image"ã¨å¼ã°ããã¿ã¹ã¯ã§ããï¼ããã¹ãï¼ãã£ãã·ã§ã³ï¼ãæ¡ä»¶ã¨ãã¦ï¼ãã®ããã¹ãã«ããç»åãçæãããã¨ãç®æãã¾ãï¼
æåãªç 究ã§ã¯ï¼StackGANãããã¾ãï¼
Â
Â
Â
以ä¸ç®æ¬¡ã§ãï¼
- References
- ãªãããã¹ãããç»åãçæããã®ãï¼
- ã©ãããçæã¢ãã«ãåªãã¦ããã®ãï¼
- ãã®ã¿ã¹ã¯ãå°é£ã«ãã¦ããè¦å ã¯ï¼
- text-to-image synthesisã®ç 究ã®æµãã¯ï¼Â
- ã©ãããç»åãçæãããï¼
- 使ããã¦ãããã¼ã¿ã»ããã¯ï¼
- çæã§ãã解å度ã¯ï¼
- ãããã¯ã¼ã¯æ§é ã¯ã©ããªã£ã¦ãããï¼
- Lossé¢æ°ãDiscriminatorã«å·¥å¤«ã¯ãããï¼
- éæ
- ãããã«
Â
References
[1] GAN-INT-CLS: Generative Adversarial Text to Image Synthesis, ICML 2016 (link)
[2] GAWWN: Learning What and Where to Draw, NIPS 2016 (link)
[3]Â StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks, ICCV 2017 (link)
[4] TAC-GAN: Text Conditioned Auxiliary Classifier Generative Adversarial Network, arXiv 2017 (link)
[5] StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks, arXiv 2017 (link)
[6] AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks, CVPR 2018Â (link)
[7] FusedGAN: Semi-supervised FusedGAN for Conditional Image Generation, arXiv 2018 (link)
[8] HDGAN: Photographic Text-to-Image Synthesis with a Hierarchically-nested Adversarial Network, arXiv 2018 (link)
Â
Â
Â
ãªãããã¹ãããç»åãçæããã®ãï¼
[6]ã«ã¯ï¼æ¬¡ã®ãããªå¿ç¨å ãæãããã¦ãã¾ãï¼
ãã»ã¢ã¼ãçæ
ãã»ã³ã³ãã¥ã¼ã¿è£å©ã§ã®ãã¶ã¤ã³
ãã»ãã«ãã¢ã¼ãã«å¦ç¿ã®çºå±
ãã»ãã¸ã§ã³ã¨è¨èªã®é¢é£ã¥ã
Â
ããã«ï¼ITmediaã®è¨äºã§ã¯ããã«å ·ä½çã«ï¼
ãã»é³å£°ã§åçä¿®æ£
ãã»èæ¬ããã¢ãã¡æ ç»ã®çæ
Â
ãªã©ãæãããã¦ãã¾ãï¼
Â
Â
Â
Â
ã©ãããçæã¢ãã«ãåªãã¦ããã®ãï¼
"text-to-image"ã®ã¿ã¹ã¯ã«éå®ããå ´åã§ãï¼ã©ã®ãããªç»åãçæã§ããã®ãæã¾ãããã¯ï¼[7]ãªã©ããããã®è«æä¸ã«è¨è¿°ããã¦ãã¾ããï¼ãããã®æ§è³ªãã¾ã¨ããã¨
Â
ãã»fidelityï¼ãã£ãã·ã§ã³ã«å¿ å®ãªç»åãçæã§ããã
ãã»diversityï¼å¤æ§ãªç»åãçæã§ãã
ãã»discriminableï¼çæç»åãï¼äººãï¼èå¥ã§ãã
ãã»high resolutionï¼é«è§£åã§çæã§ãã
ãã»realisticï¼èªç¶ç»åã«è¿ãçæãã§ãã
ãã»contollableï¼ã©ã®ãããªç»åãçæããããã³ã³ããã¼ã«ã§ãã
ã
ã¨ãªãã§ãããï¼
discriminableã¨high resolutionãåããã£ã¦realisticã¨è¨ããããããã¾ããï¼GANã®çæç»åã®è©ä¾¡æ¹æ³ã確ç«ããã¦ããªãããã«ï¼ä½ããªã¢ã«ãï¼ä½ãèªç¶ãã¯ææ§ãªã¾ã¾ã§ãï¼
ä¸è¨ã®æ§è³ªå ¨ã¦ãå ¼ãåããç´ æ´ãããã¢ãã«ã¯ã¾ã ææ¡ããã¦ãã¾ããï¼åææ³ãããããããã¤ãã®æ§è³ªãéæãã¦ããï¼è«æä¸ã§ã¯éæããæ§è³ªã®ã¿ãã®ã¿ã¹ã¯ã§å¿ è¦ãªæ§è³ªãªãã ãã¨è¨ã£ãé¡ããã¦ãã¾ãï¼
Â
Â
Â
ãã®ã¿ã¹ã¯ãå°é£ã«ãã¦ããè¦å ã¯ï¼
ããã¹ãï¼ãã£ãã·ã§ã³ï¼ããç»åãçæããã¿ã¹ã¯ãé£ãããã¦ããç¹ã«ã¤ãã¦ã¾ã¨ããã¨ï¼
Â
ãã»èªç¶ç»åã®é«æ¬¡å 空é
ãã»ããã¹ã空éã¨ç»å空éã®éã
ãã»ãã¼ã¿ã®ç¨æ
Â
ã¨ãªãã§ãããï¼
 ããããé ã追ã£ã¦èª¬æãã¾ãï¼
Â
Â
èªç¶ç»åã®é«æ¬¡å 空é
GANãæ±ãã¨å¿ ãã¨ãã£ã¦åºã¦ãã話ã§ããï¼èªç¶ç»åã¯ãã¯ã»ã«ç©ºéã¨ããé«æ¬¡å ã®ç©ºéã®ããä¸é¨åã«ã®ã¿åå¨ãã¦ãã¾ã(å¤æ§ä½ã®è©±)ï¼ãªã®ã§ï¼ã©ã³ãã ã«ãã¯ã»ã«ãé¸ãã§ç©ä½ãæ ã£ããããªç»åãçæããã®ã¯éããªãç¡çã§ãï¼ãµã¤ãºã256x256ã§RGBã®ç»åã ã¨ï¼0~255ãã¨ã256*256*3次å ã®ç©ºéããæå³ãããã¯ã»ã«ã®çµã¿åãããå¼ãå½ã¦ããã¨ã«ãªãã¾ãï¼
ãã®ãããªåé¡ãããä¸ã§ï¼GANã«ãã£ã¦é«æ¬¡å ã®åå¸ãã¢ãã«åãããã¨ãã§ããããã«ãªãï¼ããã·ã£ã¼ãã§ãªã¢ã«ãªç»åãä½ãããããã«ãªãã¾ããï¼ããããªããï¼çæããã解å度ãä¸ããã«ã¤ãããã«é«æ¬¡å ã«ãªãããï¼ãããªã工夫ãå¿ è¦ã«ãªãã¾ãï¼
ãã®å·¥å¤«ãï¼æ¡ä»¶ãå ¥åããã¨ãããã¨ãï¼æ失é¢æ°ï¼ãã¤ãã¼ã¸ã§ã³ã¹ã®å·¥å¤«ï¼ããã«ã¯ï¼çæãï¼æ®µéã«ãããï¼å±¤ãprogressiveã«å¢ããã¦ãããã¨ã§ããï¼
"text-to-image"ã®ã¿ã¹ã¯ã¯GANï¼ç¹ã«Conditional GANï¼ã«ãã£ã¦å¤§ããåé²ãã¾ããï¼Â
ãã¡ããGANãå©ç¨ãããã¨ã«ãã£ã¦ï¼è¨ç·´ã®ä¸å®å®æ§ã®åé¡ãä»éãã¦ãã¾ãï¼
Â
Â
ããã¹ã空éã¨ç»å空éã®éã
è¨èã¨ç»åãåæã«æ±ãã¨ãã multi-modalãªã¿ã¹ã¯ã«ãªãããï¼textã®ç¹å¾´ã¨imageã®ç¹å¾´ããã¾ãæ±ããªããã°ãªãã¾ããï¼
åçã«æ ã£ã¦ãããã®ãï¼æç« ã使ã£ã¦å®ç§ã«è¡¨ç¾ãããã¨ã¯äººã§ãã§ãã¾ããï¼ããã¹ãã«å¿ å®ã«ç»åãçæã§ããã¨ãã¦ãèæ¯ã«ä½ãæ ã£ã¦ãããï¼ããç´°é¨ã¯ã©ããªã£ã¦ããããªã©ãããã¹ãã«å«ã¾ãã¦ããªãå ´åï¼çæãããç»åã«å¶ç´ãããããï¼ãã¡ããã¡ãã«ãªã£ã¦ãã¾ãããããã¾ããï¼
ç¾å¨ã§ã¯ï¼ããã¹ãã§æå®ã§ããªãé¨åã¯ï¼ï¼ä¾ãã°èæ¯ãã¹ã¿ã¤ã«æ§ã ãªè¦ç´ ãã¾ã¨ãã¦ï¼ãã¤ãºãã¯ãã«ã«æ å½ããï¼ç»åãçæããããã¨ãè¡ããã¦ãã¾ãï¼
Â
Â
ãã¼ã¿ã®ç¨æ
ãã£ãã·ã§ã³ã¤ããã¼ã¿ã大éã«ç¨æãããã¨ãé£ããã¨ãããã¨ã§ãï¼
å°ãªããã¼ã¿ã§å¦ç¿ãè¡ã工夫ãããã¦ãã¾ãï¼
ããã¹ãã®åãè¾¼ã¿ç©ºéã¯é«æ¬¡å ã§ããã®ã§ï¼ããã«å¯¾ãã¦ãã¼ã¿æ°ãå°ãªãã¨ï¼Generatorãéé£ç¶ã®å¤æ§ä½ãå¦ç¿ãã¦ãã¾ãã¨ããåé¡ãããã¾ãï¼ããã§[3]ã§ã¯ï¼å¦ç¿ããå¤æ§ä½ãæ»ããã«ãï¼éå¦ç¿ãé¿ãã工夫ï¼Conditioning Augumentationï¼ãåãå ¥ãããã¦ãã¾ãï¼
ã¾ãï¼[7]ã§ã¯ï¼unsupervisedã§å¦ç¿ã§ãããããã¯ã¼ã¯ã¨ï¼supervisedã§å¦ç¿ãããããã¯ã¼ã¯ã®ï¼ã¤ã«åé¢ãããã¨ã§ï¼å¤§éã«åå¨ãããã£ãã·ã§ã³ãæããªãç»åã§unsupervisedã®ãããã¯ã¼ã¯ãå¦ç¿ãï¼çµæã¨ãã¦supervisedå´ã®ãããã¯ã¼ã¯ã®åä¸ã«ä½ç¨ããããã«å·¥å¤«ãã¦ãã¾ãï¼ãã®ããã«æ£è§£ã©ãã«ãæããªãç»åã使ããã¨ããã§å¦ç¿ã«å©ç¨ããã¨ããæ¹æ³ã¯ï¼ç·ç»åã®ç 究ã§ãè¦ããã注ç®ãã¹ãæ¹æ³ã ã¨æãã¾ãï¼
Â
Â
Â
Â
Â
以ä¸ã§ã¯ï¼ããããã®ææ³ã«ã¤ãã¦æ´çãã¦ããããã¨æãã¾ãï¼
Â
text-to-image synthesisã®ç 究ã®æµãã¯ï¼Â
ã¾ãï¼text-to-imageã®ã¿ã¹ã¯ã«GANãç¨ããç 究ã®æµãã«ã¤ãã¦ï¼[8]ã®Related Workãåèã«ãã¦èª¬æãã¾ãï¼
GAN-INT_CLS [1]ã¯ï¼text-to-imageã®ã¿ã¹ã¯ã«GANãå©ç¨ãï¼64x64ã®ç»åãçæã§ããæåã®ææ³ã§ããï¼ãã®ææ³ã§ã¯image-text matching aware adversarial traingã¨ããæ°ããæ¦ç¥ãææ¡ãã¾ããï¼GAWWN [2]㯠text-to-image synthesisã®ã¿ã¹ã¯ã§ï¼ä½ã®ã³ã³ãã³ããã©ã®ä½ç½®ã«ããããæå®ãããã¨ã®ã§ãããããã¯ã¼ã¯ãææ¡ãã¾ããï¼StackGAN [3]ã¯256x256ã®å§åçãªç»åãçæã§ããï¼ã¹ãã¼ã¸è¨ç·´ãææ¡ãã¾ããï¼TAC-GAN [4]㯠text-to-imageç¨GANã®è¨ç·´ãã¢ã·ã¹ãããããã«auxiliary classifiersãç¨ãã¾ããï¼StackGAN++ [5]ã¯StackGANã®çºå±å½¢ã¨ãã¦StackGAN-v2ã¨å¼ã°ãï¼tree-like ãããã¯ã¼ã¯ãææ¡ãã¾ããï¼AttnGAN [6]ã¯ã¢ãã³ã·ã§ã³ããªãã³ãªæ¹æ³ãããç´°é¨ãçæã§ãããã¨ã示ãã¾ããï¼FusedGAN [7]ã¯ï¼ã¹ãã¼ã¸è¨ç·´ãend-to-endã§å¦ç¿ã§ããããã«ï¼ã¹ãã¼ã¸ã«æº¶ãè¾¼ã¾ãã(fused)ææ³ãææ¡ãã¾ããï¼ããã¦ï¼HDGAN [8]ã¯ããã¾ã§ã«ãªããããã¯ã¼ã¯æ§é ï¼hierarchical-nested)ã§é«è§£åç»åãend-to-endã®è¨ç·´ã§çæã§ãããã¨ã示ãã¾ããï¼
Â
Â
Â
Â
ã©ãããç»åãçæãããï¼
åææ³ã§ã©ã®ãããªçµæãå¾ããã¦ãããï¼ã©ã®ãããªå¼·ã¿ããããè¦ã¦ããã¾ãããï¼
Â
ã»GAN-INT-CLS
text-to-imageã§GANãå©ç¨ããåãã¦ã®ææ³ï¼ç»åãµã¤ãºã¯64x64ã§ããï¼
Â
Â
ã»GAWWN
ãã¦ã³ãã£ã³ã°ããã¯ã¹ããã¼ãã¤ã³ããå©ç¨ãã¦ï¼ã©ãã«ä½ãããããæå®ã§ãã¾ãï¼è¿½å ã®çµæã§ã¯ï¼ãã¦ã³ãã£ã³ã°ããã¯ã¹ã縦é·ã«ãããã¨ã§ï¼é³¥ã縦é·ã«ãªããã¨ã示ãããããªã©ãã¦ãã¾ããï¼çæãããç»åãã³ã³ããã¼ã«ã§ãããã¨(controllable)ã売ãã§ãï¼
Â
Â
ã»StackGAN
ã¹ãã¼ã¸ï¼ã§256x256ã®é«è§£åç»åãçæãããã¨ããã¤ã³ãã¯ããããã¾ããï¼ä¸ã¨æ¯ã¹ã¦ããæ´åæ§ã®åããç»åã«ãªã£ã¦ãã¾ãï¼ã¹ãã¼ã¸ï¼ã§ã¯GANç¹æã®ã¢ã¤ã¢ã¤ã£ã¨ãã質æãåºã¦ãã¾ããï¼ã¹ãã¼ã¸ï¼ã§ã¹ã ã¼ãºãªè³ªæã«ãªã£ã¦ãããã¨ãè¦ã¦åãã¾ãï¼
Â
Â
ã»TAC-GAN
ã©ããçæçµæã§ãï¼StackGANã»ã©ã®ã¯ãªãªãã£ã¼ã¯åºã¦ãã¾ãããï¼åãããã¹ãããæ§ã ãªã¿ã¤ãã®ç»åãçæã§ããã¨ããå¤æ§æ§(diversity)ã売ãã¨ããææ³ã§ãï¼
Â
Â
ã»StackGAN++
StackGANãæ¹è¯ãã StackGAN-v2ã§ãï¼äººã®ç®ã§è¦ã¦ãæ¬ç©ãªããããªããã¨æããã¯ãªãªãã£ã¼ã®é³¥ã®ç»åãçæãã¦ãã¾ãï¼
Â
Â
ã»AttnGAN
 AttnGANã¯StackGANã®èè ã¨ã®å ±èã§ãï¼ï¼æ®µç®ãå·¦ããé ã«ç»åãé«è§£å度åãã¦ãããã¨ã示ãã¦ãã¾ãï¼ããã¦ä¸ï¼æ®µã¯ï¼åèªãã¯ãã«ãç´°é¨ãã¢ãã³ã·ã§ã³ããåããæã¤ãã¨ã示ãã¦ãã¾ãï¼
Â
Â
ã»FusedGAN
åããã¼ãºã®ã¾ã¾ã§è²ãå¤ãããï¼èæ¯ãå¤ãããã§ãããã¨(B)ãããã¹ããå¤ãããã¨ã§ãã¼ãºãç¶æããªããå¤åãå ããã(C)ï¼é£ç¶çã«å¤åãããã(D)ã§ããã¨ããï¼å¤æ§æ§(diversity)ã¨ã³ã³ããã¼ã«æ§(controllable)ãã¢ãã¼ã«ããçµæç»åã§ãï¼
Â
ã»HDGAN
対å¿ããç´°é¨ã®è¨è¿°ã«ã¯å¿ å®ãªãããï¼é«è§£åã®ç»åãçæã§ãã¦ãã¾ãï¼
Â
Â
Â
Â
使ããã¦ãããã¼ã¿ã»ããã¯ï¼
以ä¸ã®è¡¨ãã¾ã¨ãã«ãªãã¾ãï¼
Â
 ã»CUB datasets
Â
ã»Oxford-102
102 Category Flower Dataset (link)
Â
ã»MS COCO
Â
ã»MHP
MPII Human Pose Database (link)
Â
text-to-imageã®ã¿ã¹ã¯ã§å©ç¨ããããã¼ã¿ã»ããã®ç¨®é¡ã¯å¤ãããã¾ããï¼ä¸æã®ç»åã«å¯¾ãã¦ï¼ã©ãããç¹å¾´ã®ãã®ãæ ã£ã¦ããã®ããã£ãã·ã§ã³ãã¤ããã³ã¹ãã¯é«ãã®ã§ãããï¼
ã¾ãï¼ä¸ã¤ã®ãã¼ã¿ã»ããã«å«ã¾ããç»åã®ææ°ãå¤ãã¯ããã¾ããï¼CUBã«ã¯11,788æã®ç»åããããã¾ããï¼
ãããã£ã¦ï¼å°ãªããã¼ã¿æ°ã§éå¦ç¿ãé²ããªããå¦ç¿ãã工夫ãå¿ è¦ã«ãªãã¾ãï¼
ãããï¼ãã®ãã¼ã¿ãå°ãªãã¨ãã課é¡ã¯çªç¶è§£æ±ºãããå¯è½æ§ã¯ããã¾ãï¼OpenPoseã®ãããªé«ç²¾åº¦ã®ãã¼ãºæ¨å®ã®ææ³ãï¼ã»ã°ã¡ã³ãã¼ã·ã§ã³ã®ææ³ãåºç¾ãããã¨ã«ãã£ã¦ï¼å¤§éã«ãã¼ãºæ å ±ã®æ師ãã¼ã¿ãã»ã°ã¡ã³ãã¼ã·ã§ã³ãããã®æ師ãã¼ã¿ãä½ãããããã«ï¼ç»åãããã£ãã·ã§ã³ãçæããéæ¹åã®ã¿ã¹ã¯ã§å¤§ããªé²å±ãããã°ï¼ä¸æã®ç»åã«å¯¾ãã¦å¤§éã®ãã£ãã·ã§ã³ã©ãã«ãçæã§ããããã«ãªãã®ã§ï¼ãã£ãã·ã§ã³ããç»åãçæããæ¹åã®ã¿ã¹ã¯ã大ããé²å±ãããã¨ã«ãªãã§ãããï¼
Â
Â
Â
Â
çæã§ãã解å度ã¯ï¼
Progressive GANã¯text-to-imageã®ã¿ã¹ã¯ã§ã¯ããã¾ããï¼åèã¾ã§ã«ï¼
StackGANã¯256x256ã®é«è§£åã®ç»åãçæã§ããã¨ãããã¨ãããï¼text-to-imageã®ææ³ã¨ãã¦ã ãã§ã¯ãªãï¼é«è§£åç»åã®çæã®ææ³ã¨ãã¦ç¥ããã¦ããã¾ãï¼StackGANãæ¡ç¨ãã¦ããï¼ã¹ãã¼ã¸ã®ãããã¯ã¼ã¯æ§é ã¯ä»ã®é åã§ãå©ç¨ããã¦ãã¾ãï¼ä¾ãã°ï¼ãã¼ãºæ å ±ãã人ç©ã®ç»åãçæããã¿ã¹ã¯ã§å©ç¨ããã¦ãã¾ãï¼è©³ããã¯ï¼ä»¥ä¸ã®è¨äºããã©ããï¼
Â
Â
Â
Â
ãããã¯ã¼ã¯æ§é ã¯ã©ããªã£ã¦ãããï¼
以ä¸ã§ã¯ï¼ããããã®ææ³ãã©ã®ãããªãããã¯ã¼ã¯æ§é ã使ã£ã¦ããããã¾ã¨ãã¾ãï¼
Â
ã»GAN-INT-CLS
ã»ã³ãã³ã¹ã®åãè¾¼ã¿ãã¯ãã«ãæ¡ä»¶ã¨ãã¦å ããDCGANãã¼ã¹ã®ãããã¯ã¼ã¯ã§ãï¼end-to-endã§å¦ç¿å¯è½ï¼
Â
Â
ã»GAWWN
ã»ã³ãã³ã¹ã®åãè¾¼ã¿ãã¯ãã«ã¨ãã¼ãã¤ã³ããæ¡ä»¶ã«ãã¾ãï¼Globalã¨Localã«ã¹ããªã¼ã ãåãã工夫ãããã¦ãã¾ãï¼
Â
Â
ã»StackGAN
ï¼ã¹ãã¼ã¸ã§ã®çæï¼ï¼ã¹ãã¼ã¸ç®ã«ã»ã³ãã³ã¹ãã¯ãã«ããä½è§£åã®ç»åãçæãï¼ï¼ã¹ãã¼ã¸ç®ã«çæãããä½è§£åã®ç»åã¨ã»ã³ãã³ã¹ãã¯ãã«ããé«è§£åã®ç»åãçæãã¾ãï¼
Â
Â
ã»TAC-GANÂ
AC-GANã¨åãæ§é ã§ï¼Gã®å ¥åãã¯ã©ã¹ã©ãã«ã§ã¯ãªãã¦ã»ã³ãã³ã¹ã®ãã¯ãã«ã«ãªã£ã¦ãã¾ãï¼end-to-endã®å¦ç¿ãå¯è½ã§ãï¼
Â
Â
ã»StackGAN++Â
tree-likeãªæ§é ã®Gã§ï¼è¤æ°ã¹ã±ã¼ã«ã®è§£å度ã«å¯¾å¿ããDãç¨æãã¦ãã¾ãï¼Dã¯ããããç¬ç«ãã¦åã¹ã±ã¼ã«ã®ç»åã®åå¸ãå¦ç¿ããä¸æ¹ã§ï¼Gã¯åDããã®èª¤å·®ãã¾ã¨ãã¦å¦ç¿ãã¾ãï¼ãããã£ã¦ï¼è¤æ°ã¹ã±ã¼ã«ã®åå¸ãGãåæã«è¿ä¼¼ããã¨ããå¹æãæå¾ ã§ãã¾ãï¼end-to-endã§å¦ç¿ã§ãã¾ãï¼
Â
Â
ã»AttnGAN
StackGAN++ã®æ§é ã«ã¢ãã³ã·ã§ã³ãä¸ãã層ã追å ãã¦ãã¾ãï¼ã¾ãï¼DAMSMã¨ããã¢ãã«ã追å ããã¦ãã¾ãï¼ã»ã³ãã³ã¹ã®ãã¯ãã«ã ãã§ã¯ãªãï¼ããã«ç´°ããç¹å¾´ã¨ãªãã¯ã¼ãã®ãã¯ãã«ãå©ç¨ãã¾ãï¼end-to-endã§å¦ç¿ã§ããã¯ãã§ãï¼
Â
Â
ã»FusedGAN
ä¸ã®ã¹ããªã¼ã ãunsupervisedãªå¦ç¿ãã§ãããããã¯ã¼ã¯ã§ï¼ä¸ã®ã¹ããªã¼ã ãsupervisedãªå¦ç¿ããããããã¯ã¼ã¯ã§ãï¼end-to-endã§å¦ç¿ã§ãã¾ããï¼ã¾ãï¼ä¸ã®ã¹ããªã¼ã ãunsupervisedã§å¦ç¿ãï¼æ¬¡ã«ä¸ã®ã¹ããªã¼ã ãå¦ç¿ããã¨ããï¼ã¹ãã¼ã¸ã«å¹æãåãã¦èãããã¨ãã§ãã¾ãï¼
Â
Â
ã»HDGAN
 ä¸ã¤ã®Gã«å¯¾ãã¦è¤æ°ã¹ã±ã¼ã«ã®Dã使ãã¾ãï¼Hierarchical-nesedæ§é ã¨å¼ãã§ãã¾ãï¼end-to-endã§512x512ã®é«è§£åãéæãããã¨ãã§ãã¦ãã¾ãï¼
Â
Â
[8]ã§ã¯ï¼å ¸åçãªGANã®ãã¬ã¼ã ã¯ã¼ã¯ã次ã®å³ã®ããã«ç´¹ä»ããã¦ãã¾ãï¼Â
 AãStackGANãªã©ã®æ§é ã§ï¼C㯠Prog.GANï¼DãHDGANã§ãï¼ãã®ããã«é«è§£åã®ç»åãçæããã¨ããã¿ã¹ã¯ã«æ¼ãã¦ã¯ï¼AãCã®ããã«ï¼ä½è§£åãçæã§ããããã«ãªã£ã¦ããé«è§£åãçæã§ããããã«å¦ç¿ããã¨ãã step-by-stepã®å¦ç¿æ¹æ³ãåããã¦ãã¾ããï¼ããã¯ï¼ä¸çºã§é«è§£åã®çæãå¦ç¿ããã¨ããã¿ã¹ã¯ã¯å°é£ã§ããã®ã§ï¼ããã¤ãã®sub-taskã«åå²ãã¦ã¢ããã¼ãããæ¹æ³ã§ããï¼ããããªããï¼HDGANã§ã¯end-to-endã§å¦ç¿ã§ããã¨ãããå¼·ã¿ã§ãï¼ãããã£ã¦ï¼ä»å¾ã¯ãã®æ§é ãåãå ¥ããçæææ³ãå¢ãã¦ããããããã¾ããï¼
Â
Â
Â
Â
Lossé¢æ°ãDiscriminatorã«å·¥å¤«ã¯ãããï¼
é«è§£åã®ç»åãçæããGANãä½ãããã«ã¯ï¼ãããã¯ã¼ã¯æ§é ã工夫ããã¨ããæ¹æ³ã®ä»ã«ï¼Lossã工夫ããã¨ããæ¹æ³ãããã¾ãï¼
text-to-imageã®ã¿ã¹ã¯ã§ä½¿ãããLossã®ç¹å¾´ã¨ãã¦ï¼çæããããç»åãæ¡ä»¶ã®ããã¹ãã«å¿ å®(fidality)ã§ããããè©ä¾¡ããããã®Lossãå ¥ããç¹å¾´ãããã¾ãï¼ããã¯ï¼ãé»è²ãé³¥ãçæãããã¨ããæ¡ä»¶ãå ¥åããã¨ãã«ï¼ã©ããªã«é«è§£åã§æ´åæ§ã®åããç»åãçæãããã¨ãã¦ããããã赤ãé³¥ãã§ã¯ãããããªãï¼ã¨ããLossã§ãï¼é常ã®Discriminatorã§ããã°ï¼æ´åæ§ããåãã¦ããã°èª¤å·®ãçºçããªãã¾ã¾ã§ãï¼ããã§ï¼[1]ã§ã¯ï¼matching-aware Discriminatorãææ¡ãã¦ãã¾ãï¼
matching-aware Discriminatorã¸ã®å ¥åã¯ï¼ç»åã¨ããã¹ãã®åãè¾¼ã¿ãã¯ãã«ã®ï¼ç¨®é¡ã§ãï¼ä»¥ä¸ã®ããã«ï¼ã©ã®ãããªçµã¿åããã®å ¥åãã¢ããDãå¤å®ãããã¨ã«ãªãã¾ãï¼
ç¹ã«ï¼æ´åæ§ãåãã¦ããçã®ç»åãå ¥åãã¦ããå ´åã§ãï¼ç»åã¨ããã¹ããããããã¦ããªããªãã°Fakeã®å¤å®ãè¿ããªãã¨ãããªãã¨ãããã¨ãç¹å¾´ã§ãï¼
ãã®ããã«ï¼ç»åã®ãªã¢ãªãºã ã«å ãã¦ï¼ç»åã¨ããã¹ãã®ãããã³ã°å ·åãæé©åããå¦ç¿ããããã¨ã§ï¼Dã¯Gã«è¿½å ã®ã·ã°ãã«ãä¸ãããã¨ãã§ãã¾ãï¼
ããã«ï¼Dã«ããä¸ã¤ã®åºåã§åæã«ãªã¢ãªãºã ã¨ãããã³ã°ã測ãã®ã§ã¯ãªãï¼æ示çã«ã¹ããªã¼ã ãåãã¦ãªã¢ãªãºã ã®è©ä¾¡ã¨ãããã³ã°ã®è©ä¾¡ãå¥ã ã«ãããã¨ãã工夫ãåºã¦ãã¦ãã¾ãï¼
ä¾ãã°ï¼StackGAN++ã§ã¯ä¸ã®å³ã®ããã«ï¼Real/Fakeã®å¤å®ãUnconditional lossã§ï¼ç»åã¨ããã¹ãã®ãããã³ã°å ·åã®å¤å®ãConditional lossã§è¡ãªã£ã¦ãã¾ãï¼
Â
ä¸æ¹ï¼AttnGANã§ã¯ï¼ããã¾ã§ã®ããã«Dã«ç»åã¨ã»ã³ãã³ã¹ãå ¥åãã¦Lossãåºããã¨ã«å ãã¦ï¼æ°ãã«ç»åã¨ã¯ã¼ãã®ãããã³ã°ãå¤å®ããDeep Attentional Multimodal Similarity Model (DAMSM)ãææ¡ãã¦ãã¾ãï¼AttnGANã¯å ¨ä½ã®æ§æãã»ã³ãã³ã¹ãæ¡ä»¶ã«çæãï¼ç´°é¨ã®çæãã¯ã¼ããæ¡ä»¶ã«ãã¦è¡ãªã£ã¦ããã®ã§ï¼ç´°é¨ãã¯ã¼ãã«å¿ å®ãªçæçµæãã測ãæ§é ãåãå ¥ãããã¨ã¯èªç¶ãªæµãã§ãããï¼
Â
ããã¦ï¼HDGANã§ã¯ã¯ã¼ãã使ã£ã¦ããªãã®ã§ï¼ç´°é¨ã®è©ä¾¡ãããããã«ã¯ã¼ãã使ããã¨ã¯ã§ãã¾ããï¼ã¾ãï¼é«è§£åã®çæç»åãç³ã¿è¾¼ãã§ãã¾ãã¨å容éãåºããï¼ç´°é¨ã§ã®æ´åæ§ãè©ä¾¡ãããã¨ãé£ãããªãã¾ãï¼ããã§ï¼ç´°é¨ã®æ´åæ§ãè©ä¾¡ããããã«ï¼çæç»åããããã«åºåãï¼ãããæ¯ã«RealãFakeãã®è©ä¾¡ãè¡ãªã£ã¦ãã¾ãï¼ãã®æ¹æ³ã¯local adversarial lossã¨å¼ã°ãï¼ã¢ããã«ã®è«æ(link)ãCycle GAN (link)ã§ã使ããã¦ãã¾ãï¼
æå¼·ã®ãããã¯ã¼ã¯æ§é ãèãããããããã¨ããªãããã§ã¯ãªãï¼Lossã®åãæ¹ã§ãçµæãéã£ã¦ããã¨ãããDeep Learningã®é¢ç½ãã¨ããã ã¨æãã¾ãï¼
Â
Â
Â
Â
éæ
å人çã«æã£ããã¨ï¼
diversityãæã£ãç»åãçæã§ãããã¨ãéè¦ããã®ã¯ï¼ããã¹ãã¨ãã次å ããç»åã¨ããå¥ã®æ¬¡å ã¸ã®ãããã³ã°ï¼ã¯ã¼ãï¼ãèãã¦ããããã§ãããï¼äºã¤ã®ãã¯ãã«ç©ºéã¯å®å ¨ã«ã¯éãªããªãããï¼ä¸å¯¾å¤ã®å¤æã«ãªã£ã¦ããã®ã ã¨èãã¦ãã¾ãï¼ä¾ãã°pix2pixã®ããã«ç»åããå¥ã®ãã¡ã¤ã³ã®ç»åã¸ã®å¤æã¯ä¸å¯¾ä¸ã§ãããªããã¦ãã¾ãï¼ä¸å¯¾å¤ãå®ç¾ããããã«ï¼ãã¤ãºãã¯ãã«ã¨ãã¦ã©ã³ãã ãªãã¯ãã«zãå ¥åã«å ãããã¨ãè¡ããã¦ãã¾ããï¼ãã®ãã¯ãã«ãdisentangleã«ãã¦äººã®æã§æä½ã§ããããã«ãªãã¨ãããã¿ã¼ã§ããï¼
StackGANã§ã¯ä¸ã®å³ã®ããã«ï¼ç°ãªãå ¥åã«å¯¾ãã¦ï¼åãçµæãåºåãã¦ãã¾ãmode collapseãå ±åããã¦ãã¾ã[5]ï¼Â
Â
ã¾ãï¼æ¡ä»¶ã¨ãã¦å ¥åããããã¹ãã®åãè¾¼ã¿ãã¯ãã«ã¯ï¼ã©ããæ¢åã®ææ³ã§äºåã«æºåããããã®ã使ã£ã¦ãã¾ããï¼ãªã®ã§ãã®åãè¾¼ã¿ãã¯ãã«ãä½æãã段éããtext-to-imageã¸ã®å©ç¨ãè¦æ®ããæ¹æ³ãåºã¦ãã¦ãè¯ãããã§ãï¼
Â
人ãçæç»åãè¦ãã¨ãã«ï¼èªç¶ãï¼ãªã¢ã«ãï¼ã©ããã®å¤æãããéï¼èæ¯ããä½ãããæãããã¨ãå¤ãã®ã§ã¯ç¡ããã¨æãã¾ãï¼åæ¯ã«ãããªãã¸ã§ã¯ãã®çæã«ç®ãè¡ããã¡ã§ããï¼ã¾ãèæ¯ã綺éºã«çæãããã¨ãèªç¶ã«è¦ããè¿éã ã£ããããããããã¾ããï¼StackGAN++ã®çµæãè¦ã¦æãã¾ããï¼
Â
Â
Â
Â
ãããã«
ããã¹ãããç»åãçæããGANã¾ã¨ãã¯ä»¥ä¸ã«ãªãã¾ãï¼
ããã¹ãã«åã£ãç»åãé«è§£åã§çæããã¨ããé£ããã¿ã¹ã¯ã解決ããããã«ï¼å¤ãã®å·¥å¤«ãããã¾ããï¼åããããªå·¥å¤«ãå¥ã®ã¿ã¹ã¯ï¼å¥ã®é åã§ã使ããã¨ãã§ããããããã¾ããï¼
è¨äºãèªãã§ãã ããï¼ãããã¨ããããã¾ããï¼
Â
 ãªãã¤ã¼ãããã©ãã¼ãã¦é ããã¨å±ã¿ã«ãªãã¾ãï¼ââ
ãããã°æ´æ°ããããã¹ãããç»åãçæããGANã¾ã¨ãã
â akmtn (@akmtn_twi) 2018å¹´3æ25æ¥
"text-to-image"ã®GANææ³ã¾ã¨ãã§ã
ãã£ãã·ã§ã³ããé«è§£åç»åãçæããããã«ããããã®å·¥å¤«ããã®ã§ãä»ã®ã¿ã¹ã¯ã§ä½¿ãã工夫ãããããããã¾ããhttps://t.co/uxLKstFdr6 pic.twitter.com/KxtOUTZ1mc
Â