Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ラベルなしデータを用いた Dense Tracking の研究事例 / Learning De...

Naoki Kato
August 04, 2020

ラベルなしデータを用いた Dense Tracking の研究事例 / Learning Dense Tracking from Unlabeled Videos

社内勉強会での発表資料です。

ラベルなし動画を用いて画素レベルでのトラッキング(dense tracking)を学習する研究事例を紹介します。

近年では、応用先である Video Object Segmentation において教師あり手法に匹敵する性能を持った教師なし手法が提案されつつあります。

紹介論文:

- Tracking Emerges by Colorizing Videos, ECCV'18
https://arxiv.org/abs/1806.09594

- Learning Correspondence from the Cycle-consistency of Time, CVPR’19
https://arxiv.org/abs/1903.07593

- Unsupervised deep tracking, CVPR’19
https://arxiv.org/abs/1904.01828

- Multigrid Predictive Filter Flow for Unsupervised Learning on Videos, '19
https://arxiv.org/abs/1904.01693

- Self-supervised Learning for Video Correspondence Flow, BMVC’19
https://arxiv.org/abs/1905.00875

- Joint-task Self-supervised Learning for Temporal Correspondence, NeurIPS’19
https://arxiv.org/abs/1909.11895

- MAST: A Memory-Augmented Self-Supervised Tracker, CVPR’20
https://arxiv.org/abs/2002.07793

- Learning Video Object Segmentation from Unlabeled Videos, CVPR’20
https://arxiv.org/abs/2003.05020

- Space-Time Correspondence as a Contrastive Random Walk, ’20
https://arxiv.org/abs/2006.14613

Naoki Kato

August 04, 2020
Tweet

More Decks by Naoki Kato

Other Decks in Research

Transcript

  1. Mobility Technologies Co., Ltd. 2 %98 9=tracking (= dense

    tracking) )? ;< #>! @4E ▪ Self-supervised /6.AC,-D' 2+/6&3 /6:( ▪ "7 0B15 $*
  2. Mobility Technologies Co., Ltd. 3 1!" - # ,/*!" %3.(

    +) ▪ Video Object Segmentation 4 $)21' 5 ▪ Texture tracking ▪ Pose tracking 4 Semantic segmentation pose estimation 0& 5 Dense Tracking Video Object Segmentation Texture tracking Pose tracking
  3. Mobility Technologies Co., Ltd. 4 1# !% $

    ! %" Video Object Segmentation (VOS) S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixe, D. Cremers, and L. V. Gool. “One-shot video object segmentation,” In CVPR, 2017.
  4. Mobility Technologies Co., Ltd. 5 ▪ DAVIS-2017 ▪ 150

    ▪ 1 ▪ & ▪ Region overlapping J&ground-truth #$IoU% ▪ Contour accuracy F&!"F
  5. Mobility Technologies Co., Ltd. 7 JAGM)./$ ▪ Propagation-based approach [Hu+,

    ’17] [Voigtlaender+, ’19]^ ▪ *" 3I:] ?C7 ▪ Optical flow metric learning \H(-/+QU@F1 → Optical flow @FQU[B0?\HQU6N 2? O4XDP=;7@F ▪ Detection/segmentation-based approach [Caelles+, ’17] [Luiten+, ’18]^ ▪ 8(-/+ detection/segmentation VLKT5R@F1 ▪ >WK&/#9 ,"@F %"'&/#1(-/+ fine-tuning EY GM<ZS!"'= Y.-T. Hu, J.-B. Huang, and A. G. Schwing. “Maskrnn: Instance level video object segmentation,” In NIPS, 2017. P. Voigtlaender, Y. Chai, F. Schroff, H. Adam, B. Leibe, and L.-C. Chen. “Feelvos: Fast end-to-end embedding learning for video object segmentation,” I CVPR, 2019. S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixe ́, D. Cremers, and L. V. Gool. “One-shot video object segmentation,” In CVPR, 2017. J. Luiten, P. Voigtlaender, and B. Leibe. “Premvos: Proposal-generation, refinement and merging for video object segmentation,” In ACCV, 2018.
  6. Mobility Technologies Co., Ltd. 8 837<.K>O ▪ 1C% -

    !J(5 : ▪ 0G9 83%'6/I M)A,#% AB#" % $E4FN → 83 7<4F ▪ &7% $4F ▪ 0H%=@ → ;*5DL7<2?+D5 837<.K>
  7. Mobility Technologies Co., Ltd. 9 ▪ '5 =$! 58

    *+"(:/2.1 ▪ > 3# ;07%6 <,- &4 proxy)9 Video Colorizaition [Vondrick+, ECCV’18] C. Vondrick, A. Shrivastava, A. Fathi, S. Guadarrama, K. Murpy, “Tracking Emerges by Colorizing Videos,” In ECCV, 2018.
  8. Mobility Technologies Co., Ltd. #$@ ▪ 6-"$ (input frame) 78;=+"$

    (reference frame) 78,5 &4 ▪ ?%"$ "$$! ;:'* 3 ;0) /2 78;,5 1< → 78/2>(.9 10 Video Colorizaition [Vondrick+, ECCV’18]
  9. Mobility Technologies Co., Ltd. !#?8 11 Video Colorizaition [Vondrick+, ECCV’18]

    K(BJ> EIA3P #,/ 3P#D EIU*7L. A H, ▪ :4S<XR50 @MK(BK6 cross entropy % V-U2T colorizationmultimodal 2T [Zhang+, ’16]W ▪ 9O=U*7L.D "(BGCV1$& F1W); EI i, j U*7 'NU*7D EI+ Q1K(B R. Zhang, P. Isola, A. A. Efros, “Colorful Image Colorization,” In ECCV, 2016.
  10. Mobility Technologies Co., Ltd. 12 0DF?M ▪ /A MKineticsK30!*;8005JL ▪

    MResNet-18 + 3D conv 51 647&. G () ▪ 9- C/<H3 +8 ▪ /A5 Lab=Jab#: Video Colorizaition [Vondrick+, ECCV’18] DAVIS-2017 E$@6 Video object segmentation 3>'% optical flow ",2BI4
  11. Mobility Technologies Co., Ltd. 14 ▪ * cycle-consistency )

    proxy#&6 (%,+ 4$ 32 5$ 32 ' -'0 ▪ !.% )"1% dense tracking / CycleTime [Wang+, CVPR’19] X. Wang, A. Jabri, A. A. Efro, “Learning Correspondence from the Cycle-consistency of Time,” In CVPR, 2019.
  12. Mobility Technologies Co., Ltd. 15 B ▪ BResNet-50 "52-

    &' ▪ <;%4 TB 3) 1$! 2- #'<;70 2- &' ▪ :tracking%4 /@ =.( → A.( 9 <;?*/6 / , >5 +8 CycleTime [Wang+, CVPR’19]
  13. Mobility Technologies Co., Ltd. 16 5I> OL.D CycleTime [Wang+, CVPR’19]

    Vondrick :@2? Q,7J0H/ NP ') Rconv × 2 + linearS xy=3 G1P4M;6 ;6$#) 8 B9! %( %( ;K> Vondrick :@2? Q,7J0E$ *AFC+< R -E S $(" ( &
  14. Mobility Technologies Co., Ltd. 17 3 )%<* /A ▪ 97;&,3,

    ' MSE?5* cycle /@ ▪ > cycle' $- )%<* → :4=!# ▪ 97;&,3, .( 6 "2 → .( +108 CycleTime [Wang+, CVPR’19]
  15. Mobility Technologies Co., Ltd. 18 ▪ ,; IVLOGG11!(73444EH ▪ 0?4

    9B2 '1 8 $1 ▪ Video object segmentationpose propagation video colorization ").<D/ CycleTime [Wang+, CVPR’19] ▪ >F5I& =2 6%+*CA-@#3: ,;
  16. Mobility Technologies Co., Ltd. 19 Self-supervised visual tracking (@. (

    HF) 2946 ▪ ?I$<&(J 1%("> # $;0!8$7%(" = G :*)BE response map ,/ ▪ Cycle-consistency loss <'((-D ▪ CycleTime C5arXiv3A Unsupervised Deep Tracking [Wang+, CVPR’19] N. Wang, Y. Song, C. Ma, W. Zhou, W. Liu, H. Li, “Unsupervised Deep Tracking,” In CVPR, 2019. '# $+1 L2&
  17. Mobility Technologies Co., Ltd. ?0 8 %5/@ filter flow

    3, (= ▪ Filter flowC+. ?#9<A"&9$B'2>7 ▪ Filter flow A- 6 ▪ Filter flow !B:1) optical flow ;01) (=*4 20 mgPFF [Kong+, ’19] S. Kong, C. Fowlkes, “Multigrid Predictive Filter Flow for Unsupervised Learning on Videos,” In arXiv:1904.01693, 2019.
  18. Mobility Technologies Co., Ltd. C5D"0 filter flow 3+@'K=>L7?;

    fllter flow 1 → . K,J 11×11!:L ) <$GB9&A 21 mgPFF [Kong+, ’19] 4*H5M ▪ #82M Charbonnier functionM ▪ Forward-backward flow consistencyM I6(E6( / - Charbonnier function ▪ Smoothness constraintsM %F L1 ▪ Sparsity constraintsM L1
  19. Mobility Technologies Co., Ltd. 24 VondrickP_ *2"TFP_ RZ ▪ Colour

    dropout~JkX:@C$-(/7 .1#+0 OG SH → %"'X;$-(/ 6cOGSH ▪ Restricted attention~reference frame targetdis8{5Mn= ^ qh!"' >` ▪ Scheduled sampling~3Kfb]pl?)02+,&/QKl c \)02+l 4a → vmtr3MHU <tr ▪ Cycle consistency~l4a zWE → uWEnSHyVoAe6c CorrFlow [Lai+, BMVC’19] Z. Lai, W. Xie, “Self-supervised Learning for Video Correspondence Flow,” In BMVC, 2019. wIB LabgxlN9jYL cross entropy loss|VondrickD[}
  20. Mobility Technologies Co., Ltd. 25 G+D< CorrFlow [Lai+, BMVC’19] (9?>HD<

    Ablation study ;0=!#8F.&24' %/"I- !#3E 28F.& $%%# A,*@ :CB16 KJ 57)
  21. Mobility Technologies Co., Ltd. 26 6*4 #2="3' ()+9 ▪ 6*-'

    8 52)+< >CycleTime5&70-;06*4? UVC [Li+, NeurIPS’19] X. Li, S. Liu, S. D. Mello, X. Wang, J. Kautz, M.-H. Yang, “Joint-task Self-supervised Learning for Temporal Correspondence,” In NeurIPS, 2019. 8 2:/ %! )+ $. #2=" 3, 1
  22. Mobility Technologies Co., Ltd. 27 )++($)+&colorization%+ UVC [Li+, NeurIPS’19] D9BG

    6E U-=R3 U-=TA>FKON40>F • # 21 KON40>F<7M?,A@ • N40>F M?,A;<7M? LabK/ :P auto-encoder .J → *!"S QLC5I'+ 8H ablation study
  23. Mobility Technologies Co., Ltd. 28 3(B4G ▪ =%603(EL1F ▪ Concentration

    lossG<&$+72) ;, .?MSE → " =1:8#*/! C ▪ Orthogonal lossGD5'A5'@> <&$9-* cycle-consistency lossEMSEF UVC [Li+, NeurIPS’19]
  24. Mobility Technologies Co., Ltd. 29 31+ UVC [Li+, NeurIPS’19] Concentration

    loss *$. " Ablation study 1+L7localization moduleO7orthogonal lossC7concentration loss DAVIS-2017 31+ (#&,!$26% '4) 057 1/ -
  25. Mobility Technologies Co., Ltd. 31 ▪ dense

    tracking UVC [Li+, NeurIPS’19]
  26. Mobility Technologies Co., Ltd. CorrFlow *WS MOU 9F<,=49F (?6T_8 ▪

    V5$Y@1 self-supervised 9F 2Q AP %R ▪ ^)Z> 7.b +Nc&JL GK;B ▪ I 0[ /'`X a"' ▪ Z> =49F-6T_8 ▪ E'6TH]!:C;B;B9F self-supervised 9F =49F D E'6T # 3\ 32 MAST [Lai+, CVPR’20] Z. Lai, E. Lu, W. Xie, “MAST: A Memory-Augmented Self-Supervised Tracker,” In CVPR, 2020.
  27. Mobility Technologies Co., Ltd. 33 6G=F+H ▪ 'A IEMQ ▪

    RGBB*0!#"$)CNN( → !#"$) 0 &!#"$D9) %? :3 >; 1 I /@ trivialJO ▪ !#"$MCN7LabB* 'A K- ▪ ;4N<Q ▪ I L5. :3;4 ▪ I,P28K- → Huber Loss 'A MAST [Lai+, CVPR’20]
  28. Mobility Technologies Co., Ltd. 34 #)* ▪ X6UB"(*$]%&'^S7H 0 "(*$0-

    \,?T4VQ VQ!< ▪ ROI1\,?T4VQ VQ!5G _ ▪ CorrFlow reference frame targetLRW/ROI ROI1\,?T4F → DZM[ "(*$I-ROI2 ;8 ▪ ENOK:"(*$[ reference frame PZC9Z@ Z@ LR> \,?. (response map) VQ ▪ Response map soft-argmax ]YAVQ^ ROI+AJ= ▪ Bilinear sampling ROI32 ROI1\,?T4Q2 MAST [Lai+, CVPR’20]
  29. Mobility Technologies Co., Ltd. 35 %5 71; ▪ I0 ,

    I5 (long term memory), It-5 , It-3 , It-1 (short term memory) reference frame / ▪ Refernce frame 1 /$3 reference frame / fine-tuning 62-; ▪ +# ,0).UVC 6.0!'48( ▪ 9/" *& ). '4 : MAST [Lai+, CVPR’20]
  30. Mobility Technologies Co., Ltd. 36 Ablation study MAST [Lai+, CVPR’20]

    Lab5$(, 3%-69 Long term memory 3%-69 .:1 hard #/3% " * -7+ 8 Reference frame 0 5!"2;-7 0&50 )' -74
  31. Mobility Technologies Co., Ltd. 3%.8 ▪ )7 &

    *.80,/4 STM:# /4$; ▪ )7 (' (' 5-+ generalization gap #/4 1!=?3%.8<> 37 MAST [Lai+, CVPR’20] Youtube-VOS 9"62
  32. Mobility Technologies Co., Ltd. N@J:?9057L % 'HEP8VOSMAF>D Saliency modelT?9 7L4=U

    CAM T;?97L4=U/H ▪ Object/instance-level zero-shot VOS (Z-VOS)V !B&#'+2H -OG.&/))S613 RQKC= ▪ One-shot VOS (O-VOS)V 1"(*$I S6,G. <"(*$RQ 38 MuG [Lu+, CVPR’20] X. Lu, W. Wang, J. Shen, Y-W. Tai, D. Crandall, S. C. H. Hoi, “Learning Video Object Segmentation from Unlabeled Videos,” In CVPR, 2020.
  33. Mobility Technologies Co., Ltd. 4 QB F@58)", (FCN) =T ▪

    Frame granularity analysis $-0(Ksaliency map CAM F@*&, NRK cross entropy .!]M ▪ Short-term granularity analysis Unsupervised Deep Tracking [Wang+, CVPR’19] 9J `G:ZG:YUSI? cycle- consistency loss ]M ▪ Long-term granularity analysis 97N /%+/ 2$-0(A3=P_ 4a2[;V!0,bE>?DNR LC&#,X ▪ Video granularity analysis H6-&,LC^917N'X O7N'\<W=T 39 MuG [Lu+, CVPR’20]
  34. Mobility Technologies Co., Ltd. >.C ▪ Object-level zero-shot VOSO $&!A0?G?2-/NH

    ▪ Instance-level zero-shot VOSO Mask R-CNN GrabCut %%,IM319'J2-/NF@D */ ,IM3<L 8IoU optical flow 4 E$&!%%256; ▪ One-shot VOSO 0$&!B: #N)74 " #(= 40 MuG [Lu+, CVPR’20] DAVIS-2017&O-VOSK+F@
  35. Mobility Technologies Co., Ltd. 56,0"5$! 8;27'> +/%

    ▪ = &)?#.:4 ▪ cycle-consistency constraint ( contrastive learning *9 ▪ 3- *9 VOSpose trackingvideo part segmentation SoTA<1 41 Space-Time Correspondence as a Contrastive Random Walk [Jabri+, ’20] A. Jabri, A. Owens, A. A. Efros, “Space-Time Correspondence as a Contrastive Random Walk,” In arXiv:2006.14613, 2020.
  36. Mobility Technologies Co., Ltd. ! " !

    + 1 % " # ! ! + & ! # 42 Space-Time Correspondence as a Contrastive Random Walk [Jabri+, ’20]
  37. Mobility Technologies Co., Ltd. ! → ! + $ →

    ! & cycle-consistency loss -& 43 Space-Time Correspondence as a Contrastive Random Walk [Jabri+, ’20] #$1 ▪ Edge dropout1/'+ dropout-& .*,%0 ▪ Test-time training1 ( ")!)
  38. Mobility Technologies Co., Ltd. ▪ https://ajabri.github.io/videowalk/

    45 Space-Time Correspondence as a Contrastive Random Walk [Jabri+, ’20]
  39. Mobility Technologies Co., Ltd. 46 ▪ Self-supervised dense tracking

    S\&-12*t ▪ Video colorizationt ▪ =>RB%[U #Oj# ▪ ^8L_C@q;gEC@JP#Zd ▪ W^8Y<3`$# reference frame JP^8HA Ie,')# ▪ Cycle-consistency learningt ▪ gC@ L_C@JP#Zd ▪ i?b9a #S\D6N !4]" ▪ 7t ▪ =>(/2*.+0UGoVnT7cKFQfE Mp # ▪ h`Ie self-supervised > supervised #Xm r:5bkls