Slide 1

Slide 1 text

Mobility Technologies Co., Ltd.   Dense Tracking      AI Mobility Technologies

Slide 2

Slide 2 text

Mobility Technologies Co., Ltd. 2  %98 9=tracking (= dense tracking)  )? ;< #>!  @4E ■ Self-supervised /6.AC,-D'  2+/6&3 /6:(  ■ "7 0B15  $*

Slide 3

Slide 3 text

Mobility Technologies Co., Ltd. 3 1!" - # ,/*!" %3.( +)  ■ Video Object Segmentation 4 $)21' 5 ■ Texture tracking ■ Pose tracking 4 Semantic segmentation  pose estimation 0&  5 Dense Tracking   Video Object Segmentation Texture tracking Pose tracking

Slide 4

Slide 4 text

Mobility Technologies Co., Ltd. 4 1# !%  $  ! %" Video Object Segmentation (VOS) S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixe, D. Cremers, and L. V. Gool. “One-shot video object segmentation,” In CVPR, 2017.

Slide 5

Slide 5 text

Mobility Technologies Co., Ltd. 5 ■ DAVIS-2017  ■ 150 ■ 1  ■ & ■ Region overlapping J&ground-truth #$IoU% ■ Contour accuracy F&!"F    

Slide 6

Slide 6 text

Mobility Technologies Co., Ltd. 6 UnsupervisedVOS#$ # "    !  

Slide 7

Slide 7 text

Mobility Technologies Co., Ltd. 7 JAGM)./$ ■ Propagation-based approach [Hu+, ’17] [Voigtlaender+, ’19]^ ■ *" 3I:] ?C7 ■ Optical flow  metric learning \H(-/+QU@F1  → Optical flow @FQU[B0?\HQU6N  2? O4XDP=;7@F ■ Detection/segmentation-based approach [Caelles+, ’17] [Luiten+, ’18]^ ■ 8(-/+ detection/segmentation VLKT5R@F1  ■ >WK&/#9 ,"@F   %"'&/#1(-/+ fine-tuning EY GM<ZS!"'= Y.-T. Hu, J.-B. Huang, and A. G. Schwing. “Maskrnn: Instance level video object segmentation,” In NIPS, 2017. P. Voigtlaender, Y. Chai, F. Schroff, H. Adam, B. Leibe, and L.-C. Chen. “Feelvos: Fast end-to-end embedding learning for video object segmentation,” I CVPR, 2019. S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixe ́, D. Cremers, and L. V. Gool. “One-shot video object segmentation,” In CVPR, 2017. J. Luiten, P. Voigtlaender, and B. Leibe. “Premvos: Proposal-generation, refinement and merging for video object segmentation,” In ACCV, 2018.

Slide 8

Slide 8 text

Mobility Technologies Co., Ltd. 8 837<.K>O ■ 1C% -  !J(5 : ■ 0G9 83%'6/I M)A,#% AB#" % $E4FN → 83 7<4F  ■ &7% $4F   ■ 0H%=@  → ;*5DL7<2?+D5 837<.K>

Slide 9

Slide 9 text

Mobility Technologies Co., Ltd. 9 ■  '5 =$! 58 *+"(:/2.1 ■ > 3# ;07%6 <,-  &4 proxy)9 Video Colorizaition [Vondrick+, ECCV’18] C. Vondrick, A. Shrivastava, A. Fathi, S. Guadarrama, K. Murpy, “Tracking Emerges by Colorizing Videos,” In ECCV, 2018.

Slide 10

Slide 10 text

Mobility Technologies Co., Ltd. #$@ ■ 6-"$ (input frame) 78;=+"$ (reference frame) 78,5  &4 ■ ?%"$ "$$!  ;:'*  3 ;0)  /2 78;,5 1< → 78/2>(.9 10 Video Colorizaition [Vondrick+, ECCV’18]

Slide 11

Slide 11 text

Mobility Technologies Co., Ltd. !#?8 11 Video Colorizaition [Vondrick+, ECCV’18] K(BJ> EIA3P #,/ 3P#D EIU*7L. A H, ■ :4S

Slide 12

Slide 12 text

Mobility Technologies Co., Ltd. 12 0DF?M ■ /A MKineticsK30!*;8005JL ■ MResNet-18 + 3D conv 51 647&. G () ■ 9- C/'% optical flow ",2BI4

Slide 13

Slide 13 text

Mobility Technologies Co., Ltd. 13 %!,( Video Colorizaition [Vondrick+, ECCV’18] 06*7 ■ '.$+  ■ 4/ 21" & #53 )- 

Slide 14

Slide 14 text

Mobility Technologies Co., Ltd. 14 ■ *  cycle-consistency ) proxy#&6 (%,+ 4$ 32 5$ 32 ' -'0  ■ !.% )"1% dense tracking / CycleTime [Wang+, CVPR’19] X. Wang, A. Jabri, A. A. Efro, “Learning Correspondence from the Cycle-consistency of Time,” In CVPR, 2019.

Slide 15

Slide 15 text

Mobility Technologies Co., Ltd. 15 B ■  BResNet-50 "52- &' ■ <;%4 TB  3)  1$! 2- #'<;70  2- &' ■ :tracking%4 /@ =.( → A.( 9 <;?*/6 /   , >5 +8 CycleTime [Wang+, CVPR’19]

Slide 16

Slide 16 text

Mobility Technologies Co., Ltd. 16 5I> OL.D CycleTime [Wang+, CVPR’19] Vondrick :@2? Q,7J0H/ NP ') Rconv × 2 + linearS xy=3 G1P4M;6 ;6$#) 8 B9! %( %( ;K> Vondrick :@2? Q,7J0E$ *AFC+< R -E S $(" ( & 

Slide 17

Slide 17 text

Mobility Technologies Co., Ltd. 17 3 )%<* /A ■ 97;&,3,  ' MSE?5* cycle /@ ■  > cycle' $- )%<* → :4=!# ■ 97;&,3,  .( 6 "2 → .( +108 CycleTime [Wang+, CVPR’19]

Slide 18

Slide 18 text

Mobility Technologies Co., Ltd. 18 ■ ,; IVLOGG11!(73444EH ■ 0?4 9B2  '1 8 $1 ■ Video object segmentationpose propagation  video colorization ").<D/ CycleTime [Wang+, CVPR’19] ■ >F5I& =2 6%+*CA-@#3: ,;  

Slide 19

Slide 19 text

Mobility Technologies Co., Ltd. 19 Self-supervised visual tracking (@. ( HF) 2946 ■ ?I$<&(J 1%("> # $;0!8$7%(" = G  :*)BE response map ,/ ■ Cycle-consistency loss <'((-D ■ CycleTime C5arXiv3A Unsupervised Deep Tracking [Wang+, CVPR’19] N. Wang, Y. Song, C. Ma, W. Zhou, W. Liu, H. Li, “Unsupervised Deep Tracking,” In CVPR, 2019. '# $+1 L2&

Slide 20

Slide 20 text

Mobility Technologies Co., Ltd. ?0 8  %5/@ filter flow 3, (= ■ Filter flowC+. ?#9<A"&9$B'2>7 ■ Filter flow A- 6 ■ Filter flow !B:1)  optical flow ;01) (=*4 20 mgPFF [Kong+, ’19] S. Kong, C. Fowlkes, “Multigrid Predictive Filter Flow for Unsupervised Learning on Videos,” In arXiv:1904.01693, 2019.

Slide 21

Slide 21 text

Mobility Technologies Co., Ltd. C5D"0   filter flow 3+@'K=>L7?; fllter flow 1 → .   K,J 11×11!:L ) <$GB9&A 21 mgPFF [Kong+, ’19] 4*H5M ■ #82M Charbonnier functionM ■ Forward-backward flow consistencyM I6(E6(  / - Charbonnier function ■ Smoothness constraintsM %F L1 ■ Sparsity constraintsM L1

Slide 22

Slide 22 text

Mobility Technologies Co., Ltd. ,*& 22 mgPFF [Kong+, ’19] ■ VOS !. ) % $("+-# ■ '# $(  *& 

Slide 23

Slide 23 text

Mobility Technologies Co., Ltd.  23 mgPFF [Kong+, ’19]

Slide 24

Slide 24 text

Mobility Technologies Co., Ltd. 24 VondrickP_ *2"TFP_ RZ ■ Colour dropout~JkX:@C$-(/7 .1#+0 OG SH  → %"'X;$-(/ 6cOGSH ■ Restricted attention~reference frame targetdis8{5Mn= ^  qh!"' >` ■ Scheduled sampling~3Kfb]pl?)02+,&/QKl c \)02+l 4a  → vmtr3MHU <tr ■ Cycle consistency~l4a zWE → uWEnSHyVoAe6c CorrFlow [Lai+, BMVC’19] Z. Lai, W. Xie, “Self-supervised Learning for Video Correspondence Flow,” In BMVC, 2019. wIB LabgxlN9jYL  cross entropy loss|VondrickD[}

Slide 25

Slide 25 text

Mobility Technologies Co., Ltd. 25 G+D< CorrFlow [Lai+, BMVC’19] (9?>HD< Ablation study  ;0=!#8F.&24'  %/"I- !#3E 28F.& $%%# A,*@ :CB16  KJ 57)  

Slide 26

Slide 26 text

Mobility Technologies Co., Ltd. 26 6*4 #2="3' ()+9 ■ 6*-'  8 52)+< >CycleTime5&70-;06*4? UVC [Li+, NeurIPS’19] X. Li, S. Liu, S. D. Mello, X. Wang, J. Kautz, M.-H. Yang, “Joint-task Self-supervised Learning for Temporal Correspondence,” In NeurIPS, 2019. 8 2:/ %! )+ $.   #2=" 3, 1

Slide 27

Slide 27 text

Mobility Technologies Co., Ltd. 27 )++($)+&colorization%+ UVC [Li+, NeurIPS’19] D9BG 6E U-=R3 U-=TA>FKON40>F  ● # 21 KON40>F<7M?,A@ ● N40>F M?,A;<7M?  LabK/ :P auto-encoder .J → *!"S QLC5I'+ 8H ablation study   

Slide 28

Slide 28 text

Mobility Technologies Co., Ltd. 28 3(B4G ■ =%603(EL1F ■ Concentration lossG<&$+72)  ;, .?MSE  → " =1:8#*/!  C ■ Orthogonal lossGD5'A5'@> <&$9-*  cycle-consistency lossEMSEF UVC [Li+, NeurIPS’19]

Slide 29

Slide 29 text

Mobility Technologies Co., Ltd. 29 31+ UVC [Li+, NeurIPS’19] Concentration loss *$. " Ablation study 1+L7localization moduleO7orthogonal lossC7concentration loss DAVIS-2017 31+ (#&,!$26% '4) 057 1/ -

Slide 30

Slide 30 text

Mobility Technologies Co., Ltd. 30  ablation study  ■   UVC [Li+, NeurIPS’19]

Slide 31

Slide 31 text

Mobility Technologies Co., Ltd. 31  ■   dense tracking   UVC [Li+, NeurIPS’19]

Slide 32

Slide 32 text

Mobility Technologies Co., Ltd. CorrFlow *WS MOU 9F<,=49F (?6T_8 ■ V5$Y@1 self-supervised 9F 2Q AP %R ■ ^)Z> 7.b +Nc&JL GK;B ■ I 0[ /'`X a"' ■ Z> =49F-6T_8 ■  E'6TH]!:C;B;B9F self-supervised 9F =49F D E'6T # 3\ 32 MAST [Lai+, CVPR’20] Z. Lai, E. Lu, W. Xie, “MAST: A Memory-Augmented Self-Supervised Tracker,” In CVPR, 2020.

Slide 33

Slide 33 text

Mobility Technologies Co., Ltd. 33 6G=F+H ■ 'A IEMQ ■ RGBB*0!#"$)CNN( → !#"$) 0 &!#"$D9) %?   :3 >; 1 I /@ trivialJO  ■ !#"$MCN7LabB* 'A K- ■ ;4N<Q ■ I L5. :3;4  ■ I,P28K- → Huber Loss 'A MAST [Lai+, CVPR’20]

Slide 34

Slide 34 text

Mobility Technologies Co., Ltd. 34 #)* ■ X6UB"(*$]%&'^S7H 0 "(*$0- \,?T4VQ VQ!< ■ ROI1\,?T4VQ  VQ!5G _ ■ CorrFlow reference frame targetLRW/ROI ROI1\,?T4F → DZM[ "(*$I-ROI2 ;8 ■ ENOK:"(*$[ reference frame PZC9Z@ Z@ LR> \,?. (response map) VQ ■ Response map  soft-argmax ]YAVQ^ ROI+AJ= ■ Bilinear sampling ROI32 ROI1\,?T4Q2 MAST [Lai+, CVPR’20]

Slide 35

Slide 35 text

Mobility Technologies Co., Ltd. 35 %5 71; ■ I0 , I5 (long term memory), It-5 , It-3 , It-1 (short term memory)  reference frame / ■ Refernce frame 1 /$3  reference frame / fine-tuning 62-; ■ +# ,0).UVC 6.0!'48( ■ 9/" *& ). '4 : MAST [Lai+, CVPR’20]

Slide 36

Slide 36 text

Mobility Technologies Co., Ltd. 36 Ablation study MAST [Lai+, CVPR’20] Lab5$(, 3%-69 Long term memory 3%-69 .:1 hard  #/3% " * -7+ 8 Reference frame 0 5!"2;-7 0&50 )'  -74 

Slide 37

Slide 37 text

Mobility Technologies Co., Ltd.  3%.8 ■ )7 &  *.80,/4 STM:# /4$; ■ )7 (' ('  5-+  generalization gap #/4 1!=?3%.8<> 37 MAST [Lai+, CVPR’20] Youtube-VOS 9"62

Slide 38

Slide 38 text

Mobility Technologies Co., Ltd. N@J:?9057L % 'HEP8VOSMAF>D Saliency modelT?9 7L4=U CAM T;?97L4=U/H ■ Object/instance-level zero-shot VOS (Z-VOS)V !B&#'+2H -OG.&/))S613 RQKC= ■ One-shot VOS (O-VOS)V 1"(*$I S6,G. <"(*$RQ  38 MuG [Lu+, CVPR’20] X. Lu, W. Wang, J. Shen, Y-W. Tai, D. Crandall, S. C. H. Hoi, “Learning Video Object Segmentation from Unlabeled Videos,” In CVPR, 2020.

Slide 39

Slide 39 text

Mobility Technologies Co., Ltd. 4 QB F@58)", (FCN) =T ■ Frame granularity analysis $-0(Ksaliency map   CAM F@*&, NRK cross entropy .!]M ■ Short-term granularity analysis Unsupervised Deep Tracking [Wang+, CVPR’19] 9J `G:ZG:YUSI? cycle- consistency loss ]M ■ Long-term granularity analysis 97N /%+/ 2$-0(A3=P_ 4a2[;V!0,bE>?DNR LC&#,X  ■ Video granularity analysis H6-&,LC^917N'X O7N'\<W=T 39 MuG [Lu+, CVPR’20]

Slide 40

Slide 40 text

Mobility Technologies Co., Ltd. >.C ■ Object-level zero-shot VOSO $&!A0?G?2-/NH ■ Instance-level zero-shot VOSO Mask R-CNN  GrabCut  %%,IM319'J2-/NF@D */ ,IM3

Slide 41

Slide 41 text

Mobility Technologies Co., Ltd. 56,0"5$!   8;27'>  +/% ■ = &)?#.:4 ■ cycle-consistency constraint ( contrastive learning *9 ■ 3- *9 VOSpose trackingvideo part segmentation SoTA<1 41 Space-Time Correspondence as a Contrastive Random Walk [Jabri+, ’20] A. Jabri, A. Owens, A. A. Efros, “Space-Time Correspondence as a Contrastive Random Walk,” In arXiv:2006.14613, 2020.

Slide 42

Slide 42 text

Mobility Technologies Co., Ltd.  !  "  ! + 1  % " #  !   ! + & !   # 42 Space-Time Correspondence as a Contrastive Random Walk [Jabri+, ’20]

Slide 43

Slide 43 text

Mobility Technologies Co., Ltd. ! → ! + $ → ! & cycle-consistency loss -& 43 Space-Time Correspondence as a Contrastive Random Walk [Jabri+, ’20]   #$1 ■ Edge dropout1/'+ dropout-&  .*,%0 ■ Test-time training1  (  ")!)

Slide 44

Slide 44 text

Mobility Technologies Co., Ltd. 44 Space-Time Correspondence as a Contrastive Random Walk [Jabri+, ’20] DAVIS-2017

Slide 45

Slide 45 text

Mobility Technologies Co., Ltd.   ■   https://ajabri.github.io/videowalk/ 45 Space-Time Correspondence as a Contrastive Random Walk [Jabri+, ’20]

Slide 46

Slide 46 text

Mobility Technologies Co., Ltd. 46 ■ Self-supervised  dense tracking S\&-12*t ■ Video colorizationt ■ =>RB%[U #Oj# ■ ^8L_C@q;gEC@JP#Zd ■ W^8Y<3`$# reference frame JP^8HA Ie,')# ■ Cycle-consistency learningt ■ gC@ L_C@JP#Zd ■ i?b9a #S\D6N !4]" ■ 7t ■ =>(/2*.+0UGoVnT7cKFQfE Mp # ■ h`Ie  self-supervised > supervised #Xm   r:5bkls

Slide 47

Slide 47 text

·    Mobility Technologies Co., Ltd.