ラベルなしデータを用いた Dense Tracking の研究事例 / Learning Dense Tracking from Unlabeled Videos

Mobility Technologies Co., Ltd. Dense Tracking
AI Mobility Technologies

Mobility Technologies Co., Ltd. 2 %98 9=tracking (= dense
tracking) )? ;< #>! @4E ▪ Self-supervised /6.AC,-D' 2+/6&3 /6:( ▪ "7 0B15 $*

Mobility Technologies Co., Ltd. 3 1!" - # ,/*!" %3.(
+) ▪ Video Object Segmentation 4 $)21' 5 ▪ Texture tracking ▪ Pose tracking 4 Semantic segmentation pose estimation 0& 5 Dense Tracking Video Object Segmentation Texture tracking Pose tracking

Mobility Technologies Co., Ltd. 4 1# !% $
! %" Video Object Segmentation (VOS) S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixe, D. Cremers, and L. V. Gool. “One-shot video object segmentation,” In CVPR, 2017.

Mobility Technologies Co., Ltd. 5 ▪ DAVIS-2017 ▪ 150
▪ 1 ▪ & ▪ Region overlapping J&ground-truth #$IoU% ▪ Contour accuracy F&!"F

Mobility Technologies Co., Ltd. 6 UnsupervisedVOS#$ # "
!

Mobility Technologies Co., Ltd. 7 JAGM)./$ ▪ Propagation-based approach [Hu+,
’17] [Voigtlaender+, ’19]^ ▪ *" 3I:] ?C7 ▪ Optical flow metric learning \H(-/+QU@F1 → Optical flow @FQU[B0?\HQU6N 2? O4XDP=;7@F ▪ Detection/segmentation-based approach [Caelles+, ’17] [Luiten+, ’18]^ ▪ 8(-/+ detection/segmentation VLKT5R@F1 ▪ >WK&/#9 ,"@F %"'&/#1(-/+ fine-tuning EY GM<ZS!"'= Y.-T. Hu, J.-B. Huang, and A. G. Schwing. “Maskrnn: Instance level video object segmentation,” In NIPS, 2017. P. Voigtlaender, Y. Chai, F. Schroff, H. Adam, B. Leibe, and L.-C. Chen. “Feelvos: Fast end-to-end embedding learning for video object segmentation,” I CVPR, 2019. S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixe ́, D. Cremers, and L. V. Gool. “One-shot video object segmentation,” In CVPR, 2017. J. Luiten, P. Voigtlaender, and B. Leibe. “Premvos: Proposal-generation, refinement and merging for video object segmentation,” In ACCV, 2018.

Mobility Technologies Co., Ltd. 8 837<.K>O ▪ 1C% -
!J(5 : ▪ 0G9 83%'6/I M)A,#% AB#" % $E4FN → 83 7<4F ▪ &7% $4F ▪ 0H%=@ → ;*5DL7<2?+D5 837<.K>

Mobility Technologies Co., Ltd. 9 ▪ '5 =$! 58
*+"(:/2.1 ▪ > 3# ;07%6 <,- &4 proxy)9 Video Colorizaition [Vondrick+, ECCV’18] C. Vondrick, A. Shrivastava, A. Fathi, S. Guadarrama, K. Murpy, “Tracking Emerges by Colorizing Videos,” In ECCV, 2018.

Mobility Technologies Co., Ltd. #$@ ▪ 6-"$ (input frame) 78;=+"$
(reference frame) 78,5 &4 ▪ ?%"$ "$$! ;:'* 3 ;0) /2 78;,5 1< → 78/2>(.9 10 Video Colorizaition [Vondrick+, ECCV’18]

Mobility Technologies Co., Ltd. !#?8 11 Video Colorizaition [Vondrick+, ECCV’18]
K(BJ> EIA3P #,/ 3P#D EIU*7L. A H, ▪ :4S<XR50 @MK(BK6 cross entropy % V-U2T colorizationmultimodal 2T [Zhang+, ’16]W ▪ 9O=U*7L.D "(BGCV1$& F1W); EI i, j U*7 'NU*7D EI+ Q1K(B R. Zhang, P. Isola, A. A. Efros, “Colorful Image Colorization,” In ECCV, 2016.

Mobility Technologies Co., Ltd. 12 0DF?M ▪ /A MKineticsK30!*;8005JL ▪
MResNet-18 + 3D conv 51 647&. G () ▪ 9- C/<H3 +8 ▪ /A5 Lab=Jab#: Video Colorizaition [Vondrick+, ECCV’18] DAVIS-2017 E$@6 Video object segmentation 3>'% optical flow ",2BI4

Mobility Technologies Co., Ltd. 13 %!,( Video Colorizaition [Vondrick+, ECCV’18]
06*7 ▪ '.$+ ▪ 4/ 21" & #53 )-

Mobility Technologies Co., Ltd. 14 ▪ * cycle-consistency )
proxy#&6 (%,+ 4$ 32 5$ 32 ' -'0 ▪ !.% )"1% dense tracking / CycleTime [Wang+, CVPR’19] X. Wang, A. Jabri, A. A. Efro, “Learning Correspondence from the Cycle-consistency of Time,” In CVPR, 2019.

Mobility Technologies Co., Ltd. 15 B ▪ BResNet-50 "52-
&' ▪ <;%4 TB 3) 1$! 2- #'<;70 2- &' ▪ :tracking%4 /@ =.( → A.( 9 <;?*/6 / , >5 +8 CycleTime [Wang+, CVPR’19]

Mobility Technologies Co., Ltd. 16 5I> OL.D CycleTime [Wang+, CVPR’19]
Vondrick :@2? Q,7J0H/ NP ') Rconv × 2 + linearS xy=3 G1P4M;6 ;6$#) 8 B9! %( %( ;K> Vondrick :@2? Q,7J0E$ *AFC+< R -E S $(" ( &

Mobility Technologies Co., Ltd. 17 3 )%<* /A ▪ 97;&,3,
' MSE?5* cycle /@ ▪ > cycle' $- )%<* → :4=!# ▪ 97;&,3, .( 6 "2 → .( +108 CycleTime [Wang+, CVPR’19]

Mobility Technologies Co., Ltd. 18 ▪ ,; IVLOGG11!(73444EH ▪ 0?4
9B2 '1 8 $1 ▪ Video object segmentationpose propagation video colorization ").<D/ CycleTime [Wang+, CVPR’19] ▪ >F5I& =2 6%+*CA-@#3: ,;

Mobility Technologies Co., Ltd. 19 Self-supervised visual tracking (@. (
HF) 2946 ▪ ?I$<&(J 1%("> # $;0!8$7%(" = G :*)BE response map ,/ ▪ Cycle-consistency loss <'((-D ▪ CycleTime C5arXiv3A Unsupervised Deep Tracking [Wang+, CVPR’19] N. Wang, Y. Song, C. Ma, W. Zhou, W. Liu, H. Li, “Unsupervised Deep Tracking,” In CVPR, 2019. '# $+1 L2&

Mobility Technologies Co., Ltd. ?0 8 %5/@ filter flow
3, (= ▪ Filter flowC+. ?#9<A"&9$B'2>7 ▪ Filter flow A- 6 ▪ Filter flow !B:1) optical flow ;01) (=*4 20 mgPFF [Kong+, ’19] S. Kong, C. Fowlkes, “Multigrid Predictive Filter Flow for Unsupervised Learning on Videos,” In arXiv:1904.01693, 2019.

Mobility Technologies Co., Ltd. C5D"0 filter flow 3+@'K=>L7?;
fllter flow 1 → . K,J 11×11!:L ) <$GB9&A 21 mgPFF [Kong+, ’19] 4*H5M ▪ #82M Charbonnier functionM ▪ Forward-backward flow consistencyM I6(E6( / - Charbonnier function ▪ Smoothness constraintsM %F L1 ▪ Sparsity constraintsM L1

Mobility Technologies Co., Ltd. ,*& 22 mgPFF [Kong+, ’19] ▪
VOS !. ) % $("+-# ▪ '# $( *&

Mobility Technologies Co., Ltd. 23 mgPFF [Kong+, ’19]

Mobility Technologies Co., Ltd. 24 VondrickP_ *2"TFP_ RZ ▪ Colour
dropout~JkX:@C$-(/7 .1#+0 OG SH → %"'X;$-(/ 6cOGSH ▪ Restricted attention~reference frame targetdis8{5Mn= ^ qh!"' >` ▪ Scheduled sampling~3Kfb]pl?)02+,&/QKl c \)02+l 4a → vmtr3MHU <tr ▪ Cycle consistency~l4a zWE → uWEnSHyVoAe6c CorrFlow [Lai+, BMVC’19] Z. Lai, W. Xie, “Self-supervised Learning for Video Correspondence Flow,” In BMVC, 2019. wIB LabgxlN9jYL cross entropy loss|VondrickD[}

Mobility Technologies Co., Ltd. 25 G+D< CorrFlow [Lai+, BMVC’19] (9?>HD<
Ablation study ;0=!#8F.&24' %/"I- !#3E 28F.& $%%# A,*@ :CB16 KJ 57)

Mobility Technologies Co., Ltd. 26 6*4 #2="3' ()+9 ▪ 6*-'
8 52)+< >CycleTime5&70-;06*4? UVC [Li+, NeurIPS’19] X. Li, S. Liu, S. D. Mello, X. Wang, J. Kautz, M.-H. Yang, “Joint-task Self-supervised Learning for Temporal Correspondence,” In NeurIPS, 2019. 8 2:/ %! )+ $. #2=" 3, 1

Mobility Technologies Co., Ltd. 27 )++($)+&colorization%+ UVC [Li+, NeurIPS’19] D9BG
6E U-=R3 U-=TA>FKON40>F • # 21 KON40>F<7M?,A@ • N40>F M?,A;<7M? LabK/ :P auto-encoder .J → *!"S QLC5I'+ 8H ablation study

Mobility Technologies Co., Ltd. 28 3(B4G ▪ =%603(EL1F ▪ Concentration
lossG<&$+72) ;, .?MSE → " =1:8#*/! C ▪ Orthogonal lossGD5'A5'@> <&$9-* cycle-consistency lossEMSEF UVC [Li+, NeurIPS’19]

Mobility Technologies Co., Ltd. 29 31+ UVC [Li+, NeurIPS’19] Concentration
loss *$. " Ablation study 1+L7localization moduleO7orthogonal lossC7concentration loss DAVIS-2017 31+ (#&,!$26% '4) 057 1/ -

Mobility Technologies Co., Ltd. 30 ablation study ▪
UVC [Li+, NeurIPS’19]

Mobility Technologies Co., Ltd. 31 ▪ dense
tracking UVC [Li+, NeurIPS’19]

Mobility Technologies Co., Ltd. CorrFlow *WS MOU 9F<,=49F (?6T_8 ▪
V5$Y@1 self-supervised 9F 2Q AP %R ▪ ^)Z> 7.b +Nc&JL GK;B ▪ I 0[ /'`X a"' ▪ Z> =49F-6T_8 ▪ E'6TH]!:C;B;B9F self-supervised 9F =49F D E'6T # 3\ 32 MAST [Lai+, CVPR’20] Z. Lai, E. Lu, W. Xie, “MAST: A Memory-Augmented Self-Supervised Tracker,” In CVPR, 2020.

Mobility Technologies Co., Ltd. 33 6G=F+H ▪ 'A IEMQ ▪
RGBB*0!#"$)CNN( → !#"$) 0 &!#"$D9) %? :3 >; 1 I /@ trivialJO ▪ !#"$MCN7LabB* 'A K- ▪ ;4N<Q ▪ I L5. :3;4 ▪ I,P28K- → Huber Loss 'A MAST [Lai+, CVPR’20]

Mobility Technologies Co., Ltd. 34 #)* ▪ X6UB"(*$]%&'^S7H 0 "(*$0-
\,?T4VQ VQ!< ▪ ROI1\,?T4VQ VQ!5G _ ▪ CorrFlow reference frame targetLRW/ROI ROI1\,?T4F → DZM[ "(*$I-ROI2 ;8 ▪ ENOK:"(*$[ reference frame PZC9Z@ Z@ LR> \,?. (response map) VQ ▪ Response map soft-argmax ]YAVQ^ ROI+AJ= ▪ Bilinear sampling ROI32 ROI1\,?T4Q2 MAST [Lai+, CVPR’20]

Mobility Technologies Co., Ltd. 35 %5 71; ▪ I0 ,
I5 (long term memory), It-5 , It-3 , It-1 (short term memory) reference frame / ▪ Refernce frame 1 /$3 reference frame / fine-tuning 62-; ▪ +# ,0).UVC 6.0!'48( ▪ 9/" *& ). '4 : MAST [Lai+, CVPR’20]

Mobility Technologies Co., Ltd. 36 Ablation study MAST [Lai+, CVPR’20]
Lab5$(, 3%-69 Long term memory 3%-69 .:1 hard #/3% " * -7+ 8 Reference frame 0 5!"2;-7 0&50 )' -74

Mobility Technologies Co., Ltd. 3%.8 ▪ )7 &
*.80,/4 STM:# /4$; ▪ )7 (' (' 5-+ generalization gap #/4 1!=?3%.8<> 37 MAST [Lai+, CVPR’20] Youtube-VOS 9"62

Mobility Technologies Co., Ltd. N@J:?9057L % 'HEP8VOSMAF>D Saliency modelT?9 7L4=U
CAM T;?97L4=U/H ▪ Object/instance-level zero-shot VOS (Z-VOS)V !B&#'+2H -OG.&/))S613 RQKC= ▪ One-shot VOS (O-VOS)V 1"(*$I S6,G. <"(*$RQ 38 MuG [Lu+, CVPR’20] X. Lu, W. Wang, J. Shen, Y-W. Tai, D. Crandall, S. C. H. Hoi, “Learning Video Object Segmentation from Unlabeled Videos,” In CVPR, 2020.

Mobility Technologies Co., Ltd. 4 QB F@58)", (FCN) =T ▪
Frame granularity analysis $-0(Ksaliency map CAM F@*&, NRK cross entropy .!]M ▪ Short-term granularity analysis Unsupervised Deep Tracking [Wang+, CVPR’19] 9J `G:ZG:YUSI? cycle- consistency loss ]M ▪ Long-term granularity analysis 97N /%+/ 2$-0(A3=P_ 4a2[;V!0,bE>?DNR LC&#,X ▪ Video granularity analysis H6-&,LC^917N'X O7N'\<W=T 39 MuG [Lu+, CVPR’20]

Mobility Technologies Co., Ltd. >.C ▪ Object-level zero-shot VOSO $&!A0?G?2-/NH
▪ Instance-level zero-shot VOSO Mask R-CNN GrabCut %%,IM319'J2-/NF@D */ ,IM3<L 8IoU optical flow 4 E$&!%%256; ▪ One-shot VOSO 0$&!B: #N)74 " #(= 40 MuG [Lu+, CVPR’20] DAVIS-2017&O-VOSK+F@

Mobility Technologies Co., Ltd. 56,0"5$! 8;27'> +/%
▪ = &)?#.:4 ▪ cycle-consistency constraint ( contrastive learning *9 ▪ 3- *9 VOSpose trackingvideo part segmentation SoTA<1 41 Space-Time Correspondence as a Contrastive Random Walk [Jabri+, ’20] A. Jabri, A. Owens, A. A. Efros, “Space-Time Correspondence as a Contrastive Random Walk,” In arXiv:2006.14613, 2020.

Mobility Technologies Co., Ltd. ! " !
+ 1 % " # ! ! + & ! # 42 Space-Time Correspondence as a Contrastive Random Walk [Jabri+, ’20]

Mobility Technologies Co., Ltd. ! → ! + $ →
! & cycle-consistency loss -& 43 Space-Time Correspondence as a Contrastive Random Walk [Jabri+, ’20] #$1 ▪ Edge dropout1/'+ dropout-& .*,%0 ▪ Test-time training1 ( ")!)

Mobility Technologies Co., Ltd. 44 Space-Time Correspondence as a Contrastive
Random Walk [Jabri+, ’20] DAVIS-2017

Mobility Technologies Co., Ltd. ▪ https://ajabri.github.io/videowalk/
45 Space-Time Correspondence as a Contrastive Random Walk [Jabri+, ’20]

Mobility Technologies Co., Ltd. 46 ▪ Self-supervised dense tracking
S\&-12*t ▪ Video colorizationt ▪ =>RB%[U #Oj# ▪ ^8L_C@q;gEC@JP#Zd ▪ W^8Y<3`$# reference frame JP^8HA Ie,')# ▪ Cycle-consistency learningt ▪ gC@ L_C@JP#Zd ▪ i?b9a #S\D6N !4]" ▪ 7t ▪ =>(/2*.+0UGoVnT7cKFQfE Mp # ▪ h`Ie self-supervised > supervised #Xm r:5bkls

· Mobility Technologies Co., Ltd.

ラベルなしデータを用いた Dense Tracking の研究事例 / Learning De...

ラベルなしデータを用いた Dense Tracking の研究事例 / Learning Dense Tracking from Unlabeled Videos

More Decks by Naoki Kato

Other Decks in Research

Featured

Transcript