Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[ja] M2Det Description

[ja] M2Det Description

Description of M2Det.
[Zhao et al. 2018] Qijie Zhao, Tao Sheng, Yongtao Wang, Zhi Tang, Ying Chen, Ling Cai
and Haibin Ling 2018. M2Det: A Single-Shot Object Detector based on Multi-Level
Feature Pyramid Network. In AAAI 2019.

Shunta Komatsu

April 17, 2019
Tweet

More Decks by Shunta Komatsu

Other Decks in Technology

Transcript

  1. ࿦จ঺հ M2Det: A Single-Shot Object Detector based on Multi-Level Feature

    Pyramid Network খদ ढ़ଠ ୩ޱݚڀࣨ B4 2019/04/17 খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 1 / 36
  2. ໨࣍ 1 ͸͡Ίʹ 2 ֓ཁ 3 Ϟνϕʔγϣϯ 4 ఏҊख๏ 5

    ݁Ռɾߟ࡯ 6 ݁࿦ 7 ͓ΘΓʹ খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 2 / 36
  3. ͸͡Ίʹ ໨࣍ 1 ͸͡Ίʹ 2 ֓ཁ 3 Ϟνϕʔγϣϯ 4 ఏҊख๏

    5 ݁Ռɾߟ࡯ 6 ݁࿦ 7 ͓ΘΓʹ খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 3 / 36
  4. ͸͡Ίʹ ͲΜͳ࿦จ͔ • AAAI19 Ͱൃද͞Εͨ෺ମݕग़ٕज़ɽ • SSD ΍ YOLO ΑΓॲཧ଎౓ɾݕग़ਫ਼౓͕޲্

    (SOTA)ɽ Figure: Deep learning object detection history. https://github.com/hoya012/deep_learning_object_detection খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 4 / 36
  5. ֓ཁ ໨࣍ 1 ͸͡Ίʹ 2 ֓ཁ 3 Ϟνϕʔγϣϯ 4 ఏҊख๏

    5 ݁Ռɾߟ࡯ 6 ݁࿦ 7 ͓ΘΓʹ খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 6 / 36
  6. ֓ཁ MLFPN ͷఏҊ • MLFPN: Multi-Level Feature Pyramid Network. •

    ͜Ε͕ M2Det ͷϕʔεͰ͋Γɼຊ࿦จͷϙΠϯτɽ • ԼਤͷΑ͏ͳϕϯνϚʔΫΛಘͨ 1ɽ Figure: Speed (ms) vs. accuracy (mAP) on COCO test-dev. 1ԣ࣠͸ਪ࿦࣌ؒ (= 1/fps)ɽॎ࣠͸ IoU ͷᮢ஋Λ 0.5 ͔Β 0.95 ·Ͱ 0.05 ͣͭมԽͨ͠ࡍͷ mAPɽ খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 7 / 36
  7. Ϟνϕʔγϣϯ ໨࣍ 1 ͸͡Ίʹ 2 ֓ཁ 3 Ϟνϕʔγϣϯ 4 ఏҊख๏

    5 ݁Ռɾߟ࡯ 6 ݁࿦ 7 ͓ΘΓʹ খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 8 / 36
  8. Ϟνϕʔγϣϯ ࠷ۙͷ෺ମݕग़ΞʔΩςΫνϟ • εέʔϧෆมͳ෺ମݕग़Ϟσϧ࡞੒ͷͨΊͷख๏ɽ 1 ը૾ϐϥϛου (ಉ͡ը૾Λෳ਺ͷεέʔϧʹϦαΠζͨ͠΋ͷ) Λ࢖͏ɽ 2 ಛ௃ϐϥϛου

    (Ϟσϧͷෳ਺ͷதؒ૚͔ΒಘΒΕΔಛ௃) Λ࢖͏ɽ • ը૾ϐϥϛου͸ϝϞϦޮ཰͕ѱ͘ϦΞϧλΠϜੑʹ͔͚Δɽ • ಛ௃ϐϥϛου͸ϝϞϦͷઅ໿ͱ͍͏఺ͱɼ༷ʑͳ DNN ͱͷ਌࿨ੑ͕ߴ͍఺Ͱଟ͘࢖ ΘΕ͖ͯͨ (e.g. SSD, FPN, STDN)ɽ Figure: Illustrations of four kinds of feature pyramids. খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 9 / 36
  9. Ϟνϕʔγϣϯ ಛ௃ϐϥϛουͷܽ఺ • ಛ௃ϐϥϛουʹ͸ 2 ͭͷܽ఺͕͋Δɽ 1 ͜Ε·Ͱ࢖ΘΕ͖ͯͨόοΫϘʔϯͷωοτϫʔΫ (e.g. VGG16)

    ͸෼ྨλεΫͷͨΊͷ΋ ͷͰ͋Γɼಛ௃ϐϥϛουΛ (΄ͱΜͲ) ͦͷ··༻͍Δ͜ͱͰ͸ݕग़ͷͨΊͷे෼ͳಛ௃ Λநग़Ͱ͖͍ͯͳ͍ͷͰ͸ͳ͍͔ɽ 2 ͜Ε·Ͱ͸ single-level ͳ (ෳࡶ౓ͷ௿͍) ৘ใ͔͠࢖͍ͬͯͳ͍ͷͰɼෳࡶ౓ͷߴ͍ಛ௃ 2 ΋࢖͏͜ͱͰɼΑΓෳࡶͳ෺ମͷݕग़ʹ༗༻Ͱ͸ͳ͍͔ɽ • ͜ͷ࿦จͷ໨త͸ɼطଘͷख๏ͷܽ఺Λิ͑ΔΑ͏ͳɼmulti-scale Ͱ multi-level ͳಛ௃ ϐϥϛουΛ࡞੒͢Δ͜ͱɽ 2Ұൠʹɼਂ͍૚Ͱ͸ߴϨϕϧͷ (ෳࡶͳ) ಛ௃͕ɼઙ͍૚Ͱ͸௿Ϩϕϧ (୯७ͳ) ಛ௃͕ಘΒΕΔɽ খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 10 / 36
  10. ఏҊख๏ ໨࣍ 1 ͸͡Ίʹ 2 ֓ཁ 3 Ϟνϕʔγϣϯ 4 ఏҊख๏

    5 ݁Ռɾߟ࡯ 6 ݁࿦ 7 ͓ΘΓʹ খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 11 / 36
  11. ఏҊख๏ M2Det ͷΞʔΩςΫνϟ Figure: An overview of the proposed M2Det

    (320 ʷ 320). খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 12 / 36
  12. ఏҊख๏ M2Det ͷߏ੒ 1 Backbone network (e.g. VGG16, ResNet-101). 2

    MLFPNɽ 1 FFM (Feature Fusion Module): ೖྗ͞Εͨෳ਺ͷಛ௃Λ߹੒ͯ͠ग़ྗ͢ΔɽMLFPN Ͱ͸ FFMv1 ͱ FFMv2 ͷ 2 छྨɽ 2 TUM (Thinned U-shape Module): ҟͳΔεέʔϧͷಛ௃ϚοϓΛग़ྗ͢Δɽ 3 SFAM (Scale-wise Feature Aggregation Module): εέʔϧ͝ͱͷಛ௃Λ࿈݁ͯ͠ multi-level3 Ͱ multi-scale4 ͳಛ௃Λू໿͢Δɽ 3 Prediction layers. 3ଟ༷ͳෳࡶ౓Λ࣋ͭɽ 4ଟ༷ͳεέʔϧಛ௃Λ࣋ͭɽ খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 13 / 36
  13. ఏҊख๏ Backbone network • ࿦จͰ͸ VGG16 ͱ ResNet-101 Λ࣮૷࣮ͯ͠ݧ͍ͯ͠Δɽ •

    ը૾෼ྨλεΫʹ࢖ΘΕΔΞʔΩςΫνϟͰɼೖྗը૾ͷಛ௃நग़ʹ༻͍Δɽ Figure: [Backbone network] An overview of the proposed M2Det (320 ʷ 320). খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 14 / 36
  14. ఏҊख๏ MLFPN (1) FFMv1 Figure: [FFMv1] An overview of the

    proposed M2Det (320 ʷ 320). খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 15 / 36
  15. ఏҊख๏ MLFPN (1) FFMv1 • όοΫϘʔϯ͔ΒಘΒΕͨಛ௃ϚοϓΛ߹੒͢Δ͜ͱͰ base feature Λநग़͢Δɽ •

    ೖྗಛ௃ͷνϟϯωϧѹॖͷͨΊͷ৞ΈࠐΈ૚ͱਂ͍ಛ௃Λಉ͡εέʔϧʹ߹ΘͤΔͨ ΊͷΞοϓαϯϓϧ૚͔Βߏ੒͞ΕΔ (Լਤ)ɽ • (ิ଍) Figure: Structural details of FFMv1. খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 16 / 36
  16. ఏҊख๏ MLFPN (2) TUM Figure: [TUM] An overview of the

    proposed M2Det (320 ʷ 320). খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 17 / 36
  17. ఏҊख๏ MLFPN (2) TUM • Τϯίʔμɾσίʔμߏ੒ͰɼFFMv2 (࠷ॳ͸ base feature) ͷग़ྗ͔ΒϚϧνεέʔϧ

    ͳಛ௃ϐϥϛουΛੜ੒͢Δɽ • σίʔμͷ֤૚͕ରԠ͢ΔΤϯίʔμͷ૚ͱɼ1 ͭલͷ૚ΛΞοϓαϯϓϧͨ͠΋ͷΛ ଍͠Θͤɼ৞ΈࠐΜͰग़ྗ͞ΕΔɽ Figure: Structural details of TUM. খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 18 / 36
  18. ఏҊख๏ MLFPN (3) FFMv2 Figure: [FFMv2] An overview of the

    proposed M2Det (320 ʷ 320). খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 19 / 36
  19. ఏҊख๏ MLFPN (3) FFMv2 • લͷ TUM ͷղ૾౓͕࠷େͷग़ྗͱɼFFMv1 ͔Βܨ͕Δ base

    feature Λ݁߹ͯ࣍͠ͷ TUM ʹग़ྗ͢Δɽ • ஈΛॏͶΔ͝ͱʹΑΓਂ͍ಛ௃Λநग़Ͱ͖ΔΑ͏ʹͳΔɽ Figure: Structural details of FFMv2. খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 20 / 36
  20. ఏҊख๏ MLFPN (4) SFAM Figure: [SFAM] An overview of the

    proposed M2Det (320 ʷ 320). খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 21 / 36
  21. ఏҊख๏ MLFPN (4) SFAM • TUM ʹΑͬͯੜ੒͞Εͨෳ਺ͷಛ௃ϐϥϛουΛू໿͢Δɽ • ಉ͡εέʔϧͷಛ௃Ϛοϓ͔Βνϟϯωϧํ޲ʹ݁߹Λͯ͠ɼglobal average

    pooling ʹ ΑΓ৘ใͷѹॖΛ͠ɼνϟϯωϧ͝ͱͷಛ௃ΛదԠతʹ࠶ௐ੔͢Δ 5ɽ Figure: Structural details of SFAM. 5SE Block (Hu et al. 2017) ͱݺ͹ΕΔॲཧɽ খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 22 / 36
  22. ఏҊख๏ Prediction layers • SFAM ʹΑͬͯू໿͞Εͨಛ௃ϐϥϛου͔Β෺ମͷΫϥε༧ଌٴͼҐஔճؼΛߦ͏ɽ • NMS6 Λద༻ͯ͠͞Βʹ੍౓ΛߴΊ͍ͯΔɽ Figure:

    [Prediction layers] An overview of the proposed M2Det (320 ʷ 320). 6NMS (Girshick et al. 2013), Soft-NMS (Bodla et al. 2017) ಉ͡Ϋϥεʹਪଌ͞Εͨॏͳ͍ͬͯΔྖҬΛ཈ ੍͢Δख๏ɽ খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 23 / 36
  23. ݁Ռɾߟ࡯ ໨࣍ 1 ͸͡Ίʹ 2 ֓ཁ 3 Ϟνϕʔγϣϯ 4 ఏҊख๏

    5 ݁Ռɾߟ࡯ 6 ݁࿦ 7 ͓ΘΓʹ খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 24 / 36
  24. ݁Ռɾߟ࡯ ੑೳධՁ • ਪఆ଎౓ (ԣ࣠) ͸଎͘ɼฏۉਖ਼ղ཰ (ॎ ࣠) ͸ߴ͘ͳ͍ͬͯΔɽ •

    single-scale ਪ࿦Ͱ mAP: 41.0ɼ multi-scale Ͱ mAP: 44.2 ͕ಘΒΕɼ SOTA Λୡ੒ɽ Figure: (Reappeared) Speed (ms) vs. accuracy (mAP) on COCO test-dev. খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 25 / 36
  25. ݁࿦ ໨࣍ 1 ͸͡Ίʹ 2 ֓ཁ 3 Ϟνϕʔγϣϯ 4 ఏҊख๏

    5 ݁Ռɾߟ࡯ 6 ݁࿦ 7 ͓ΘΓʹ খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 27 / 36
  26. ͓ΘΓʹ ໨࣍ 1 ͸͡Ίʹ 2 ֓ཁ 3 Ϟνϕʔγϣϯ 4 ఏҊख๏

    5 ݁Ռɾߟ࡯ 6 ݁࿦ 7 ͓ΘΓʹ খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 29 / 36
  27. ͓ΘΓʹ ఏҊɾײ૝ ఏҊ ▶ YOLOv3 ͸ Ϋϥε෼ྨͷࡍʹ softmax ؔ਺Ͱ͸ͳ͘ϩδεςΟ οΫؔ਺ʹมߋ

    7 ͍ͯ͠Δɽ ▶ YOLOv3 ͷ࿦จʹΑΔͱɼϩδεςΟ οΫؔ਺Λ࢖͏͜ͱͰɼOpen Images (Google) ͷΑ ͏ͳɼCOCO ΍ VOC ΑΓ΋ෳࡶͳσʔληοτ 8 ʹରԠͰ͖Δͱͷ͜ͱͳͷͰɼϩδε ςΟ οΫؔ਺ͷಋೖ͕ΑΓҰൠతͳղऍ (multi-label) ʹܨ͕ΔͷͰ͸ͳ͍͔ɽ ײ૝ ▶ multi-level ͳಛ௃ͷ༗༻ੑΛ஌Δ͜ͱ͕Ͱ͖ͨɽ ▶ Object detection ͷݱ࣌఺Ͱͷ SOTA ͷΞʔΩςΫνϟΛ͋Δఔ౓ཧղͰ͖ͨͷͰɼ࿦จ தͰ঺հ͞Ε͍ͯΔଞͷΞʔΩςΫνϟʹ͍ͭͯ΋࿦จͳͲʹ໨Λ௨͍ͨ͠ɽ 7ଛࣦؔ਺͸ MSE ͔Β֤ϥϕϧʹର͢Δ binary cross-entropy ʹมߋɽ 8Open Images Ͱ͸ɼ“ਓؒ” ͱ “ঁੑ” ͷΑ͏ʹॏෳͨ͠ϥϕϧ͕ଘࡏ͢Δɽ খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 30 / 36
  28. ͓ΘΓʹ ࢀߟจݙ I [Zhao et al. 2018] Qijie Zhao, Tao

    Sheng, Yongtao Wang, Zhi Tang, Ying Chen, Ling Cai and Haibin Ling 2018. M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network. In AAAI 2019. [Lin et al. 2016] Tsung-Yi Lin, Piotr Doll´ ar, Ross Girshick, Kaiming He, Bharath Hariharan and Serge Belongie 2016. Feature Pyramid Networks for Object Detection. In CVPR. [Zhou et al. 2018] Peng Zhou, Bingbing Ni, Cong Geng, Jianguo Hu and Yi Xu 2018. Scale-Transferrable Object Detection. In CVPR 2018. [Hu et al. 2017] Jie Hu, Li Shen, Samuel Albanie, Gang Sun and Enhua Wu 2017. Squeeze-and-Excitation Networks. In CVPR 2018. [Girshick et al. 2013] Ross Girshick, Jeff Donahue, Trevor Darrell and Jitendra Malik 2013. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR 2014. খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 31 / 36
  29. ͓ΘΓʹ ࢀߟจݙ II [Bodla et al. 2017] Navaneeth Bodla, Bharat

    Singh, Rama Chellappa and Larry S. Davis 2017. Soft-NMS – Improving Object Detection With One Line of Code. In CVPR 2017. [Liu et al. 2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu and Alexander C. Berg 2016. SSD: Single Shot MultiBox Detector. In ECCV 2016. [Fu et al. 2017] Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi and Alexander C. Berg 2018. DSSD : Deconvolutional Single Shot Detector. In 2017. [Redmon and Farhadi 2018] Joseph Redmon and Ali Farhadi 2018. YOLOv3: An Incremental Improvement. In 2018. খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 32 / 36
  30. ෇࿥ FFMv1 detail Figure: Additional description of FFMv1. খদ ढ़ଠ

    (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 33 / 36
  31. ෇࿥ Experimental results Figure: (More detailed) Speed (ms) vs. accuracy

    (mAP) on COCO test-dev. খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 34 / 36
  32. ෇࿥ What is U-Net • Ұൠతͳ CNN Ͱ͸ɼϓʔϦϯάʹΑͬͯ෺ମͷશ ମతͳҐஔ৘ใΛ΅͔͢͜ͱͰɼҐஔζϨ΍େ͖ ͞ͷҧ͍Λڐ༰͍ͯ͠Δɽ

    • ҰํͰɼྖҬநग़Ͱ͸෺ମͷہॴతಛ௃Αશମత Ґஔ৘ใͷ྆ऀΛը૾͔Βநग़͢Δඞཁ͕͋Γɼ ϓʔϦϯά૚Ͱᐆດʹͳͬͨہॴతͳಛ௃Λਖ਼֬ ʹ෮ݩ͢Δඞཁ͕͋Δɽ • Upsampling (Unpooling): ಛ௃ͷ࣍ݩΛ্͛Δɽ • Merge: ಛ௃Λอ࣋ͨ͠··ը૾Λେ͖͘͢ΔͨΊ ʹɼಉ͡େ͖͞ͷಛ௃Λஈ֊తʹϚʔδ͢Δ͜ͱ Ͱɼہॴతಛ௃Λอ࣋ͨ͠··શମతҐஔ৘ใͷ ෮ݩ͕Մೳɽ Figure: U-Net architecture. https://lp-tech.net/articles/5MIeh খদ ढ़ଠ (୩ޱݚڀࣨ B4) ࿦จ঺հ 2019/04/17 35 / 36