DeepCluster 論文の紹介

A61adad507d7bc0ee5b52ebe333abed7?s=47 Yuki Ishikawa
August 08, 2018

DeepCluster 論文の紹介

Facebook AI Research による論文「Deep Clustering
 for Unsupervised Learning
 of Visual Features」の解説資料
https://arxiv.org/abs/1807.05520

A61adad507d7bc0ee5b52ebe333abed7?s=128

Yuki Ishikawa

August 08, 2018
Tweet

Transcript

  1. Deep Clustering
 for Unsupervised Learning
 of Visual Features 2018.08.08 (ԭೄ)

    AI ؔ࿈࿦จಡΈձ #4
  2. @hoto17296 • ͪΎΒσʔλגࣜձࣾ • Web ԰ / Πϯϑϥ԰ / σʔλ෼ੳ԰

  3. ࠓճͷ࿦จɿ
 Deep Clustering for Unsupervised Learning
 of Visual Features •

    https://arxiv.org/abs/1807.05520 • Facebook AI Research • Accepted at ECCV 2018
  4. ֓ཁ • CNN Ͱը૾ͷΫϥελϦϯάΛ͢Δख๏ • CNN ͷग़ྗΛ k-means ͰΫϥελϦϯάͨ݁͠ՌΛ
 “ِϥϕϧ”

    ͱͯ͠ѻ͍ɺωοτϫʔΫͷॏΈΛߋ৽͢Δ • ͱͯ΋ྑ͍ੑೳ͕ग़ͨ • Pascal VOC ʹΑΔධՁͰଞͷΞϧΰϦζϜΛ௒͑Δੑೳ • ؤ݈ੑ͕͋Δ • σʔληοτΛม͑ͯ΋େৎ෉ (ImageNet → YFCC100) • ωοτϫʔΫߏ଄Λม͑ͯ΋େৎ෉ (AlexNet → VGG16) • k-means Ҏ֎ͷΫϥελϦϯάΞϧΰϦζϜͰ΋େৎ෉
  5. 1. എܠ

  6. ImageNet • (௒༗໊ͳ) ը૾σʔληοτ • 100ສຕΛ௒͑Δը૾ • 1000Ϋϥεʹϥϕϧ෇͚͕͞Ε͍ͯΔ • ը૾෼ྨΞϧΰϦζϜͷධՁͳͲͷ༻్Ͱ


    Α͘༻͍ΒΕΔ
  7. ImageNet ͷ՝୊ • “ͨͬͨͷ” 100ສຕ͔͠ͳ͍ • ਓؒͷ෇͚ͨϥϕϧʹҰ෦ޡΓ͕͋Δ ↓ ۙ೥ɺը૾෼ྨϞσϧͷੑೳ͕಄ଧͪʹͳ͍ͬͯΔͷ͸
 σʔληοτʹཁҼ͕͋Δ΋ͷͱߟ͑ΒΕ͍ͯΔ

  8. ImageNet ͷ՝୊ͷղܾࡦ • Πϯλʔωοτن໛ͷը૾σʔληοτ • ਓ͕ؒϥϕϧ෇͚͠ͳ͍ ↓ ڭࢣͳֶ͠शʹΑͬͯ͜ΕΛ࣮ݱ͍ͨ͠

  9. 2. લఏ

  10. ڭࢣ (͋Γ|ͳ͠) ֶश • ڭࢣ͋Γֶश (Supervised Learning) • ֶशσʔλʹڭࢣϥϕϧ͕෇͍͍ͯΔ •

    ෼ྨ΍ճؼͳͲ • ڭࢣͳֶ͠श (Unsupervised Learning) • ֶशσʔλʹڭࢣϥϕϧ͕෇͍͍ͯͳ͍ • ΫϥελϦϯά • AutoEncoder
  11. ࣗݾڭࢣ͋Γֶश (Self-Supervised Learning) • ڭࢣͳֶ͠शͷҰछ (ཁग़య) • ԿΒ͔ͷํ๏Ͱ “ِͷϥϕϧ” Λ༻ҙ͠ɺ


    ͦΕΛڭࢣϥϕϧͱݟֶཱͯͯशΛߦ͏
  12. 3. ख๏

  13. epoch ͷྲྀΕ 3. k-means ͰΫϥελϦϯά 1. ೖྗΛ CNN ͰϑΥϫʔυ 4.

    ΫϥελϦϯά݁ՌΛ
 “ِϥϕϧ” ͱͯ͠ޡࠩΛܭࢉ 5. ωοτϫʔΫͷॏΈΛߋ৽ 2. CNN ͷग़ྗ݁ՌΛ PCA Ͱѹॖ
  14. ٙ໰ ͏·͘ΫϥελϦϯάͰ͖ΔΘ͚ͳ͘ͳ͍ʁʁʁ ॳظঢ়ଶͰ͸ωοτϫʔΫΛશֶ͘शͤͯ͞ͳ͍ͷʹɺ

  15. Ͱ͖ΔΒ͍͠ 5IFHPPEQFSGPSNBODFPGSBOEPNDPOWOFUTJTJOUJNBUFMZUJFEUPUIFJS DPOWPMVUJPOBMTUSVDUVSFXIJDIHJWFTBTUSPOHQSJPSPOUIFJOQVUTJHOBM (ֶश͍ͤͯ͞ͳ͍) ϥϯμϜͳ CNN Ͱ͋ͬͯ΋ྑ͍ੑೳ͕ग़ͤΔͷ͸ɺ ೖྗ৴߸ʹڧ͍ࣄલ෼෍Λ༩͑Δ৞ΈࠐΈߏ଄ ͕ີ઀ʹؔ܎͍ͯ͠Δɻ ❓❓❓

  16. ผͷݚڀ [26] • ύϥϝʔλ͕ϥϯμϜͳ AlexNet ʹ
 ImageNet σʔληοτͰ෼ྨΛߦͬͨ • ग़ྗ΋ϥϯμϜʹͳΔͱ͢Ε͹ɺ


    (ImageNet ͸1000Ϋϥε෼ྨͳͷͰ)
 ਫ਼౓ͷظ଴஋͸ 0.1 %ͱͳΔ • ͔࣮͠͠ࡍʹ͸ɺظ଴஋Λང͔ʹ௒͑Δ
 12 %ͷਫ਼౓Λग़ͨ͠
  17. ͨͿΜ͜͏͍͏͜ͱ ύϥϝʔλ͕ϥϯμϜͰ͋ͬͯ΋ɺCNN ͷߏ଄ͦͷ΋ͷ͕
 ʮͳΜ͔ͦΕͬΆ͍஋Λग़ྗ͢ΔʯྗΛ͍࣋ͬͯΔ

  18. 4. ͦͷଞ͍Ζ͍Ζ

  19. ͍͔ͭ͘ͷ໰୊ͷճආ • શ෦ͻͱͭͷΫϥελʹೖͬͯ͠·͏໰୊ • ۭͷΫϥελ͕͋ͬͨ৔߹͸ॏ৺ΛҠಈͤͯ͞
 ΫϥελΛ࠶ܭࢉ͢Δ͜ͱͰղܾ • ِϥϕϧ਺͕ภΔ໰୊ • ڭࢣ͋ΓֶशͰϥϕϧͷ਺͕ภ͍ͬͯΔͱ͖ʹ


    ى͖Δͷͱಉ͡໰୊ • ِϥϕϧͷத͔ΒҰ༷ʹαϯϓϦϯάͯ͠
 ֶशͤ͞Δ͜ͱͰղܾ
  20. ࣮૷ͷৄࡉ (1/2) • CNN ʹ͸ඪ४తͳ AlexNet Λ༻͍ͨ • Local Response

    Normalization ૚͸
 Batch Normalization ૚ʹೖΕସ͑ͨ • ৭৘ใΛͦͷ··ѻ͏ͷ͕೉͍͠ • Sobel filter (※ ྠֲநग़) ʹجͮ͘ઢܗม׵ʹΑͬͯ
 ৭Λ࡟আ͠ίϯτϥετΛڧௐ͍ͯ͠Δ • ImageNet ͷը૾͸ Data Augmentation ͯ͠ೖྗͨ͠ • mini batch size ͸ 256 ʹͨ͠
  21. ࣮૷ͷৄࡉ (2/2) • 500 epoch Λֶश͢Δͷʹ P100 GPU Λ࢖ͬͯ
 12

    ೔͔͔ͬͨ • ࣮ߦ࣌ؒશମͷ 1/3 ͸ k-means ͷॲཧ࣌ؒ • ΫϥελϦϯά͢ΔલʹશσʔλΛ Forward ͢Δඞཁ͕
 ͋ΔͷͰͲ͏ͯ͠΋͕͔͔࣌ؒΔ
  22. 5. ༷ʑͳ࣮ݧɾߟ࡯

  23. ิ଍ɿਖ਼نԽ૬ޓ৘ใྔ (NMI) • Normalized Mutual Information • ͋ΔΫϥελϦϯά݁Ռ A ͱ


    ผͷΫϥελϦϯά݁Ռ B ͕
 ͲΕ͚ͩࣅ௨͍ͬͯΔ͔ΛදݱͰ͖Δ
  24. ImageNet ϥϕϧͱͷൺֱ • DeepCluster ʹΑΔΫϥελϦϯά݁Ռͱ
 ImageNet ͷϥϕϧͷ NMI ͷਪҠ •

    epoch ͕ਐΉʹͭΕͯ
 ࣅ௨ͬͯ͘Δ
  25. Ϋϥελͷ҆ఆੑ • ͋Δը૾͕ɺ࣍ͷ epoch Ͱ΋ಉ͡Ϋϥελʹ
 ׂΓ౰ͯΒΕΔׂ߹ (= ҆ఆੑ) • epoch

    ͕ਐΉʹͭΕ
 ҆ఆੑ͕૿͢ • 0.8 ҎԼͰ๞࿨͢Δ • ͦΕҎ্ͷֶश͸
 ҙຯ͕ͳ͍
  26. Ϋϥελ਺ʹΑΔੑೳͷҧ͍ • mAP ͱ͍͏ํ๏ (ʁ) Ͱ෼ྨੑೳΛܭଌͨ͠ • k = 10,000

    Ͱ࠷΋ੑೳ͕ྑ͔ͬͨ • ImageNet Ͱ͋Ε͹ k = 1,000 ͕
 ྑ͍ͷͰ͸ͳ͍͔ͱߟ͕͕͑ͪͩɺ
 ա৒ͳηάϝϯςʔγϣϯͷ
 ΄͏͕͍͍݁ՌΛग़ͨ͠
  27. ৭ͷআڈʹΑΔࣝผೳྗͷҧ͍ • Լͷը૾͸ɺCNN ͷ࠷ॳͷ૚ΛՄࢹԽͨ͠΋ͷ • ৭৘ใΛͦͷ··ೖྗͨ͠৔߹ (ࠨ) ͸ɺ
 ৭ʹؔ͢Δ৘ใ͔ࣝ͠ผ͍ͯ͠ͳ͍ •

    Sobel filter Ͱ৭৘ใΛม׵ͨ͠৔߹ (ӈ) ͸ɺ
 ΤοδΛࣝผ͍ͯ͠Δ
  28. CNN ͷ֤૚͝ͱͷߟ࡯ • Լͷը૾͸ɺ֤૚Ͱ࠷΋൓Ԡͷྑ͔ͬͨը૾ TOP 9 • ਂ͍૚ʹͳΔ΄Ͳେ͖ͳύλʔϯΛೝ͍ࣝͯ͠Δ (༧૝௨Γ) •

    ࠷ޙͷ૚ (conv5) ͸ɺલͷ૚·ͰͰೝࣝͨ͜͠ͱΛ
 ࠶౓ೝࣝ͠௚͍ͯ͠ΔΑ͏ʹ΋ݟ͑Δ • (AlexNet ʹ͓͍ͯ) ࠷ޙͷ૚ (conv5) ͸ଞͷ૚ͱ͸
 ಛ௃͕ҟͳΔͱ͍͏ผͷݚڀ݁Ռ [43] Λཪ෇͚͍ͯΔ
  29. ֤૚ͷ෼ྨੑೳ (1/3) • ্Ґ n ૚·Ͱͷग़ྗ͔Βઢܗ෼ྨثΛ࡞Δ • ImageNet ͱ Place

    σʔληοτͰͷ෼ྨੑೳΛධՁ͢Δ
  30. ֤૚ͷ෼ྨੑೳ (2/3)

  31. ֤૚ͷ෼ྨੑೳ (3/3) • DeepCluster ͸ߴ͍ϨΠϠͰͷੑೳ͕ྑ͍ • conv3 ͷੑೳ͕ͱͯ΋ྑ͍ • ͳΜͱ

    conv5 ΑΓ΋ྑ͍ • ҰํͰ conv1 ͷੑೳ͕શ͘ྑ͘ͳ͍ • DeepCluster Ͱ͸ɺconv3-conv4 Ͱ ImageNet ͷ
 ϥϕϧʹ૬౰͢Δ΋ͷΛೝ͍ࣝͯ͠ΔͷͰ͸ͳ͍͔
  32. Pascal VOC ʹΑΔධՁ (1/3) • Pascal VOC: ෼ྨɾ෺ମݕग़ɾϥϕϧ෇͚ Λߦ͏ίϯϖ •

    DeepCluster Λ࢖ͬͯ໰୊Λղ͘͜ͱͰੑೳΛධՁ͢Δ • ෺ମݕग़ͷ࣮૷ʹ͸ Fast R-CNN Λ༻͍ͨ
  33. Pascal VOC ʹΑΔධՁ (2/3)

  34. Pascal VOC ʹΑΔධՁ (3/3) • ෼ྨɾ෺ମݕग़ɾϥϕϧ෇͚ ͢΂ͯʹ͓͍ͯੑೳ͕ྑ͍ • ڵຯਂ͍఺ͱͯ͠ɺfine-tuned (?)

    ͳϥϯμϜωοτϫʔΫ͸ ͦΕͳΓͷਫ਼౓Λग़͕͢ɺશ݁߹૚ 6-8 ͷΈΛֶशͨ͠৔߹ ͷੑೳ͸͔ͳΓ௿͘ͳΔ • ͜ΕΒͷλεΫ͸ fine-tuning Ͱ͖ͳ͍৔߹Ͱݱ࣮ͷ
 ΞϓϦέʔγϣϯͱۙ͘ͳΔ • ͦͷ৔߹ɺ࠷৽ͷख๏ͱͷࠩ͸ߋʹେ͖͘ͳΔͩΖ͏ (෼ྨͰ࠷େ 9%) ( ˘ω˘) .oO ( ͪΐͬͱԿݴͬͯΔ͔Θ͔ΒΜ͔ͬͨ )
  35. 6. ·ͱΊ

  36. ֓ཁ (࠶ܝ) • CNN Ͱը૾ͷΫϥελϦϯάΛ͢Δख๏ • CNN ͷग़ྗΛ k-means ͰΫϥελϦϯάͨ݁͠ՌΛ


    “ِϥϕϧ” ͱͯ͠ѻ͍ɺωοτϫʔΫͷॏΈΛߋ৽͢Δ • ͱͯ΋ྑ͍ੑೳ͕ग़ͨ • Pascal VOC ʹΑΔධՁͰଞͷΞϧΰϦζϜΛ௒͑Δੑೳ • ؤ݈ੑ͕͋Δ • σʔληοτΛม͑ͯ΋େৎ෉ (ImageNet → YFCC100) • ωοτϫʔΫߏ଄Λม͑ͯ΋େৎ෉ (AlexNet → VGG16) • k-means Ҏ֎ͷΫϥελϦϯάΞϧΰϦζϜͰ΋େৎ෉