Slide 1

Slide 1 text

Deep Clustering
 for Unsupervised Learning
 of Visual Features 2018.08.08 (ԭೄ) AI ؔ࿈࿦จಡΈձ #4

Slide 2

Slide 2 text

@hoto17296 • ͪΎΒσʔλגࣜձࣾ • Web ԰ / Πϯϑϥ԰ / σʔλ෼ੳ԰

Slide 3

Slide 3 text

ࠓճͷ࿦จɿ
 Deep Clustering for Unsupervised Learning
 of Visual Features • https://arxiv.org/abs/1807.05520 • Facebook AI Research • Accepted at ECCV 2018

Slide 4

Slide 4 text

֓ཁ • CNN Ͱը૾ͷΫϥελϦϯάΛ͢Δख๏ • CNN ͷग़ྗΛ k-means ͰΫϥελϦϯάͨ݁͠ՌΛ
 “ِϥϕϧ” ͱͯ͠ѻ͍ɺωοτϫʔΫͷॏΈΛߋ৽͢Δ • ͱͯ΋ྑ͍ੑೳ͕ग़ͨ • Pascal VOC ʹΑΔධՁͰଞͷΞϧΰϦζϜΛ௒͑Δੑೳ • ؤ݈ੑ͕͋Δ • σʔληοτΛม͑ͯ΋େৎ෉ (ImageNet → YFCC100) • ωοτϫʔΫߏ଄Λม͑ͯ΋େৎ෉ (AlexNet → VGG16) • k-means Ҏ֎ͷΫϥελϦϯάΞϧΰϦζϜͰ΋େৎ෉

Slide 5

Slide 5 text

1. എܠ

Slide 6

Slide 6 text

ImageNet • (௒༗໊ͳ) ը૾σʔληοτ • 100ສຕΛ௒͑Δը૾ • 1000Ϋϥεʹϥϕϧ෇͚͕͞Ε͍ͯΔ • ը૾෼ྨΞϧΰϦζϜͷධՁͳͲͷ༻్Ͱ
 Α͘༻͍ΒΕΔ

Slide 7

Slide 7 text

ImageNet ͷ՝୊ • “ͨͬͨͷ” 100ສຕ͔͠ͳ͍ • ਓؒͷ෇͚ͨϥϕϧʹҰ෦ޡΓ͕͋Δ ↓ ۙ೥ɺը૾෼ྨϞσϧͷੑೳ͕಄ଧͪʹͳ͍ͬͯΔͷ͸
 σʔληοτʹཁҼ͕͋Δ΋ͷͱߟ͑ΒΕ͍ͯΔ

Slide 8

Slide 8 text

ImageNet ͷ՝୊ͷղܾࡦ • Πϯλʔωοτن໛ͷը૾σʔληοτ • ਓ͕ؒϥϕϧ෇͚͠ͳ͍ ↓ ڭࢣͳֶ͠शʹΑͬͯ͜ΕΛ࣮ݱ͍ͨ͠

Slide 9

Slide 9 text

2. લఏ

Slide 10

Slide 10 text

ڭࢣ (͋Γ|ͳ͠) ֶश • ڭࢣ͋Γֶश (Supervised Learning) • ֶशσʔλʹڭࢣϥϕϧ͕෇͍͍ͯΔ • ෼ྨ΍ճؼͳͲ • ڭࢣͳֶ͠श (Unsupervised Learning) • ֶशσʔλʹڭࢣϥϕϧ͕෇͍͍ͯͳ͍ • ΫϥελϦϯά • AutoEncoder

Slide 11

Slide 11 text

ࣗݾڭࢣ͋Γֶश (Self-Supervised Learning) • ڭࢣͳֶ͠शͷҰछ (ཁग़య) • ԿΒ͔ͷํ๏Ͱ “ِͷϥϕϧ” Λ༻ҙ͠ɺ
 ͦΕΛڭࢣϥϕϧͱݟֶཱͯͯशΛߦ͏

Slide 12

Slide 12 text

3. ख๏

Slide 13

Slide 13 text

epoch ͷྲྀΕ 3. k-means ͰΫϥελϦϯά 1. ೖྗΛ CNN ͰϑΥϫʔυ 4. ΫϥελϦϯά݁ՌΛ
 “ِϥϕϧ” ͱͯ͠ޡࠩΛܭࢉ 5. ωοτϫʔΫͷॏΈΛߋ৽ 2. CNN ͷग़ྗ݁ՌΛ PCA Ͱѹॖ

Slide 14

Slide 14 text

ٙ໰ ͏·͘ΫϥελϦϯάͰ͖ΔΘ͚ͳ͘ͳ͍ʁʁʁ ॳظঢ়ଶͰ͸ωοτϫʔΫΛશֶ͘शͤͯ͞ͳ͍ͷʹɺ

Slide 15

Slide 15 text

Ͱ͖ΔΒ͍͠ 5IFHPPEQFSGPSNBODFPGSBOEPNDPOWOFUTJTJOUJNBUFMZUJFEUPUIFJS DPOWPMVUJPOBMTUSVDUVSFXIJDIHJWFTBTUSPOHQSJPSPOUIFJOQVUTJHOBM (ֶश͍ͤͯ͞ͳ͍) ϥϯμϜͳ CNN Ͱ͋ͬͯ΋ྑ͍ੑೳ͕ग़ͤΔͷ͸ɺ ೖྗ৴߸ʹڧ͍ࣄલ෼෍Λ༩͑Δ৞ΈࠐΈߏ଄ ͕ີ઀ʹؔ܎͍ͯ͠Δɻ ❓❓❓

Slide 16

Slide 16 text

ผͷݚڀ [26] • ύϥϝʔλ͕ϥϯμϜͳ AlexNet ʹ
 ImageNet σʔληοτͰ෼ྨΛߦͬͨ • ग़ྗ΋ϥϯμϜʹͳΔͱ͢Ε͹ɺ
 (ImageNet ͸1000Ϋϥε෼ྨͳͷͰ)
 ਫ਼౓ͷظ଴஋͸ 0.1 %ͱͳΔ • ͔࣮͠͠ࡍʹ͸ɺظ଴஋Λང͔ʹ௒͑Δ
 12 %ͷਫ਼౓Λग़ͨ͠

Slide 17

Slide 17 text

ͨͿΜ͜͏͍͏͜ͱ ύϥϝʔλ͕ϥϯμϜͰ͋ͬͯ΋ɺCNN ͷߏ଄ͦͷ΋ͷ͕
 ʮͳΜ͔ͦΕͬΆ͍஋Λग़ྗ͢ΔʯྗΛ͍࣋ͬͯΔ

Slide 18

Slide 18 text

4. ͦͷଞ͍Ζ͍Ζ

Slide 19

Slide 19 text

͍͔ͭ͘ͷ໰୊ͷճආ • શ෦ͻͱͭͷΫϥελʹೖͬͯ͠·͏໰୊ • ۭͷΫϥελ͕͋ͬͨ৔߹͸ॏ৺ΛҠಈͤͯ͞
 ΫϥελΛ࠶ܭࢉ͢Δ͜ͱͰղܾ • ِϥϕϧ਺͕ภΔ໰୊ • ڭࢣ͋ΓֶशͰϥϕϧͷ਺͕ภ͍ͬͯΔͱ͖ʹ
 ى͖Δͷͱಉ͡໰୊ • ِϥϕϧͷத͔ΒҰ༷ʹαϯϓϦϯάͯ͠
 ֶशͤ͞Δ͜ͱͰղܾ

Slide 20

Slide 20 text

࣮૷ͷৄࡉ (1/2) • CNN ʹ͸ඪ४తͳ AlexNet Λ༻͍ͨ • Local Response Normalization ૚͸
 Batch Normalization ૚ʹೖΕସ͑ͨ • ৭৘ใΛͦͷ··ѻ͏ͷ͕೉͍͠ • Sobel filter (※ ྠֲநग़) ʹجͮ͘ઢܗม׵ʹΑͬͯ
 ৭Λ࡟আ͠ίϯτϥετΛڧௐ͍ͯ͠Δ • ImageNet ͷը૾͸ Data Augmentation ͯ͠ೖྗͨ͠ • mini batch size ͸ 256 ʹͨ͠

Slide 21

Slide 21 text

࣮૷ͷৄࡉ (2/2) • 500 epoch Λֶश͢Δͷʹ P100 GPU Λ࢖ͬͯ
 12 ೔͔͔ͬͨ • ࣮ߦ࣌ؒશମͷ 1/3 ͸ k-means ͷॲཧ࣌ؒ • ΫϥελϦϯά͢ΔલʹશσʔλΛ Forward ͢Δඞཁ͕
 ͋ΔͷͰͲ͏ͯ͠΋͕͔͔࣌ؒΔ

Slide 22

Slide 22 text

5. ༷ʑͳ࣮ݧɾߟ࡯

Slide 23

Slide 23 text

ิ଍ɿਖ਼نԽ૬ޓ৘ใྔ (NMI) • Normalized Mutual Information • ͋ΔΫϥελϦϯά݁Ռ A ͱ
 ผͷΫϥελϦϯά݁Ռ B ͕
 ͲΕ͚ͩࣅ௨͍ͬͯΔ͔ΛදݱͰ͖Δ

Slide 24

Slide 24 text

ImageNet ϥϕϧͱͷൺֱ • DeepCluster ʹΑΔΫϥελϦϯά݁Ռͱ
 ImageNet ͷϥϕϧͷ NMI ͷਪҠ • epoch ͕ਐΉʹͭΕͯ
 ࣅ௨ͬͯ͘Δ

Slide 25

Slide 25 text

Ϋϥελͷ҆ఆੑ • ͋Δը૾͕ɺ࣍ͷ epoch Ͱ΋ಉ͡Ϋϥελʹ
 ׂΓ౰ͯΒΕΔׂ߹ (= ҆ఆੑ) • epoch ͕ਐΉʹͭΕ
 ҆ఆੑ͕૿͢ • 0.8 ҎԼͰ๞࿨͢Δ • ͦΕҎ্ͷֶश͸
 ҙຯ͕ͳ͍

Slide 26

Slide 26 text

Ϋϥελ਺ʹΑΔੑೳͷҧ͍ • mAP ͱ͍͏ํ๏ (ʁ) Ͱ෼ྨੑೳΛܭଌͨ͠ • k = 10,000 Ͱ࠷΋ੑೳ͕ྑ͔ͬͨ • ImageNet Ͱ͋Ε͹ k = 1,000 ͕
 ྑ͍ͷͰ͸ͳ͍͔ͱߟ͕͕͑ͪͩɺ
 ա৒ͳηάϝϯςʔγϣϯͷ
 ΄͏͕͍͍݁ՌΛग़ͨ͠

Slide 27

Slide 27 text

৭ͷআڈʹΑΔࣝผೳྗͷҧ͍ • Լͷը૾͸ɺCNN ͷ࠷ॳͷ૚ΛՄࢹԽͨ͠΋ͷ • ৭৘ใΛͦͷ··ೖྗͨ͠৔߹ (ࠨ) ͸ɺ
 ৭ʹؔ͢Δ৘ใ͔ࣝ͠ผ͍ͯ͠ͳ͍ • Sobel filter Ͱ৭৘ใΛม׵ͨ͠৔߹ (ӈ) ͸ɺ
 ΤοδΛࣝผ͍ͯ͠Δ

Slide 28

Slide 28 text

CNN ͷ֤૚͝ͱͷߟ࡯ • Լͷը૾͸ɺ֤૚Ͱ࠷΋൓Ԡͷྑ͔ͬͨը૾ TOP 9 • ਂ͍૚ʹͳΔ΄Ͳେ͖ͳύλʔϯΛೝ͍ࣝͯ͠Δ (༧૝௨Γ) • ࠷ޙͷ૚ (conv5) ͸ɺલͷ૚·ͰͰೝࣝͨ͜͠ͱΛ
 ࠶౓ೝࣝ͠௚͍ͯ͠ΔΑ͏ʹ΋ݟ͑Δ • (AlexNet ʹ͓͍ͯ) ࠷ޙͷ૚ (conv5) ͸ଞͷ૚ͱ͸
 ಛ௃͕ҟͳΔͱ͍͏ผͷݚڀ݁Ռ [43] Λཪ෇͚͍ͯΔ

Slide 29

Slide 29 text

֤૚ͷ෼ྨੑೳ (1/3) • ্Ґ n ૚·Ͱͷग़ྗ͔Βઢܗ෼ྨثΛ࡞Δ • ImageNet ͱ Place σʔληοτͰͷ෼ྨੑೳΛධՁ͢Δ

Slide 30

Slide 30 text

֤૚ͷ෼ྨੑೳ (2/3)

Slide 31

Slide 31 text

֤૚ͷ෼ྨੑೳ (3/3) • DeepCluster ͸ߴ͍ϨΠϠͰͷੑೳ͕ྑ͍ • conv3 ͷੑೳ͕ͱͯ΋ྑ͍ • ͳΜͱ conv5 ΑΓ΋ྑ͍ • ҰํͰ conv1 ͷੑೳ͕શ͘ྑ͘ͳ͍ • DeepCluster Ͱ͸ɺconv3-conv4 Ͱ ImageNet ͷ
 ϥϕϧʹ૬౰͢Δ΋ͷΛೝ͍ࣝͯ͠ΔͷͰ͸ͳ͍͔

Slide 32

Slide 32 text

Pascal VOC ʹΑΔධՁ (1/3) • Pascal VOC: ෼ྨɾ෺ମݕग़ɾϥϕϧ෇͚ Λߦ͏ίϯϖ • DeepCluster Λ࢖ͬͯ໰୊Λղ͘͜ͱͰੑೳΛධՁ͢Δ • ෺ମݕग़ͷ࣮૷ʹ͸ Fast R-CNN Λ༻͍ͨ

Slide 33

Slide 33 text

Pascal VOC ʹΑΔධՁ (2/3)

Slide 34

Slide 34 text

Pascal VOC ʹΑΔධՁ (3/3) • ෼ྨɾ෺ମݕग़ɾϥϕϧ෇͚ ͢΂ͯʹ͓͍ͯੑೳ͕ྑ͍ • ڵຯਂ͍఺ͱͯ͠ɺfine-tuned (?) ͳϥϯμϜωοτϫʔΫ͸ ͦΕͳΓͷਫ਼౓Λग़͕͢ɺશ݁߹૚ 6-8 ͷΈΛֶशͨ͠৔߹ ͷੑೳ͸͔ͳΓ௿͘ͳΔ • ͜ΕΒͷλεΫ͸ fine-tuning Ͱ͖ͳ͍৔߹Ͱݱ࣮ͷ
 ΞϓϦέʔγϣϯͱۙ͘ͳΔ • ͦͷ৔߹ɺ࠷৽ͷख๏ͱͷࠩ͸ߋʹେ͖͘ͳΔͩΖ͏ (෼ྨͰ࠷େ 9%) ( ˘ω˘) .oO ( ͪΐͬͱԿݴͬͯΔ͔Θ͔ΒΜ͔ͬͨ )

Slide 35

Slide 35 text

6. ·ͱΊ

Slide 36

Slide 36 text

֓ཁ (࠶ܝ) • CNN Ͱը૾ͷΫϥελϦϯάΛ͢Δख๏ • CNN ͷग़ྗΛ k-means ͰΫϥελϦϯάͨ݁͠ՌΛ
 “ِϥϕϧ” ͱͯ͠ѻ͍ɺωοτϫʔΫͷॏΈΛߋ৽͢Δ • ͱͯ΋ྑ͍ੑೳ͕ग़ͨ • Pascal VOC ʹΑΔධՁͰଞͷΞϧΰϦζϜΛ௒͑Δੑೳ • ؤ݈ੑ͕͋Δ • σʔληοτΛม͑ͯ΋େৎ෉ (ImageNet → YFCC100) • ωοτϫʔΫߏ଄Λม͑ͯ΋େৎ෉ (AlexNet → VGG16) • k-means Ҏ֎ͷΫϥελϦϯάΞϧΰϦζϜͰ΋େৎ෉