Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
Deep Clustering for Unsupervised Learning of Visual Features 2018.08.08 (ԭೄ) AI ؔ࿈จಡΈձ #4
Slide 2
Slide 2 text
@hoto17296 • ͪΎΒσʔλגࣜձࣾ • Web / Πϯϑϥ / σʔλੳ
Slide 3
Slide 3 text
ࠓճͷจɿ Deep Clustering for Unsupervised Learning of Visual Features • https://arxiv.org/abs/1807.05520 • Facebook AI Research • Accepted at ECCV 2018
Slide 4
Slide 4 text
֓ཁ • CNN Ͱը૾ͷΫϥελϦϯάΛ͢Δख๏ • CNN ͷग़ྗΛ k-means ͰΫϥελϦϯάͨ݁͠ՌΛ “ِϥϕϧ” ͱͯ͠ѻ͍ɺωοτϫʔΫͷॏΈΛߋ৽͢Δ • ͱͯྑ͍ੑೳ͕ग़ͨ • Pascal VOC ʹΑΔධՁͰଞͷΞϧΰϦζϜΛ͑Δੑೳ • ؤ݈ੑ͕͋Δ • σʔληοτΛม͑ͯେৎ (ImageNet → YFCC100) • ωοτϫʔΫߏΛม͑ͯେৎ (AlexNet → VGG16) • k-means Ҏ֎ͷΫϥελϦϯάΞϧΰϦζϜͰେৎ
Slide 5
Slide 5 text
1. എܠ
Slide 6
Slide 6 text
ImageNet • (༗໊ͳ) ը૾σʔληοτ • 100ສຕΛ͑Δը૾ • 1000Ϋϥεʹϥϕϧ͚͕͞Ε͍ͯΔ • ը૾ྨΞϧΰϦζϜͷධՁͳͲͷ༻్Ͱ Α͘༻͍ΒΕΔ
Slide 7
Slide 7 text
ImageNet ͷ՝ • “ͨͬͨͷ” 100ສຕ͔͠ͳ͍ • ਓؒͷ͚ͨϥϕϧʹҰ෦ޡΓ͕͋Δ ↓ ۙɺը૾ྨϞσϧͷੑೳ͕಄ଧͪʹͳ͍ͬͯΔͷ σʔληοτʹཁҼ͕͋Δͷͱߟ͑ΒΕ͍ͯΔ
Slide 8
Slide 8 text
ImageNet ͷ՝ͷղܾࡦ • Πϯλʔωοτنͷը૾σʔληοτ • ਓ͕ؒϥϕϧ͚͠ͳ͍ ↓ ڭࢣͳֶ͠शʹΑͬͯ͜ΕΛ࣮ݱ͍ͨ͠
Slide 9
Slide 9 text
2. લఏ
Slide 10
Slide 10 text
ڭࢣ (͋Γ|ͳ͠) ֶश • ڭࢣ͋Γֶश (Supervised Learning) • ֶशσʔλʹڭࢣϥϕϧ͕͍͍ͯΔ • ྨճؼͳͲ • ڭࢣͳֶ͠श (Unsupervised Learning) • ֶशσʔλʹڭࢣϥϕϧ͕͍͍ͯͳ͍ • ΫϥελϦϯά • AutoEncoder
Slide 11
Slide 11 text
ࣗݾڭࢣ͋Γֶश (Self-Supervised Learning) • ڭࢣͳֶ͠शͷҰछ (ཁग़య) • ԿΒ͔ͷํ๏Ͱ “ِͷϥϕϧ” Λ༻ҙ͠ɺ ͦΕΛڭࢣϥϕϧͱݟֶཱͯͯशΛߦ͏
Slide 12
Slide 12 text
3. ख๏
Slide 13
Slide 13 text
epoch ͷྲྀΕ 3. k-means ͰΫϥελϦϯά 1. ೖྗΛ CNN ͰϑΥϫʔυ 4. ΫϥελϦϯά݁ՌΛ “ِϥϕϧ” ͱͯ͠ޡࠩΛܭࢉ 5. ωοτϫʔΫͷॏΈΛߋ৽ 2. CNN ͷग़ྗ݁ՌΛ PCA Ͱѹॖ
Slide 14
Slide 14 text
ٙ ͏·͘ΫϥελϦϯάͰ͖ΔΘ͚ͳ͘ͳ͍ʁʁʁ ॳظঢ়ଶͰωοτϫʔΫΛશֶ͘शͤͯ͞ͳ͍ͷʹɺ
Slide 15
Slide 15 text
Ͱ͖ΔΒ͍͠ 5IFHPPEQFSGPSNBODFPGSBOEPNDPOWOFUTJTJOUJNBUFMZUJFEUPUIFJS DPOWPMVUJPOBMTUSVDUVSFXIJDIHJWFTBTUSPOHQSJPSPOUIFJOQVUTJHOBM (ֶश͍ͤͯ͞ͳ͍) ϥϯμϜͳ CNN Ͱ͋ͬͯྑ͍ੑೳ͕ग़ͤΔͷɺ ೖྗ৴߸ʹڧ͍ࣄલΛ༩͑ΔΈࠐΈߏ ͕ີʹ͍ؔͯ͠Δɻ ❓❓❓
Slide 16
Slide 16 text
ผͷݚڀ [26] • ύϥϝʔλ͕ϥϯμϜͳ AlexNet ʹ ImageNet σʔληοτͰྨΛߦͬͨ • ग़ྗϥϯμϜʹͳΔͱ͢Εɺ (ImageNet 1000ΫϥεྨͳͷͰ) ਫ਼ͷظ 0.1 %ͱͳΔ • ͔࣮͠͠ࡍʹɺظΛང͔ʹ͑Δ 12 %ͷਫ਼Λग़ͨ͠
Slide 17
Slide 17 text
ͨͿΜ͜͏͍͏͜ͱ ύϥϝʔλ͕ϥϯμϜͰ͋ͬͯɺCNN ͷߏͦͷͷ͕ ʮͳΜ͔ͦΕͬΆ͍Λग़ྗ͢ΔʯྗΛ͍࣋ͬͯΔ
Slide 18
Slide 18 text
4. ͦͷଞ͍Ζ͍Ζ
Slide 19
Slide 19 text
͍͔ͭ͘ͷͷճආ • શ෦ͻͱͭͷΫϥελʹೖͬͯ͠·͏ • ۭͷΫϥελ͕͋ͬͨ߹ॏ৺ΛҠಈͤͯ͞ ΫϥελΛ࠶ܭࢉ͢Δ͜ͱͰղܾ • ِϥϕϧ͕ภΔ • ڭࢣ͋ΓֶशͰϥϕϧͷ͕ภ͍ͬͯΔͱ͖ʹ ى͖Δͷͱಉ͡ • ِϥϕϧͷத͔ΒҰ༷ʹαϯϓϦϯάͯ͠ ֶशͤ͞Δ͜ͱͰղܾ
Slide 20
Slide 20 text
࣮ͷৄࡉ (1/2) • CNN ʹඪ४తͳ AlexNet Λ༻͍ͨ • Local Response Normalization Batch Normalization ʹೖΕସ͑ͨ • ৭ใΛͦͷ··ѻ͏ͷ͕͍͠ • Sobel filter (※ ྠֲநग़) ʹجͮ͘ઢܗมʹΑͬͯ ৭Λআ͠ίϯτϥετΛڧௐ͍ͯ͠Δ • ImageNet ͷը૾ Data Augmentation ͯ͠ೖྗͨ͠ • mini batch size 256 ʹͨ͠
Slide 21
Slide 21 text
࣮ͷৄࡉ (2/2) • 500 epoch Λֶश͢Δͷʹ P100 GPU Λͬͯ 12 ͔͔ͬͨ • ࣮ߦ࣌ؒશମͷ 1/3 k-means ͷॲཧ࣌ؒ • ΫϥελϦϯά͢ΔલʹશσʔλΛ Forward ͢Δඞཁ͕ ͋ΔͷͰͲ͏͕͔͔ͯ࣌ؒ͠Δ
Slide 22
Slide 22 text
5. ༷ʑͳ࣮ݧɾߟ
Slide 23
Slide 23 text
ิɿਖ਼نԽ૬ޓใྔ (NMI) • Normalized Mutual Information • ͋ΔΫϥελϦϯά݁Ռ A ͱ ผͷΫϥελϦϯά݁Ռ B ͕ ͲΕ͚ͩࣅ௨͍ͬͯΔ͔ΛදݱͰ͖Δ
Slide 24
Slide 24 text
ImageNet ϥϕϧͱͷൺֱ • DeepCluster ʹΑΔΫϥελϦϯά݁Ռͱ ImageNet ͷϥϕϧͷ NMI ͷਪҠ • epoch ͕ਐΉʹͭΕͯ ࣅ௨ͬͯ͘Δ
Slide 25
Slide 25 text
Ϋϥελͷ҆ఆੑ • ͋Δը૾͕ɺ࣍ͷ epoch Ͱಉ͡Ϋϥελʹ ׂΓͯΒΕΔׂ߹ (= ҆ఆੑ) • epoch ͕ਐΉʹͭΕ ҆ఆੑ͕૿͢ • 0.8 ҎԼͰ͢Δ • ͦΕҎ্ͷֶश ҙຯ͕ͳ͍
Slide 26
Slide 26 text
ΫϥελʹΑΔੑೳͷҧ͍ • mAP ͱ͍͏ํ๏ (ʁ) ͰྨੑೳΛܭଌͨ͠ • k = 10,000 Ͱ࠷ੑೳ͕ྑ͔ͬͨ • ImageNet Ͱ͋Ε k = 1,000 ͕ ྑ͍ͷͰͳ͍͔ͱߟ͕͕͑ͪͩɺ աͳηάϝϯςʔγϣϯͷ ΄͏͕͍͍݁ՌΛग़ͨ͠
Slide 27
Slide 27 text
৭ͷআڈʹΑΔࣝผೳྗͷҧ͍ • Լͷը૾ɺCNN ͷ࠷ॳͷΛՄࢹԽͨ͠ͷ • ৭ใΛͦͷ··ೖྗͨ͠߹ (ࠨ) ɺ ৭ʹؔ͢Δใ͔ࣝ͠ผ͍ͯ͠ͳ͍ • Sobel filter Ͱ৭ใΛมͨ͠߹ (ӈ) ɺ ΤοδΛࣝผ͍ͯ͠Δ
Slide 28
Slide 28 text
CNN ͷ֤͝ͱͷߟ • Լͷը૾ɺ֤Ͱ࠷Ԡͷྑ͔ͬͨը૾ TOP 9 • ਂ͍ʹͳΔ΄Ͳେ͖ͳύλʔϯΛೝ͍ࣝͯ͠Δ (༧௨Γ) • ࠷ޙͷ (conv5) ɺલͷ·ͰͰೝࣝͨ͜͠ͱΛ ࠶ೝ͍ࣝͯ͠͠ΔΑ͏ʹݟ͑Δ • (AlexNet ʹ͓͍ͯ) ࠷ޙͷ (conv5) ଞͷͱ ಛ͕ҟͳΔͱ͍͏ผͷݚڀ݁Ռ [43] Λཪ͚͍ͯΔ
Slide 29
Slide 29 text
֤ͷྨੑೳ (1/3) • ্Ґ n ·Ͱͷग़ྗ͔ΒઢܗྨثΛ࡞Δ • ImageNet ͱ Place σʔληοτͰͷྨੑೳΛධՁ͢Δ
Slide 30
Slide 30 text
֤ͷྨੑೳ (2/3)
Slide 31
Slide 31 text
֤ͷྨੑೳ (3/3) • DeepCluster ߴ͍ϨΠϠͰͷੑೳ͕ྑ͍ • conv3 ͷੑೳ͕ͱͯྑ͍ • ͳΜͱ conv5 ΑΓྑ͍ • ҰํͰ conv1 ͷੑೳ͕શ͘ྑ͘ͳ͍ • DeepCluster Ͱɺconv3-conv4 Ͱ ImageNet ͷ ϥϕϧʹ૬͢ΔͷΛೝ͍ࣝͯ͠ΔͷͰͳ͍͔
Slide 32
Slide 32 text
Pascal VOC ʹΑΔධՁ (1/3) • Pascal VOC: ྨɾମݕग़ɾϥϕϧ͚ Λߦ͏ίϯϖ • DeepCluster ΛͬͯΛղ͘͜ͱͰੑೳΛධՁ͢Δ • ମݕग़ͷ࣮ʹ Fast R-CNN Λ༻͍ͨ
Slide 33
Slide 33 text
Pascal VOC ʹΑΔධՁ (2/3)
Slide 34
Slide 34 text
Pascal VOC ʹΑΔධՁ (3/3) • ྨɾମݕग़ɾϥϕϧ͚ ͯ͢ʹ͓͍ͯੑೳ͕ྑ͍ • ڵຯਂ͍ͱͯ͠ɺfine-tuned (?) ͳϥϯμϜωοτϫʔΫ ͦΕͳΓͷਫ਼Λग़͕͢ɺશ݁߹ 6-8 ͷΈΛֶशͨ͠߹ ͷੑೳ͔ͳΓ͘ͳΔ • ͜ΕΒͷλεΫ fine-tuning Ͱ͖ͳ͍߹Ͱݱ࣮ͷ ΞϓϦέʔγϣϯͱۙ͘ͳΔ • ͦͷ߹ɺ࠷৽ͷख๏ͱͷࠩߋʹେ͖͘ͳΔͩΖ͏ (ྨͰ࠷େ 9%) ( ˘ω˘) .oO ( ͪΐͬͱԿݴͬͯΔ͔Θ͔ΒΜ͔ͬͨ )
Slide 35
Slide 35 text
6. ·ͱΊ
Slide 36
Slide 36 text
֓ཁ (࠶ܝ) • CNN Ͱը૾ͷΫϥελϦϯάΛ͢Δख๏ • CNN ͷग़ྗΛ k-means ͰΫϥελϦϯάͨ݁͠ՌΛ “ِϥϕϧ” ͱͯ͠ѻ͍ɺωοτϫʔΫͷॏΈΛߋ৽͢Δ • ͱͯྑ͍ੑೳ͕ग़ͨ • Pascal VOC ʹΑΔධՁͰଞͷΞϧΰϦζϜΛ͑Δੑೳ • ؤ݈ੑ͕͋Δ • σʔληοτΛม͑ͯେৎ (ImageNet → YFCC100) • ωοτϫʔΫߏΛม͑ͯେৎ (AlexNet → VGG16) • k-means Ҏ֎ͷΫϥελϦϯάΞϧΰϦζϜͰେৎ