Slide 1

Slide 1 text

Matryoshka Representation Learning Graduate School of Informatics, Nagoya University, Japan. ൃදऀ: Hayato Tsukagoshi Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan
 William Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, Ali Farhadi
 NeurIPS 2022
 https://arxiv.org/abs/2205.13147

Slide 2

Slide 2 text

•දݱֶशͰ͸ॴ๬ͷຒΊࠐΈ࣍ݩ਺
 ͝ͱʹϞσϧͷ܇࿅͕ඞཁ •ಉҰϞσϧͰෳ਺࣍ݩͷຒΊࠐΈදݱ
 Λग़ྗͰ͖ΔΑ͏ͳख๏ΛఏҊ •܇࿅ํ๏: ෦෼తͳຒΊࠐΈʹ
 ͍ͭͯͷଛࣦΛ֊૚తʹܭࢉ • “ϛχ”ຒΊࠐΈͷଛࣦͷ࿨ΛऔΔ •෼ྨɾݕࡧλεΫʹ͓͚ΔධՁͰੑೳΛ͋Δఔ౓ҡ࣋͠ͳ͕Β
 ຒΊࠐΈͷ࣍ݩ࡟ݮʹ੒ޭ͍ͯ͠Δ͜ͱΛ֬ೝ ֓ཁ 2

Slide 3

Slide 3 text

•OpenAIͷEmbedding APIͰಋೖ͞ΕΔͳͲ࠷ۙࢢຽݖΛ֫ಘͭͭ͋͠Δٕ ज़͔ͩΒ • ͦͷׂʹશવղઆ͕ଘࡏ͠ͳ͍ •ίϯηϓτ͕Θ͔Γ΍͘͢ࠓޙ͓͍֮͑ͯͯଛ͸ͳͦ͞͏ͳٕज़ͷͨΊ ໔੹ࣄ߲ •εϥΠυதͷਤද͸֤εϥΠυͰݴٴ͞Ε͍ͯΔ࿦จ͔ΒͷҾ༻Ͱ͢ •࿦จதͷ਺ࣜͱ͸ҟͳΔจࣈΛ࢖͍ͬͯΔ৔߹͕͋Γ·͢ બఆཧ༝ 3

Slide 4

Slide 4 text

දݱֶश (Representation Learning) •छʑͷػցֶशλεΫʹ༗༻ͳಛ௃දݱΛಘΔख๏ɾٕज़ • ྫ: େن໛ͳϥϕϧ෇͖ը૾σʔληοτʹΑΔϞσϧֶश (ResNet) • ྫ: ݴޠ—ը૾ϖΞΛ༻͍ͨରরֶशʹΑΔϚϧνϞʔμϧຒΊࠐΈ (CLIP) • ྫ: ݴޠ—ݴޠϖΞΛ༻͍ͨରরֶशʹΑΔςΩετຒΊࠐΈ (E5) •ಘΒΕΔ΋ͷ: ͋ΔࣄྫΛೖΕͨ࣌ʹྑ͍ײ͡ͷϕΫτϧΛग़ྗ͢ΔϞσϧ ໰୊ •ಘΒΕΔຒΊࠐΈදݱͷϕΫτϧ࣍ݩ਺͸Ϟσϧ܇࿅࣌ͷ΋ͷ͔ΒมߋෆՄ • ࣍ݩ਺͕େ͖͍ͱอଘίετɾݕࡧίετɾछʑͷॲཧίετ͕େ͖͍ දݱֶशͱͦͷ໰୊ 4

Slide 5

Slide 5 text

֓ཁɾ௚ײతཧղ •ຒΊࠐΈϕΫτϧͷҰ෦͚ͩͰ΋͏·͍͜ͱػೳ͢ΔΑ͏ʹ͢Δٕज़ •ओͳ༻్͸දݱֶश͕ͩɺͦͷଞͷϑϨʔϜϫʔΫʹ΋ద༻Մೳ ֶशํ๏ •ຒΊࠐΈϕΫτϧͷҰ෦͚ͩ࢖͏ֶश΋͓ͯ͘͠ ͍͍ͱ͜Ζ •ಛʹෳࡶͳ͜ͱΛ͠ͳͯ͘΋͍͍ײ͡ʹಈ͘ •࠷ऴ૚ͷग़ྗͷຒΊࠐΈΛద౰ͳ࣍ݩͰ͖Δ͚ͩͷ͓ख࣮ܰ૷ Matryoshka Representation Learning: MRL🪆 5

Slide 6

Slide 6 text

1. ੾ΓऔΓ͍ͨ࣍ݩ਺Λ༻ҙ 2. େݩͷຒΊࠐΈϕΫτϧΛ༻ҙ 3. ॱ൪ʹ੾Γऔͬͯ׬੒ MatryoshkaຒΊࠐΈͷ࡞Γํ (͓खܰ) 6

Slide 7

Slide 7 text

1. ੾ΓऔΓ͍ͨ࣍ݩ਺Λ༻ҙ 2. େݩͷຒΊࠐΈϕΫτϧΛ༻ҙ 3. ॱ൪ʹຒΊࠐΈͱ෼ྨ༻ߦྻΛ੾ΓऔͬͯlogitΛܭࢉ MatryoshkaຒΊࠐΈΛ࢖ͬͨ෼ྨ 7 ෼ྨ༻ͷઢܗ૚΋ڞ༗͢Δύλʔϯ
 E ff i cient MRL: MRL-E

Slide 8

Slide 8 text

ଛࣦܭࢉ: CrossEntropyLossͷ৔߹ 8 ݸʑͷ෦෼ຒΊࠐΈͷଛࣦΛ
 ܭࢉͯ͠࿨ΛऔΔ͚ͩ

Slide 9

Slide 9 text

ଛࣦܭࢉ: CrossEntropyLossͷ৔߹ 9 ࣍ݩ͝ͱʹॏཁ౓΋ઃఆՄೳ

Slide 10

Slide 10 text

ଛࣦܭࢉ: CrossEntropyLossͷ৔߹ 10 ࣍ݩ͝ͱʹॏཁ౓΋ઃఆՄೳ

Slide 11

Slide 11 text

ํ਑ •දݱֶशΛߦ͍֫ಘ͞ΕͨຒΊࠐΈͷ඼࣭ΛධՁ ෼ྨλεΫ •ຒΊࠐΈ͔ΒΫϥε෼ྨΛߦ͏ઢܗ૚Λ܇࿅ͯ͠෼ྨ (Linear Probing) •෼ྨର৅ࣄྫͷ࠷ۙ๣ࣄྫͷΫϥε΁෼ྨ (1-NN) ݕࡧλεΫ •ը૾ݕࡧʹ͓͚Δਫ਼౓ͱܭࢉίετ(FLOPS)ͷτϨʔυΦϑΛධՁ ධՁ࣮ݧ 11

Slide 12

Slide 12 text

•ਂ૚ֶशϞσϧ͸૚͕ਂ͘ͳΔ΄Ͳֶश͕೉͘͠ͳΔ • ޯ഑ফࣦɾޯ഑രൃ౳ͷӨڹ •௚ײతʹɺਂ૚ֶशʹ͓͍ͯϞσϧͷ૚͕ਂ͘ͳΔ΄Ͳ
 ֤૚ͷʮ࢓ࣄʯ͸ͪΐͬ͜ͱʹͳΔ • ͜ͷͪΐͬ͜ͱΛϥϯμϜॳظԽ͔Βֶश͢Δͷ͕೉͍͠ •࢒ࠩ઀ଓ (Residual Connection) Λ༻͍ͨ
 Ϟσϧߏ଄ΛఏҊ • ग़ྗ͕ෆཁͳΒθϩʹ௵ͤ͹Α͘ඇઢܗͰ΋ֶश͕؆୯ •ߴੑೳɾߴֶश҆ఆੑΛ࣮ݱ He et al., Deep Residual Learning for Image Recognition, CVPR 2016 ؔ࿈ݚڀ: ResNet 12

Slide 13

Slide 13 text

•ը૾෼໺ʹTransformerΛ
 ಋೖͨ͠ݚڀ •ೖྗը૾Λ͍͔ͭ͘ͷ“ύον”
 ʹ෼ׂ͠ɺύονͷຒΊࠐΈΛ
 ࡞੒ˠTransformerͰॲཧ •BERTͱಉ༷mask͞Εͨύον
 Λ༧ଌ͢ΔΑ͏ʹ܇࿅ • Masked Patch Prediction Dosovitskiy et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR 2021 ؔ࿈ݚڀ: ViT (Vision Transformer) 13

Slide 14

Slide 14 text

•Vision & LanguageͷຒΊࠐΈϞσϧ • ը૾΋ݴޠ΋ಉۭؒ͡ʹຒΊࠐΉ •noisyͳը૾—ݴޠσʔληοτ
 Ͱେن໛ʹରরֶश • ಉ࣌ظʹొ৔ͨ͠CLIPͱ
 ࣅཱͨͪҐஔͷݚڀ͕ͩΑΓ
 σʔληοτͷن໛͕େ͖͍ •ը૾ΛݴޠͰݕࡧ͢Δɺը૾ʹݴޠΛ଍ͯ͠ݕࡧ͢ΔɺͳͲ͕Մೳʹ Jia et al., Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision, ICML 2021 ؔ࿈ݚڀ: ALIGN 14

Slide 15

Slide 15 text

•࣍ݩ͝ͱʹݸผ܇࿅ͨ͠Ϟσϧ (Fixed Feature: FF) • OracleతͳཱͪҐஔ͕ͩMRL͕͜ΕΛ্ճΔՄೳੑ΋͋Δ •SVDʹΑΔ࣍ݩ࡟ݮ •Slim. Net • MRLͱࣅͨख๏ɺதؒදݱ΋ॖখ͢Δ •Rand. LP • ݩͷ࣍ݩ਺ͷେ͖ͳຒΊࠐΈ͔ΒϥϯμϜʹ࣍ݩΛબ୒ͯ͠࢖༻ ൺֱख๏ 15 ࡞ऀ஫: Fixed Feature͸ඇ௚ײతͰͳΜͱͳ͘ඍົͳ໊લͱ͍͏ؾ΋͢Δ͕ɺMRL͕ fl exibleͳͷͰͦΕͱରൺతͳ໊લʹͨ͠ͱਪଌ

Slide 16

Slide 16 text

•தؒ૚ʹ͓͚Δग़ྗ΋ॖখ͢ΔMRLͷΑ͏ͳख๏ɾΞʔΩςΫνϟ • ٯʹMRL͕͜Εͷಛघέʔε (࠷ऴ૚͔͠Slimʹ͠ͳ͍) Yu et al., Slimmable Neural Networks, ICLR 2019 ؔ࿈ݚڀ: Slimmable Neural Networks (Slim. Net) 16

Slide 17

Slide 17 text

•ϥϕϧ෇͖ը૾σʔληοτͰදݱֶश •ImageNet-1K: 128ສࣄྫɾ1000Ϋϥε •MRL͸ݸผ܇࿅(FF)ͱಉ౳ੑೳ • SVDʹΑΔ࣍ݩ࡟ݮ΍ϥϯμϜ
 બ୒ΑΓ΋Α͍ੑೳ •MRL͸2048→64͘Β͍·Ͱੑೳ͕
 શવམ͍ͪͯͳ͍ • ࠷ۙ๣୳ࡧͰ͸͜ͷ͘Β͍·Ͱ
 ࡟ͬͯ΋໰୊ͳ͘ಈ࡞͢Δʁ •Slim. Netͱͷൺֱ͸ܭࢉίετ͕
 ߹ͬͯϋζͰζϧ͍ؾ͕͢Δ͕… ImageNet-1K: ResNet 50 / 1-NN Accuracy 17

Slide 18

Slide 18 text

•ΑΓେن໛ͳσʔληοτͰදݱֶश • JFT-300M: 3ԯࣄྫɾ1.8ສΫϥε • ALIGN: 18ԯͷը૾—ςΩετϖΞ •MRL͸࣍ݩ࡟ݮͯ͠΋ߴੑೳ • ݸผ܇࿅ (FF) ͸͜ͷن໛ͩͱݫ͍͠ •ͪ͜ΒͰ͸ϥϯμϜબ୒ׂ͕ͱ
 ੑೳΛҡ࣋ͯ͠Δؾ΋͢Δ • ൃදऀߟ࡯: े෼ʹ܇࿅ͨ͠Ϟσϧ͸ຒΊࠐΈதͷ֤࣍ݩͰউखʹ
 ໾ׂ෼୲͢Δؾ΋ (த৺తͳίϯηϓτʹ͋ͨΔ࣍ݩ͸ෆ໌͕ͩ) ImageNet-1K: ViT-B/16 / 1-NN Accuracy 18

Slide 19

Slide 19 text

•ݕࡧλεΫʹ͓͚Δੑೳɾεϐʔυ
 ͷτϨʔυΦϑΛධՁ •৭(Ds)͕ݕࡧ࣌ɾؙͷେখ(Dr)͕
 ϦϥϯΩϯά࣌ͷຒΊࠐΈදݱͷ
 ࣍ݩ਺Λදݱ • ࠷ࠨ্: ݕࡧ8࣍ݩˠॱҐ2048࣍ݩ • ࠷ࠨԼ: ݕࡧ8࣍ݩˠॱҐ8࣍ݩ •ݴ͑Δ͜ͱ: ݕࡧ࣌ʹ͸8࣍ݩͷΈ࢖ͬͯݕࡧ͠ɺͦͷޙݕࡧ্Ґͷগ਺ࣄ ྫͷΈϑϧαΠζͷຒΊࠐΈͰॱҐ͚ͮ͢Ε͹ߴਫ਼౓ɾߴεϧʔϓοτ • ࣮ࡍʹ͜͏͍ͬͨ৔໘͕͋Δ͔͸Ṗ͕ͩ… ImageNet-1K: ViT-B/16 / ݕࡧ 19

Slide 20

Slide 20 text

•࠷ऴग़ྗ૚ͷຒΊࠐΈ࣍ݩ਺Λ
 ࡟ͬͨ৔߹ͷMLMͷaccuracyΛධՁ •MRL͸ຒΊࠐΈ࣍ݩ਺ΛݮΒͯ͠΋
 from scratchͳֶशͱಉ౳ੑೳ •BERTͷຒΊࠐΈΛMatryoshka͍ͨ͠
 ৔໘͸ݶΒΕͦ͏ͳؾ΋͢Δ͕…
 MLM: BERT 20

Slide 21

Slide 21 text

•දݱֶशʹ͓͚ΔຒΊࠐΈ࣍ݩ਺ͷ໰୊ʹରॲ •ಉҰϞσϧͰෳ਺࣍ݩͷຒΊࠐΈΛग़ྗՄೳʹ •༏Εͨ࣍ݩ਺—ੑೳͷτϨʔυΦϑΛ֬ೝ ײ૝ •ΊͬͪΌֵ৽తͳख๏͔ͱ͍͏ͱͦ͏Ͱ΋ͳ͍ • γϯϓϧͳίϯηϓτͷ༗༻ੑΛେྔͷ࣮ݧͰ͔֬Ί͍ͯΔ఺͕Ғ͍ • ಛʹMRLʹΑΔѱӨڹ͸΄΅ແ͠ͱ͍͏఺Λ͔֬Ί͍ͯΔͷ͕ඇৗʹҒ͍ •Slim. NetͱҟͳΓਪ࿦࣌ؒ͸࡟ݮ͞Εͳ͍఺ʹ஫ҙ͕ඞཁ •܇࿅͍ͯ͠ͳ͍࣍ݩ਺Ͱͷಈ࡞͕ະ஌਺ (·ͩ܇࿅࣌ͷؾݣ͍͕ඞཁ) •࣮૷ίετɾѱӨڹ͕খ͍͞ͷͰࠓޙͱΓ·MRLͰֶशͱ͍͏બ୒΋ΞϦ͔ ·ͱΊ 21

Slide 22

Slide 22 text

•https://github.com/huggingface/blog/blob/main/matryoshka.md#how-are-- matryoshka-embedding-models-trained •https://openai.com/index/new-embedding-models-and-api-updates/ •https://techblog.exawizards.com/entry/2023/05/10/055218 ࢀߟจݙ 22