$30 off During Our Annual Pro Sale. View Details »

[輪講資料] Matryoshka Representation Learning

Hayato Tsukagoshi
August 06, 2024
840

[輪講資料] Matryoshka Representation Learning

表現学習において、特定の埋め込み次元数で訓練されたモデルを変更することなく、出力埋め込みをある程度任意の埋め込み次元数に容易に縮小可能にできる手法である Matryoshka Representation Learning (MRL) について解説した資料です。

元論文: https://arxiv.org/abs/2205.13147

Hayato Tsukagoshi

August 06, 2024
Tweet

Transcript

  1. Matryoshka Representation Learning Graduate School of Informatics, Nagoya University, Japan.

    ൃදऀ: Hayato Tsukagoshi Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan
 William Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, Ali Farhadi
 NeurIPS 2022
 https://arxiv.org/abs/2205.13147
  2. දݱֶश (Representation Learning) •छʑͷػցֶशλεΫʹ༗༻ͳಛ௃දݱΛಘΔख๏ɾٕज़ • ྫ: େن໛ͳϥϕϧ෇͖ը૾σʔληοτʹΑΔϞσϧֶश (ResNet) • ྫ:

    ݴޠ—ը૾ϖΞΛ༻͍ͨରরֶशʹΑΔϚϧνϞʔμϧຒΊࠐΈ (CLIP) • ྫ: ݴޠ—ݴޠϖΞΛ༻͍ͨରরֶशʹΑΔςΩετຒΊࠐΈ (E5) •ಘΒΕΔ΋ͷ: ͋ΔࣄྫΛೖΕͨ࣌ʹྑ͍ײ͡ͷϕΫτϧΛग़ྗ͢ΔϞσϧ ໰୊ •ಘΒΕΔຒΊࠐΈදݱͷϕΫτϧ࣍ݩ਺͸Ϟσϧ܇࿅࣌ͷ΋ͷ͔ΒมߋෆՄ • ࣍ݩ਺͕େ͖͍ͱอଘίετɾݕࡧίετɾछʑͷॲཧίετ͕େ͖͍ දݱֶशͱͦͷ໰୊ 4
  3. •ਂ૚ֶशϞσϧ͸૚͕ਂ͘ͳΔ΄Ͳֶश͕೉͘͠ͳΔ • ޯ഑ফࣦɾޯ഑രൃ౳ͷӨڹ •௚ײతʹɺਂ૚ֶशʹ͓͍ͯϞσϧͷ૚͕ਂ͘ͳΔ΄Ͳ
 ֤૚ͷʮ࢓ࣄʯ͸ͪΐͬ͜ͱʹͳΔ • ͜ͷͪΐͬ͜ͱΛϥϯμϜॳظԽ͔Βֶश͢Δͷ͕೉͍͠ •࢒ࠩ઀ଓ (Residual Connection)

    Λ༻͍ͨ
 Ϟσϧߏ଄ΛఏҊ • ग़ྗ͕ෆཁͳΒθϩʹ௵ͤ͹Α͘ඇઢܗͰ΋ֶश͕؆୯ •ߴੑೳɾߴֶश҆ఆੑΛ࣮ݱ He et al., Deep Residual Learning for Image Recognition, CVPR 2016 ؔ࿈ݚڀ: ResNet 12
  4. •ը૾෼໺ʹTransformerΛ
 ಋೖͨ͠ݚڀ •ೖྗը૾Λ͍͔ͭ͘ͷ“ύον”
 ʹ෼ׂ͠ɺύονͷຒΊࠐΈΛ
 ࡞੒ˠTransformerͰॲཧ •BERTͱಉ༷mask͞Εͨύον
 Λ༧ଌ͢ΔΑ͏ʹ܇࿅ • Masked Patch

    Prediction Dosovitskiy et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR 2021 ؔ࿈ݚڀ: ViT (Vision Transformer) 13
  5. •Vision & LanguageͷຒΊࠐΈϞσϧ • ը૾΋ݴޠ΋ಉۭؒ͡ʹຒΊࠐΉ •noisyͳը૾—ݴޠσʔληοτ
 Ͱେن໛ʹରরֶश • ಉ࣌ظʹొ৔ͨ͠CLIPͱ
 ࣅཱͨͪҐஔͷݚڀ͕ͩΑΓ


    σʔληοτͷن໛͕େ͖͍ •ը૾ΛݴޠͰݕࡧ͢Δɺը૾ʹݴޠΛ଍ͯ͠ݕࡧ͢ΔɺͳͲ͕Մೳʹ Jia et al., Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision, ICML 2021 ؔ࿈ݚڀ: ALIGN 14
  6. •࣍ݩ͝ͱʹݸผ܇࿅ͨ͠Ϟσϧ (Fixed Feature: FF) • OracleతͳཱͪҐஔ͕ͩMRL͕͜ΕΛ্ճΔՄೳੑ΋͋Δ •SVDʹΑΔ࣍ݩ࡟ݮ •Slim. Net •

    MRLͱࣅͨख๏ɺதؒදݱ΋ॖখ͢Δ •Rand. LP • ݩͷ࣍ݩ਺ͷେ͖ͳຒΊࠐΈ͔ΒϥϯμϜʹ࣍ݩΛબ୒ͯ͠࢖༻ ൺֱख๏ 15 ࡞ऀ஫: Fixed Feature͸ඇ௚ײతͰͳΜͱͳ͘ඍົͳ໊લͱ͍͏ؾ΋͢Δ͕ɺMRL͕ fl exibleͳͷͰͦΕͱରൺతͳ໊લʹͨ͠ͱਪଌ
  7. •ϥϕϧ෇͖ը૾σʔληοτͰදݱֶश •ImageNet-1K: 128ສࣄྫɾ1000Ϋϥε •MRL͸ݸผ܇࿅(FF)ͱಉ౳ੑೳ • SVDʹΑΔ࣍ݩ࡟ݮ΍ϥϯμϜ
 બ୒ΑΓ΋Α͍ੑೳ •MRL͸2048→64͘Β͍·Ͱੑೳ͕
 શવམ͍ͪͯͳ͍ •

    ࠷ۙ๣୳ࡧͰ͸͜ͷ͘Β͍·Ͱ
 ࡟ͬͯ΋໰୊ͳ͘ಈ࡞͢Δʁ •Slim. Netͱͷൺֱ͸ܭࢉίετ͕
 ߹ͬͯϋζͰζϧ͍ؾ͕͢Δ͕… ImageNet-1K: ResNet 50 / 1-NN Accuracy 17
  8. •ΑΓେن໛ͳσʔληοτͰදݱֶश • JFT-300M: 3ԯࣄྫɾ1.8ສΫϥε • ALIGN: 18ԯͷը૾—ςΩετϖΞ •MRL͸࣍ݩ࡟ݮͯ͠΋ߴੑೳ • ݸผ܇࿅

    (FF) ͸͜ͷن໛ͩͱݫ͍͠ •ͪ͜ΒͰ͸ϥϯμϜબ୒ׂ͕ͱ
 ੑೳΛҡ࣋ͯ͠Δؾ΋͢Δ • ൃදऀߟ࡯: े෼ʹ܇࿅ͨ͠Ϟσϧ͸ຒΊࠐΈதͷ֤࣍ݩͰউखʹ
 ໾ׂ෼୲͢Δؾ΋ (த৺తͳίϯηϓτʹ͋ͨΔ࣍ݩ͸ෆ໌͕ͩ) ImageNet-1K: ViT-B/16 / 1-NN Accuracy 18
  9. •ݕࡧλεΫʹ͓͚Δੑೳɾεϐʔυ
 ͷτϨʔυΦϑΛධՁ •৭(Ds)͕ݕࡧ࣌ɾؙͷେখ(Dr)͕
 ϦϥϯΩϯά࣌ͷຒΊࠐΈදݱͷ
 ࣍ݩ਺Λදݱ • ࠷ࠨ্: ݕࡧ8࣍ݩˠॱҐ2048࣍ݩ • ࠷ࠨԼ:

    ݕࡧ8࣍ݩˠॱҐ8࣍ݩ •ݴ͑Δ͜ͱ: ݕࡧ࣌ʹ͸8࣍ݩͷΈ࢖ͬͯݕࡧ͠ɺͦͷޙݕࡧ্Ґͷগ਺ࣄ ྫͷΈϑϧαΠζͷຒΊࠐΈͰॱҐ͚ͮ͢Ε͹ߴਫ਼౓ɾߴεϧʔϓοτ • ࣮ࡍʹ͜͏͍ͬͨ৔໘͕͋Δ͔͸Ṗ͕ͩ… ImageNet-1K: ViT-B/16 / ݕࡧ 19