Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[輪講資料] Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space

[輪講資料] Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space

事前学習済み言語モデルを統合することによって構築される大規模Variational Auto-Encoder (VAE)モデルのOptimusと、その論文について解説した資料です。
Optimusを支えるVAEの目的関数の導出から丁寧に紹介します。

Hayato Tsukagoshi

October 18, 2022
Tweet

More Decks by Hayato Tsukagoshi

Other Decks in Research

Transcript

  1. Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space

    Graduate school of Informatics, Nagoya University, Japan. ൃදऀ: Hayato Tsukagoshi Chunyuan Li, Xiang Gao, Yuan Li, Baolin Peng, Xiujun Li, Yizhe Zhang, and Jianfeng Gao EMNLP 2020 URL: https://aclanthology.org/2020.emnlp-main.378/
  2. •VAE (ม෼ࣗݾූ߸Խث)ϕʔεͷࣄલֶशࡁΈݴޠϞσϧOptimusΛఏҊ • ஫ҙ: طଘͷࣄલֶशࡁΈݴޠϞσϧ͸͔ͬ͠Γར༻ •EncoderʹBERTɺDecoder͸GPT-2 • ೋͭͷϞσϧΛ͏·͘౷߹ͯ͠
 VAEΛߏ੒ɺ౷߹ख๏΋޻෉ •จੜ੒ʹ͓͚ΔධՁࢦඪɾ৚݅෇͖ੜ੒ɾ௿ࢿݯઃఆͷλεΫͰߴ͍ੑೳ

    • જࡏදݱͷઢܗิ׬ʹΑΔҙຯతʹͳΊΒ͔ͳจੜ੒͕Մೳ • NLPʹ͓͚ΔVAE + ࣄલֶशͷ༗༻ੑΛࣔ͢ ࿦จ֓ཁ 2
  3. •VAE͸ཧ࿦తɾٕज़తʹ໘ന͍͕(ಛʹNLPͰ)͋·Γ஫໨͞Ε͍ͯͳ͍ • BERTͳͲͷࣄલֶशࡁΈϞσϧ͕؆୯ɾڧྗ • ҰԠଟ༷ੑΛॏࢹ͢Δจੜ੒λεΫͰ͸࢖ΘΕ͍ͯΔΑ͏͕ͩ… •ࣗ෼༻ʹVAEʹ͍ͭͯษڧɾഎܠ஌ࣝͷ·ͱΊ௚͕͔ͨͬͨ͠͠ • ਺ࣜΛ͋·Γ͓֮͑ͯΒͣ… •ࣗ෼ͷݚڀͰ࢖͏͔΋͠Εͣڵຯ͕͋ͬͨ •

    VAEϕʔεͷจຒΊࠐΈϞσϧ͸΄ͱΜͲݟͳ͍ͷͰ • BERT- fl owͱ͔͸ࢥ૝͕ۙͦ͏Ͱ͸͋Δ બఆཧ༝ 3
  4. ಋೖ •VAEͱ͸ •VAEͷ໨తؔ਺ͷಋग़ Optimus •Ϟσϧߏ଄ •ଛࣦؔ਺ •BERTͱGPT-2ͷ౷߹ •ධՁ࣮ݧ ໨࣍ 4

  5. ಋೖ

  6. ಋೖ •VAEͱ͸ •VAEͷ໨తؔ਺ͷಋग़ Optimus •Ϟσϧߏ଄ •ଛࣦؔ਺ •BERTͱGPT-2ͷ౷߹ •ධՁ࣮ݧ ໨࣍ 6

  7. •ग़ྗ͚ͩͰͳ͘ೖྗͷ෼෍΋ϞσϧԽ͢Δख๏ * • զʑ͕Α͘࢖͏ͷ͸ࣝผϞσϧ (෼ྨ͚ͩߦ͏) •σʔλ͕ԿΒ͔ͷ֬཰෼෍ʹج͍ͮͯੜ੒͞ΕΔͱߟ͑Δ • ؍ଌσʔλ͔Β؍ଌσʔλ͕ै͏֬཰෼෍Λਪఆ͢Δ •ը૾෼໺ʹ͓͚ΔGAN͕༗໊ •

    NLPͰ͸ҙ֎ͱ͋·Γݟͳ͍ʁ ੜ੒Ϟσϧ 7 * ύλʔϯೝࣝͱػցֶश ্ר p.42Λࢀর
  8. •தؒදݱ͔ΒೖྗΛ࠶ߏ੒Ͱ͖ΔΑ͏ʹ܇࿅͢ΔϞσϧ • ੜ੒ϞσϧͷҰछ • ڭࢣͳֶ͠श͕Մೳ •தؒදݱ͸ೖྗͷѹॖ͞ΕͨදݱͱΈͳͤΔ • ඇઢܗͰෳࡶͳ࣍ݩѹॖ͕Ͱ͖Δ • ΫϥελϦϯά΍ҟৗݕ஌ɾϊΠζআڈͳͲʹ΋࢖ΘΕΔ

    Auto-Encoder (AE): ࣗݾූ߸Խث 8
  9. •Auto-Encoderͷજࡏදݱͷ෼෍ʹ੍໿ΛՃ͑ͨ΋ͷ (ͱݟ၏ͤΔ) • AEͱ͸ҟͳΔಈػͱཧ࿦എܠΛ͕࣋ͭɺࣅͨ΋ͷͱղऍͰ͖Δ • જࡏදݱʹର͢Δ੍໿ʹΑͬͯσʔλͷੜ੒͕༰қʹ • Kingma et al.,

    2013. Auto-Encoding Variational Bayes ͰఏҊ •જࡏදݱͷ෼෍ʹ͸೚ҙͷࣄલ෼෍ (prior) Λબ΂Δ • ଟ͘ͷ৔߹͸ඪ४ਖ਼ن෼෍ (standard normal distribution) •ଛࣦؔ਺ͱͯ͠ೋͭͷଛࣦΛ଍͠߹Θͤͯ༻͍Δ • ࠶ߏ੒ޡࠩ • જࡏදݱͷ෼෍ʹ͍ͭͯͷଛࣦ Variational Auto-Encoder (VAE): ม෼ࣗݾූ߸Խث 9
  10. VAEͷϞσϧߏ଄ 10 જࡏදݱ
 z x Wμ Wσ Encoder μ σ

    x’ Decoder
  11. VAEͷϞσϧߏ଄ 11 જࡏදݱ
 z x Wμ Wσ Encoder μ σ

    x’ Decoder ೖྗΛϕΫτϧදݱʹม׵
  12. VAEͷϞσϧߏ଄ 12 ෼ࢄڞ෼ࢄߦྻ͸ΊΜͲ͏ͳͷͰجຊతʹର֯ߦྻͱΈͳͯ͠͠·͏ જࡏදݱ
 z x Wμ Wσ Encoder μ

    σ x’ Decoder ϕΫτϧදݱ͔ΒΨ΢ε෼෍ͷ
 ฏۉͱ෼ࢄڞ෼ࢄߦྻΛग़ྗ
  13. VAEͷϞσϧߏ଄ 13 ෼ࢄڞ෼ࢄߦྻ͸ΊΜͲ͏ͳͷͰجຊతʹର֯ߦྻͱΈͳͯ͠͠·͏ જࡏදݱ
 z x Wμ Wσ Encoder μ

    σ x’ Decoder ฏۉͱ෼ࢄڞ෼ࢄߦྻΛ༻͍ͯΨ΢ε෼ ෍͔ΒαϯϓϦϯάɺજࡏදݱΛ֫ಘ
  14. VAEͷϞσϧߏ଄ 14 ෼ࢄڞ෼ࢄߦྻ͸ΊΜͲ͏ͳͷͰجຊతʹର֯ߦྻͱΈͳͯ͠͠·͏ જࡏදݱ
 z x Wμ Wσ Encoder μ

    σ x’ Decoder જࡏදݱ͔Βग़ྗΛ࠶ߏ੒
  15. AEͱVAEͷϞσϧߏ଄ͷൺֱ 15 જࡏදݱ
 z x Wμ Wσ Encoder μ σ

    x’ Decoder જࡏදݱ
 z x Encoder x’ Decoder AE VAE
  16. AEͱVAEͷϞσϧߏ଄ͷൺֱ 16 જࡏදݱ
 z x Wμ Wσ Encoder μ σ

    x’ Decoder જࡏදݱ
 z x Encoder x’ Decoder AE VAE જࡏදݱΛαϯϓϦϯά͢Δ ͨΊͷॲཧͱ
 જࡏදݱͷ෼෍ʹؔ͢Δ
 ଛࣦ͕૿͑Δ͚ͩ
  17. •AEͰ͸જࡏදݱ͕ͲͷΑ͏ʹ෼෍͍ͯ͠Δ͔ෆ໌ • VAEͰ͸ط஌ͷ֬཰෼෍ʹ͚ۙͮΔΑ͏ʹֶशΛߦ͏ • ط஌෼෍͔ΒͷαϯϓϦϯάͰࣗવͳσʔλͷੜ੒͕ߦ͑Δ •ਖ਼ଇԽೳྗ͕͋ΓAEΑΓؤ݈ • Denoising Auto-EncoderͳͲͱಉ༷ •

    PCA΍SVDͱҟͳΓɺඇઢܗม׵Ͱೖྗσʔλͷѹॖ͕ߦ͑Δ VAEͷར఺ 17
  18. GAN •ࣝผث(Discriminator)͕ੜ੒ث(Generator)ͷग़ྗΛ෼ྨͰ͖ͳ͍Α͏ʹֶश VAE •જࡏදݱͷ෼෍͕ࣄલ෼෍ʹۙͮ͘Α͏ʹ + ೖྗΛ࠶ߏ੒͢ΔΑ͏ʹֶश Normalizing fl ow •ٯม׵Մೳͳࣸ૾Λֶशɺෳࡶͳજࡏදݱͷ෼෍Λߏ੒

    •VAEͱ૊Έ߹ΘͤՄೳ Di ff usion Models •ॱํ޲ͰϊΠζՃࢉɺٯํ޲ͰϊΠζΛআڈ͢ΔΑ͏ʹϞσϧΛֶश VAEͱͦͷଞͷੜ੒Ϟσϧͷൺֱ 18 ม෼ਪ࿦ͱ Normalizing Flow
  19. •VAEͷଛࣦؔ਺͸ҎԼͷೋͭͷ଍͠߹Θͤ • ࠶ߏ੒ޡࠩ • ਖ਼ଇԽ߲ (જࡏදݱͷ෼෍ʹ͍ͭͯͷଛࣦ) • ͸Encoderͷύϥϝʔλɺ ͸Decoderͷύϥϝʔλ ϕ

    θ VAEͷ໨తؔ਺ 19 ℒ = − DKL ( qϕ (z|X) ∥ pθ (z) ) Eqϕ (z|X) [ log pθ (X|z) ] ਖ਼ଇԽ߲ ࠶ߏ੒ޡࠩ
  20. •ͦ΋ͦ΋ͷVAE (΋͘͠͸ม෼ϕΠζ)ͷ͓ؾ࣋ͪ • σʔλ ʹӅ͞Εͨੑ࣭ Λදݱ͢Δࣄޙ֬཰෼෍ Λ஌Γ͍ͨ •࣮ࡍʹ͸ ΍ ͸Θ͔Βͳ͍͜ͱ͕΄ͱΜͲ

    • Λۙࣅͨ͠ Ͱଥڠ • ͸ͲͷΑ͏ʹٻΊΔ͔ʁ • ͜ͷ֬཰෼෍΋ͲͷΑ͏ʹͳΔ͔Θ͔Βͳ͍ • Λͱ͔͔ͬΓʹࣜΛ͜Ͷ͘Γ·Θͯ͠ΈΔ X Z pθ (Z|X) pθ (X) pθ (Z|X) pθ (Z|X) qϕ (Z|X) qϕ (Z|X) pθ (X) VAEͷ໨తؔ਺ͷٻΊํ 20
  21. •ͦ΋ͦ΋ͷVAE (΋͘͠͸ม෼ϕΠζ)ͷ͓ؾ࣋ͪ • σʔλ ʹӅ͞Εͨੑ࣭ Λදݱ͢Δࣄޙ֬཰෼෍ Λ஌Γ͍ͨ •࣮ࡍʹ͸ ΍ ͸Θ͔Βͳ͍͜ͱ͕΄ͱΜͲ

    • Λۙࣅͨ͠ Ͱଥڠ • ͸ͲͷΑ͏ʹٻΊΔ͔ʁ • ͜ͷ֬཰෼෍΋ͲͷΑ͏ʹͳΔ͔Θ͔Βͳ͍ • Λͱ͔͔ͬΓʹࣜΛ͜Ͷ͘Γ·Θͯ͠ΈΔ X Z pθ (Z|X) pθ (X) pθ (Z|X) pθ (Z|X) qϕ (Z|X) qϕ (Z|X) pθ (X) VAEͷ໨తؔ਺ͷٻΊํ 21
  22. VAEͷ໨తؔ਺ͷٻΊํ 22 log pθ (X) = log ∫ pθ (X,

    z) dz = log ∫ pθ (X, z) qϕ (z|X) qϕ (z|X) dz = log ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) ҎԼͷΑ͏ʹࣜมܗΛͯ͠ΈΔ zͰपลԽͨ͠΋ͷ
 ͱΈͳ͢
  23. VAEͷ໨తؔ਺ͷٻΊํ 23 log pθ (X) = log ∫ pθ (X,

    z) dz = log ∫ pθ (X, z) qϕ (z|X) qϕ (z|X) dz = log ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) ҎԼͷΑ͏ʹࣜมܗΛͯ͠ΈΔ 1Λ͔͚ͯ΋͍ͬ͠ΐ
  24. VAEͷ໨తؔ਺ͷٻΊํ 24 ΠΣϯηϯͷෆ౳ࣜΑΓɺ
 ͸Ԝؔ਺ (্ʹತ) Ͱ͋Δ͜ͱʹ஫ҙ͢Δͱ f(x) = log(x) ∫

    pθ (X, z) dz qϕ (z|X) qϕ (z|X) log log pθ (X) ≥ ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) log pθ (X, z) dz qϕ (z|X) qϕ (z|X) log ∫ ͢ͳΘͪ ≥
  25. VAEͷ໨తؔ਺ͷٻΊํ 25 ΠΣϯηϯͷෆ౳ࣜΑΓɺ
 ͸Ԝؔ਺ (্ʹತ) Ͱ͋Δ͜ͱʹ஫ҙ͢Δͱ f(x) = log(x) ∫

    pθ (X, z) dz qϕ (z|X) qϕ (z|X) log log pθ (X) ≥ ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) log pθ (X, z) dz qϕ (z|X) qϕ (z|X) log ∫ ͢ͳΘͪ ≥
  26. VAEͷ໨తؔ਺ͷٻΊํ 26 ͜͜ͰӈลΛ ͱ͓͘ͱ log pθ (X) ≥ ℒ(θ, ϕ;

    X) ℒ(θ, ϕ; X) = ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) log ͱॻ͚Δɻ͜ͷ Λ
 ELBO (Evidence Lower BOund): ม෼Լք ͱݺͿ ℒ(θ, ϕ; X)
  27. VAEͷ໨తؔ਺ͷٻΊํ 27 ELBOΛม෼Լݶͱॻ͘͜ͱ΋͋Δ͕ɺlower limit (Լݶ)Ͱ͸ͳ͘lower boundͳͷͰԼք͕ਖ਼͍͠Μ͡Όͳ͍͔ͱࢥ͍ͬͯΔ ͜͜ͰӈลΛ ͱ͓͘ͱ log pθ

    (X) ≥ ℒ(θ, ϕ; X) ℒ(θ, ϕ; X) = ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) log ͱॻ͚Δɻ͜ͷ Λ
 ELBO (Evidence Lower BOund): ม෼Լք ͱݺͿ ℒ(θ, ϕ; X)
  28. VAEͷ໨తؔ਺ͷٻΊํ 28 ͱ͜ΖͰઌ΄Ͳͷෆ౳ࣜͷ྆ลͷࠩ ʹ͍ͭͯߟ͑ͯΈΔͱ log pθ (X) − ℒ(θ, ϕ;

    X) = ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) log log pθ (X) − = ∫ pθ (z|X) pθ (X) dz qϕ (z|X) qϕ (z|X) log log pθ (X) ∫ − qϕ (z|X) dz
  29. VAEͷ໨తؔ਺ͷٻΊํ 29 log pθ (X) − ℒ(θ, ϕ; X) =

    ∫ pθ (z|X) pθ (X) dz qϕ (z|X) qϕ (z|X) log log pθ (X) ∫ − = ∫ pθ (z|X) pθ (X) dz qϕ (z|X) qϕ (z|X) log ∫ log pθ (X) dz − = ∫ log pθ (z|X) pθ (X) dz qϕ (z|X) dz qϕ (z|X) qϕ (z|X) pθ (X) qϕ (z|X)
  30. ∫ log pθ (z|X) pθ (X) dz pθ (X) VAEͷ໨తؔ਺ͷٻΊํ

    30 log pθ (X) − ℒ(θ, ϕ; X) = = = DKL ( qϕ (z|X) ∥ pθ (z|X) ) qϕ (z|X) qϕ (z|X) ∫ log pθ (z|X) dz qϕ (z|X) qϕ (z|X)
  31. VAEͷ໨తؔ਺ͷٻΊํ 31 Ҏ্ΑΓ log pθ (X) = ℒ(θ, ϕ; X)

    + DKL ( qϕ (z|X) ∥ pθ (z|X) ) ΋ͱ΋ͱͷ໨త͸ Λۙࣅ͢Δ ΛٻΊΔ͜ͱ pθ (z|X) qϕ (z|X) → Λ࠷খԽ͢Ε͹Α͍ DKL ( qϕ (z|X) ∥ pθ (z|X) ) ͸ ͷ΋ͱͰҰఆͳͷͰ log pθ (X) θ ͷ࠷খԽ 㱻 ͷ࠷େԽ DKL ( qϕ (z|X) ∥ pθ (z|X) ) ℒ(θ, ϕ; X)
  32. VAEͷ໨తؔ਺ͷٻΊํ 32 Ҏ্ΑΓ log pθ (X) = ℒ(θ, ϕ; X)

    + DKL ( qϕ (z|X) ∥ pθ (z|X) ) ΋ͱ΋ͱͷ໨త͸ Λۙࣅ͢Δ ΛٻΊΔ͜ͱ pθ (z|X) qϕ (z|X) → Λ࠷খԽ͢Ε͹Α͍ DKL ( qϕ (z|X) ∥ pθ (z|X) ) ͸ ͷ΋ͱͰҰఆͳͷͰ log pθ (X) θ ͷ࠷খԽ 㱻 ͷ࠷େԽ DKL ( qϕ (z|X) ∥ pθ (z|X) ) ℒ(θ, ϕ; X)
  33. VAEͷ໨తؔ਺ͷٻΊํ 33 Ҏ্ΑΓ log pθ (X) = ℒ(θ, ϕ; X)

    + DKL ( qϕ (z|X) ∥ pθ (z|X) ) ΋ͱ΋ͱͷ໨త͸ Λۙࣅ͢Δ ΛٻΊΔ͜ͱ pθ (z|X) qϕ (z|X) → Λ࠷খԽ͢Ε͹Α͍ DKL ( qϕ (z|X) ∥ pθ (z|X) ) ͸ ͷ΋ͱͰҰఆͳͷͰ log pθ (X) θ ͷ࠷খԽ 㱻 ͷ࠷େԽ DKL ( qϕ (z|X) ∥ pθ (z|X) ) ℒ(θ, ϕ; X) ্ࣜӈล ୈ1߲ͱୈ2߲ͷ࿨͕ෆม → ୈ2߲͕খ͘͞ͳΔͳΒ
 ୈ1߲͸େ͖͘ͳΒͳ͍ͱ͍͚ͳ͍
  34. dz VAEͷ໨తؔ਺ͷٻΊํ 34 ℒ(θ, ϕ; X) = ∫ pθ (X,

    z) dz qϕ (z|X) qϕ (z|X) log qϕ (z|X) qϕ (z|X) log ∫ = pθ (X|z) pθ (z) qϕ (z|X) log ∫ = pθ (X|z) dz qϕ (z|X) qϕ (z|X) log ∫ pθ (z) dz + ͱ͜ΖͰɺม෼ԼքΛ͞Βʹ෼ղͯ͠ΈΔͱ
  35. VAEͷ໨తؔ਺ͷٻΊํ 35 ℒ(θ, ϕ; X) qϕ (z|X) log ∫ =

    pθ (X|z) dz qϕ (z|X) qϕ (z|X) log ∫ pθ (z) dz − qϕ (z|X) log ∫ = pθ (X|z) dz − DKL ( qϕ (z|X) ∥ pθ (z) ) ໬౓ ਖ਼ଇԽ߲ qϕ (z|X) log ∫ = pθ (X|z) dz qϕ (z|X) qϕ (z|X) log ∫ pθ (z) dz +
  36. VAEͷ໨తؔ਺ͷٻΊํ 36 ͷ࠷େԽ 㱻 ͷ࠷খԽͳͷͰɺ
 ଛࣦؔ਺͕ҎԼͷΑ͏ʹఆΊΒΕΔ ℒ(θ, ϕ; X) −ℒ(θ,

    ϕ; X) −ℒ(θ, ϕ; X) = qϕ (z|X) log ∫ pθ (X|z) dz − DKL ( qϕ (z|X) ∥ pθ (z) ) = − DKL ( qϕ (z|X) ∥ pθ (z) ) Eqϕ (z|X) [ log pθ (X|z) ] ਖ਼ଇԽ߲ ࠶ߏ੒ޡࠩ
  37. VAEͷ໨తؔ਺ͷٻΊํ 37 ͷ࠷େԽ 㱻 ͷ࠷খԽͳͷͰɺ
 ଛࣦؔ਺͕ҎԼͷΑ͏ʹఆΊΒΕΔ ℒ(θ, ϕ; X) −ℒ(θ,

    ϕ; X) −ℒ(θ, ϕ; X) = qϕ (z|X) log ∫ pθ (X|z) dz − DKL ( qϕ (z|X) ∥ pθ (z) ) = − DKL ( qϕ (z|X) ∥ pθ (z) ) Eqϕ (z|X) [ log pθ (X|z) ] ਖ਼ଇԽ߲ ࠶ߏ੒ޡࠩ ʹΨ΢ε෼෍Λ
 Ծఆ͢Ε͹ɺղੳతʹ
 ଛࣦؔ਺ΛٻΊΒΕΔ pθ (z)
  38. VAEͷϞσϧߏ଄ (࠶ܝ) 38 જࡏදݱ
 z x Wμ Wσ Encoder μ

    σ x’ Decoder
  39. VAEͷϞσϧߏ଄ (࠶ܝ) 39 જࡏදݱ
 z x Wμ Wσ Encoder μ

    σ x’ Decoder ຊ౰͸͜͜ʹ reperameterization trick
 ͱ͍͏ςΫ͕ڬ·Δ
  40. VAEͷٖࣅίʔυ: Encoder 40

  41. VAEͷٖࣅίʔυ: Encoder 41 ࣮૷ͱͯ͠͸
 ઢܗ૚ʹೋވʹ௨͚ͩ͢

  42. VAEͷٖࣅίʔυ: શମ 42

  43. VAEͷٖࣅίʔυ: શମ 43 αϯϓϦϯάͯ֫͠ಘͨ͠
 જࡏදݱ͔ΒೖྗΛ࠶ߏ੒

  44. •જࡏදݱͷ෼෍ʹط஌ͷ֬཰෼෍ΛԾఆֶͯ͠शΛߦ͏ੜ੒Ϟσϧ • ࣍ݩѹॖɾҙຯͷ͋Δදݱͷநग़ / αϯϓϦϯάʹΑΔੜ੒͕Մೳ •ग़ࣗ͸ҟͳΔ͕ɺAuto-EncoderͱࣅͨΞʔΩςΫνϟΛඋ͑Δ • Auto-Encoderʹજࡏදݱʹؔ͢Δਖ਼ଇԽ߲Λ௥Ճͨ͠΋ͷͱΈͳͤΔ • ਖ਼ଇԽ߲ʹΑΓVAE͸AEΑΓ΋ؤ݈

    (ͱݴΘΕΔ) VAEͷ·ͱΊ 44
  45. Optimus

  46. ಋೖ •VAEͱ͸ •VAEͷ໨తؔ਺ͷಋग़ Optimus •Ϟσϧߏ଄ •ଛࣦؔ਺ •BERTͱGPT-2ͷ౷߹ •ධՁ࣮ݧ ໨࣍ 46

  47. •VAE (ม෼ࣗݾූ߸Խث)ϕʔεͷࣄલֶशࡁΈݴޠϞσϧOptimusΛఏҊ • ஫ҙ: طଘͷࣄલֶशࡁΈݴޠϞσϧ͸͔ͬ͠Γར༻ •EncoderʹBERTɺDecoder͸GPT-2 • ೋͭͷϞσϧΛ͏·͘౷߹ͯ͠
 VAEΛߏ੒ɺ౷߹ख๏΋޻෉ •จੜ੒ʹ͓͚ΔධՁࢦඪɾ৚݅෇͖ੜ੒ɾ௿ࢿݯઃఆͷλεΫͰߴ͍ੑೳ

    • જࡏදݱͷઢܗิ׬ʹΑΔҙຯతʹͳΊΒ͔ͳจੜ੒͕Մೳ • NLPʹ͓͚ΔVAE + ࣄલֶशͷ༗༻ੑΛࣔ͢ ࿦จ֓ཁ (࠶ܝ) 47
  48. •EncoderʹBERTΛར༻ɺ[CLS]Λจදݱͱͯ͠༻͍Δ •DecoderʹGPT-2Λར༻ɺજࡏදݱʹैͬͯจੜ੒Λߦ͏ •શମͱͯ͠VAEతʹೖྗจΛ࠶ߏ੒Ͱ͖ΔΑ͏ʹֶश Ϟσϧߏ଄: ؆୯൛ 48

  49. Ϟσϧߏ଄: ΋͏ͪΐͬͱࡉ͔͍൛ 49 [CLS] w1 w2 … BERT μ σ

    WE
  50. Ϟσϧߏ଄: ΋͏ͪΐͬͱࡉ͔͍൛ 50 z [CLS] w1 w2 … BERT reparameterization

    trick μ σ WE sampling
  51. Ϟσϧߏ଄: ΋͏ͪΐͬͱࡉ͔͍൛ 51 z [CLS] w1 w2 … BERT GPT-2

    reparameterization trick μ σ WE / WM WD sampling
  52. Ϟσϧߏ଄: ΋͏ͪΐͬͱࡉ͔͍൛ 52 z [CLS] w1 w2 … [CLS] w1

    w2 … w1 w2 w3 … BERT GPT-2 reparameterization trick μ σ WE / WM WD sampling
  53. •௨ৗͷVAEͷଛࣦؔ਺ʹϋΠύʔύϥϝʔλ Λ௥Ճͯ͠ར༻ • ʹΑͬͯਖ਼ଇԽͷڧ͞Λௐ੔ • ͷͱ͖ʹAuto-Encoderͱ΄΅ಉ͡ʹ (αϯϓϦϯά͸ߦ͏) • ʹΑͬͯજࡏදݱ͕ “ա౓ʹ”

    ࣄલ෼෍ʹۙͮ͘ͷΛ๷͙ β, λ β β = 0 λ ଛࣦؔ਺ 53
  54. •௨ৗͷVAEͷଛࣦؔ਺ʹϋΠύʔύϥϝʔλ Λ௥Ճͯ͠ར༻ • ʹΑͬͯਖ਼ଇԽͷڧ͞Λௐ੔ • ͷͱ͖ʹAuto-Encoderͱ΄΅ಉ͡ʹ (αϯϓϦϯά͸ߦ͏) • ʹΑͬͯજࡏදݱ͕ “ա౓ʹ”

    ࣄલ෼෍ʹۙͮ͘ͷΛ๷͙ β, λ β β = 0 λ ଛࣦؔ਺ 54
  55. •௨ৗͷVAEͷଛࣦؔ਺ʹϋΠύʔύϥϝʔλ Λ௥Ճͯ͠ར༻ • ʹΑͬͯਖ਼ଇԽͷڧ͞Λௐ੔ • ͷͱ͖ʹAuto-Encoderͱ΄΅ಉ͡ʹ (αϯϓϦϯά͸ߦ͏) • ʹΑͬͯજࡏදݱ͕ “ա౓ʹ”

    ࣄલ෼෍ʹۙͮ͘ͷΛ๷͙ β, λ β β = 0 λ ଛࣦؔ਺ 55
  56. •௨ৗͷVAEͷଛࣦؔ਺ʹϋΠύʔύϥϝʔλ Λ௥Ճͯ͠ར༻ • ʹΑͬͯਖ਼ଇԽͷڧ͞Λௐ੔ • ͷͱ͖ʹAuto-Encoderͱ΄΅ಉ͡ʹ (αϯϓϦϯά͸ߦ͏) • ʹΑͬͯજࡏදݱ͕ “ա౓ʹ”

    ࣄલ෼෍ʹۙͮ͘ͷΛ๷͙ β, λ β β = 0 λ ଛࣦؔ਺ 56 ϋΠύϥ͕ଟ͍😇
  57. •BERTͱGPT-2Λ౷߹ͯ͠VAEΛߏங͢Δʹ͸େ·͔ʹೋͭͷ໰୊͕ଘࡏ 1. ෼͔ͪॻ͖ •BERTͱGPT-2͸ҟͳΔޠኮΛ࣋ͪɺ෼͔ͪॻ͖ख๏͕ҟͳΔ •ೖྗͱग़ྗͰҟͳΔtokenizerΛ࢖͏͜ͱͰղܾ 2. જࡏදݱΛ༻͍ͨ৚݅෇͖ੜ੒ •GPT-2͸৚݅෇͖ςΩετੜ੒ͷͨΊͷػߏΛඋ͍͑ͯͳ͍ •ͲͷΑ͏ʹBERTΛ༻͍ͯಘΒΕͨજࡏදݱ͔ΒςΩετΛੜ੒͢Δ͔ʁ •

    જࡏදݱͱGPT-2ͷੜ੒ػߏΛ౷߹͢Δ2ͭͷख๏Λ࣮ݧ BERTͱGPT-2ͷ౷߹ 57 prompting͸·ͨผͷ࿩
  58. •BERTͱGPT-2Λ౷߹ͯ͠VAEΛߏங͢Δʹ͸େ·͔ʹೋͭͷ໰୊͕ଘࡏ 1. ෼͔ͪॻ͖ •BERTͱGPT-2͸ҟͳΔޠኮΛ࣋ͪɺ෼͔ͪॻ͖ख๏͕ҟͳΔ •ೖྗͱग़ྗͰҟͳΔtokenizerΛ࢖͏͜ͱͰղܾ 2. જࡏදݱΛ༻͍ͨ৚݅෇͖ੜ੒ •GPT-2͸৚݅෇͖ςΩετੜ੒ͷͨΊͷػߏΛඋ͍͑ͯͳ͍ •ͲͷΑ͏ʹBERTΛ༻͍ͯಘΒΕͨજࡏදݱ͔ΒςΩετΛੜ੒͢Δ͔ʁ •

    જࡏදݱͱGPT-2ͷੜ੒ػߏΛ౷߹͢Δ2ͭͷख๏Λ࣮ݧ BERTͱGPT-2ͷ౷߹ 58 prompting͸·ͨผͷ࿩
  59. •BERTͱGPT-2Λ౷߹ͯ͠VAEΛߏங͢Δʹ͸େ·͔ʹೋͭͷ໰୊͕ଘࡏ 1. ෼͔ͪॻ͖ •BERTͱGPT-2͸ҟͳΔޠኮΛ࣋ͪɺ෼͔ͪॻ͖ख๏͕ҟͳΔ •ೖྗͱग़ྗͰҟͳΔtokenizerΛ࢖͏͜ͱͰղܾ 2. જࡏදݱΛ༻͍ͨ৚݅෇͖ੜ੒ •GPT-2͸৚݅෇͖ςΩετੜ੒ͷͨΊͷػߏΛඋ͍͑ͯͳ͍ •ͲͷΑ͏ʹBERTΛ༻͍ͯಘΒΕͨજࡏදݱ͔ΒςΩετΛੜ੒͢Δ͔ʁ •

    જࡏදݱͱGPT-2ͷੜ੒ػߏΛ౷߹͢Δ2ͭͷख๏Λ࣮ݧ BERTͱGPT-2ͷ౷߹ 59 prompting͸·ͨผͷ࿩
  60. Memory •જࡏදݱΛ૚ͷ਺ͷϕΫτϧʹม׵ •จੜ੒࣌ʹ֤૚ͰϕΫτϧΛݟͳ͕Βੜ੒ Embedding •જࡏදݱΛม׵ͯ͠୯ޠຒΊࠐΈʹՃࢉ •BERTͷposition embeddingͷΑ͏ʹ
 જࡏදݱΛ༻͍Δ BERTͱGPT-2ͷ౷߹: જࡏදݱΛ༻͍ͨ৚݅෇͖ੜ੒

    60 prompting͸·ͨผͷ࿩
  61. Memory •જࡏදݱΛ૚ͷ਺ͷϕΫτϧʹม׵ •จੜ੒࣌ʹ֤૚ͰϕΫτϧΛݟͳ͕Βੜ੒ Embedding •જࡏදݱΛม׵ͯ͠୯ޠຒΊࠐΈʹՃࢉ •BERTͷposition embeddingͷΑ͏ʹ
 જࡏදݱΛ༻͍Δ BERTͱGPT-2ͷ౷߹: જࡏදݱΛ༻͍ͨ৚݅෇͖ੜ੒

    61 prompting͸·ͨผͷ࿩
  62. Language Modeling •Optimus͕จΛਖ਼͘͠ੜ੒Ͱ͖Δ͔ධՁ •จੜ੒ʹ͓͚ΔPerplexity (PPL), MI Guided Language Generation •ಛఆͷ৚݅ʹैͬͨจΛਖ਼͘͠ੜ੒Ͱ͖Δ͔ධՁ

    •ର࿩Ԡ౴ੜ੒ɺಛఆελΠϧͰͷԠ౴ੜ੒ɺϥϕϧͰ৚݅෇͚ͨ͠จੜ੒ Low-resource Language Understanding •௿ࢿݯઃఆͰͷOptimusͷ༗༻ੑΛݕূ •จຒΊࠐΈϕʔεͰGLUEΛղ͍ͯੑೳݕূ ධՁ࣮ݧ 62
  63. •જࡏදݱ࣍ݩ: 32 • ެ։͞Ε͍ͯΔ࣮૷͔Β൑அ •VAEͱͯ͠ͷ܇࿅σʔλ: ӳޠWikipedia 199ສจ •จੜ੒ܥͷλεΫͰ͸͞ΒʹͦΕͧΕͷσʔληοτͰ1 epochֶ͚ͩश •ֶशͷ޻෉͕͍Ζ͍Ζ

    • Λֶशதʹ૿Ճͤ͞ΔͳͲ •Low-resource Language UnderstandingͰ͸Encoder (BERT)ͷ[CLS]ʹରԠ ͢ΔදݱΛར༻ • ͳͷͰɺϕΫτϧͷ࣍ݩ਺͸32Ͱ͸ͳ͘768 β ࣮ݧઃఆ 63 જࡏදݱͷ࣍ݩ਺͕࿦จʹ໌ه͞Ε͍ͯͳ͍ؾ͕͢Δ…
  64. •طଘͷখ͞ͳVAEΑΓඇৗʹߴ͍ੑೳ • ڊେͳϞσϧɾڊେίʔύεͰͷࣄલֶश͸VAEͰ΋΍͸Γ༗ޮ • ʹΑΔจੜ੒ͷੑೳͱજࡏදݱͷ඼࣭ͷτϨʔυΦϑ͕ଘࡏ λ ධՁ࣮ݧ: Language Modeling 64

  65. •طଘͷখ͞ͳVAEΑΓඇৗʹߴ͍ੑೳ • ڊେͳϞσϧɾڊେίʔύεͰͷࣄલֶश͸VAEͰ΋΍͸Γ༗ޮ • ʹΑΔจੜ੒ͷੑೳͱજࡏදݱͷ඼࣭ͷτϨʔυΦϑ͕ଘࡏ λ ධՁ࣮ݧ: Language Modeling 65

  66. •طଘͷখ͞ͳVAEΑΓඇৗʹߴ͍ੑೳ • ڊେͳϞσϧɾڊେίʔύεͰͷࣄલֶश͸VAEͰ΋΍͸Γ༗ޮ • ʹΑΔจੜ੒ͷੑೳͱજࡏදݱͷ඼࣭ͷτϨʔυΦϑ͕ଘࡏ λ ධՁ࣮ݧ: Language Modeling 66

  67. •3/4ͷσʔληοτͰGPT-2ͷPPLΑΓ΋௿͍PPLΛୡ੒ • ಛʹSNLIͳͲಛ༗ͷయܕతͳจ͕ଟ͍σʔληοτͰߴ͍ੑೳ ධՁ࣮ݧ: Language Modeling 67

  68. •OptimusͷજࡏදݱΛ༻͍Δ͜ͱͰจදݱͷԋࢉ͕Մೳ • Λ΋ͱʹจੜ੒ •͜ͷ݁ՌΛͲ͏ड͚औΕ͹͍͍ͷ͔…? zD = zB − zA +

    zC ධՁ࣮ݧ: Guided Language Generation 68 ࿦จͰ঺հ͞Ε͍ͯΔ σϞαΠτ ͸ΞΫηεͰ͖ͳ͘ͳ͍ͬͯΔ😇
  69. •ೋͭͷจͷજࡏදݱͷ
 ઢܗิ׬ʹΑΔੜ੒ •VAEͷજࡏۭ͕ؒͳΊΒ͔
 ͳ͜ͱʹΑΔԸܙ •શؔ͘܎ͷͳ͍จ͸
 ग़͖͍ͯͯͳ͍ɺ͘Β͍ͷ
 ؾ͔࣋ͪ • ิ׬͞Εͨจͷޠኮ͸
 ݩͷจͱࣅ͍ͯΔ

    ධՁ࣮ݧ: Guided Language Generation 69
  70. •ೋͭͷจͷજࡏදݱͷ
 ઢܗิ׬ʹΑΔੜ੒ •VAEͷજࡏۭ͕ؒͳΊΒ͔
 ͳ͜ͱʹΑΔԸܙ •શؔ͘܎ͷͳ͍จ͸
 ग़͖͍ͯͯͳ͍ɺ͘Β͍ͷ
 ؾ͔࣋ͪ • ิ׬͞Εͨจͷޠኮ͸
 ݩͷจͱࣅ͍ͯΔ

    ධՁ࣮ݧ: Guided Language Generation 70
  71. •3ͭͷλεΫͰ࣮ݧɾߴ͍ੑೳ • ର࿩Ԡ౴ੜ੒ • ಛఆελΠϧͷจੜ੒ • ৚݅෇͖ੜ੒ •৚݅෇͖ੜ੒Ͱ͸ײ৘෼ྨͷ
 ϥϕϧʹجͮ͘ςΩετΛੜ੒ •

    ੜ੒จͷϥϕϧ෼ྨ֬཰΍
 ੜ੒จͷଟ༷ੑͰߴ͍ੑೳ
 ධՁ࣮ݧ: Guided Language Generation 71 ৄ͍࣮͠ݧઃఆɾλεΫઆ໌ʹ͍ͭͯ͸ݩ࿦จΛࢀরͷ͜ͱ
  72. •3ͭͷλεΫͰ࣮ݧɾߴ͍ੑೳ • ର࿩Ԡ౴ੜ੒ • ಛఆελΠϧͷจੜ੒ • ৚݅෇͖ੜ੒ •৚݅෇͖ੜ੒Ͱ͸ײ৘෼ྨͷ
 ϥϕϧʹجͮ͘ςΩετΛੜ੒ •

    ੜ੒จͷϥϕϧ෼ྨ֬཰΍
 ੜ੒จͷଟ༷ੑͰߴ͍ੑೳ
 ධՁ࣮ݧ: Guided Language Generation 72 ৄ͍࣮͠ݧઃఆɾλεΫઆ໌ʹ͍ͭͯ͸ݩ࿦จΛࢀরͷ͜ͱ
  73. •OptimusͷEncoderදݱΛ༻͍ͯ
 ઢܗ෼ྨثΛ܇࿅ • Yelpσʔληοτͷײ৘෼ྨλεΫ •܇࿅ࣄྫ਺ʹΑΔੑೳͷมԽΛ؍࡯
 •Optimus͸܇࿅ࣄྫ਺͕খͯ͘͞΋
 ൺֱతߴ͍෼ྨੑೳ • ੑೳ্͕͕Δͷ͕एׯૣ͍ •

    ಛʹ fi ne-tuningͳ͠ͷ৔߹ʹ΋ͱͷ
 BERTΑΓ΋ੑೳ͕ߴ͍ • VAEͷֶशΛ௨ͯ͠ྑ͍જࡏۭؒ
 Λ֫ಘ͍ͯ͠Δ͜ͱΛࣔࠦ ධՁ࣮ݧ: Low-resource Language Understanding 73
  74. •OptimusͷEncoderදݱΛ༻͍ͯ
 ઢܗ෼ྨثΛ܇࿅ • Yelpσʔληοτͷײ৘෼ྨλεΫ •܇࿅ࣄྫ਺ʹΑΔੑೳͷมԽΛ؍࡯
 •Optimus͸܇࿅ࣄྫ਺͕খͯ͘͞΋
 ൺֱతߴ͍෼ྨੑೳ • ੑೳ্͕͕Δͷ͕एׯૣ͍ •

    ಛʹ fi ne-tuningͳ͠ͷ৔߹ʹ΋ͱͷ
 BERTΑΓ΋ੑೳ͕ߴ͍ • VAEͷֶशΛ௨ͯ͠ྑ͍જࡏۭؒ
 Λ֫ಘ͍ͯ͠Δ͜ͱΛࣔࠦ ධՁ࣮ݧ: Low-resource Language Understanding 74
  75. •OptimusͱBERTͷจදݱ
 ͷ෼෍ΛՄࢹԽ • Yelpσʔληοτͷ
 ։ൃηοτΛจදݱʹม׵ •Optimusͷํ͕จදݱͷ෼෍͕
 Ұ༷Ͱϥϕϧ͝ͱͷմ͕ΑΓ
 ໌֬ • ಛʹɺBERTΑΓજࡏදݱ͕


    Ұ༷ʹ෼෍͍ͯ͠Δ • ͱݴ͑ΔΑ͏ͳؾ͕͢Δ ධՁ࣮ݧ: Low-resource Language Understanding 75
  76. •OptimusͱBERTͷจදݱ
 ͷ෼෍ΛՄࢹԽ • Yelpσʔληοτͷ
 ։ൃηοτΛจදݱʹม׵ •Optimusͷํ͕จදݱͷ෼෍͕
 Ұ༷Ͱϥϕϧ͝ͱͷմ͕ΑΓ
 ໌֬ • ಛʹɺBERTΑΓજࡏදݱ͕


    Ұ༷ʹ෼෍͍ͯ͠Δ • ͱݴ͑ΔΑ͏ͳؾ͕͢Δ ධՁ࣮ݧ: Low-resource Language Understanding 76
  77. •OptimusͷGLUEͰͷੑೳΛධՁ • จຒΊࠐΈΛೖྗͱ͢Δઢܗ෼ྨثʹΑͬͯͲΕ΄Ͳͷੑೳ͕ग़Δ͔ •Fine-tuningͳ͠ͷ৔߹ʹݩͷBERTΑΓ΋ߴ͍ੑೳ • Optimus͸BERTΑΓ΋ྑ͍จදݱ͕֫ಘͰ͖͍ͯΔʁ • BERTͷQQPͷੑೳ͕௿͗͢Δͷ͕ؾʹͳΔ͕… •Fine-tuning͋Γͷ৔߹͸ͦ͜·ͰมΘΒͳ͍ (ݩ͕BERTͳͷͰ౰વ͔)

    ධՁ࣮ݧ: Low-resource Language Understanding 77 Ͳ͏ͤͳΒSentEvalͰ΋࣮ݧͯ͠ཉ͔͕ͬͨ͠…
  78. •OptimusͷGLUEͰͷੑೳΛධՁ • จຒΊࠐΈΛೖྗͱ͢Δઢܗ෼ྨثʹΑͬͯͲΕ΄Ͳͷੑೳ͕ग़Δ͔ •Fine-tuningͳ͠ͷ৔߹ʹݩͷBERTΑΓ΋ߴ͍ੑೳ • Optimus͸BERTΑΓ΋ྑ͍จදݱ͕֫ಘͰ͖͍ͯΔʁ • BERTͷQQPͷੑೳ͕௿͗͢Δͷ͕ؾʹͳΔ͕… •Fine-tuning͋Γͷ৔߹͸ͦ͜·ͰมΘΒͳ͍ (ݩ͕BERTͳͷͰ౰વ͔)

    ධՁ࣮ݧ: Low-resource Language Understanding 78 Ͳ͏ͤͳΒSentEvalͰ΋࣮ݧͯ͠ཉ͔͕ͬͨ͠…
  79. •OptimusͷGLUEͰͷੑೳΛධՁ • จຒΊࠐΈΛೖྗͱ͢Δઢܗ෼ྨثʹΑͬͯͲΕ΄Ͳͷੑೳ͕ग़Δ͔ •Fine-tuningͳ͠ͷ৔߹ʹݩͷBERTΑΓ΋ߴ͍ੑೳ • Optimus͸BERTΑΓ΋ྑ͍จදݱ͕֫ಘͰ͖͍ͯΔʁ • BERTͷQQPͷੑೳ͕௿͗͢Δͷ͕ؾʹͳΔ͕… •Fine-tuning͋Γͷ৔߹͸ͦ͜·ͰมΘΒͳ͍ (ݩ͕BERTͳͷͰ౰વ͔)

    ධՁ࣮ݧ: Low-resource Language Understanding 79 Ͳ͏ͤͳΒSentEvalͰ΋࣮ݧͯ͠ཉ͔͕ͬͨ͠…
  80. •VAEϕʔεͷେن໛ࣄલֶशࡁΈݴޠϞσϧOptimusΛఏҊ •EncoderʹBERTɺDecoderʹGPT-2Λ্ख͘౷߹ͯ͠VAEΛߏ੒ •จੜ੒ɾ৚݅෇͖ੜ੒ɾ௿ࢿݯઃఆͷλεΫͰߴ͍ੑೳ • ಛʹطଘͷখ͞ͳVAEΛେ্͖͘ճΔੑೳ • VAEʹ͓͚Δࣄલֶशͷ༗ޮੑΛࣔ͢ ײ૝ •BERTͳͲͷطଘࣄલֶशࡁΈݴޠϞσϧΛར༻ͤͣɺfrom scratchͰֶश͢

    ΔͱͲ͏ͳΔͷ͔͕ؾʹͳΔ • ܭࢉϦιʔεతʹݫ͔ͬͨ͠໛༷(Sec. 6 DiscussionΛࢀর) •ࣄલֶश + VAEͳ࿩ͱͯ͠͸໘ന͍͕ɺԠ༻ൣғ͸ݶఆత͔ ·ͱΊ 80
  81. •VAEϕʔεͷେن໛ࣄલֶशࡁΈݴޠϞσϧOptimusΛఏҊ •EncoderʹBERTɺDecoderʹGPT-2Λ্ख͘౷߹ͯ͠VAEΛߏ੒ •จੜ੒ɾ৚݅෇͖ੜ੒ɾ௿ࢿݯઃఆͷλεΫͰߴ͍ੑೳ • ಛʹطଘͷখ͞ͳVAEΛେ্͖͘ճΔੑೳ • VAEʹ͓͚Δࣄલֶशͷ༗ޮੑΛࣔ͢ ײ૝ •BERTͳͲͷطଘࣄલֶशࡁΈݴޠϞσϧΛར༻ͤͣɺfrom scratchͰֶश͢

    ΔͱͲ͏ͳΔͷ͔͕ؾʹͳΔ • ܭࢉϦιʔεతʹݫ͔ͬͨ͠໛༷(Sec. 6 DiscussionΛࢀর) •ࣄલֶश + VAEͳ࿩ͱͯ͠͸໘ന͍͕ɺԠ༻ൣғ͸ݶఆత͔ ·ͱΊ 81