Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[輪講資料] Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space

[輪講資料] Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space

事前学習済み言語モデルを統合することによって構築される大規模Variational Auto-Encoder (VAE)モデルのOptimusと、その論文について解説した資料です。
Optimusを支えるVAEの目的関数の導出から丁寧に紹介します。

Hayato Tsukagoshi

October 18, 2022
Tweet

More Decks by Hayato Tsukagoshi

Other Decks in Research

Transcript

  1. Optimus: Organizing Sentences via Pre-trained
    Modeling of a Latent Space
    Graduate school of Informatics, Nagoya University, Japan.
    ൃදऀ: Hayato Tsukagoshi
    Chunyuan Li, Xiang Gao, Yuan Li, Baolin Peng, Xiujun Li, Yizhe Zhang, and Jianfeng Gao

    EMNLP 2020

    URL: https://aclanthology.org/2020.emnlp-main.378/

    View Slide

  2. •VAE (ม෼ࣗݾූ߸Խث)ϕʔεͷࣄલֶशࡁΈݴޠϞσϧOptimusΛఏҊ

    • ஫ҙ: طଘͷࣄલֶशࡁΈݴޠϞσϧ͸͔ͬ͠Γར༻

    •EncoderʹBERTɺDecoder͸GPT-2

    • ೋͭͷϞσϧΛ͏·͘౷߹ͯ͠

    VAEΛߏ੒ɺ౷߹ख๏΋޻෉

    •จੜ੒ʹ͓͚ΔධՁࢦඪɾ৚݅෇͖ੜ੒ɾ௿ࢿݯઃఆͷλεΫͰߴ͍ੑೳ

    • જࡏදݱͷઢܗิ׬ʹΑΔҙຯతʹͳΊΒ͔ͳจੜ੒͕Մೳ

    • NLPʹ͓͚ΔVAE + ࣄલֶशͷ༗༻ੑΛࣔ͢
    ࿦จ֓ཁ
    2

    View Slide

  3. •VAE͸ཧ࿦తɾٕज़తʹ໘ന͍͕(ಛʹNLPͰ)͋·Γ஫໨͞Ε͍ͯͳ͍

    • BERTͳͲͷࣄલֶशࡁΈϞσϧ͕؆୯ɾڧྗ

    • ҰԠଟ༷ੑΛॏࢹ͢Δจੜ੒λεΫͰ͸࢖ΘΕ͍ͯΔΑ͏͕ͩ…

    •ࣗ෼༻ʹVAEʹ͍ͭͯษڧɾഎܠ஌ࣝͷ·ͱΊ௚͕͔ͨͬͨ͠͠

    • ਺ࣜΛ͋·Γ͓֮͑ͯΒͣ…

    •ࣗ෼ͷݚڀͰ࢖͏͔΋͠Εͣڵຯ͕͋ͬͨ

    • VAEϕʔεͷจຒΊࠐΈϞσϧ͸΄ͱΜͲݟͳ͍ͷͰ

    • BERT-
    fl
    owͱ͔͸ࢥ૝͕ۙͦ͏Ͱ͸͋Δ
    બఆཧ༝
    3

    View Slide

  4. ಋೖ
    •VAEͱ͸

    •VAEͷ໨తؔ਺ͷಋग़

    Optimus
    •Ϟσϧߏ଄

    •ଛࣦؔ਺

    •BERTͱGPT-2ͷ౷߹

    •ධՁ࣮ݧ
    ໨࣍
    4

    View Slide

  5. ಋೖ

    View Slide

  6. ಋೖ
    •VAEͱ͸

    •VAEͷ໨తؔ਺ͷಋग़

    Optimus
    •Ϟσϧߏ଄

    •ଛࣦؔ਺

    •BERTͱGPT-2ͷ౷߹

    •ධՁ࣮ݧ
    ໨࣍
    6

    View Slide

  7. •ग़ྗ͚ͩͰͳ͘ೖྗͷ෼෍΋ϞσϧԽ͢Δख๏ *

    • զʑ͕Α͘࢖͏ͷ͸ࣝผϞσϧ (෼ྨ͚ͩߦ͏)

    •σʔλ͕ԿΒ͔ͷ֬཰෼෍ʹج͍ͮͯੜ੒͞ΕΔͱߟ͑Δ

    • ؍ଌσʔλ͔Β؍ଌσʔλ͕ै͏֬཰෼෍Λਪఆ͢Δ

    •ը૾෼໺ʹ͓͚ΔGAN͕༗໊

    • NLPͰ͸ҙ֎ͱ͋·Γݟͳ͍ʁ
    ੜ੒Ϟσϧ
    7
    * ύλʔϯೝࣝͱػցֶश ্ר p.42Λࢀর

    View Slide

  8. •தؒදݱ͔ΒೖྗΛ࠶ߏ੒Ͱ͖ΔΑ͏ʹ܇࿅͢ΔϞσϧ

    • ੜ੒ϞσϧͷҰछ

    • ڭࢣͳֶ͠श͕Մೳ

    •தؒදݱ͸ೖྗͷѹॖ͞ΕͨදݱͱΈͳͤΔ

    • ඇઢܗͰෳࡶͳ࣍ݩѹॖ͕Ͱ͖Δ

    • ΫϥελϦϯά΍ҟৗݕ஌ɾϊΠζআڈͳͲʹ΋࢖ΘΕΔ
    Auto-Encoder (AE): ࣗݾූ߸Խث
    8

    View Slide

  9. •Auto-Encoderͷજࡏදݱͷ෼෍ʹ੍໿ΛՃ͑ͨ΋ͷ (ͱݟ၏ͤΔ)

    • AEͱ͸ҟͳΔಈػͱཧ࿦എܠΛ͕࣋ͭɺࣅͨ΋ͷͱղऍͰ͖Δ

    • જࡏදݱʹର͢Δ੍໿ʹΑͬͯσʔλͷੜ੒͕༰қʹ

    • Kingma et al., 2013. Auto-Encoding Variational Bayes ͰఏҊ

    •જࡏදݱͷ෼෍ʹ͸೚ҙͷࣄલ෼෍ (prior) Λબ΂Δ

    • ଟ͘ͷ৔߹͸ඪ४ਖ਼ن෼෍ (standard normal distribution)
    •ଛࣦؔ਺ͱͯ͠ೋͭͷଛࣦΛ଍͠߹Θͤͯ༻͍Δ

    • ࠶ߏ੒ޡࠩ
    • જࡏදݱͷ෼෍ʹ͍ͭͯͷଛࣦ
    Variational Auto-Encoder (VAE): ม෼ࣗݾූ߸Խث
    9

    View Slide

  10. VAEͷϞσϧߏ଄
    10
    જࡏදݱ

    z
    x


    Encoder
    μ
    σ
    x’
    Decoder

    View Slide

  11. VAEͷϞσϧߏ଄
    11
    જࡏදݱ

    z
    x


    Encoder
    μ
    σ
    x’
    Decoder
    ೖྗΛϕΫτϧදݱʹม׵

    View Slide

  12. VAEͷϞσϧߏ଄
    12
    ෼ࢄڞ෼ࢄߦྻ͸ΊΜͲ͏ͳͷͰجຊతʹର֯ߦྻͱΈͳͯ͠͠·͏
    જࡏදݱ

    z
    x


    Encoder
    μ
    σ
    x’
    Decoder
    ϕΫτϧදݱ͔ΒΨ΢ε෼෍ͷ

    ฏۉͱ෼ࢄڞ෼ࢄߦྻΛग़ྗ

    View Slide

  13. VAEͷϞσϧߏ଄
    13
    ෼ࢄڞ෼ࢄߦྻ͸ΊΜͲ͏ͳͷͰجຊతʹର֯ߦྻͱΈͳͯ͠͠·͏
    જࡏදݱ

    z
    x


    Encoder
    μ
    σ
    x’
    Decoder
    ฏۉͱ෼ࢄڞ෼ࢄߦྻΛ༻͍ͯΨ΢ε෼
    ෍͔ΒαϯϓϦϯάɺજࡏදݱΛ֫ಘ

    View Slide

  14. VAEͷϞσϧߏ଄
    14
    ෼ࢄڞ෼ࢄߦྻ͸ΊΜͲ͏ͳͷͰجຊతʹର֯ߦྻͱΈͳͯ͠͠·͏
    જࡏදݱ

    z
    x


    Encoder
    μ
    σ
    x’
    Decoder
    જࡏදݱ͔Βग़ྗΛ࠶ߏ੒

    View Slide

  15. AEͱVAEͷϞσϧߏ଄ͷൺֱ
    15
    જࡏදݱ

    z
    x


    Encoder
    μ
    σ
    x’
    Decoder
    જࡏදݱ

    z
    x Encoder x’
    Decoder
    AE
    VAE

    View Slide

  16. AEͱVAEͷϞσϧߏ଄ͷൺֱ
    16
    જࡏදݱ

    z
    x


    Encoder
    μ
    σ
    x’
    Decoder
    જࡏදݱ

    z
    x Encoder x’
    Decoder
    AE
    VAE
    જࡏදݱΛαϯϓϦϯά͢Δ
    ͨΊͷॲཧͱ

    જࡏදݱͷ෼෍ʹؔ͢Δ

    ଛࣦ͕૿͑Δ͚ͩ

    View Slide

  17. •AEͰ͸જࡏදݱ͕ͲͷΑ͏ʹ෼෍͍ͯ͠Δ͔ෆ໌

    • VAEͰ͸ط஌ͷ֬཰෼෍ʹ͚ۙͮΔΑ͏ʹֶशΛߦ͏

    • ط஌෼෍͔ΒͷαϯϓϦϯάͰࣗવͳσʔλͷੜ੒͕ߦ͑Δ

    •ਖ਼ଇԽೳྗ͕͋ΓAEΑΓؤ݈

    • Denoising Auto-EncoderͳͲͱಉ༷

    • PCA΍SVDͱҟͳΓɺඇઢܗม׵Ͱೖྗσʔλͷѹॖ͕ߦ͑Δ
    VAEͷར఺
    17

    View Slide

  18. GAN
    •ࣝผث(Discriminator)͕ੜ੒ث(Generator)ͷग़ྗΛ෼ྨͰ͖ͳ͍Α͏ʹֶश

    VAE
    •જࡏදݱͷ෼෍͕ࣄલ෼෍ʹۙͮ͘Α͏ʹ + ೖྗΛ࠶ߏ੒͢ΔΑ͏ʹֶश

    Normalizing
    fl
    ow
    •ٯม׵Մೳͳࣸ૾Λֶशɺෳࡶͳજࡏදݱͷ෼෍Λߏ੒

    •VAEͱ૊Έ߹ΘͤՄೳ

    Di
    ff
    usion Models
    •ॱํ޲ͰϊΠζՃࢉɺٯํ޲ͰϊΠζΛআڈ͢ΔΑ͏ʹϞσϧΛֶश
    VAEͱͦͷଞͷੜ੒Ϟσϧͷൺֱ
    18
    ม෼ਪ࿦ͱ Normalizing Flow

    View Slide

  19. •VAEͷଛࣦؔ਺͸ҎԼͷೋͭͷ଍͠߹Θͤ

    • ࠶ߏ੒ޡࠩ
    • ਖ਼ଇԽ߲ (જࡏදݱͷ෼෍ʹ͍ͭͯͷଛࣦ)
    • ͸Encoderͷύϥϝʔλɺ ͸Decoderͷύϥϝʔλ
    ϕ θ
    VAEͷ໨తؔ਺
    19
    ℒ = −
    DKL
    ( qϕ
    (z|X) ∥ pθ
    (z) ) Eqϕ
    (z|X)
    [ log pθ
    (X|z) ]
    ਖ਼ଇԽ߲ ࠶ߏ੒ޡࠩ

    View Slide

  20. •ͦ΋ͦ΋ͷVAE (΋͘͠͸ม෼ϕΠζ)ͷ͓ؾ࣋ͪ

    • σʔλ ʹӅ͞Εͨੑ࣭ Λදݱ͢Δࣄޙ֬཰෼෍ Λ஌Γ͍ͨ

    •࣮ࡍʹ͸ ΍ ͸Θ͔Βͳ͍͜ͱ͕΄ͱΜͲ

    • Λۙࣅͨ͠ Ͱଥڠ

    • ͸ͲͷΑ͏ʹٻΊΔ͔ʁ

    • ͜ͷ֬཰෼෍΋ͲͷΑ͏ʹͳΔ͔Θ͔Βͳ͍

    • Λͱ͔͔ͬΓʹࣜΛ͜Ͷ͘Γ·Θͯ͠ΈΔ
    X Z pθ
    (Z|X)

    (X) pθ
    (Z|X)

    (Z|X) qϕ
    (Z|X)

    (Z|X)

    (X)
    VAEͷ໨తؔ਺ͷٻΊํ
    20

    View Slide

  21. •ͦ΋ͦ΋ͷVAE (΋͘͠͸ม෼ϕΠζ)ͷ͓ؾ࣋ͪ

    • σʔλ ʹӅ͞Εͨੑ࣭ Λදݱ͢Δࣄޙ֬཰෼෍ Λ஌Γ͍ͨ

    •࣮ࡍʹ͸ ΍ ͸Θ͔Βͳ͍͜ͱ͕΄ͱΜͲ

    • Λۙࣅͨ͠ Ͱଥڠ

    • ͸ͲͷΑ͏ʹٻΊΔ͔ʁ

    • ͜ͷ֬཰෼෍΋ͲͷΑ͏ʹͳΔ͔Θ͔Βͳ͍

    • Λͱ͔͔ͬΓʹࣜΛ͜Ͷ͘Γ·Θͯ͠ΈΔ
    X Z pθ
    (Z|X)

    (X) pθ
    (Z|X)

    (Z|X) qϕ
    (Z|X)

    (Z|X)

    (X)
    VAEͷ໨తؔ਺ͷٻΊํ
    21

    View Slide

  22. VAEͷ໨తؔ਺ͷٻΊํ
    22
    log pθ
    (X) = log


    (X, z) dz
    = log


    (X, z)

    (z|X)

    (z|X)
    dz
    = log


    (X, z)
    dz

    (z|X)

    (z|X)
    ҎԼͷΑ͏ʹࣜมܗΛͯ͠ΈΔ
    zͰपลԽͨ͠΋ͷ

    ͱΈͳ͢

    View Slide

  23. VAEͷ໨తؔ਺ͷٻΊํ
    23
    log pθ
    (X) = log


    (X, z) dz
    = log


    (X, z)

    (z|X)

    (z|X)
    dz
    = log


    (X, z)
    dz

    (z|X)

    (z|X)
    ҎԼͷΑ͏ʹࣜมܗΛͯ͠ΈΔ
    1Λ͔͚ͯ΋͍ͬ͠ΐ

    View Slide

  24. VAEͷ໨తؔ਺ͷٻΊํ
    24
    ΠΣϯηϯͷෆ౳ࣜΑΓɺ

    ͸Ԝؔ਺ (্ʹತ) Ͱ͋Δ͜ͱʹ஫ҙ͢Δͱ
    f(x) = log(x)


    (X, z)
    dz

    (z|X)

    (z|X)
    log
    log pθ
    (X) ≥


    (X, z)
    dz

    (z|X)

    (z|X)
    log

    (X, z)
    dz

    (z|X)

    (z|X)
    log

    ͢ͳΘͪ

    View Slide

  25. VAEͷ໨తؔ਺ͷٻΊํ
    25
    ΠΣϯηϯͷෆ౳ࣜΑΓɺ

    ͸Ԝؔ਺ (্ʹತ) Ͱ͋Δ͜ͱʹ஫ҙ͢Δͱ
    f(x) = log(x)


    (X, z)
    dz

    (z|X)

    (z|X)
    log
    log pθ
    (X) ≥


    (X, z)
    dz

    (z|X)

    (z|X)
    log

    (X, z)
    dz

    (z|X)

    (z|X)
    log

    ͢ͳΘͪ

    View Slide

  26. VAEͷ໨తؔ਺ͷٻΊํ
    26
    ͜͜ͰӈลΛ
    ͱ͓͘ͱ
    log pθ
    (X) ≥ ℒ(θ, ϕ; X)
    ℒ(θ, ϕ; X) =


    (X, z)
    dz

    (z|X)

    (z|X)
    log
    ͱॻ͚Δɻ͜ͷ Λ

    ELBO (Evidence Lower BOund): ม෼Լք ͱݺͿ
    ℒ(θ, ϕ; X)

    View Slide

  27. VAEͷ໨తؔ਺ͷٻΊํ
    27
    ELBOΛม෼Լݶͱॻ͘͜ͱ΋͋Δ͕ɺlower limit (Լݶ)Ͱ͸ͳ͘lower boundͳͷͰԼք͕ਖ਼͍͠Μ͡Όͳ͍͔ͱࢥ͍ͬͯΔ
    ͜͜ͰӈลΛ
    ͱ͓͘ͱ
    log pθ
    (X) ≥ ℒ(θ, ϕ; X)
    ℒ(θ, ϕ; X) =


    (X, z)
    dz

    (z|X)

    (z|X)
    log
    ͱॻ͚Δɻ͜ͷ Λ

    ELBO (Evidence Lower BOund): ม෼Լք ͱݺͿ
    ℒ(θ, ϕ; X)

    View Slide

  28. VAEͷ໨తؔ਺ͷٻΊํ
    28
    ͱ͜ΖͰઌ΄Ͳͷෆ౳ࣜͷ྆ลͷࠩ
    ʹ͍ͭͯߟ͑ͯΈΔͱ
    log pθ
    (X) − ℒ(θ, ϕ; X)
    = ∫

    (X, z)
    dz

    (z|X)

    (z|X)
    log
    log pθ
    (X) −
    =


    (z|X) pθ
    (X)
    dz

    (z|X)

    (z|X)
    log
    log pθ
    (X)
    ∫ −

    (z|X) dz

    View Slide

  29. VAEͷ໨తؔ਺ͷٻΊํ
    29
    log pθ
    (X) − ℒ(θ, ϕ; X)
    =


    (z|X) pθ
    (X)
    dz

    (z|X)

    (z|X)
    log
    log pθ
    (X)
    ∫ −
    =


    (z|X) pθ
    (X)
    dz

    (z|X)

    (z|X)
    log

    log pθ
    (X) dz −
    = ∫
    log

    (z|X) pθ
    (X)
    dz

    (z|X) dz

    (z|X)

    (z|X)

    (X) qϕ
    (z|X)

    View Slide


  30. log

    (z|X) pθ
    (X)
    dz

    (X)
    VAEͷ໨తؔ਺ͷٻΊํ
    30
    log pθ
    (X) − ℒ(θ, ϕ; X)
    =
    =
    = DKL
    ( qϕ
    (z|X) ∥ pθ
    (z|X) )

    (z|X)

    (z|X)

    log

    (z|X)
    dz

    (z|X)

    (z|X)

    View Slide

  31. VAEͷ໨తؔ਺ͷٻΊํ
    31
    Ҏ্ΑΓ
    log pθ
    (X) = ℒ(θ, ϕ; X) + DKL
    ( qϕ
    (z|X) ∥ pθ
    (z|X) )
    ΋ͱ΋ͱͷ໨త͸ Λۙࣅ͢Δ ΛٻΊΔ͜ͱ

    (z|X) qϕ
    (z|X)
    → Λ࠷খԽ͢Ε͹Α͍
    DKL
    ( qϕ
    (z|X) ∥ pθ
    (z|X) )
    ͸ ͷ΋ͱͰҰఆͳͷͰ
    log pθ
    (X) θ
    ͷ࠷খԽ 㱻 ͷ࠷େԽ
    DKL
    ( qϕ
    (z|X) ∥ pθ
    (z|X) ) ℒ(θ, ϕ; X)

    View Slide

  32. VAEͷ໨తؔ਺ͷٻΊํ
    32
    Ҏ্ΑΓ
    log pθ
    (X) = ℒ(θ, ϕ; X) + DKL
    ( qϕ
    (z|X) ∥ pθ
    (z|X) )
    ΋ͱ΋ͱͷ໨త͸ Λۙࣅ͢Δ ΛٻΊΔ͜ͱ

    (z|X) qϕ
    (z|X)
    → Λ࠷খԽ͢Ε͹Α͍
    DKL
    ( qϕ
    (z|X) ∥ pθ
    (z|X) )
    ͸ ͷ΋ͱͰҰఆͳͷͰ
    log pθ
    (X) θ
    ͷ࠷খԽ 㱻 ͷ࠷େԽ
    DKL
    ( qϕ
    (z|X) ∥ pθ
    (z|X) ) ℒ(θ, ϕ; X)

    View Slide

  33. VAEͷ໨తؔ਺ͷٻΊํ
    33
    Ҏ্ΑΓ
    log pθ
    (X) = ℒ(θ, ϕ; X) + DKL
    ( qϕ
    (z|X) ∥ pθ
    (z|X) )
    ΋ͱ΋ͱͷ໨త͸ Λۙࣅ͢Δ ΛٻΊΔ͜ͱ

    (z|X) qϕ
    (z|X)
    → Λ࠷খԽ͢Ε͹Α͍
    DKL
    ( qϕ
    (z|X) ∥ pθ
    (z|X) )
    ͸ ͷ΋ͱͰҰఆͳͷͰ
    log pθ
    (X) θ
    ͷ࠷খԽ 㱻 ͷ࠷େԽ
    DKL
    ( qϕ
    (z|X) ∥ pθ
    (z|X) ) ℒ(θ, ϕ; X)
    ্ࣜӈล ୈ1߲ͱୈ2߲ͷ࿨͕ෆม
    → ୈ2߲͕খ͘͞ͳΔͳΒ

    ୈ1߲͸େ͖͘ͳΒͳ͍ͱ͍͚ͳ͍

    View Slide

  34. dz
    VAEͷ໨తؔ਺ͷٻΊํ
    34
    ℒ(θ, ϕ; X) =


    (X, z)
    dz

    (z|X)

    (z|X)
    log

    (z|X)

    (z|X)
    log

    =

    (X|z) pθ
    (z)

    (z|X) log

    = pθ
    (X|z) dz

    (z|X)

    (z|X)
    log


    (z)
    dz +
    ͱ͜ΖͰɺม෼ԼքΛ͞Βʹ෼ղͯ͠ΈΔͱ

    View Slide

  35. VAEͷ໨తؔ਺ͷٻΊํ
    35
    ℒ(θ, ϕ; X)

    (z|X) log

    = pθ
    (X|z) dz

    (z|X)

    (z|X)
    log
    ∫ pθ
    (z)
    dz −

    (z|X) log

    = pθ
    (X|z) dz − DKL
    ( qϕ
    (z|X) ∥ pθ
    (z) )
    ໬౓ ਖ਼ଇԽ߲

    (z|X) log

    = pθ
    (X|z) dz

    (z|X)

    (z|X)
    log


    (z)
    dz +

    View Slide

  36. VAEͷ໨తؔ਺ͷٻΊํ
    36
    ͷ࠷େԽ 㱻 ͷ࠷খԽͳͷͰɺ

    ଛࣦؔ਺͕ҎԼͷΑ͏ʹఆΊΒΕΔ
    ℒ(θ, ϕ; X) −ℒ(θ, ϕ; X)
    −ℒ(θ, ϕ; X) = qϕ
    (z|X) log
    ∫ pθ
    (X|z) dz

    DKL
    ( qϕ
    (z|X) ∥ pθ
    (z) )
    = −
    DKL
    ( qϕ
    (z|X) ∥ pθ
    (z) ) Eqϕ
    (z|X)
    [ log pθ
    (X|z) ]
    ਖ਼ଇԽ߲ ࠶ߏ੒ޡࠩ

    View Slide

  37. VAEͷ໨తؔ਺ͷٻΊํ
    37
    ͷ࠷େԽ 㱻 ͷ࠷খԽͳͷͰɺ

    ଛࣦؔ਺͕ҎԼͷΑ͏ʹఆΊΒΕΔ
    ℒ(θ, ϕ; X) −ℒ(θ, ϕ; X)
    −ℒ(θ, ϕ; X) = qϕ
    (z|X) log
    ∫ pθ
    (X|z) dz

    DKL
    ( qϕ
    (z|X) ∥ pθ
    (z) )
    = −
    DKL
    ( qϕ
    (z|X) ∥ pθ
    (z) ) Eqϕ
    (z|X)
    [ log pθ
    (X|z) ]
    ਖ਼ଇԽ߲ ࠶ߏ੒ޡࠩ
    ʹΨ΢ε෼෍Λ

    Ծఆ͢Ε͹ɺղੳతʹ

    ଛࣦؔ਺ΛٻΊΒΕΔ

    (z)

    View Slide

  38. VAEͷϞσϧߏ଄ (࠶ܝ)
    38
    જࡏදݱ

    z
    x


    Encoder
    μ
    σ
    x’
    Decoder

    View Slide

  39. VAEͷϞσϧߏ଄ (࠶ܝ)
    39
    જࡏදݱ

    z
    x


    Encoder
    μ
    σ
    x’
    Decoder
    ຊ౰͸͜͜ʹ
    reperameterization trick

    ͱ͍͏ςΫ͕ڬ·Δ

    View Slide

  40. VAEͷٖࣅίʔυ: Encoder
    40

    View Slide

  41. VAEͷٖࣅίʔυ: Encoder
    41
    ࣮૷ͱͯ͠͸

    ઢܗ૚ʹೋވʹ௨͚ͩ͢

    View Slide

  42. VAEͷٖࣅίʔυ: શମ
    42

    View Slide

  43. VAEͷٖࣅίʔυ: શମ
    43
    αϯϓϦϯάͯ֫͠ಘͨ͠

    જࡏදݱ͔ΒೖྗΛ࠶ߏ੒

    View Slide

  44. •જࡏදݱͷ෼෍ʹط஌ͷ֬཰෼෍ΛԾఆֶͯ͠शΛߦ͏ੜ੒Ϟσϧ

    • ࣍ݩѹॖɾҙຯͷ͋Δදݱͷநग़ / αϯϓϦϯάʹΑΔੜ੒͕Մೳ

    •ग़ࣗ͸ҟͳΔ͕ɺAuto-EncoderͱࣅͨΞʔΩςΫνϟΛඋ͑Δ

    • Auto-Encoderʹજࡏදݱʹؔ͢Δਖ਼ଇԽ߲Λ௥Ճͨ͠΋ͷͱΈͳͤΔ

    • ਖ਼ଇԽ߲ʹΑΓVAE͸AEΑΓ΋ؤ݈ (ͱݴΘΕΔ)
    VAEͷ·ͱΊ
    44

    View Slide

  45. Optimus

    View Slide

  46. ಋೖ
    •VAEͱ͸

    •VAEͷ໨తؔ਺ͷಋग़

    Optimus
    •Ϟσϧߏ଄

    •ଛࣦؔ਺

    •BERTͱGPT-2ͷ౷߹

    •ධՁ࣮ݧ
    ໨࣍
    46

    View Slide

  47. •VAE (ม෼ࣗݾූ߸Խث)ϕʔεͷࣄલֶशࡁΈݴޠϞσϧOptimusΛఏҊ

    • ஫ҙ: طଘͷࣄલֶशࡁΈݴޠϞσϧ͸͔ͬ͠Γར༻

    •EncoderʹBERTɺDecoder͸GPT-2

    • ೋͭͷϞσϧΛ͏·͘౷߹ͯ͠

    VAEΛߏ੒ɺ౷߹ख๏΋޻෉

    •จੜ੒ʹ͓͚ΔධՁࢦඪɾ৚݅෇͖ੜ੒ɾ௿ࢿݯઃఆͷλεΫͰߴ͍ੑೳ

    • જࡏදݱͷઢܗิ׬ʹΑΔҙຯతʹͳΊΒ͔ͳจੜ੒͕Մೳ

    • NLPʹ͓͚ΔVAE + ࣄલֶशͷ༗༻ੑΛࣔ͢
    ࿦จ֓ཁ (࠶ܝ)
    47

    View Slide

  48. •EncoderʹBERTΛར༻ɺ[CLS]Λจදݱͱͯ͠༻͍Δ

    •DecoderʹGPT-2Λར༻ɺજࡏදݱʹैͬͯจੜ੒Λߦ͏

    •શମͱͯ͠VAEతʹೖྗจΛ࠶ߏ੒Ͱ͖ΔΑ͏ʹֶश
    Ϟσϧߏ଄: ؆୯൛
    48

    View Slide

  49. Ϟσϧߏ଄: ΋͏ͪΐͬͱࡉ͔͍൛
    49
    [CLS] w1 w2 …
    BERT
    μ
    σ
    WE

    View Slide

  50. Ϟσϧߏ଄: ΋͏ͪΐͬͱࡉ͔͍൛
    50
    z
    [CLS] w1 w2 …
    BERT
    reparameterization
    trick
    μ
    σ
    WE
    sampling

    View Slide

  51. Ϟσϧߏ଄: ΋͏ͪΐͬͱࡉ͔͍൛
    51
    z
    [CLS] w1 w2 …
    BERT
    GPT-2
    reparameterization
    trick
    μ
    σ
    WE
    /
    WM
    WD
    sampling

    View Slide

  52. Ϟσϧߏ଄: ΋͏ͪΐͬͱࡉ͔͍൛
    52
    z
    [CLS] w1 w2 …
    [CLS] w1 w2 …
    w1 w2 w3 …
    BERT
    GPT-2
    reparameterization
    trick
    μ
    σ
    WE
    /
    WM
    WD
    sampling

    View Slide

  53. •௨ৗͷVAEͷଛࣦؔ਺ʹϋΠύʔύϥϝʔλ Λ௥Ճͯ͠ར༻

    • ʹΑͬͯਖ਼ଇԽͷڧ͞Λௐ੔

    • ͷͱ͖ʹAuto-Encoderͱ΄΅ಉ͡ʹ (αϯϓϦϯά͸ߦ͏)

    • ʹΑͬͯજࡏදݱ͕ “ա౓ʹ” ࣄલ෼෍ʹۙͮ͘ͷΛ๷͙
    β, λ
    β
    β = 0
    λ
    ଛࣦؔ਺
    53

    View Slide

  54. •௨ৗͷVAEͷଛࣦؔ਺ʹϋΠύʔύϥϝʔλ Λ௥Ճͯ͠ར༻

    • ʹΑͬͯਖ਼ଇԽͷڧ͞Λௐ੔

    • ͷͱ͖ʹAuto-Encoderͱ΄΅ಉ͡ʹ (αϯϓϦϯά͸ߦ͏)

    • ʹΑͬͯજࡏදݱ͕ “ա౓ʹ” ࣄલ෼෍ʹۙͮ͘ͷΛ๷͙
    β, λ
    β
    β = 0
    λ
    ଛࣦؔ਺
    54

    View Slide

  55. •௨ৗͷVAEͷଛࣦؔ਺ʹϋΠύʔύϥϝʔλ Λ௥Ճͯ͠ར༻

    • ʹΑͬͯਖ਼ଇԽͷڧ͞Λௐ੔

    • ͷͱ͖ʹAuto-Encoderͱ΄΅ಉ͡ʹ (αϯϓϦϯά͸ߦ͏)

    • ʹΑͬͯજࡏදݱ͕ “ա౓ʹ” ࣄલ෼෍ʹۙͮ͘ͷΛ๷͙
    β, λ
    β
    β = 0
    λ
    ଛࣦؔ਺
    55

    View Slide

  56. •௨ৗͷVAEͷଛࣦؔ਺ʹϋΠύʔύϥϝʔλ Λ௥Ճͯ͠ར༻

    • ʹΑͬͯਖ਼ଇԽͷڧ͞Λௐ੔

    • ͷͱ͖ʹAuto-Encoderͱ΄΅ಉ͡ʹ (αϯϓϦϯά͸ߦ͏)

    • ʹΑͬͯજࡏදݱ͕ “ա౓ʹ” ࣄલ෼෍ʹۙͮ͘ͷΛ๷͙
    β, λ
    β
    β = 0
    λ
    ଛࣦؔ਺
    56
    ϋΠύϥ͕ଟ͍😇

    View Slide

  57. •BERTͱGPT-2Λ౷߹ͯ͠VAEΛߏங͢Δʹ͸େ·͔ʹೋͭͷ໰୊͕ଘࡏ

    1. ෼͔ͪॻ͖
    •BERTͱGPT-2͸ҟͳΔޠኮΛ࣋ͪɺ෼͔ͪॻ͖ख๏͕ҟͳΔ

    •ೖྗͱग़ྗͰҟͳΔtokenizerΛ࢖͏͜ͱͰղܾ

    2. જࡏදݱΛ༻͍ͨ৚݅෇͖ੜ੒
    •GPT-2͸৚݅෇͖ςΩετੜ੒ͷͨΊͷػߏΛඋ͍͑ͯͳ͍

    •ͲͷΑ͏ʹBERTΛ༻͍ͯಘΒΕͨજࡏදݱ͔ΒςΩετΛੜ੒͢Δ͔ʁ

    • જࡏදݱͱGPT-2ͷੜ੒ػߏΛ౷߹͢Δ2ͭͷख๏Λ࣮ݧ
    BERTͱGPT-2ͷ౷߹
    57
    prompting͸·ͨผͷ࿩

    View Slide

  58. •BERTͱGPT-2Λ౷߹ͯ͠VAEΛߏங͢Δʹ͸େ·͔ʹೋͭͷ໰୊͕ଘࡏ

    1. ෼͔ͪॻ͖
    •BERTͱGPT-2͸ҟͳΔޠኮΛ࣋ͪɺ෼͔ͪॻ͖ख๏͕ҟͳΔ

    •ೖྗͱग़ྗͰҟͳΔtokenizerΛ࢖͏͜ͱͰղܾ

    2. જࡏදݱΛ༻͍ͨ৚݅෇͖ੜ੒
    •GPT-2͸৚݅෇͖ςΩετੜ੒ͷͨΊͷػߏΛඋ͍͑ͯͳ͍

    •ͲͷΑ͏ʹBERTΛ༻͍ͯಘΒΕͨજࡏදݱ͔ΒςΩετΛੜ੒͢Δ͔ʁ

    • જࡏදݱͱGPT-2ͷੜ੒ػߏΛ౷߹͢Δ2ͭͷख๏Λ࣮ݧ
    BERTͱGPT-2ͷ౷߹
    58
    prompting͸·ͨผͷ࿩

    View Slide

  59. •BERTͱGPT-2Λ౷߹ͯ͠VAEΛߏங͢Δʹ͸େ·͔ʹೋͭͷ໰୊͕ଘࡏ

    1. ෼͔ͪॻ͖
    •BERTͱGPT-2͸ҟͳΔޠኮΛ࣋ͪɺ෼͔ͪॻ͖ख๏͕ҟͳΔ

    •ೖྗͱग़ྗͰҟͳΔtokenizerΛ࢖͏͜ͱͰղܾ

    2. જࡏදݱΛ༻͍ͨ৚݅෇͖ੜ੒
    •GPT-2͸৚݅෇͖ςΩετੜ੒ͷͨΊͷػߏΛඋ͍͑ͯͳ͍

    •ͲͷΑ͏ʹBERTΛ༻͍ͯಘΒΕͨજࡏදݱ͔ΒςΩετΛੜ੒͢Δ͔ʁ

    • જࡏදݱͱGPT-2ͷੜ੒ػߏΛ౷߹͢Δ2ͭͷख๏Λ࣮ݧ
    BERTͱGPT-2ͷ౷߹
    59
    prompting͸·ͨผͷ࿩

    View Slide

  60. Memory
    •જࡏදݱΛ૚ͷ਺ͷϕΫτϧʹม׵

    •จੜ੒࣌ʹ֤૚ͰϕΫτϧΛݟͳ͕Βੜ੒

    Embedding
    •જࡏදݱΛม׵ͯ͠୯ޠຒΊࠐΈʹՃࢉ

    •BERTͷposition embeddingͷΑ͏ʹ

    જࡏදݱΛ༻͍Δ
    BERTͱGPT-2ͷ౷߹: જࡏදݱΛ༻͍ͨ৚݅෇͖ੜ੒
    60
    prompting͸·ͨผͷ࿩

    View Slide

  61. Memory
    •જࡏදݱΛ૚ͷ਺ͷϕΫτϧʹม׵

    •จੜ੒࣌ʹ֤૚ͰϕΫτϧΛݟͳ͕Βੜ੒

    Embedding
    •જࡏදݱΛม׵ͯ͠୯ޠຒΊࠐΈʹՃࢉ

    •BERTͷposition embeddingͷΑ͏ʹ

    જࡏදݱΛ༻͍Δ
    BERTͱGPT-2ͷ౷߹: જࡏදݱΛ༻͍ͨ৚݅෇͖ੜ੒
    61
    prompting͸·ͨผͷ࿩

    View Slide

  62. Language Modeling
    •Optimus͕จΛਖ਼͘͠ੜ੒Ͱ͖Δ͔ධՁ

    •จੜ੒ʹ͓͚ΔPerplexity (PPL), MI

    Guided Language Generation
    •ಛఆͷ৚݅ʹैͬͨจΛਖ਼͘͠ੜ੒Ͱ͖Δ͔ධՁ

    •ର࿩Ԡ౴ੜ੒ɺಛఆελΠϧͰͷԠ౴ੜ੒ɺϥϕϧͰ৚݅෇͚ͨ͠จੜ੒

    Low-resource Language Understanding
    •௿ࢿݯઃఆͰͷOptimusͷ༗༻ੑΛݕূ

    •จຒΊࠐΈϕʔεͰGLUEΛղ͍ͯੑೳݕূ
    ධՁ࣮ݧ
    62

    View Slide

  63. •જࡏදݱ࣍ݩ: 32

    • ެ։͞Ε͍ͯΔ࣮૷͔Β൑அ

    •VAEͱͯ͠ͷ܇࿅σʔλ: ӳޠWikipedia 199ສจ

    •จੜ੒ܥͷλεΫͰ͸͞ΒʹͦΕͧΕͷσʔληοτͰ1 epochֶ͚ͩश

    •ֶशͷ޻෉͕͍Ζ͍Ζ

    • Λֶशதʹ૿Ճͤ͞ΔͳͲ

    •Low-resource Language UnderstandingͰ͸Encoder (BERT)ͷ[CLS]ʹରԠ
    ͢ΔදݱΛར༻

    • ͳͷͰɺϕΫτϧͷ࣍ݩ਺͸32Ͱ͸ͳ͘768
    β
    ࣮ݧઃఆ
    63
    જࡏදݱͷ࣍ݩ਺͕࿦จʹ໌ه͞Ε͍ͯͳ͍ؾ͕͢Δ…

    View Slide

  64. •طଘͷখ͞ͳVAEΑΓඇৗʹߴ͍ੑೳ

    • ڊେͳϞσϧɾڊେίʔύεͰͷࣄલֶश͸VAEͰ΋΍͸Γ༗ޮ

    • ʹΑΔจੜ੒ͷੑೳͱજࡏදݱͷ඼࣭ͷτϨʔυΦϑ͕ଘࡏ
    λ
    ධՁ࣮ݧ: Language Modeling
    64

    View Slide

  65. •طଘͷখ͞ͳVAEΑΓඇৗʹߴ͍ੑೳ

    • ڊେͳϞσϧɾڊେίʔύεͰͷࣄલֶश͸VAEͰ΋΍͸Γ༗ޮ

    • ʹΑΔจੜ੒ͷੑೳͱજࡏදݱͷ඼࣭ͷτϨʔυΦϑ͕ଘࡏ
    λ
    ධՁ࣮ݧ: Language Modeling
    65

    View Slide

  66. •طଘͷখ͞ͳVAEΑΓඇৗʹߴ͍ੑೳ

    • ڊେͳϞσϧɾڊେίʔύεͰͷࣄલֶश͸VAEͰ΋΍͸Γ༗ޮ

    • ʹΑΔจੜ੒ͷੑೳͱજࡏදݱͷ඼࣭ͷτϨʔυΦϑ͕ଘࡏ
    λ
    ධՁ࣮ݧ: Language Modeling
    66

    View Slide

  67. •3/4ͷσʔληοτͰGPT-2ͷPPLΑΓ΋௿͍PPLΛୡ੒

    • ಛʹSNLIͳͲಛ༗ͷయܕతͳจ͕ଟ͍σʔληοτͰߴ͍ੑೳ
    ධՁ࣮ݧ: Language Modeling
    67

    View Slide

  68. •OptimusͷજࡏදݱΛ༻͍Δ͜ͱͰจදݱͷԋࢉ͕Մೳ

    • Λ΋ͱʹจੜ੒

    •͜ͷ݁ՌΛͲ͏ड͚औΕ͹͍͍ͷ͔…?
    zD
    = zB
    − zA
    + zC
    ධՁ࣮ݧ: Guided Language Generation
    68
    ࿦จͰ঺հ͞Ε͍ͯΔ σϞαΠτ ͸ΞΫηεͰ͖ͳ͘ͳ͍ͬͯΔ😇

    View Slide

  69. •ೋͭͷจͷજࡏදݱͷ

    ઢܗิ׬ʹΑΔੜ੒

    •VAEͷજࡏۭ͕ؒͳΊΒ͔

    ͳ͜ͱʹΑΔԸܙ

    •શؔ͘܎ͷͳ͍จ͸

    ग़͖͍ͯͯͳ͍ɺ͘Β͍ͷ

    ؾ͔࣋ͪ

    • ิ׬͞Εͨจͷޠኮ͸

    ݩͷจͱࣅ͍ͯΔ
    ධՁ࣮ݧ: Guided Language Generation
    69

    View Slide

  70. •ೋͭͷจͷજࡏදݱͷ

    ઢܗิ׬ʹΑΔੜ੒

    •VAEͷજࡏۭ͕ؒͳΊΒ͔

    ͳ͜ͱʹΑΔԸܙ

    •શؔ͘܎ͷͳ͍จ͸

    ग़͖͍ͯͯͳ͍ɺ͘Β͍ͷ

    ؾ͔࣋ͪ

    • ิ׬͞Εͨจͷޠኮ͸

    ݩͷจͱࣅ͍ͯΔ
    ධՁ࣮ݧ: Guided Language Generation
    70

    View Slide

  71. •3ͭͷλεΫͰ࣮ݧɾߴ͍ੑೳ

    • ର࿩Ԡ౴ੜ੒

    • ಛఆελΠϧͷจੜ੒

    • ৚݅෇͖ੜ੒

    •৚݅෇͖ੜ੒Ͱ͸ײ৘෼ྨͷ

    ϥϕϧʹجͮ͘ςΩετΛੜ੒

    • ੜ੒จͷϥϕϧ෼ྨ֬཰΍

    ੜ੒จͷଟ༷ੑͰߴ͍ੑೳ

    ධՁ࣮ݧ: Guided Language Generation
    71
    ৄ͍࣮͠ݧઃఆɾλεΫઆ໌ʹ͍ͭͯ͸ݩ࿦จΛࢀরͷ͜ͱ

    View Slide

  72. •3ͭͷλεΫͰ࣮ݧɾߴ͍ੑೳ

    • ର࿩Ԡ౴ੜ੒

    • ಛఆελΠϧͷจੜ੒

    • ৚݅෇͖ੜ੒

    •৚݅෇͖ੜ੒Ͱ͸ײ৘෼ྨͷ

    ϥϕϧʹجͮ͘ςΩετΛੜ੒

    • ੜ੒จͷϥϕϧ෼ྨ֬཰΍

    ੜ੒จͷଟ༷ੑͰߴ͍ੑೳ

    ධՁ࣮ݧ: Guided Language Generation
    72
    ৄ͍࣮͠ݧઃఆɾλεΫઆ໌ʹ͍ͭͯ͸ݩ࿦จΛࢀরͷ͜ͱ

    View Slide

  73. •OptimusͷEncoderදݱΛ༻͍ͯ

    ઢܗ෼ྨثΛ܇࿅

    • Yelpσʔληοτͷײ৘෼ྨλεΫ

    •܇࿅ࣄྫ਺ʹΑΔੑೳͷมԽΛ؍࡯

    •Optimus͸܇࿅ࣄྫ਺͕খͯ͘͞΋

    ൺֱతߴ͍෼ྨੑೳ

    • ੑೳ্͕͕Δͷ͕एׯૣ͍

    • ಛʹ
    fi
    ne-tuningͳ͠ͷ৔߹ʹ΋ͱͷ

    BERTΑΓ΋ੑೳ͕ߴ͍

    • VAEͷֶशΛ௨ͯ͠ྑ͍જࡏۭؒ

    Λ֫ಘ͍ͯ͠Δ͜ͱΛࣔࠦ
    ධՁ࣮ݧ: Low-resource Language Understanding
    73

    View Slide

  74. •OptimusͷEncoderදݱΛ༻͍ͯ

    ઢܗ෼ྨثΛ܇࿅

    • Yelpσʔληοτͷײ৘෼ྨλεΫ

    •܇࿅ࣄྫ਺ʹΑΔੑೳͷมԽΛ؍࡯

    •Optimus͸܇࿅ࣄྫ਺͕খͯ͘͞΋

    ൺֱతߴ͍෼ྨੑೳ

    • ੑೳ্͕͕Δͷ͕एׯૣ͍

    • ಛʹ
    fi
    ne-tuningͳ͠ͷ৔߹ʹ΋ͱͷ

    BERTΑΓ΋ੑೳ͕ߴ͍

    • VAEͷֶशΛ௨ͯ͠ྑ͍જࡏۭؒ

    Λ֫ಘ͍ͯ͠Δ͜ͱΛࣔࠦ
    ධՁ࣮ݧ: Low-resource Language Understanding
    74

    View Slide

  75. •OptimusͱBERTͷจදݱ

    ͷ෼෍ΛՄࢹԽ

    • Yelpσʔληοτͷ

    ։ൃηοτΛจදݱʹม׵

    •Optimusͷํ͕จදݱͷ෼෍͕

    Ұ༷Ͱϥϕϧ͝ͱͷմ͕ΑΓ

    ໌֬

    • ಛʹɺBERTΑΓજࡏදݱ͕

    Ұ༷ʹ෼෍͍ͯ͠Δ

    • ͱݴ͑ΔΑ͏ͳؾ͕͢Δ
    ධՁ࣮ݧ: Low-resource Language Understanding
    75

    View Slide

  76. •OptimusͱBERTͷจදݱ

    ͷ෼෍ΛՄࢹԽ

    • Yelpσʔληοτͷ

    ։ൃηοτΛจදݱʹม׵

    •Optimusͷํ͕จදݱͷ෼෍͕

    Ұ༷Ͱϥϕϧ͝ͱͷմ͕ΑΓ

    ໌֬

    • ಛʹɺBERTΑΓજࡏදݱ͕

    Ұ༷ʹ෼෍͍ͯ͠Δ

    • ͱݴ͑ΔΑ͏ͳؾ͕͢Δ
    ධՁ࣮ݧ: Low-resource Language Understanding
    76

    View Slide

  77. •OptimusͷGLUEͰͷੑೳΛධՁ

    • จຒΊࠐΈΛೖྗͱ͢Δઢܗ෼ྨثʹΑͬͯͲΕ΄Ͳͷੑೳ͕ग़Δ͔

    •Fine-tuningͳ͠ͷ৔߹ʹݩͷBERTΑΓ΋ߴ͍ੑೳ

    • Optimus͸BERTΑΓ΋ྑ͍จදݱ͕֫ಘͰ͖͍ͯΔʁ

    • BERTͷQQPͷੑೳ͕௿͗͢Δͷ͕ؾʹͳΔ͕…

    •Fine-tuning͋Γͷ৔߹͸ͦ͜·ͰมΘΒͳ͍ (ݩ͕BERTͳͷͰ౰વ͔)
    ධՁ࣮ݧ: Low-resource Language Understanding
    77
    Ͳ͏ͤͳΒSentEvalͰ΋࣮ݧͯ͠ཉ͔͕ͬͨ͠…

    View Slide

  78. •OptimusͷGLUEͰͷੑೳΛධՁ

    • จຒΊࠐΈΛೖྗͱ͢Δઢܗ෼ྨثʹΑͬͯͲΕ΄Ͳͷੑೳ͕ग़Δ͔

    •Fine-tuningͳ͠ͷ৔߹ʹݩͷBERTΑΓ΋ߴ͍ੑೳ

    • Optimus͸BERTΑΓ΋ྑ͍จදݱ͕֫ಘͰ͖͍ͯΔʁ

    • BERTͷQQPͷੑೳ͕௿͗͢Δͷ͕ؾʹͳΔ͕…

    •Fine-tuning͋Γͷ৔߹͸ͦ͜·ͰมΘΒͳ͍ (ݩ͕BERTͳͷͰ౰વ͔)
    ධՁ࣮ݧ: Low-resource Language Understanding
    78
    Ͳ͏ͤͳΒSentEvalͰ΋࣮ݧͯ͠ཉ͔͕ͬͨ͠…

    View Slide

  79. •OptimusͷGLUEͰͷੑೳΛධՁ

    • จຒΊࠐΈΛೖྗͱ͢Δઢܗ෼ྨثʹΑͬͯͲΕ΄Ͳͷੑೳ͕ग़Δ͔

    •Fine-tuningͳ͠ͷ৔߹ʹݩͷBERTΑΓ΋ߴ͍ੑೳ

    • Optimus͸BERTΑΓ΋ྑ͍จදݱ͕֫ಘͰ͖͍ͯΔʁ

    • BERTͷQQPͷੑೳ͕௿͗͢Δͷ͕ؾʹͳΔ͕…

    •Fine-tuning͋Γͷ৔߹͸ͦ͜·ͰมΘΒͳ͍ (ݩ͕BERTͳͷͰ౰વ͔)
    ධՁ࣮ݧ: Low-resource Language Understanding
    79
    Ͳ͏ͤͳΒSentEvalͰ΋࣮ݧͯ͠ཉ͔͕ͬͨ͠…

    View Slide

  80. •VAEϕʔεͷେن໛ࣄલֶशࡁΈݴޠϞσϧOptimusΛఏҊ

    •EncoderʹBERTɺDecoderʹGPT-2Λ্ख͘౷߹ͯ͠VAEΛߏ੒

    •จੜ੒ɾ৚݅෇͖ੜ੒ɾ௿ࢿݯઃఆͷλεΫͰߴ͍ੑೳ

    • ಛʹطଘͷখ͞ͳVAEΛେ্͖͘ճΔੑೳ

    • VAEʹ͓͚Δࣄલֶशͷ༗ޮੑΛࣔ͢

    ײ૝
    •BERTͳͲͷطଘࣄલֶशࡁΈݴޠϞσϧΛར༻ͤͣɺfrom scratchͰֶश͢
    ΔͱͲ͏ͳΔͷ͔͕ؾʹͳΔ

    • ܭࢉϦιʔεతʹݫ͔ͬͨ͠໛༷(Sec. 6 DiscussionΛࢀর)

    •ࣄલֶश + VAEͳ࿩ͱͯ͠͸໘ന͍͕ɺԠ༻ൣғ͸ݶఆత͔
    ·ͱΊ
    80

    View Slide

  81. •VAEϕʔεͷେن໛ࣄલֶशࡁΈݴޠϞσϧOptimusΛఏҊ

    •EncoderʹBERTɺDecoderʹGPT-2Λ্ख͘౷߹ͯ͠VAEΛߏ੒

    •จੜ੒ɾ৚݅෇͖ੜ੒ɾ௿ࢿݯઃఆͷλεΫͰߴ͍ੑೳ

    • ಛʹطଘͷখ͞ͳVAEΛେ্͖͘ճΔੑೳ

    • VAEʹ͓͚Δࣄલֶशͷ༗ޮੑΛࣔ͢

    ײ૝
    •BERTͳͲͷطଘࣄલֶशࡁΈݴޠϞσϧΛར༻ͤͣɺfrom scratchͰֶश͢
    ΔͱͲ͏ͳΔͷ͔͕ؾʹͳΔ

    • ܭࢉϦιʔεతʹݫ͔ͬͨ͠໛༷(Sec. 6 DiscussionΛࢀর)

    •ࣄલֶश + VAEͳ࿩ͱͯ͠͸໘ന͍͕ɺԠ༻ൣғ͸ݶఆత͔
    ·ͱΊ
    81

    View Slide