Slide 1

Slide 1 text

Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space Graduate school of Informatics, Nagoya University, Japan. ൃදऀ: Hayato Tsukagoshi Chunyuan Li, Xiang Gao, Yuan Li, Baolin Peng, Xiujun Li, Yizhe Zhang, and Jianfeng Gao EMNLP 2020 URL: https://aclanthology.org/2020.emnlp-main.378/

Slide 2

Slide 2 text

•VAE (ม෼ࣗݾූ߸Խث)ϕʔεͷࣄલֶशࡁΈݴޠϞσϧOptimusΛఏҊ • ஫ҙ: طଘͷࣄલֶशࡁΈݴޠϞσϧ͸͔ͬ͠Γར༻ •EncoderʹBERTɺDecoder͸GPT-2 • ೋͭͷϞσϧΛ͏·͘౷߹ͯ͠
 VAEΛߏ੒ɺ౷߹ख๏΋޻෉ •จੜ੒ʹ͓͚ΔධՁࢦඪɾ৚݅෇͖ੜ੒ɾ௿ࢿݯઃఆͷλεΫͰߴ͍ੑೳ • જࡏදݱͷઢܗิ׬ʹΑΔҙຯతʹͳΊΒ͔ͳจੜ੒͕Մೳ • NLPʹ͓͚ΔVAE + ࣄલֶशͷ༗༻ੑΛࣔ͢ ࿦จ֓ཁ 2

Slide 3

Slide 3 text

•VAE͸ཧ࿦తɾٕज़తʹ໘ന͍͕(ಛʹNLPͰ)͋·Γ஫໨͞Ε͍ͯͳ͍ • BERTͳͲͷࣄલֶशࡁΈϞσϧ͕؆୯ɾڧྗ • ҰԠଟ༷ੑΛॏࢹ͢Δจੜ੒λεΫͰ͸࢖ΘΕ͍ͯΔΑ͏͕ͩ… •ࣗ෼༻ʹVAEʹ͍ͭͯษڧɾഎܠ஌ࣝͷ·ͱΊ௚͕͔ͨͬͨ͠͠ • ਺ࣜΛ͋·Γ͓֮͑ͯΒͣ… •ࣗ෼ͷݚڀͰ࢖͏͔΋͠Εͣڵຯ͕͋ͬͨ • VAEϕʔεͷจຒΊࠐΈϞσϧ͸΄ͱΜͲݟͳ͍ͷͰ • BERT- fl owͱ͔͸ࢥ૝͕ۙͦ͏Ͱ͸͋Δ બఆཧ༝ 3

Slide 4

Slide 4 text

ಋೖ •VAEͱ͸ •VAEͷ໨తؔ਺ͷಋग़ Optimus •Ϟσϧߏ଄ •ଛࣦؔ਺ •BERTͱGPT-2ͷ౷߹ •ධՁ࣮ݧ ໨࣍ 4

Slide 5

Slide 5 text

ಋೖ

Slide 6

Slide 6 text

ಋೖ •VAEͱ͸ •VAEͷ໨తؔ਺ͷಋग़ Optimus •Ϟσϧߏ଄ •ଛࣦؔ਺ •BERTͱGPT-2ͷ౷߹ •ධՁ࣮ݧ ໨࣍ 6

Slide 7

Slide 7 text

•ग़ྗ͚ͩͰͳ͘ೖྗͷ෼෍΋ϞσϧԽ͢Δख๏ * • զʑ͕Α͘࢖͏ͷ͸ࣝผϞσϧ (෼ྨ͚ͩߦ͏) •σʔλ͕ԿΒ͔ͷ֬཰෼෍ʹج͍ͮͯੜ੒͞ΕΔͱߟ͑Δ • ؍ଌσʔλ͔Β؍ଌσʔλ͕ै͏֬཰෼෍Λਪఆ͢Δ •ը૾෼໺ʹ͓͚ΔGAN͕༗໊ • NLPͰ͸ҙ֎ͱ͋·Γݟͳ͍ʁ ੜ੒Ϟσϧ 7 * ύλʔϯೝࣝͱػցֶश ্ר p.42Λࢀর

Slide 8

Slide 8 text

•தؒදݱ͔ΒೖྗΛ࠶ߏ੒Ͱ͖ΔΑ͏ʹ܇࿅͢ΔϞσϧ • ੜ੒ϞσϧͷҰछ • ڭࢣͳֶ͠श͕Մೳ •தؒදݱ͸ೖྗͷѹॖ͞ΕͨදݱͱΈͳͤΔ • ඇઢܗͰෳࡶͳ࣍ݩѹॖ͕Ͱ͖Δ • ΫϥελϦϯά΍ҟৗݕ஌ɾϊΠζআڈͳͲʹ΋࢖ΘΕΔ Auto-Encoder (AE): ࣗݾූ߸Խث 8

Slide 9

Slide 9 text

•Auto-Encoderͷજࡏදݱͷ෼෍ʹ੍໿ΛՃ͑ͨ΋ͷ (ͱݟ၏ͤΔ) • AEͱ͸ҟͳΔಈػͱཧ࿦എܠΛ͕࣋ͭɺࣅͨ΋ͷͱղऍͰ͖Δ • જࡏදݱʹର͢Δ੍໿ʹΑͬͯσʔλͷੜ੒͕༰қʹ • Kingma et al., 2013. Auto-Encoding Variational Bayes ͰఏҊ •જࡏදݱͷ෼෍ʹ͸೚ҙͷࣄલ෼෍ (prior) Λબ΂Δ • ଟ͘ͷ৔߹͸ඪ४ਖ਼ن෼෍ (standard normal distribution) •ଛࣦؔ਺ͱͯ͠ೋͭͷଛࣦΛ଍͠߹Θͤͯ༻͍Δ • ࠶ߏ੒ޡࠩ • જࡏදݱͷ෼෍ʹ͍ͭͯͷଛࣦ Variational Auto-Encoder (VAE): ม෼ࣗݾූ߸Խث 9

Slide 10

Slide 10 text

VAEͷϞσϧߏ଄ 10 જࡏදݱ
 z x Wμ Wσ Encoder μ σ x’ Decoder

Slide 11

Slide 11 text

VAEͷϞσϧߏ଄ 11 જࡏදݱ
 z x Wμ Wσ Encoder μ σ x’ Decoder ೖྗΛϕΫτϧදݱʹม׵

Slide 12

Slide 12 text

VAEͷϞσϧߏ଄ 12 ෼ࢄڞ෼ࢄߦྻ͸ΊΜͲ͏ͳͷͰجຊతʹର֯ߦྻͱΈͳͯ͠͠·͏ જࡏදݱ
 z x Wμ Wσ Encoder μ σ x’ Decoder ϕΫτϧදݱ͔ΒΨ΢ε෼෍ͷ
 ฏۉͱ෼ࢄڞ෼ࢄߦྻΛग़ྗ

Slide 13

Slide 13 text

VAEͷϞσϧߏ଄ 13 ෼ࢄڞ෼ࢄߦྻ͸ΊΜͲ͏ͳͷͰجຊతʹର֯ߦྻͱΈͳͯ͠͠·͏ જࡏදݱ
 z x Wμ Wσ Encoder μ σ x’ Decoder ฏۉͱ෼ࢄڞ෼ࢄߦྻΛ༻͍ͯΨ΢ε෼ ෍͔ΒαϯϓϦϯάɺજࡏදݱΛ֫ಘ

Slide 14

Slide 14 text

VAEͷϞσϧߏ଄ 14 ෼ࢄڞ෼ࢄߦྻ͸ΊΜͲ͏ͳͷͰجຊతʹର֯ߦྻͱΈͳͯ͠͠·͏ જࡏදݱ
 z x Wμ Wσ Encoder μ σ x’ Decoder જࡏදݱ͔Βग़ྗΛ࠶ߏ੒

Slide 15

Slide 15 text

AEͱVAEͷϞσϧߏ଄ͷൺֱ 15 જࡏදݱ
 z x Wμ Wσ Encoder μ σ x’ Decoder જࡏදݱ
 z x Encoder x’ Decoder AE VAE

Slide 16

Slide 16 text

AEͱVAEͷϞσϧߏ଄ͷൺֱ 16 જࡏදݱ
 z x Wμ Wσ Encoder μ σ x’ Decoder જࡏදݱ
 z x Encoder x’ Decoder AE VAE જࡏදݱΛαϯϓϦϯά͢Δ ͨΊͷॲཧͱ
 જࡏදݱͷ෼෍ʹؔ͢Δ
 ଛࣦ͕૿͑Δ͚ͩ

Slide 17

Slide 17 text

•AEͰ͸જࡏදݱ͕ͲͷΑ͏ʹ෼෍͍ͯ͠Δ͔ෆ໌ • VAEͰ͸ط஌ͷ֬཰෼෍ʹ͚ۙͮΔΑ͏ʹֶशΛߦ͏ • ط஌෼෍͔ΒͷαϯϓϦϯάͰࣗવͳσʔλͷੜ੒͕ߦ͑Δ •ਖ਼ଇԽೳྗ͕͋ΓAEΑΓؤ݈ • Denoising Auto-EncoderͳͲͱಉ༷ • PCA΍SVDͱҟͳΓɺඇઢܗม׵Ͱೖྗσʔλͷѹॖ͕ߦ͑Δ VAEͷར఺ 17

Slide 18

Slide 18 text

GAN •ࣝผث(Discriminator)͕ੜ੒ث(Generator)ͷग़ྗΛ෼ྨͰ͖ͳ͍Α͏ʹֶश VAE •જࡏදݱͷ෼෍͕ࣄલ෼෍ʹۙͮ͘Α͏ʹ + ೖྗΛ࠶ߏ੒͢ΔΑ͏ʹֶश Normalizing fl ow •ٯม׵Մೳͳࣸ૾Λֶशɺෳࡶͳજࡏදݱͷ෼෍Λߏ੒ •VAEͱ૊Έ߹ΘͤՄೳ Di ff usion Models •ॱํ޲ͰϊΠζՃࢉɺٯํ޲ͰϊΠζΛআڈ͢ΔΑ͏ʹϞσϧΛֶश VAEͱͦͷଞͷੜ੒Ϟσϧͷൺֱ 18 ม෼ਪ࿦ͱ Normalizing Flow

Slide 19

Slide 19 text

•VAEͷଛࣦؔ਺͸ҎԼͷೋͭͷ଍͠߹Θͤ • ࠶ߏ੒ޡࠩ • ਖ਼ଇԽ߲ (જࡏදݱͷ෼෍ʹ͍ͭͯͷଛࣦ) • ͸Encoderͷύϥϝʔλɺ ͸Decoderͷύϥϝʔλ ϕ θ VAEͷ໨తؔ਺ 19 ℒ = − DKL ( qϕ (z|X) ∥ pθ (z) ) Eqϕ (z|X) [ log pθ (X|z) ] ਖ਼ଇԽ߲ ࠶ߏ੒ޡࠩ

Slide 20

Slide 20 text

•ͦ΋ͦ΋ͷVAE (΋͘͠͸ม෼ϕΠζ)ͷ͓ؾ࣋ͪ • σʔλ ʹӅ͞Εͨੑ࣭ Λදݱ͢Δࣄޙ֬཰෼෍ Λ஌Γ͍ͨ •࣮ࡍʹ͸ ΍ ͸Θ͔Βͳ͍͜ͱ͕΄ͱΜͲ • Λۙࣅͨ͠ Ͱଥڠ • ͸ͲͷΑ͏ʹٻΊΔ͔ʁ • ͜ͷ֬཰෼෍΋ͲͷΑ͏ʹͳΔ͔Θ͔Βͳ͍ • Λͱ͔͔ͬΓʹࣜΛ͜Ͷ͘Γ·Θͯ͠ΈΔ X Z pθ (Z|X) pθ (X) pθ (Z|X) pθ (Z|X) qϕ (Z|X) qϕ (Z|X) pθ (X) VAEͷ໨తؔ਺ͷٻΊํ 20

Slide 21

Slide 21 text

•ͦ΋ͦ΋ͷVAE (΋͘͠͸ม෼ϕΠζ)ͷ͓ؾ࣋ͪ • σʔλ ʹӅ͞Εͨੑ࣭ Λදݱ͢Δࣄޙ֬཰෼෍ Λ஌Γ͍ͨ •࣮ࡍʹ͸ ΍ ͸Θ͔Βͳ͍͜ͱ͕΄ͱΜͲ • Λۙࣅͨ͠ Ͱଥڠ • ͸ͲͷΑ͏ʹٻΊΔ͔ʁ • ͜ͷ֬཰෼෍΋ͲͷΑ͏ʹͳΔ͔Θ͔Βͳ͍ • Λͱ͔͔ͬΓʹࣜΛ͜Ͷ͘Γ·Θͯ͠ΈΔ X Z pθ (Z|X) pθ (X) pθ (Z|X) pθ (Z|X) qϕ (Z|X) qϕ (Z|X) pθ (X) VAEͷ໨తؔ਺ͷٻΊํ 21

Slide 22

Slide 22 text

VAEͷ໨తؔ਺ͷٻΊํ 22 log pθ (X) = log ∫ pθ (X, z) dz = log ∫ pθ (X, z) qϕ (z|X) qϕ (z|X) dz = log ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) ҎԼͷΑ͏ʹࣜมܗΛͯ͠ΈΔ zͰपลԽͨ͠΋ͷ
 ͱΈͳ͢

Slide 23

Slide 23 text

VAEͷ໨తؔ਺ͷٻΊํ 23 log pθ (X) = log ∫ pθ (X, z) dz = log ∫ pθ (X, z) qϕ (z|X) qϕ (z|X) dz = log ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) ҎԼͷΑ͏ʹࣜมܗΛͯ͠ΈΔ 1Λ͔͚ͯ΋͍ͬ͠ΐ

Slide 24

Slide 24 text

VAEͷ໨తؔ਺ͷٻΊํ 24 ΠΣϯηϯͷෆ౳ࣜΑΓɺ
 ͸Ԝؔ਺ (্ʹತ) Ͱ͋Δ͜ͱʹ஫ҙ͢Δͱ f(x) = log(x) ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) log log pθ (X) ≥ ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) log pθ (X, z) dz qϕ (z|X) qϕ (z|X) log ∫ ͢ͳΘͪ ≥

Slide 25

Slide 25 text

VAEͷ໨తؔ਺ͷٻΊํ 25 ΠΣϯηϯͷෆ౳ࣜΑΓɺ
 ͸Ԝؔ਺ (্ʹತ) Ͱ͋Δ͜ͱʹ஫ҙ͢Δͱ f(x) = log(x) ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) log log pθ (X) ≥ ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) log pθ (X, z) dz qϕ (z|X) qϕ (z|X) log ∫ ͢ͳΘͪ ≥

Slide 26

Slide 26 text

VAEͷ໨తؔ਺ͷٻΊํ 26 ͜͜ͰӈลΛ ͱ͓͘ͱ log pθ (X) ≥ ℒ(θ, ϕ; X) ℒ(θ, ϕ; X) = ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) log ͱॻ͚Δɻ͜ͷ Λ
 ELBO (Evidence Lower BOund): ม෼Լք ͱݺͿ ℒ(θ, ϕ; X)

Slide 27

Slide 27 text

VAEͷ໨తؔ਺ͷٻΊํ 27 ELBOΛม෼Լݶͱॻ͘͜ͱ΋͋Δ͕ɺlower limit (Լݶ)Ͱ͸ͳ͘lower boundͳͷͰԼք͕ਖ਼͍͠Μ͡Όͳ͍͔ͱࢥ͍ͬͯΔ ͜͜ͰӈลΛ ͱ͓͘ͱ log pθ (X) ≥ ℒ(θ, ϕ; X) ℒ(θ, ϕ; X) = ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) log ͱॻ͚Δɻ͜ͷ Λ
 ELBO (Evidence Lower BOund): ม෼Լք ͱݺͿ ℒ(θ, ϕ; X)

Slide 28

Slide 28 text

VAEͷ໨తؔ਺ͷٻΊํ 28 ͱ͜ΖͰઌ΄Ͳͷෆ౳ࣜͷ྆ลͷࠩ ʹ͍ͭͯߟ͑ͯΈΔͱ log pθ (X) − ℒ(θ, ϕ; X) = ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) log log pθ (X) − = ∫ pθ (z|X) pθ (X) dz qϕ (z|X) qϕ (z|X) log log pθ (X) ∫ − qϕ (z|X) dz

Slide 29

Slide 29 text

VAEͷ໨తؔ਺ͷٻΊํ 29 log pθ (X) − ℒ(θ, ϕ; X) = ∫ pθ (z|X) pθ (X) dz qϕ (z|X) qϕ (z|X) log log pθ (X) ∫ − = ∫ pθ (z|X) pθ (X) dz qϕ (z|X) qϕ (z|X) log ∫ log pθ (X) dz − = ∫ log pθ (z|X) pθ (X) dz qϕ (z|X) dz qϕ (z|X) qϕ (z|X) pθ (X) qϕ (z|X)

Slide 30

Slide 30 text

∫ log pθ (z|X) pθ (X) dz pθ (X) VAEͷ໨తؔ਺ͷٻΊํ 30 log pθ (X) − ℒ(θ, ϕ; X) = = = DKL ( qϕ (z|X) ∥ pθ (z|X) ) qϕ (z|X) qϕ (z|X) ∫ log pθ (z|X) dz qϕ (z|X) qϕ (z|X)

Slide 31

Slide 31 text

VAEͷ໨తؔ਺ͷٻΊํ 31 Ҏ্ΑΓ log pθ (X) = ℒ(θ, ϕ; X) + DKL ( qϕ (z|X) ∥ pθ (z|X) ) ΋ͱ΋ͱͷ໨త͸ Λۙࣅ͢Δ ΛٻΊΔ͜ͱ pθ (z|X) qϕ (z|X) → Λ࠷খԽ͢Ε͹Α͍ DKL ( qϕ (z|X) ∥ pθ (z|X) ) ͸ ͷ΋ͱͰҰఆͳͷͰ log pθ (X) θ ͷ࠷খԽ 㱻 ͷ࠷େԽ DKL ( qϕ (z|X) ∥ pθ (z|X) ) ℒ(θ, ϕ; X)

Slide 32

Slide 32 text

VAEͷ໨తؔ਺ͷٻΊํ 32 Ҏ্ΑΓ log pθ (X) = ℒ(θ, ϕ; X) + DKL ( qϕ (z|X) ∥ pθ (z|X) ) ΋ͱ΋ͱͷ໨త͸ Λۙࣅ͢Δ ΛٻΊΔ͜ͱ pθ (z|X) qϕ (z|X) → Λ࠷খԽ͢Ε͹Α͍ DKL ( qϕ (z|X) ∥ pθ (z|X) ) ͸ ͷ΋ͱͰҰఆͳͷͰ log pθ (X) θ ͷ࠷খԽ 㱻 ͷ࠷େԽ DKL ( qϕ (z|X) ∥ pθ (z|X) ) ℒ(θ, ϕ; X)

Slide 33

Slide 33 text

VAEͷ໨తؔ਺ͷٻΊํ 33 Ҏ্ΑΓ log pθ (X) = ℒ(θ, ϕ; X) + DKL ( qϕ (z|X) ∥ pθ (z|X) ) ΋ͱ΋ͱͷ໨త͸ Λۙࣅ͢Δ ΛٻΊΔ͜ͱ pθ (z|X) qϕ (z|X) → Λ࠷খԽ͢Ε͹Α͍ DKL ( qϕ (z|X) ∥ pθ (z|X) ) ͸ ͷ΋ͱͰҰఆͳͷͰ log pθ (X) θ ͷ࠷খԽ 㱻 ͷ࠷େԽ DKL ( qϕ (z|X) ∥ pθ (z|X) ) ℒ(θ, ϕ; X) ্ࣜӈล ୈ1߲ͱୈ2߲ͷ࿨͕ෆม → ୈ2߲͕খ͘͞ͳΔͳΒ
 ୈ1߲͸େ͖͘ͳΒͳ͍ͱ͍͚ͳ͍

Slide 34

Slide 34 text

dz VAEͷ໨తؔ਺ͷٻΊํ 34 ℒ(θ, ϕ; X) = ∫ pθ (X, z) dz qϕ (z|X) qϕ (z|X) log qϕ (z|X) qϕ (z|X) log ∫ = pθ (X|z) pθ (z) qϕ (z|X) log ∫ = pθ (X|z) dz qϕ (z|X) qϕ (z|X) log ∫ pθ (z) dz + ͱ͜ΖͰɺม෼ԼքΛ͞Βʹ෼ղͯ͠ΈΔͱ

Slide 35

Slide 35 text

VAEͷ໨తؔ਺ͷٻΊํ 35 ℒ(θ, ϕ; X) qϕ (z|X) log ∫ = pθ (X|z) dz qϕ (z|X) qϕ (z|X) log ∫ pθ (z) dz − qϕ (z|X) log ∫ = pθ (X|z) dz − DKL ( qϕ (z|X) ∥ pθ (z) ) ໬౓ ਖ਼ଇԽ߲ qϕ (z|X) log ∫ = pθ (X|z) dz qϕ (z|X) qϕ (z|X) log ∫ pθ (z) dz +

Slide 36

Slide 36 text

VAEͷ໨తؔ਺ͷٻΊํ 36 ͷ࠷େԽ 㱻 ͷ࠷খԽͳͷͰɺ
 ଛࣦؔ਺͕ҎԼͷΑ͏ʹఆΊΒΕΔ ℒ(θ, ϕ; X) −ℒ(θ, ϕ; X) −ℒ(θ, ϕ; X) = qϕ (z|X) log ∫ pθ (X|z) dz − DKL ( qϕ (z|X) ∥ pθ (z) ) = − DKL ( qϕ (z|X) ∥ pθ (z) ) Eqϕ (z|X) [ log pθ (X|z) ] ਖ਼ଇԽ߲ ࠶ߏ੒ޡࠩ

Slide 37

Slide 37 text

VAEͷ໨తؔ਺ͷٻΊํ 37 ͷ࠷େԽ 㱻 ͷ࠷খԽͳͷͰɺ
 ଛࣦؔ਺͕ҎԼͷΑ͏ʹఆΊΒΕΔ ℒ(θ, ϕ; X) −ℒ(θ, ϕ; X) −ℒ(θ, ϕ; X) = qϕ (z|X) log ∫ pθ (X|z) dz − DKL ( qϕ (z|X) ∥ pθ (z) ) = − DKL ( qϕ (z|X) ∥ pθ (z) ) Eqϕ (z|X) [ log pθ (X|z) ] ਖ਼ଇԽ߲ ࠶ߏ੒ޡࠩ ʹΨ΢ε෼෍Λ
 Ծఆ͢Ε͹ɺղੳతʹ
 ଛࣦؔ਺ΛٻΊΒΕΔ pθ (z)

Slide 38

Slide 38 text

VAEͷϞσϧߏ଄ (࠶ܝ) 38 જࡏදݱ
 z x Wμ Wσ Encoder μ σ x’ Decoder

Slide 39

Slide 39 text

VAEͷϞσϧߏ଄ (࠶ܝ) 39 જࡏදݱ
 z x Wμ Wσ Encoder μ σ x’ Decoder ຊ౰͸͜͜ʹ reperameterization trick
 ͱ͍͏ςΫ͕ڬ·Δ

Slide 40

Slide 40 text

VAEͷٖࣅίʔυ: Encoder 40

Slide 41

Slide 41 text

VAEͷٖࣅίʔυ: Encoder 41 ࣮૷ͱͯ͠͸
 ઢܗ૚ʹೋވʹ௨͚ͩ͢

Slide 42

Slide 42 text

VAEͷٖࣅίʔυ: શମ 42

Slide 43

Slide 43 text

VAEͷٖࣅίʔυ: શମ 43 αϯϓϦϯάͯ֫͠ಘͨ͠
 જࡏදݱ͔ΒೖྗΛ࠶ߏ੒

Slide 44

Slide 44 text

•જࡏදݱͷ෼෍ʹط஌ͷ֬཰෼෍ΛԾఆֶͯ͠शΛߦ͏ੜ੒Ϟσϧ • ࣍ݩѹॖɾҙຯͷ͋Δදݱͷநग़ / αϯϓϦϯάʹΑΔੜ੒͕Մೳ •ग़ࣗ͸ҟͳΔ͕ɺAuto-EncoderͱࣅͨΞʔΩςΫνϟΛඋ͑Δ • Auto-Encoderʹજࡏදݱʹؔ͢Δਖ਼ଇԽ߲Λ௥Ճͨ͠΋ͷͱΈͳͤΔ • ਖ਼ଇԽ߲ʹΑΓVAE͸AEΑΓ΋ؤ݈ (ͱݴΘΕΔ) VAEͷ·ͱΊ 44

Slide 45

Slide 45 text

Optimus

Slide 46

Slide 46 text

ಋೖ •VAEͱ͸ •VAEͷ໨తؔ਺ͷಋग़ Optimus •Ϟσϧߏ଄ •ଛࣦؔ਺ •BERTͱGPT-2ͷ౷߹ •ධՁ࣮ݧ ໨࣍ 46

Slide 47

Slide 47 text

•VAE (ม෼ࣗݾූ߸Խث)ϕʔεͷࣄલֶशࡁΈݴޠϞσϧOptimusΛఏҊ • ஫ҙ: طଘͷࣄલֶशࡁΈݴޠϞσϧ͸͔ͬ͠Γར༻ •EncoderʹBERTɺDecoder͸GPT-2 • ೋͭͷϞσϧΛ͏·͘౷߹ͯ͠
 VAEΛߏ੒ɺ౷߹ख๏΋޻෉ •จੜ੒ʹ͓͚ΔධՁࢦඪɾ৚݅෇͖ੜ੒ɾ௿ࢿݯઃఆͷλεΫͰߴ͍ੑೳ • જࡏදݱͷઢܗิ׬ʹΑΔҙຯతʹͳΊΒ͔ͳจੜ੒͕Մೳ • NLPʹ͓͚ΔVAE + ࣄલֶशͷ༗༻ੑΛࣔ͢ ࿦จ֓ཁ (࠶ܝ) 47

Slide 48

Slide 48 text

•EncoderʹBERTΛར༻ɺ[CLS]Λจදݱͱͯ͠༻͍Δ •DecoderʹGPT-2Λར༻ɺજࡏදݱʹैͬͯจੜ੒Λߦ͏ •શମͱͯ͠VAEతʹೖྗจΛ࠶ߏ੒Ͱ͖ΔΑ͏ʹֶश Ϟσϧߏ଄: ؆୯൛ 48

Slide 49

Slide 49 text

Ϟσϧߏ଄: ΋͏ͪΐͬͱࡉ͔͍൛ 49 [CLS] w1 w2 … BERT μ σ WE

Slide 50

Slide 50 text

Ϟσϧߏ଄: ΋͏ͪΐͬͱࡉ͔͍൛ 50 z [CLS] w1 w2 … BERT reparameterization trick μ σ WE sampling

Slide 51

Slide 51 text

Ϟσϧߏ଄: ΋͏ͪΐͬͱࡉ͔͍൛ 51 z [CLS] w1 w2 … BERT GPT-2 reparameterization trick μ σ WE / WM WD sampling

Slide 52

Slide 52 text

Ϟσϧߏ଄: ΋͏ͪΐͬͱࡉ͔͍൛ 52 z [CLS] w1 w2 … [CLS] w1 w2 … w1 w2 w3 … BERT GPT-2 reparameterization trick μ σ WE / WM WD sampling

Slide 53

Slide 53 text

•௨ৗͷVAEͷଛࣦؔ਺ʹϋΠύʔύϥϝʔλ Λ௥Ճͯ͠ར༻ • ʹΑͬͯਖ਼ଇԽͷڧ͞Λௐ੔ • ͷͱ͖ʹAuto-Encoderͱ΄΅ಉ͡ʹ (αϯϓϦϯά͸ߦ͏) • ʹΑͬͯજࡏදݱ͕ “ա౓ʹ” ࣄલ෼෍ʹۙͮ͘ͷΛ๷͙ β, λ β β = 0 λ ଛࣦؔ਺ 53

Slide 54

Slide 54 text

•௨ৗͷVAEͷଛࣦؔ਺ʹϋΠύʔύϥϝʔλ Λ௥Ճͯ͠ར༻ • ʹΑͬͯਖ਼ଇԽͷڧ͞Λௐ੔ • ͷͱ͖ʹAuto-Encoderͱ΄΅ಉ͡ʹ (αϯϓϦϯά͸ߦ͏) • ʹΑͬͯજࡏදݱ͕ “ա౓ʹ” ࣄલ෼෍ʹۙͮ͘ͷΛ๷͙ β, λ β β = 0 λ ଛࣦؔ਺ 54

Slide 55

Slide 55 text

•௨ৗͷVAEͷଛࣦؔ਺ʹϋΠύʔύϥϝʔλ Λ௥Ճͯ͠ར༻ • ʹΑͬͯਖ਼ଇԽͷڧ͞Λௐ੔ • ͷͱ͖ʹAuto-Encoderͱ΄΅ಉ͡ʹ (αϯϓϦϯά͸ߦ͏) • ʹΑͬͯજࡏදݱ͕ “ա౓ʹ” ࣄલ෼෍ʹۙͮ͘ͷΛ๷͙ β, λ β β = 0 λ ଛࣦؔ਺ 55

Slide 56

Slide 56 text

•௨ৗͷVAEͷଛࣦؔ਺ʹϋΠύʔύϥϝʔλ Λ௥Ճͯ͠ར༻ • ʹΑͬͯਖ਼ଇԽͷڧ͞Λௐ੔ • ͷͱ͖ʹAuto-Encoderͱ΄΅ಉ͡ʹ (αϯϓϦϯά͸ߦ͏) • ʹΑͬͯજࡏදݱ͕ “ա౓ʹ” ࣄલ෼෍ʹۙͮ͘ͷΛ๷͙ β, λ β β = 0 λ ଛࣦؔ਺ 56 ϋΠύϥ͕ଟ͍😇

Slide 57

Slide 57 text

•BERTͱGPT-2Λ౷߹ͯ͠VAEΛߏங͢Δʹ͸େ·͔ʹೋͭͷ໰୊͕ଘࡏ 1. ෼͔ͪॻ͖ •BERTͱGPT-2͸ҟͳΔޠኮΛ࣋ͪɺ෼͔ͪॻ͖ख๏͕ҟͳΔ •ೖྗͱग़ྗͰҟͳΔtokenizerΛ࢖͏͜ͱͰղܾ 2. જࡏදݱΛ༻͍ͨ৚݅෇͖ੜ੒ •GPT-2͸৚݅෇͖ςΩετੜ੒ͷͨΊͷػߏΛඋ͍͑ͯͳ͍ •ͲͷΑ͏ʹBERTΛ༻͍ͯಘΒΕͨજࡏදݱ͔ΒςΩετΛੜ੒͢Δ͔ʁ • જࡏදݱͱGPT-2ͷੜ੒ػߏΛ౷߹͢Δ2ͭͷख๏Λ࣮ݧ BERTͱGPT-2ͷ౷߹ 57 prompting͸·ͨผͷ࿩

Slide 58

Slide 58 text

•BERTͱGPT-2Λ౷߹ͯ͠VAEΛߏங͢Δʹ͸େ·͔ʹೋͭͷ໰୊͕ଘࡏ 1. ෼͔ͪॻ͖ •BERTͱGPT-2͸ҟͳΔޠኮΛ࣋ͪɺ෼͔ͪॻ͖ख๏͕ҟͳΔ •ೖྗͱग़ྗͰҟͳΔtokenizerΛ࢖͏͜ͱͰղܾ 2. જࡏදݱΛ༻͍ͨ৚݅෇͖ੜ੒ •GPT-2͸৚݅෇͖ςΩετੜ੒ͷͨΊͷػߏΛඋ͍͑ͯͳ͍ •ͲͷΑ͏ʹBERTΛ༻͍ͯಘΒΕͨજࡏදݱ͔ΒςΩετΛੜ੒͢Δ͔ʁ • જࡏදݱͱGPT-2ͷੜ੒ػߏΛ౷߹͢Δ2ͭͷख๏Λ࣮ݧ BERTͱGPT-2ͷ౷߹ 58 prompting͸·ͨผͷ࿩

Slide 59

Slide 59 text

•BERTͱGPT-2Λ౷߹ͯ͠VAEΛߏங͢Δʹ͸େ·͔ʹೋͭͷ໰୊͕ଘࡏ 1. ෼͔ͪॻ͖ •BERTͱGPT-2͸ҟͳΔޠኮΛ࣋ͪɺ෼͔ͪॻ͖ख๏͕ҟͳΔ •ೖྗͱग़ྗͰҟͳΔtokenizerΛ࢖͏͜ͱͰղܾ 2. જࡏදݱΛ༻͍ͨ৚݅෇͖ੜ੒ •GPT-2͸৚݅෇͖ςΩετੜ੒ͷͨΊͷػߏΛඋ͍͑ͯͳ͍ •ͲͷΑ͏ʹBERTΛ༻͍ͯಘΒΕͨજࡏදݱ͔ΒςΩετΛੜ੒͢Δ͔ʁ • જࡏදݱͱGPT-2ͷੜ੒ػߏΛ౷߹͢Δ2ͭͷख๏Λ࣮ݧ BERTͱGPT-2ͷ౷߹ 59 prompting͸·ͨผͷ࿩

Slide 60

Slide 60 text

Memory •જࡏදݱΛ૚ͷ਺ͷϕΫτϧʹม׵ •จੜ੒࣌ʹ֤૚ͰϕΫτϧΛݟͳ͕Βੜ੒ Embedding •જࡏදݱΛม׵ͯ͠୯ޠຒΊࠐΈʹՃࢉ •BERTͷposition embeddingͷΑ͏ʹ
 જࡏදݱΛ༻͍Δ BERTͱGPT-2ͷ౷߹: જࡏදݱΛ༻͍ͨ৚݅෇͖ੜ੒ 60 prompting͸·ͨผͷ࿩

Slide 61

Slide 61 text

Memory •જࡏදݱΛ૚ͷ਺ͷϕΫτϧʹม׵ •จੜ੒࣌ʹ֤૚ͰϕΫτϧΛݟͳ͕Βੜ੒ Embedding •જࡏදݱΛม׵ͯ͠୯ޠຒΊࠐΈʹՃࢉ •BERTͷposition embeddingͷΑ͏ʹ
 જࡏදݱΛ༻͍Δ BERTͱGPT-2ͷ౷߹: જࡏදݱΛ༻͍ͨ৚݅෇͖ੜ੒ 61 prompting͸·ͨผͷ࿩

Slide 62

Slide 62 text

Language Modeling •Optimus͕จΛਖ਼͘͠ੜ੒Ͱ͖Δ͔ධՁ •จੜ੒ʹ͓͚ΔPerplexity (PPL), MI Guided Language Generation •ಛఆͷ৚݅ʹैͬͨจΛਖ਼͘͠ੜ੒Ͱ͖Δ͔ධՁ •ର࿩Ԡ౴ੜ੒ɺಛఆελΠϧͰͷԠ౴ੜ੒ɺϥϕϧͰ৚݅෇͚ͨ͠จੜ੒ Low-resource Language Understanding •௿ࢿݯઃఆͰͷOptimusͷ༗༻ੑΛݕূ •จຒΊࠐΈϕʔεͰGLUEΛղ͍ͯੑೳݕূ ධՁ࣮ݧ 62

Slide 63

Slide 63 text

•જࡏදݱ࣍ݩ: 32 • ެ։͞Ε͍ͯΔ࣮૷͔Β൑அ •VAEͱͯ͠ͷ܇࿅σʔλ: ӳޠWikipedia 199ສจ •จੜ੒ܥͷλεΫͰ͸͞ΒʹͦΕͧΕͷσʔληοτͰ1 epochֶ͚ͩश •ֶशͷ޻෉͕͍Ζ͍Ζ • Λֶशதʹ૿Ճͤ͞ΔͳͲ •Low-resource Language UnderstandingͰ͸Encoder (BERT)ͷ[CLS]ʹରԠ ͢ΔදݱΛར༻ • ͳͷͰɺϕΫτϧͷ࣍ݩ਺͸32Ͱ͸ͳ͘768 β ࣮ݧઃఆ 63 જࡏදݱͷ࣍ݩ਺͕࿦จʹ໌ه͞Ε͍ͯͳ͍ؾ͕͢Δ…

Slide 64

Slide 64 text

•طଘͷখ͞ͳVAEΑΓඇৗʹߴ͍ੑೳ • ڊେͳϞσϧɾڊେίʔύεͰͷࣄલֶश͸VAEͰ΋΍͸Γ༗ޮ • ʹΑΔจੜ੒ͷੑೳͱજࡏදݱͷ඼࣭ͷτϨʔυΦϑ͕ଘࡏ λ ධՁ࣮ݧ: Language Modeling 64

Slide 65

Slide 65 text

•طଘͷখ͞ͳVAEΑΓඇৗʹߴ͍ੑೳ • ڊେͳϞσϧɾڊେίʔύεͰͷࣄલֶश͸VAEͰ΋΍͸Γ༗ޮ • ʹΑΔจੜ੒ͷੑೳͱજࡏදݱͷ඼࣭ͷτϨʔυΦϑ͕ଘࡏ λ ධՁ࣮ݧ: Language Modeling 65

Slide 66

Slide 66 text

•طଘͷখ͞ͳVAEΑΓඇৗʹߴ͍ੑೳ • ڊେͳϞσϧɾڊେίʔύεͰͷࣄલֶश͸VAEͰ΋΍͸Γ༗ޮ • ʹΑΔจੜ੒ͷੑೳͱજࡏදݱͷ඼࣭ͷτϨʔυΦϑ͕ଘࡏ λ ධՁ࣮ݧ: Language Modeling 66

Slide 67

Slide 67 text

•3/4ͷσʔληοτͰGPT-2ͷPPLΑΓ΋௿͍PPLΛୡ੒ • ಛʹSNLIͳͲಛ༗ͷయܕతͳจ͕ଟ͍σʔληοτͰߴ͍ੑೳ ධՁ࣮ݧ: Language Modeling 67

Slide 68

Slide 68 text

•OptimusͷજࡏදݱΛ༻͍Δ͜ͱͰจදݱͷԋࢉ͕Մೳ • Λ΋ͱʹจੜ੒ •͜ͷ݁ՌΛͲ͏ड͚औΕ͹͍͍ͷ͔…? zD = zB − zA + zC ධՁ࣮ݧ: Guided Language Generation 68 ࿦จͰ঺հ͞Ε͍ͯΔ σϞαΠτ ͸ΞΫηεͰ͖ͳ͘ͳ͍ͬͯΔ😇

Slide 69

Slide 69 text

•ೋͭͷจͷજࡏදݱͷ
 ઢܗิ׬ʹΑΔੜ੒ •VAEͷજࡏۭ͕ؒͳΊΒ͔
 ͳ͜ͱʹΑΔԸܙ •શؔ͘܎ͷͳ͍จ͸
 ग़͖͍ͯͯͳ͍ɺ͘Β͍ͷ
 ؾ͔࣋ͪ • ิ׬͞Εͨจͷޠኮ͸
 ݩͷจͱࣅ͍ͯΔ ධՁ࣮ݧ: Guided Language Generation 69

Slide 70

Slide 70 text

•ೋͭͷจͷજࡏදݱͷ
 ઢܗิ׬ʹΑΔੜ੒ •VAEͷજࡏۭ͕ؒͳΊΒ͔
 ͳ͜ͱʹΑΔԸܙ •શؔ͘܎ͷͳ͍จ͸
 ग़͖͍ͯͯͳ͍ɺ͘Β͍ͷ
 ؾ͔࣋ͪ • ิ׬͞Εͨจͷޠኮ͸
 ݩͷจͱࣅ͍ͯΔ ධՁ࣮ݧ: Guided Language Generation 70

Slide 71

Slide 71 text

•3ͭͷλεΫͰ࣮ݧɾߴ͍ੑೳ • ର࿩Ԡ౴ੜ੒ • ಛఆελΠϧͷจੜ੒ • ৚݅෇͖ੜ੒ •৚݅෇͖ੜ੒Ͱ͸ײ৘෼ྨͷ
 ϥϕϧʹجͮ͘ςΩετΛੜ੒ • ੜ੒จͷϥϕϧ෼ྨ֬཰΍
 ੜ੒จͷଟ༷ੑͰߴ͍ੑೳ
 ධՁ࣮ݧ: Guided Language Generation 71 ৄ͍࣮͠ݧઃఆɾλεΫઆ໌ʹ͍ͭͯ͸ݩ࿦จΛࢀরͷ͜ͱ

Slide 72

Slide 72 text

•3ͭͷλεΫͰ࣮ݧɾߴ͍ੑೳ • ର࿩Ԡ౴ੜ੒ • ಛఆελΠϧͷจੜ੒ • ৚݅෇͖ੜ੒ •৚݅෇͖ੜ੒Ͱ͸ײ৘෼ྨͷ
 ϥϕϧʹجͮ͘ςΩετΛੜ੒ • ੜ੒จͷϥϕϧ෼ྨ֬཰΍
 ੜ੒จͷଟ༷ੑͰߴ͍ੑೳ
 ධՁ࣮ݧ: Guided Language Generation 72 ৄ͍࣮͠ݧઃఆɾλεΫઆ໌ʹ͍ͭͯ͸ݩ࿦จΛࢀরͷ͜ͱ

Slide 73

Slide 73 text

•OptimusͷEncoderදݱΛ༻͍ͯ
 ઢܗ෼ྨثΛ܇࿅ • Yelpσʔληοτͷײ৘෼ྨλεΫ •܇࿅ࣄྫ਺ʹΑΔੑೳͷมԽΛ؍࡯
 •Optimus͸܇࿅ࣄྫ਺͕খͯ͘͞΋
 ൺֱతߴ͍෼ྨੑೳ • ੑೳ্͕͕Δͷ͕एׯૣ͍ • ಛʹ fi ne-tuningͳ͠ͷ৔߹ʹ΋ͱͷ
 BERTΑΓ΋ੑೳ͕ߴ͍ • VAEͷֶशΛ௨ͯ͠ྑ͍જࡏۭؒ
 Λ֫ಘ͍ͯ͠Δ͜ͱΛࣔࠦ ධՁ࣮ݧ: Low-resource Language Understanding 73

Slide 74

Slide 74 text

•OptimusͷEncoderදݱΛ༻͍ͯ
 ઢܗ෼ྨثΛ܇࿅ • Yelpσʔληοτͷײ৘෼ྨλεΫ •܇࿅ࣄྫ਺ʹΑΔੑೳͷมԽΛ؍࡯
 •Optimus͸܇࿅ࣄྫ਺͕খͯ͘͞΋
 ൺֱతߴ͍෼ྨੑೳ • ੑೳ্͕͕Δͷ͕एׯૣ͍ • ಛʹ fi ne-tuningͳ͠ͷ৔߹ʹ΋ͱͷ
 BERTΑΓ΋ੑೳ͕ߴ͍ • VAEͷֶशΛ௨ͯ͠ྑ͍જࡏۭؒ
 Λ֫ಘ͍ͯ͠Δ͜ͱΛࣔࠦ ධՁ࣮ݧ: Low-resource Language Understanding 74

Slide 75

Slide 75 text

•OptimusͱBERTͷจදݱ
 ͷ෼෍ΛՄࢹԽ • Yelpσʔληοτͷ
 ։ൃηοτΛจදݱʹม׵ •Optimusͷํ͕จදݱͷ෼෍͕
 Ұ༷Ͱϥϕϧ͝ͱͷմ͕ΑΓ
 ໌֬ • ಛʹɺBERTΑΓજࡏදݱ͕
 Ұ༷ʹ෼෍͍ͯ͠Δ • ͱݴ͑ΔΑ͏ͳؾ͕͢Δ ධՁ࣮ݧ: Low-resource Language Understanding 75

Slide 76

Slide 76 text

•OptimusͱBERTͷจදݱ
 ͷ෼෍ΛՄࢹԽ • Yelpσʔληοτͷ
 ։ൃηοτΛจදݱʹม׵ •Optimusͷํ͕จදݱͷ෼෍͕
 Ұ༷Ͱϥϕϧ͝ͱͷմ͕ΑΓ
 ໌֬ • ಛʹɺBERTΑΓજࡏදݱ͕
 Ұ༷ʹ෼෍͍ͯ͠Δ • ͱݴ͑ΔΑ͏ͳؾ͕͢Δ ධՁ࣮ݧ: Low-resource Language Understanding 76

Slide 77

Slide 77 text

•OptimusͷGLUEͰͷੑೳΛධՁ • จຒΊࠐΈΛೖྗͱ͢Δઢܗ෼ྨثʹΑͬͯͲΕ΄Ͳͷੑೳ͕ग़Δ͔ •Fine-tuningͳ͠ͷ৔߹ʹݩͷBERTΑΓ΋ߴ͍ੑೳ • Optimus͸BERTΑΓ΋ྑ͍จදݱ͕֫ಘͰ͖͍ͯΔʁ • BERTͷQQPͷੑೳ͕௿͗͢Δͷ͕ؾʹͳΔ͕… •Fine-tuning͋Γͷ৔߹͸ͦ͜·ͰมΘΒͳ͍ (ݩ͕BERTͳͷͰ౰વ͔) ධՁ࣮ݧ: Low-resource Language Understanding 77 Ͳ͏ͤͳΒSentEvalͰ΋࣮ݧͯ͠ཉ͔͕ͬͨ͠…

Slide 78

Slide 78 text

•OptimusͷGLUEͰͷੑೳΛධՁ • จຒΊࠐΈΛೖྗͱ͢Δઢܗ෼ྨثʹΑͬͯͲΕ΄Ͳͷੑೳ͕ग़Δ͔ •Fine-tuningͳ͠ͷ৔߹ʹݩͷBERTΑΓ΋ߴ͍ੑೳ • Optimus͸BERTΑΓ΋ྑ͍จදݱ͕֫ಘͰ͖͍ͯΔʁ • BERTͷQQPͷੑೳ͕௿͗͢Δͷ͕ؾʹͳΔ͕… •Fine-tuning͋Γͷ৔߹͸ͦ͜·ͰมΘΒͳ͍ (ݩ͕BERTͳͷͰ౰વ͔) ධՁ࣮ݧ: Low-resource Language Understanding 78 Ͳ͏ͤͳΒSentEvalͰ΋࣮ݧͯ͠ཉ͔͕ͬͨ͠…

Slide 79

Slide 79 text

•OptimusͷGLUEͰͷੑೳΛධՁ • จຒΊࠐΈΛೖྗͱ͢Δઢܗ෼ྨثʹΑͬͯͲΕ΄Ͳͷੑೳ͕ग़Δ͔ •Fine-tuningͳ͠ͷ৔߹ʹݩͷBERTΑΓ΋ߴ͍ੑೳ • Optimus͸BERTΑΓ΋ྑ͍จදݱ͕֫ಘͰ͖͍ͯΔʁ • BERTͷQQPͷੑೳ͕௿͗͢Δͷ͕ؾʹͳΔ͕… •Fine-tuning͋Γͷ৔߹͸ͦ͜·ͰมΘΒͳ͍ (ݩ͕BERTͳͷͰ౰વ͔) ධՁ࣮ݧ: Low-resource Language Understanding 79 Ͳ͏ͤͳΒSentEvalͰ΋࣮ݧͯ͠ཉ͔͕ͬͨ͠…

Slide 80

Slide 80 text

•VAEϕʔεͷେن໛ࣄલֶशࡁΈݴޠϞσϧOptimusΛఏҊ •EncoderʹBERTɺDecoderʹGPT-2Λ্ख͘౷߹ͯ͠VAEΛߏ੒ •จੜ੒ɾ৚݅෇͖ੜ੒ɾ௿ࢿݯઃఆͷλεΫͰߴ͍ੑೳ • ಛʹطଘͷখ͞ͳVAEΛେ্͖͘ճΔੑೳ • VAEʹ͓͚Δࣄલֶशͷ༗ޮੑΛࣔ͢ ײ૝ •BERTͳͲͷطଘࣄલֶशࡁΈݴޠϞσϧΛར༻ͤͣɺfrom scratchͰֶश͢ ΔͱͲ͏ͳΔͷ͔͕ؾʹͳΔ • ܭࢉϦιʔεతʹݫ͔ͬͨ͠໛༷(Sec. 6 DiscussionΛࢀর) •ࣄલֶश + VAEͳ࿩ͱͯ͠͸໘ന͍͕ɺԠ༻ൣғ͸ݶఆత͔ ·ͱΊ 80

Slide 81

Slide 81 text

•VAEϕʔεͷେن໛ࣄલֶशࡁΈݴޠϞσϧOptimusΛఏҊ •EncoderʹBERTɺDecoderʹGPT-2Λ্ख͘౷߹ͯ͠VAEΛߏ੒ •จੜ੒ɾ৚݅෇͖ੜ੒ɾ௿ࢿݯઃఆͷλεΫͰߴ͍ੑೳ • ಛʹطଘͷখ͞ͳVAEΛେ্͖͘ճΔੑೳ • VAEʹ͓͚Δࣄલֶशͷ༗ޮੑΛࣔ͢ ײ૝ •BERTͳͲͷطଘࣄલֶशࡁΈݴޠϞσϧΛར༻ͤͣɺfrom scratchͰֶश͢ ΔͱͲ͏ͳΔͷ͔͕ؾʹͳΔ • ܭࢉϦιʔεతʹݫ͔ͬͨ͠໛༷(Sec. 6 DiscussionΛࢀর) •ࣄલֶश + VAEͳ࿩ͱͯ͠͸໘ന͍͕ɺԠ༻ൣғ͸ݶఆత͔ ·ͱΊ 81