Slide 1

Slide 1 text

ݴޠؒసҠ͕ՄೳͳΤϯίʔμͷ͍࣋ͬͯΔ஌ࣝͱ͸Կ͔ʁ ౦ژେֶ ௽Ԭݚ ཥ ྇פ (@ryoNLP0123) ਓ޻ݴޠΛ࢖ͬͨࣄલ܇࿅ɿ

Slide 2

Slide 2 text

ཥ ྇פʢϦ Ϧϣ΢Χϯʣ - ౦େ௽Ԭݚ D3 - ڵຯ͸ϚϧνϦϯΨϧ NLP - தࠃੜ·Ε೔ຊҭͪ ࣗݾ঺հ

Slide 3

Slide 3 text

༮গظ… 私 中国 持 算数

Slide 4

Slide 4 text

ֶߍʹͯ 算数 !

Slide 5

Slide 5 text

தࠃޠͰֶश͠ ೔ຊޠͷςετΛड͚Δ ݴޠؒసҠֶश Cross-lingual Transfer Learning

Slide 6

Slide 6 text

܇࿅σʔλ͕ಛఆͷݴޠ͔͠ͳ͍ʢྫɿதࠃޠʣঢ়گԼͰɺଞͷ ݴޠʢྫɿ೔ຊޠʣͷσʔλʹ΋ରԠͰ͖ΔϞσϧΛͭ͘Δɻ ݴޠؒసҠֶश 回 ? ➡︎ ϥϕϧ෇σʔλ͕શવͳ͍ݴޠ΋ੈքʹ͸ͨ͘͞Μ͋ΔͨΊ

Slide 7

Slide 7 text

ੈքதͷݴޠͱσʔλྔ The State and Fate of Linguistic Diversity and Inclusion in the NLP World (Joshi, et al., 2020) 7 ݴޠʢ೔ຊޠɺӳޠͳͲʣ 2191 ݴޠ 222 ݴޠ

Slide 8

Slide 8 text

ݴޠؒసҠֶशͷ΍Γ͔ͨ ݱࡏओྲྀͷํ๏͸ෳ਺ݴޠͰ࢖͑ΔΤϯίʔμΛֶशͯ͠ݴޠؒ Ͱ࢖͍ճ͢͜ͱɻ 複数⾔語 使 ? ⾔語 違 ⽂法 違 、何 共通性 ?

Slide 9

Slide 9 text

͓ͲΖ͖ͷઌߦݚڀͨͪ On the Cross-lingual Transferability of Monolingual Representations (Artetxe et al., 2020) Encoder L1 Embeddings L1 Pretraining 🇬🇧 L2 Embeddings Encoder ❄ L2 Pretraining 🇪🇸 Encoder L2 Embeddings L2 Evaluation 🇪🇸 Encoder L1 Embeddings L1 Fine-tuning 🇬🇧 ❄ ΤϯίʔμͷॏΈ͸ӳޠͰ͔͠Ξοϓσʔτ͞Ε͍ͯͳ͍͕ɺεϖΠϯޠͷλεΫ͕ղ͚Δɻ

Slide 10

Slide 10 text

͓ͲΖ͖ͷઌߦݚڀͨͪ Using Transfer to Study Linguistic Structure in Language Models (Papadimitriou and Jurafsky, 2020) Encoder L1 Embeddings L1 Pretraining ♪ L2 Embeddings Encoder ❄ L2 Training 🇪🇸 Encoder L2 Embeddings L2 Evaluation 🇪🇸 ָේσʔλͰ܇࿅͞ΕͨΤϯίʔμ͕ɺεϖΠϯޠͷϞσϦϯάʹ͋Δఔ౓࢖͑Δɻ

Slide 11

Slide 11 text

ࠓ೔ͷ࿩Ͱߟ͍͑ͨ͜ͱ ΤϯίʔμʹͲͷΑ͏ͳ஌ֶ͕ࣝश͞ΕΕ͹ɺ ͦΕ͸ଞͷݴޠʹ໾ཱͭͷ͔ʁ

Slide 12

Slide 12 text

࣮ݧख๏ ਓ޻ݴޠ͔ΒͷసҠֶश Encoder L1 Embeddings L1 Pretraining L2 Embeddings Encoder ❄ L2 Training 🇬🇧 Encoder L2 Embeddings L2 Evaluation 🇬🇧 Կ͔͠Βͷߏ଄Λ࣋ͬͨਓ޻ݴޠΛσβΠϯ͢Δɻ

Slide 13

Slide 13 text

ਓ޻ݴޠΛσβΠϯ͢Δ

Slide 14

Slide 14 text

ਓ޻ݴޠ '1539', '3283', '2412', '6587', '5401', '26', '9138', '3192', '904', '7458' w ୯ޠͷ୅ΘΓʹ਺ࣈͱه߸ͷཏྻ͔ΒͳΔɻ w Կ͔͠ΒͷTFNBOUJDTʹάϥ΢ϯσΟϯά͞Ε͍ͯΔΘ͚ Ͱ͸ͳ͘ɺͨͩߏ଄Λ΋ͭɻ w ਓ޻ݴޠͷจ͸αϯϓϦϯά͞Εͯੜ੒͞ΕΔɻ

Slide 15

Slide 15 text

ਓ޻ݴޠͷจΛαϯϓϦϯά͢Δ l ∼ plen (l) w ·ͣจͷ௕͞ΛԿ͔͠Βͷ෼෍͕Βαϯϓϧ͢Δɻ w ͦͷ਺͚ͩτʔΫϯΛαϯϓϧ͢Δɻ͜͜Ͱ࢖ΘΕΔΞϧΰϦζϜ͕ ਓ޻ݴޠΛಛ௃͚ͮΔɻ

Slide 16

Slide 16 text

ࣗવݴޠʹ͓͚Δ୯ޠ͸ϥϯμϜʹݱΕΔΘ͚Ͱ͸ͳ͍ɻ ୯ޠͷ෼෍ΛϞσϦϯά͢Δ • ස౓෼෍͸࿪ΜͰ͍Δ͠… • จ಺ͷ୯ޠ͸Կ͔͠Βͷؔ࿈ੑΛ࣋ͭɻ “A dog and cat are fighting over food.”

Slide 17

Slide 17 text

Uniform Language p(w) = 1 | 𝒱 | ୯ޠ͸Ұ༷෼෍͔ΒαϯϓϦϯά͞ΕΔ ͜Ε͸୯ͳΔϕʔεϥΠϯɻ

Slide 18

Slide 18 text

Zipf Language p(w) ∝ 1 rank(w) ୯ޠ͸ Zipf ͷ෼෍͔ΒαϯϓϦϯά͞ΕΔɻ

Slide 19

Slide 19 text

Log-Linear Language ୯ޠ͸จຖʹҟͳΔ෼෍͔ΒαϯϓϦϯά͞ΕΔɻ p(w|s) ∝ exp( ⃗ c s ⋅ ⃗ v w ) ⃗ c s ⃗ v w Discourse vector: ͜Ε͸ͦΕͧΕͷจʹରͯ͠ɺਖ਼ن෼෍ ͔ΒϥϯμϜʹαϯϓϧ͞ΕΔɻ Word vectors: ͦΕͧΕͷ୯ޠ͕ϕΫτϧΛ࣋ͭɻ͜ͷϕΫ τϧ͸ਖ਼ن෼෍͔ΒϥϯμϜʹαϯϓϦϯά͞ΕΔɻ

Slide 20

Slide 20 text

จ಺ͷ୯ޠ͸Ұఆͷϧʔϧʹैͬͯ഑ஔ͞ΕΔɻ จͷߏ଄ΛϞσϦϯά͢Δ I dog saw a nsubj obj det 7&3# 130 /06/ %&5 • ࠓճ͸ґଘߏ଄Λ໛ͨ͠΋ͷΛ࡞Δɻ

Slide 21

Slide 21 text

Nesting Dependency Language <0 <248 <23 23> <567 567> 248> 0> • จ಺ʹ୯ޠ͸ඞͣಛఆͷϖΞͱҰॹʹݱΕΔɻ • ϖΞͷґଘؔ܎͸ೖΕࢠʹͳ͍ͬͯΔɻ

Slide 22

Slide 22 text

Flat Dependency Language <0 <248 <23 23> <567 567> 0> • ϖΞͷґଘؔ܎͸ೖΕࢠʹͳ͍ͬͯͳͯ͘ྑ͍ɻ 248> • จ಺ʹ୯ޠ͸ඞͣಛఆͷϖΞͱҰॹʹݱΕΔɻ

Slide 23

Slide 23 text

͍ͬͨΜ·ͱΊ: ਓ޻ݴޠͨͪ ୯ޠͷ෼෍ΛϞσϦϯά uniform zipf log-linear จͷߏ଄ΛϞσϦϯά flat nesting

Slide 24

Slide 24 text

࣮ݧ

Slide 25

Slide 25 text

͔֬Ί͍ͨ͜ͱ Encoder L1 Embeddings L1 Pretraining L2 Embeddings Encoder ❄ L2 Training 🇬🇧 Encoder L2 Embeddings L2 Evaluation 🇬🇧 ͜͜ͷਓ޻ݴޠ͕ͲͷΑ͏ͳߏ଄Λ͍࣋ͬͯΕ͹ ӳޠͷλεΫΛղ͘ͷʹ໾ʹཱͭΤϯίʔμ͕Ͱ͖Δͷ͔ʁ

Slide 26

Slide 26 text

λεΫɿݴޠϞσϦϯά ίϯςΫετ͔Β࣍ͷ୯ޠΛ༧ଌ͢Δɻ Encoder A cat and dog are

Slide 27

Slide 27 text

ϕʔεϥΠϯͳͲ L2 Embeddings Encoder ❄ Random Weights 🇬🇧 Encoder From Scratch L2 Embeddings 🇬🇧 Encoder L1 Embeddings Pretrained Encoders L2 Embeddings Encoder ❄ 🇬🇧

Slide 28

Slide 28 text

Ϟσϧ • Transformer (300 dim, 3 layers) ࣄલֶशͷσʔλ 12.8M จΛ֤ݴޠʹ͍ͭͯαϯϓϦϯάɻ • Artificial languages • Natural languages (Wikipedia dumps of en, es, ja) ධՁλεΫͷσʔλ (Fine-tuning and test) • the Penn Treebank Corpus ࣮ݧઃఆ (LSTM ΋ࢼͯ͠େମಉ͡܏޲)

Slide 29

Slide 29 text

τʔΫϯͷ෼෍ͷӨڹ͸ʁ • Log-linear Language ͷΑ͏ͳ୯ޠ෼෍͕͋Δͱɺͦͦ͜͜࢖͑ΔΤϯίʔμʹͳΔɻ

Slide 30

Slide 30 text

จͷߏ଄ͷӨڹ͸ʁ • Flat ͱ Nesting ͩͱ Nesting ͷํ͕ྑ͍ɻ • ਓ޻ݴޠͱࣗવݴޠͷ͕ࠩࢥͬͨΑΓ͍ۙʢখ͍͞Ϟσϧͱ୯७ͳλεΫ͔ͩΒͩͱࢥ͏͕…ʣ

Slide 31

Slide 31 text

΋͏ͪΐͬͱ෼ੳ

Slide 32

Slide 32 text

Τϯίʔμ͕จ຺৘ใΛଊ͍͑ͯΔ͜ͱ͕ॏཁʁ Ծઆ • Τϯίʔμ͕΍͍ͬͯΔͷ͸ཁ͢Δʹɺจ຺৘ใʢೖྗத ͷτʔΫϯʣΛ̍ͭͷϕΫτϧʹ·ͱΊΔ͜ͱɻ

Slide 33

Slide 33 text

Τϯίʔμग़ྗʹจ຺৘ใ͸ ͲΕؚ͚ͩ·Ε͍ͯΔ͔ʁ Encoder 34 28 12 77 ग़ྗϕΫτϧ͔Βೖྗதͷલʹग़͖ͯͨτʔΫϯΛ༧ଌ͢Δɻ (34, 28, 12, 77) Probing λεΫΛ࡞ͬͯௐ΂Δ

Slide 34

Slide 34 text

Τϯίʔμग़ྗʹจ຺৘ใ͸ ͲΕؚ͚ͩ·Ε͍ͯΔ͔ʁ Probing Task Language Modeling Probing λεΫͷείΞͱݴޠϞσϦϯάͷੑೳ͸૬͍ؔͯ͠Δɻ

Slide 35

Slide 35 text

Τϯίʔμग़ྗʹจ຺৘ใ͸ ͲΕؚ͚ͩ·Ε͍ͯΔ͔ʁ Probing Task Language Modeling ͜͜΋؇΍͔ʹ૬ؔʁ

Slide 36

Slide 36 text

• ࣗવݴޠλεΫʹసҠ͢ΔͨΊʹ͸ɺจ຺৘ใΛΤϯίʔ υ͢Δೳྗ͕ॏཁɻ • จ຺৘ใΛΤϯίʔυͷ࢓ํΛసҠͤ͞Δ͜ͱ͕ɺ࣮ࡍͷ cross-lingual transfer Ͱ΋ΧΪͱͳ͍ͬͯΔ…ʁ ͜ͷ݁Ռͷࣔࠦ͢Δͱ͜Ζ

Slide 37

Slide 37 text

• େ͖͍Ϟσϧɺ೉͍͠λεΫͰ͸Ͳ͏͔ʁ • Τϯίʔμͷจ຺Λ·ͱΊΔύλʔϯΛΑΓৄࡉʹ෼ੳ͢Δ ํ๏͸ʁ ؾʹͳΔͱ͜Ζ