Slide 1

Slide 1 text

Visualizing and Measuring the Geometry of BERT NN ࿦จΛࡘʹञΛҿΉձ #9

Slide 2

Slide 2 text

ࣗݾ঺հ • ਿࢁ Ѩ੟ • Software Engineer @Repro • ػցֶशͱ͔౷ܭͱ͔։ൃͱ͔ • TensorFlow Docs ຋༁ & ϨϏϡʔ • ػցֶशਤؑ ڞஶ

Slide 3

Slide 3 text

Abstract • Google PAIRͰ঺հ͞Ε͍ͯͨ࿦จ • ࣗવݴޠॲཧʹ͓͍ͯ Transformer ʹࣅͨΞʔΩςΫνϟͷ ωοτϫʔΫ͸ۃΊͯ༗๬ • ͦͷΑ͏ͳωοτϫʔΫͰࣗવݴޠॲཧʹ͓͚Δಛ௃Λ಺෦Ͱ ͲͷΑ͏ʹอ͍࣋ͯ͠Δͷ͔໌Β͔ʹ͍ͨ͠ • BERT ʹ͍ͭͯఆྔɾఆੑతͳ෼ੳΛߦͬͨ • ҙຯ࿦ɾߏจ࿦తͳ৘ใΛֶश͍ͯͦ͠͏ͳ݁Ռ͕ಘΒΕͨ

Slide 4

Slide 4 text

໨࣍ 1.Context & related works <- 2.Geometry of syntax 3.Geometry of word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion

Slide 5

Slide 5 text

Context & related works • A Structural Probe for Finding Syntax in Word Representations (2019) ͷΞϯαʔʹͳ͍ͬͯΔ • ͜ͷ࿦จൈ͖ʹ͸΄ͱΜͲԿ΋Θ͔Βͳ͍ߏ੒

Slide 6

Slide 6 text

!

Slide 7

Slide 7 text

2 ഒಡΊΔ͓ಘͳ࿦จ

Slide 8

Slide 8 text

A Structural Probe for Finding Syntax in Word Representations NN ࿦จΛࡘʹञΛҿΉձ #9

Slide 9

Slide 9 text

ࣗݾ঺հ • ਿࢁ Ѩ੟ • Software Engineer @Repro • ػցֶशͱ͔౷ܭͱ͔։ൃͱ͔ • TensorFlow Docs ຋༁ & ϨϏϡʔ • ػցֶशਤؑ ڞஶ

Slide 10

Slide 10 text

Abstract • Stanford େֶͷ࿦จ • ୯ޠදݱʹ͍ͭͯ͸ղੳ͕ਐΜͰ͖͍ͯΔ͕ɺߏจ໦ͷදݱ͕ ֶश͞Ε͍ͯΔ͔ʹ͍ͭͯ͸͜Ε·Ͱ͔֬ΊΒΕ͍ͯͳ͍ • ຊݚڀͰ͸ structual probe ͱ͍͏ख๏ΛఏҊ͢Δ • ͜Ε͸neural networkͷ୯ޠදݱΛઢܗม׵ۭͨؒ͠ʹߏจ ໦͕ຒΊࠐ·Ε͍ͯΔ͔ΛධՁ͢Δ΋ͷͰ͋Δ • ELMo, BERT Ͱ͸ߏจ໦Λֶश͍ͯ͠Δͱࣔࠦ͢Δ݁ՌΛಘͨ

Slide 11

Slide 11 text

ݚڀͷ໨త • ਂ૚ϞσϧͰ͸ߏจ໦Λֶश͍ͯ͠Δͷ͔ɺͱ͍͏ٙ໰ʹ౴͑ ͍ͨ ͜ͷ࿦จͰઆ໌͢Δ͜ͱ • ୯ޠදݱ͔Βߏจ໦Λݟ͚ͭΔํ๏ʹ͍ͭͯ • ୯ޠදݱͷ௿࣍ݩ΁ͷࣹӨ͔Βߏจ໦ʹؔ͢Δ৘ใΛ෮ݩ͠ɺ ධՁ͢Δํ๏ͱͦͷ۩ମྫ (ELMo, BERT)ʹ͍ͭͯ

Slide 12

Slide 12 text

ख๏ͷΞΠσΞ • άϥϑͷϊʔυؒͷڑ཭Λอͬͨ·· ϕΫτϧۭؒʹຒΊࠐΉ͜ͱΛߟ͑Δ • ΋͜͠Ε͕Ͱ͖͍ͯΕ͹ɺ͋Δϊʔυ ͷྡͷϊʔυ Λ୳͢͜ͱ͸ۙ๣ ୳ࡧͱಉ͡ • ·ͨɺϞσϧ͕ਖ਼͘͠໦ߏ଄Λֶश͢ Ε͹ɺͦͷදݱۭؒͷҰ෦͚ͩΛར༻ ͢ΔͷͰ͸ͳ͍͔ • දݱۭؒͷ෦෼ۭؒͰɺ໦ߏ଄ͷڑ཭ Λอ͍ͬͯΔΑ͏ͳ΋ͷΛ୳ͤ͹ྑ͍

Slide 13

Slide 13 text

ͭ·Γ? • ղઆهࣄ1ʹ͋Δਤ͕Θ͔Γ΍͍͢ • ࠨͷۭ͕ؒ୯ޠͷදݱۭؒ • ࠨਤதͷփ৭ͷฏ໘͕໦ߏ଄Λදݱ͠ ͍ͯΔ෦෼ۭؒ • ӈଆ͕෮ݩ͞Εͨ໦ߏ଄ 1 https://nlp.stanford.edu//~johnhew//structural-probe.html

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

The structural probe • : ൪໨ͷจதͷ ൪໨ͷ୯ޠͱͦͷϕΫτϧ • : ߏจ໦্Ͱͷϊʔυؒڑ཭ • : ෦෼্ۭؒͰͷڑ ཭

Slide 16

Slide 16 text

Results (Table 1) • จ຺Λߟྀ͠ͳ͍Ϟσϧ(্4ͭ)ʹର͠ ͯɺจ຺Λߟྀ͢ΔϞσϧ(Լ4ͭ)ͷํ ͕ߏจ໦Λ࠶ݱͰ͖͍ͯΔ2 2 ܎Γड͚ߏ଄ʹ͍ͭͯɺछผ΍ํ޲͸ແࢹͯ͠ධՁ͍ͯ͠Δ

Slide 17

Slide 17 text

Results (Figure 2)

Slide 18

Slide 18 text

Results (Figure 4) • ࠨ: ߏจ໦Ͱܭࢉͨ͠୯ޠؒڑ཭ • ӈ: BERT(large) 16 ૚໨Ͱܭࢉ͠ ͨ୯ޠؒڑ཭ • શମతͳߏ଄Λ࠶ݱͰ͖͍ͯͦ͏

Slide 19

Slide 19 text

future works • ڑ཭ͦͷ΋ͷͰ͸ͳ͘ڑ཭ͷ 2 ৐Λ༻ ͍Δ͜ͱ͕ॏཁͩͱ࣮ݧ͔ΒΘ͔ͬͨ • ͳͥ 2 ৐ͷํ͕ྑ͍ͷ͔͸Α͘෼͔Β ͳ͔ͬͨ

Slide 20

Slide 20 text

͜͜·Ͱ͕ Context

Slide 21

Slide 21 text

໨࣍ 1.Context & related works 2.Geometry of syntax <- 3.Geometry of word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion

Slide 22

Slide 22 text

Geometry of syntax • BERT ͷֶश݁Ռʹ͍ͭͯɺ࣍ͷ 2 ͭͷ؍఺͔Βߦͬͨ 1.ͦ΋ͦ΋໾ʹཱͭදݱΛֶशͰ͖͍ͯΔͷ͔ 2.ߏจ໦ΛֶशͰ͖͍ͯΔͷ͔

Slide 23

Slide 23 text

Attention probes and dependency representations • BERT ͷֶश݁Ռʹؔ͢ΔఆྔධՁ (༧උ࣮ݧ) • Penn Treebank ͷσʔλΛ༻͍ͯɺ 2 ͭͷ୯ޠͷؒͷ܎Γड͚ߏ଄Λ൑ఆ ͤ͞ΔλεΫ • BERT ͷग़ྗΛ΋ͱʹͯ͠ऑ͍Ϟσϧ (ઢܗࣝผػ + L2 ਖ਼ଇԽ) Ͱֶश • ݁Ռɺaccuracy ͕ 85.8% ͋ͬͨͷ Ͱɺ࣍ʹਐΜͰ΋ྑͦ͞͏ͩͱ൑அͯ͠ ͍Δ

Slide 24

Slide 24 text

Mathematics of embedding trees in Euclidean space • ϊʔυ͔ΒͳΔ໦͸ ʹڑ཭ (తͳ΋ͷ)Λอͬͨ··ຒΊࠐΊΔ͜ ͱ͕਺ֶతʹূ໌Ͱ͖ͨ • ·ͨɺڑ཭ͦͷ΋ͷΛ༻͍ͯ͠·͏ ͱɺڑ཭ΛอͭຒΊࠐΈ͕Ͱ͖ͳ͍Α ͏ͳ৔߹͕͋Δ͜ͱ΋ࣔ͞Εͨ • ͜ΕʹΑΓ॓୊͕ղܾͨ͠ͱ͍ͯ͠Δ

Slide 25

Slide 25 text

ͭ·Γ? • blog هࣄͰৄ͘͠ղઆ͞Ε͍ͯΔͷ Ͱɺৄࡉ͕ؾʹͳͬͨΒ͔͜͜ΒೖΔ ͷ͕͓͢͢Ί • https://pair-code.github.io/ interpretability/bert-tree/

Slide 26

Slide 26 text

Visualization of parse tree embeddings • ߏจ໦ͷڑ཭ΛอͭΑ͏ͳຒΊࠐΈͱ BERT ͱͷ݁Ռ͕ྨࣅ

Slide 27

Slide 27 text

Visualization of parse tree embeddings • ߏจ໦ΛຒΊࠐΜͩ݁ՌͱɺBERT ͷ ֶश݁ՌͱͰڑ཭Λൺֱ • ൺΛͱͬͨ஋Λ৭Ͱදࣔ • BERT / ਅͷߏจ໦ Λදࣔ • ੺͍఺ઢ͸ߏจ໦্Ͱܨ͕Γ͕ͳ͔ͬ ͕ͨɺBERT ͷֶश݁ՌͰ͸ۙ͘ͳͬ ͨ΋ͷ • part/of, sale/of ͳͲͻͱ·ͱ· ΓͰѻ͏ͷ͕ྑͦ͞͏ͳ΋ͷ͸͍ۙ

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

Visualization of parse tree embeddings • ߏจ໦ΛຒΊࠐΜͩ݁ՌͱɺBERT ͷ ֶश݁ՌͱͰڑ཭ͷൺͷ෼෍Λݕ౼ • ґଘؔ܎͝ͱʹूܭͨ݁͠Ռ͕ӈਤ • ؔ܎͝ͱʹ 1.2 ͔Β 2.5 ·Ͱ޿͘ ෼෍͍ͯ͠Δ • ؔ܎ੑʹରͯ͠ఆྔతͳ؍఺Λ BERT ͕Ճ͍͑ͯΔ͜ͱΛࣔࠦ͢Δ݁Ռ

Slide 30

Slide 30 text

໨࣍ 1.Context & related works 2.Geometry of syntax 3.Geometry of word senses <- • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion

Slide 31

Slide 31 text

Geometry of word senses • ߏจ͚ͩͰ͸ͳ͘୯ޠͷҙຯΛଊ͑ΒΕ͍ͯΔ͔ݕ౼ • ҙຯΛද͢෦෼ۭ͕ؒಘΒΕͳ͍͔࣮ݧ • Ͳ͏΍ΒಘΒΕͨ ! • จ຺Λਓ޻తʹௐઅͰ͖ͳ͍͔࣮ݧ • Ͱ͖ͳ͔ͬͨͲ͜Ζ͔ѱԽͨ͠

Slide 32

Slide 32 text

Measurement of word sense disambiguation capability • BERT ͷग़ྗΛ UMAP ͰՄࢹԽ • ಉ͡ "die" ʹରͯ͠΋ෳ਺ͷҙຯΛ ΋ͭΫϥελ͕Ͱ͖͍ͯΔ • kNN Λ࢖ͬͯޠٛᐆດੑղফλεΫΛ ߦͬͨ݁Ռ accuracy 71.1% (SOTA)

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

ҙຯͷ৘ใͷ෼཭ • "structural probe" ͱಉ༷ʹͯ͠ ҙຯΛද͢෦෼ۭؒΛநग़ • ߏจ໦ͱͷڑ཭ͷࠩͰ͸ͳ͘ɺ୯ޠ ͷҙຯؒͰͷίαΠϯྨࣅ౓Λར༻ (ৄࡉෆ໌) • ࣍ݩ࡟ݮલͷ accuracy ͸ 71.1% • ࣍ݩ࡟ݮΛߦ͏ͱগ্͕͠Δ • ҙຯͷ෦෼ۭؒͱ͍͏΋ͷ͕͋Γͦ͏

Slide 35

Slide 35 text

Embedding distance and context: a concatenation experiment • จ຺Λҙਤతʹૢ࡞͢Δ͜ͱͰྑ͍݁ ՌΛಘΒΕͳ͍͔࣮ݧ • ಛఆͷҙຯ͋Δ୯ޠΛ༻͍͍ͯΔ୅ද తͳจΛݟ͚ͭग़͠ɺಉ͡ҙຯͰಉ͡ ୯ޠΛ༻͍͍ͯΔจʹ࿈݁ͨ͠ • "I went to Edo" ͕୅දతͳจ ͳ৔߹ɺ"He went to Edo"ʹ෇ ͚଍ͯ͠"He went to Edo and I went to Edo" ͱ͍͏จΛ࡞Δ

Slide 36

Slide 36 text

Embedding distance and context: a concatenation experiment • ԣ࣠: BERT ͷϨΠϠʔ • ॎ࣠: ҙຯͷҧ͏Ϋϥελͷத৺ͱͷ ڑ཭ͷൺతͳ΋ͷ (େ͖͍΄ͲΑ͍) • ୅දతͳจΛ෇͚Ճ͑ͨ৔߹ɺͦͷ୯ ޠͷҙຯΛΑΓΑ͘෼཭Ͱ͖Δ͔ͱ ࢥͬͨΒͦΜͳ͜ͱ͸ͳ͔ͬͨ

Slide 37

Slide 37 text

໨࣍ 1.Context & related works 2.Geometry of syntax 3.Geometry of word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion <-

Slide 38

Slide 38 text

Conclusion • "structural probe" ʹ਺ֶతͳҙຯ෇͚Λߦͬͨ • ߏจ໦ͷຒΊࠐΈͱBERTͷֶश݁ՌΛൺֱͨ͠ͱ͜ΖɺߏจΛ ֶश͍ͯͦ͠͏ͳ݁Ռ͕ಘΒΕͨ • ߏจΛֶश͢ΔۭؒͱผʹɺҙຯΛֶश͢Δۭ͕ؒ͋Γͦ͏ͳ ͜ͱ͕Θ͔ͬͨ • ଞʹࣗવݴޠతͳҙຯͰॏཁͳ෦෼ۭ͕ؒ͋Δ͔͸ࠓޙͷݚڀ ՝୊

Slide 39

Slide 39 text

࠷ޙʹ • ࠓ೔ͷΠϕϯτͷ෮श • TensorFlow User Group Tokyo • NN࿦จΛࡘʹञΛҿΉձ

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

TensorFlow User Group Tokyo NN࿦จΛࡘʹञΛҿΉձ #9