Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Visualizing and Measuring the Geometry of BERT

Avatar for Asei Sugiyama Asei Sugiyama
September 04, 2019

Visualizing and Measuring the Geometry of BERT

NN論文を肴に酒を飲む会 #9 https://tfug-tokyo.connpass.com/event/143283/ での発表用資料です

Avatar for Asei Sugiyama

Asei Sugiyama

September 04, 2019
Tweet

More Decks by Asei Sugiyama

Other Decks in Technology

Transcript

  1. ࣗݾ঺հ • ਿࢁ Ѩ੟ • Software Engineer @Repro • ػցֶशͱ͔౷ܭͱ͔։ൃͱ͔

    • TensorFlow Docs ຋༁ & ϨϏϡʔ • ػցֶशਤؑ ڞஶ
  2. Abstract • Google PAIRͰ঺հ͞Ε͍ͯͨ࿦จ • ࣗવݴޠॲཧʹ͓͍ͯ Transformer ʹࣅͨΞʔΩςΫνϟͷ ωοτϫʔΫ͸ۃΊͯ༗๬ •

    ͦͷΑ͏ͳωοτϫʔΫͰࣗવݴޠॲཧʹ͓͚Δಛ௃Λ಺෦Ͱ ͲͷΑ͏ʹอ͍࣋ͯ͠Δͷ͔໌Β͔ʹ͍ͨ͠ • BERT ʹ͍ͭͯఆྔɾఆੑతͳ෼ੳΛߦͬͨ • ҙຯ࿦ɾߏจ࿦తͳ৘ใΛֶश͍ͯͦ͠͏ͳ݁Ռ͕ಘΒΕͨ
  3. ໨࣍ 1.Context & related works <- 2.Geometry of syntax 3.Geometry

    of word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion
  4. Context & related works • A Structural Probe for Finding

    Syntax in Word Representations (2019) ͷΞϯαʔʹͳ͍ͬͯΔ • ͜ͷ࿦จൈ͖ʹ͸΄ͱΜͲԿ΋Θ͔Βͳ͍ߏ੒
  5. !

  6. ࣗݾ঺հ • ਿࢁ Ѩ੟ • Software Engineer @Repro • ػցֶशͱ͔౷ܭͱ͔։ൃͱ͔

    • TensorFlow Docs ຋༁ & ϨϏϡʔ • ػցֶशਤؑ ڞஶ
  7. Abstract • Stanford େֶͷ࿦จ • ୯ޠදݱʹ͍ͭͯ͸ղੳ͕ਐΜͰ͖͍ͯΔ͕ɺߏจ໦ͷදݱ͕ ֶश͞Ε͍ͯΔ͔ʹ͍ͭͯ͸͜Ε·Ͱ͔֬ΊΒΕ͍ͯͳ͍ • ຊݚڀͰ͸ structual

    probe ͱ͍͏ख๏ΛఏҊ͢Δ • ͜Ε͸neural networkͷ୯ޠදݱΛઢܗม׵ۭͨؒ͠ʹߏจ ໦͕ຒΊࠐ·Ε͍ͯΔ͔ΛධՁ͢Δ΋ͷͰ͋Δ • ELMo, BERT Ͱ͸ߏจ໦Λֶश͍ͯ͠Δͱࣔࠦ͢Δ݁ՌΛಘͨ
  8. ख๏ͷΞΠσΞ • άϥϑͷϊʔυؒͷڑ཭Λอͬͨ·· ϕΫτϧۭؒʹຒΊࠐΉ͜ͱΛߟ͑Δ • ΋͜͠Ε͕Ͱ͖͍ͯΕ͹ɺ͋Δϊʔυ ͷྡͷϊʔυ Λ୳͢͜ͱ͸ۙ๣ ୳ࡧͱಉ͡ •

    ·ͨɺϞσϧ͕ਖ਼͘͠໦ߏ଄Λֶश͢ Ε͹ɺͦͷදݱۭؒͷҰ෦͚ͩΛར༻ ͢ΔͷͰ͸ͳ͍͔ • දݱۭؒͷ෦෼ۭؒͰɺ໦ߏ଄ͷڑ཭ Λอ͍ͬͯΔΑ͏ͳ΋ͷΛ୳ͤ͹ྑ͍
  9. Results (Figure 4) • ࠨ: ߏจ໦Ͱܭࢉͨ͠୯ޠؒڑ཭ • ӈ: BERT(large) 16

    ૚໨Ͱܭࢉ͠ ͨ୯ޠؒڑ཭ • શମతͳߏ଄Λ࠶ݱͰ͖͍ͯͦ͏
  10. ໨࣍ 1.Context & related works 2.Geometry of syntax <- 3.Geometry

    of word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion
  11. Attention probes and dependency representations • BERT ͷֶश݁Ռʹؔ͢ΔఆྔධՁ (༧උ࣮ݧ) •

    Penn Treebank ͷσʔλΛ༻͍ͯɺ 2 ͭͷ୯ޠͷؒͷ܎Γड͚ߏ଄Λ൑ఆ ͤ͞ΔλεΫ • BERT ͷग़ྗΛ΋ͱʹͯ͠ऑ͍Ϟσϧ (ઢܗࣝผػ + L2 ਖ਼ଇԽ) Ͱֶश • ݁Ռɺaccuracy ͕ 85.8% ͋ͬͨͷ Ͱɺ࣍ʹਐΜͰ΋ྑͦ͞͏ͩͱ൑அͯ͠ ͍Δ
  12. Mathematics of embedding trees in Euclidean space • ϊʔυ͔ΒͳΔ໦͸ ʹڑ཭

    (తͳ΋ͷ)Λอͬͨ··ຒΊࠐΊΔ͜ ͱ͕਺ֶతʹূ໌Ͱ͖ͨ • ·ͨɺڑ཭ͦͷ΋ͷΛ༻͍ͯ͠·͏ ͱɺڑ཭ΛอͭຒΊࠐΈ͕Ͱ͖ͳ͍Α ͏ͳ৔߹͕͋Δ͜ͱ΋ࣔ͞Εͨ • ͜ΕʹΑΓ॓୊͕ղܾͨ͠ͱ͍ͯ͠Δ
  13. Visualization of parse tree embeddings • ߏจ໦ΛຒΊࠐΜͩ݁ՌͱɺBERT ͷ ֶश݁ՌͱͰڑ཭Λൺֱ •

    ൺΛͱͬͨ஋Λ৭Ͱදࣔ • BERT / ਅͷߏจ໦ Λදࣔ • ੺͍఺ઢ͸ߏจ໦্Ͱܨ͕Γ͕ͳ͔ͬ ͕ͨɺBERT ͷֶश݁ՌͰ͸ۙ͘ͳͬ ͨ΋ͷ • part/of, sale/of ͳͲͻͱ·ͱ· ΓͰѻ͏ͷ͕ྑͦ͞͏ͳ΋ͷ͸͍ۙ
  14. Visualization of parse tree embeddings • ߏจ໦ΛຒΊࠐΜͩ݁ՌͱɺBERT ͷ ֶश݁ՌͱͰڑ཭ͷൺͷ෼෍Λݕ౼ •

    ґଘؔ܎͝ͱʹूܭͨ݁͠Ռ͕ӈਤ • ؔ܎͝ͱʹ 1.2 ͔Β 2.5 ·Ͱ޿͘ ෼෍͍ͯ͠Δ • ؔ܎ੑʹରͯ͠ఆྔతͳ؍఺Λ BERT ͕Ճ͍͑ͯΔ͜ͱΛࣔࠦ͢Δ݁Ռ
  15. ໨࣍ 1.Context & related works 2.Geometry of syntax 3.Geometry of

    word senses <- • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion
  16. Measurement of word sense disambiguation capability • BERT ͷग़ྗΛ UMAP

    ͰՄࢹԽ • ಉ͡ "die" ʹରͯ͠΋ෳ਺ͷҙຯΛ ΋ͭΫϥελ͕Ͱ͖͍ͯΔ • kNN Λ࢖ͬͯޠٛᐆດੑղফλεΫΛ ߦͬͨ݁Ռ accuracy 71.1% (SOTA)
  17. Embedding distance and context: a concatenation experiment • จ຺Λҙਤతʹૢ࡞͢Δ͜ͱͰྑ͍݁ ՌΛಘΒΕͳ͍͔࣮ݧ

    • ಛఆͷҙຯ͋Δ୯ޠΛ༻͍͍ͯΔ୅ද తͳจΛݟ͚ͭग़͠ɺಉ͡ҙຯͰಉ͡ ୯ޠΛ༻͍͍ͯΔจʹ࿈݁ͨ͠ • "I went to Edo" ͕୅දతͳจ ͳ৔߹ɺ"He went to Edo"ʹ෇ ͚଍ͯ͠"He went to Edo and I went to Edo" ͱ͍͏จΛ࡞Δ
  18. Embedding distance and context: a concatenation experiment • ԣ࣠: BERT

    ͷϨΠϠʔ • ॎ࣠: ҙຯͷҧ͏Ϋϥελͷத৺ͱͷ ڑ཭ͷൺతͳ΋ͷ (େ͖͍΄ͲΑ͍) • ୅දతͳจΛ෇͚Ճ͑ͨ৔߹ɺͦͷ୯ ޠͷҙຯΛΑΓΑ͘෼཭Ͱ͖Δ͔ͱ ࢥͬͨΒͦΜͳ͜ͱ͸ͳ͔ͬͨ
  19. ໨࣍ 1.Context & related works 2.Geometry of syntax 3.Geometry of

    word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion <-