Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Visualizing and Measuring the Geometry of BERT

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for Asei Sugiyama Asei Sugiyama
September 04, 2019

Visualizing and Measuring the Geometry of BERT

NN論文を肴に酒を飲む会 #9 https://tfug-tokyo.connpass.com/event/143283/ での発表用資料です

Avatar for Asei Sugiyama

Asei Sugiyama

September 04, 2019
Tweet

More Decks by Asei Sugiyama

Other Decks in Technology

Transcript

  1. ࣗݾ঺հ • ਿࢁ Ѩ੟ • Software Engineer @Repro • ػցֶशͱ͔౷ܭͱ͔։ൃͱ͔

    • TensorFlow Docs ຋༁ & ϨϏϡʔ • ػցֶशਤؑ ڞஶ
  2. Abstract • Google PAIRͰ঺հ͞Ε͍ͯͨ࿦จ • ࣗવݴޠॲཧʹ͓͍ͯ Transformer ʹࣅͨΞʔΩςΫνϟͷ ωοτϫʔΫ͸ۃΊͯ༗๬ •

    ͦͷΑ͏ͳωοτϫʔΫͰࣗવݴޠॲཧʹ͓͚Δಛ௃Λ಺෦Ͱ ͲͷΑ͏ʹอ͍࣋ͯ͠Δͷ͔໌Β͔ʹ͍ͨ͠ • BERT ʹ͍ͭͯఆྔɾఆੑతͳ෼ੳΛߦͬͨ • ҙຯ࿦ɾߏจ࿦తͳ৘ใΛֶश͍ͯͦ͠͏ͳ݁Ռ͕ಘΒΕͨ
  3. ໨࣍ 1.Context & related works <- 2.Geometry of syntax 3.Geometry

    of word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion
  4. Context & related works • A Structural Probe for Finding

    Syntax in Word Representations (2019) ͷΞϯαʔʹͳ͍ͬͯΔ • ͜ͷ࿦จൈ͖ʹ͸΄ͱΜͲԿ΋Θ͔Βͳ͍ߏ੒
  5. !

  6. ࣗݾ঺հ • ਿࢁ Ѩ੟ • Software Engineer @Repro • ػցֶशͱ͔౷ܭͱ͔։ൃͱ͔

    • TensorFlow Docs ຋༁ & ϨϏϡʔ • ػցֶशਤؑ ڞஶ
  7. Abstract • Stanford େֶͷ࿦จ • ୯ޠදݱʹ͍ͭͯ͸ղੳ͕ਐΜͰ͖͍ͯΔ͕ɺߏจ໦ͷදݱ͕ ֶश͞Ε͍ͯΔ͔ʹ͍ͭͯ͸͜Ε·Ͱ͔֬ΊΒΕ͍ͯͳ͍ • ຊݚڀͰ͸ structual

    probe ͱ͍͏ख๏ΛఏҊ͢Δ • ͜Ε͸neural networkͷ୯ޠදݱΛઢܗม׵ۭͨؒ͠ʹߏจ ໦͕ຒΊࠐ·Ε͍ͯΔ͔ΛධՁ͢Δ΋ͷͰ͋Δ • ELMo, BERT Ͱ͸ߏจ໦Λֶश͍ͯ͠Δͱࣔࠦ͢Δ݁ՌΛಘͨ
  8. ख๏ͷΞΠσΞ • άϥϑͷϊʔυؒͷڑ཭Λอͬͨ·· ϕΫτϧۭؒʹຒΊࠐΉ͜ͱΛߟ͑Δ • ΋͜͠Ε͕Ͱ͖͍ͯΕ͹ɺ͋Δϊʔυ ͷྡͷϊʔυ Λ୳͢͜ͱ͸ۙ๣ ୳ࡧͱಉ͡ •

    ·ͨɺϞσϧ͕ਖ਼͘͠໦ߏ଄Λֶश͢ Ε͹ɺͦͷදݱۭؒͷҰ෦͚ͩΛར༻ ͢ΔͷͰ͸ͳ͍͔ • දݱۭؒͷ෦෼ۭؒͰɺ໦ߏ଄ͷڑ཭ Λอ͍ͬͯΔΑ͏ͳ΋ͷΛ୳ͤ͹ྑ͍
  9. Results (Figure 4) • ࠨ: ߏจ໦Ͱܭࢉͨ͠୯ޠؒڑ཭ • ӈ: BERT(large) 16

    ૚໨Ͱܭࢉ͠ ͨ୯ޠؒڑ཭ • શମతͳߏ଄Λ࠶ݱͰ͖͍ͯͦ͏
  10. ໨࣍ 1.Context & related works 2.Geometry of syntax <- 3.Geometry

    of word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion
  11. Attention probes and dependency representations • BERT ͷֶश݁Ռʹؔ͢ΔఆྔධՁ (༧උ࣮ݧ) •

    Penn Treebank ͷσʔλΛ༻͍ͯɺ 2 ͭͷ୯ޠͷؒͷ܎Γड͚ߏ଄Λ൑ఆ ͤ͞ΔλεΫ • BERT ͷग़ྗΛ΋ͱʹͯ͠ऑ͍Ϟσϧ (ઢܗࣝผػ + L2 ਖ਼ଇԽ) Ͱֶश • ݁Ռɺaccuracy ͕ 85.8% ͋ͬͨͷ Ͱɺ࣍ʹਐΜͰ΋ྑͦ͞͏ͩͱ൑அͯ͠ ͍Δ
  12. Mathematics of embedding trees in Euclidean space • ϊʔυ͔ΒͳΔ໦͸ ʹڑ཭

    (తͳ΋ͷ)Λอͬͨ··ຒΊࠐΊΔ͜ ͱ͕਺ֶతʹূ໌Ͱ͖ͨ • ·ͨɺڑ཭ͦͷ΋ͷΛ༻͍ͯ͠·͏ ͱɺڑ཭ΛอͭຒΊࠐΈ͕Ͱ͖ͳ͍Α ͏ͳ৔߹͕͋Δ͜ͱ΋ࣔ͞Εͨ • ͜ΕʹΑΓ॓୊͕ղܾͨ͠ͱ͍ͯ͠Δ
  13. Visualization of parse tree embeddings • ߏจ໦ΛຒΊࠐΜͩ݁ՌͱɺBERT ͷ ֶश݁ՌͱͰڑ཭Λൺֱ •

    ൺΛͱͬͨ஋Λ৭Ͱදࣔ • BERT / ਅͷߏจ໦ Λදࣔ • ੺͍఺ઢ͸ߏจ໦্Ͱܨ͕Γ͕ͳ͔ͬ ͕ͨɺBERT ͷֶश݁ՌͰ͸ۙ͘ͳͬ ͨ΋ͷ • part/of, sale/of ͳͲͻͱ·ͱ· ΓͰѻ͏ͷ͕ྑͦ͞͏ͳ΋ͷ͸͍ۙ
  14. Visualization of parse tree embeddings • ߏจ໦ΛຒΊࠐΜͩ݁ՌͱɺBERT ͷ ֶश݁ՌͱͰڑ཭ͷൺͷ෼෍Λݕ౼ •

    ґଘؔ܎͝ͱʹूܭͨ݁͠Ռ͕ӈਤ • ؔ܎͝ͱʹ 1.2 ͔Β 2.5 ·Ͱ޿͘ ෼෍͍ͯ͠Δ • ؔ܎ੑʹରͯ͠ఆྔతͳ؍఺Λ BERT ͕Ճ͍͑ͯΔ͜ͱΛࣔࠦ͢Δ݁Ռ
  15. ໨࣍ 1.Context & related works 2.Geometry of syntax 3.Geometry of

    word senses <- • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion
  16. Measurement of word sense disambiguation capability • BERT ͷग़ྗΛ UMAP

    ͰՄࢹԽ • ಉ͡ "die" ʹରͯ͠΋ෳ਺ͷҙຯΛ ΋ͭΫϥελ͕Ͱ͖͍ͯΔ • kNN Λ࢖ͬͯޠٛᐆດੑղফλεΫΛ ߦͬͨ݁Ռ accuracy 71.1% (SOTA)
  17. Embedding distance and context: a concatenation experiment • จ຺Λҙਤతʹૢ࡞͢Δ͜ͱͰྑ͍݁ ՌΛಘΒΕͳ͍͔࣮ݧ

    • ಛఆͷҙຯ͋Δ୯ޠΛ༻͍͍ͯΔ୅ද తͳจΛݟ͚ͭग़͠ɺಉ͡ҙຯͰಉ͡ ୯ޠΛ༻͍͍ͯΔจʹ࿈݁ͨ͠ • "I went to Edo" ͕୅දతͳจ ͳ৔߹ɺ"He went to Edo"ʹ෇ ͚଍ͯ͠"He went to Edo and I went to Edo" ͱ͍͏จΛ࡞Δ
  18. Embedding distance and context: a concatenation experiment • ԣ࣠: BERT

    ͷϨΠϠʔ • ॎ࣠: ҙຯͷҧ͏Ϋϥελͷத৺ͱͷ ڑ཭ͷൺతͳ΋ͷ (େ͖͍΄ͲΑ͍) • ୅දతͳจΛ෇͚Ճ͑ͨ৔߹ɺͦͷ୯ ޠͷҙຯΛΑΓΑ͘෼཭Ͱ͖Δ͔ͱ ࢥͬͨΒͦΜͳ͜ͱ͸ͳ͔ͬͨ
  19. ໨࣍ 1.Context & related works 2.Geometry of syntax 3.Geometry of

    word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion <-