Visualizing and Measuring the Geometry of BERT

8fa31051503b09846584c49cd53d2f80?s=47 Asei Sugiyama
September 04, 2019

Visualizing and Measuring the Geometry of BERT

NN論文を肴に酒を飲む会 #9 https://tfug-tokyo.connpass.com/event/143283/ での発表用資料です

8fa31051503b09846584c49cd53d2f80?s=128

Asei Sugiyama

September 04, 2019
Tweet

Transcript

  1. Visualizing and Measuring the Geometry of BERT NN ࿦จΛࡘʹञΛҿΉձ #9

  2. ࣗݾ঺հ • ਿࢁ Ѩ੟ • Software Engineer @Repro • ػցֶशͱ͔౷ܭͱ͔։ൃͱ͔

    • TensorFlow Docs ຋༁ & ϨϏϡʔ • ػցֶशਤؑ ڞஶ
  3. Abstract • Google PAIRͰ঺հ͞Ε͍ͯͨ࿦จ • ࣗવݴޠॲཧʹ͓͍ͯ Transformer ʹࣅͨΞʔΩςΫνϟͷ ωοτϫʔΫ͸ۃΊͯ༗๬ •

    ͦͷΑ͏ͳωοτϫʔΫͰࣗવݴޠॲཧʹ͓͚Δಛ௃Λ಺෦Ͱ ͲͷΑ͏ʹอ͍࣋ͯ͠Δͷ͔໌Β͔ʹ͍ͨ͠ • BERT ʹ͍ͭͯఆྔɾఆੑతͳ෼ੳΛߦͬͨ • ҙຯ࿦ɾߏจ࿦తͳ৘ใΛֶश͍ͯͦ͠͏ͳ݁Ռ͕ಘΒΕͨ
  4. ໨࣍ 1.Context & related works <- 2.Geometry of syntax 3.Geometry

    of word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion
  5. Context & related works • A Structural Probe for Finding

    Syntax in Word Representations (2019) ͷΞϯαʔʹͳ͍ͬͯΔ • ͜ͷ࿦จൈ͖ʹ͸΄ͱΜͲԿ΋Θ͔Βͳ͍ߏ੒
  6. !

  7. 2 ഒಡΊΔ͓ಘͳ࿦จ

  8. A Structural Probe for Finding Syntax in Word Representations NN

    ࿦จΛࡘʹञΛҿΉձ #9
  9. ࣗݾ঺հ • ਿࢁ Ѩ੟ • Software Engineer @Repro • ػցֶशͱ͔౷ܭͱ͔։ൃͱ͔

    • TensorFlow Docs ຋༁ & ϨϏϡʔ • ػցֶशਤؑ ڞஶ
  10. Abstract • Stanford େֶͷ࿦จ • ୯ޠදݱʹ͍ͭͯ͸ղੳ͕ਐΜͰ͖͍ͯΔ͕ɺߏจ໦ͷදݱ͕ ֶश͞Ε͍ͯΔ͔ʹ͍ͭͯ͸͜Ε·Ͱ͔֬ΊΒΕ͍ͯͳ͍ • ຊݚڀͰ͸ structual

    probe ͱ͍͏ख๏ΛఏҊ͢Δ • ͜Ε͸neural networkͷ୯ޠදݱΛઢܗม׵ۭͨؒ͠ʹߏจ ໦͕ຒΊࠐ·Ε͍ͯΔ͔ΛධՁ͢Δ΋ͷͰ͋Δ • ELMo, BERT Ͱ͸ߏจ໦Λֶश͍ͯ͠Δͱࣔࠦ͢Δ݁ՌΛಘͨ
  11. ݚڀͷ໨త • ਂ૚ϞσϧͰ͸ߏจ໦Λֶश͍ͯ͠Δͷ͔ɺͱ͍͏ٙ໰ʹ౴͑ ͍ͨ ͜ͷ࿦จͰઆ໌͢Δ͜ͱ • ୯ޠදݱ͔Βߏจ໦Λݟ͚ͭΔํ๏ʹ͍ͭͯ • ୯ޠදݱͷ௿࣍ݩ΁ͷࣹӨ͔Βߏจ໦ʹؔ͢Δ৘ใΛ෮ݩ͠ɺ ධՁ͢Δํ๏ͱͦͷ۩ମྫ

    (ELMo, BERT)ʹ͍ͭͯ
  12. ख๏ͷΞΠσΞ • άϥϑͷϊʔυؒͷڑ཭Λอͬͨ·· ϕΫτϧۭؒʹຒΊࠐΉ͜ͱΛߟ͑Δ • ΋͜͠Ε͕Ͱ͖͍ͯΕ͹ɺ͋Δϊʔυ ͷྡͷϊʔυ Λ୳͢͜ͱ͸ۙ๣ ୳ࡧͱಉ͡ •

    ·ͨɺϞσϧ͕ਖ਼͘͠໦ߏ଄Λֶश͢ Ε͹ɺͦͷදݱۭؒͷҰ෦͚ͩΛར༻ ͢ΔͷͰ͸ͳ͍͔ • දݱۭؒͷ෦෼ۭؒͰɺ໦ߏ଄ͷڑ཭ Λอ͍ͬͯΔΑ͏ͳ΋ͷΛ୳ͤ͹ྑ͍
  13. ͭ·Γ? • ղઆهࣄ1ʹ͋Δਤ͕Θ͔Γ΍͍͢ • ࠨͷۭ͕ؒ୯ޠͷදݱۭؒ • ࠨਤதͷփ৭ͷฏ໘͕໦ߏ଄Λදݱ͠ ͍ͯΔ෦෼ۭؒ • ӈଆ͕෮ݩ͞Εͨ໦ߏ଄

    1 https://nlp.stanford.edu//~johnhew//structural-probe.html
  14. None
  15. The structural probe • : ൪໨ͷจதͷ ൪໨ͷ୯ޠͱͦͷϕΫτϧ • : ߏจ໦্Ͱͷϊʔυؒڑ཭

    • : ෦෼্ۭؒͰͷڑ ཭
  16. Results (Table 1) • จ຺Λߟྀ͠ͳ͍Ϟσϧ(্4ͭ)ʹର͠ ͯɺจ຺Λߟྀ͢ΔϞσϧ(Լ4ͭ)ͷํ ͕ߏจ໦Λ࠶ݱͰ͖͍ͯΔ2 2 ܎Γड͚ߏ଄ʹ͍ͭͯɺछผ΍ํ޲͸ແࢹͯ͠ධՁ͍ͯ͠Δ

  17. Results (Figure 2)

  18. Results (Figure 4) • ࠨ: ߏจ໦Ͱܭࢉͨ͠୯ޠؒڑ཭ • ӈ: BERT(large) 16

    ૚໨Ͱܭࢉ͠ ͨ୯ޠؒڑ཭ • શମతͳߏ଄Λ࠶ݱͰ͖͍ͯͦ͏
  19. future works • ڑ཭ͦͷ΋ͷͰ͸ͳ͘ڑ཭ͷ 2 ৐Λ༻ ͍Δ͜ͱ͕ॏཁͩͱ࣮ݧ͔ΒΘ͔ͬͨ • ͳͥ 2

    ৐ͷํ͕ྑ͍ͷ͔͸Α͘෼͔Β ͳ͔ͬͨ
  20. ͜͜·Ͱ͕ Context

  21. ໨࣍ 1.Context & related works 2.Geometry of syntax <- 3.Geometry

    of word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion
  22. Geometry of syntax • BERT ͷֶश݁Ռʹ͍ͭͯɺ࣍ͷ 2 ͭͷ؍఺͔Βߦͬͨ 1.ͦ΋ͦ΋໾ʹཱͭදݱΛֶशͰ͖͍ͯΔͷ͔ 2.ߏจ໦ΛֶशͰ͖͍ͯΔͷ͔

  23. Attention probes and dependency representations • BERT ͷֶश݁Ռʹؔ͢ΔఆྔධՁ (༧උ࣮ݧ) •

    Penn Treebank ͷσʔλΛ༻͍ͯɺ 2 ͭͷ୯ޠͷؒͷ܎Γड͚ߏ଄Λ൑ఆ ͤ͞ΔλεΫ • BERT ͷग़ྗΛ΋ͱʹͯ͠ऑ͍Ϟσϧ (ઢܗࣝผػ + L2 ਖ਼ଇԽ) Ͱֶश • ݁Ռɺaccuracy ͕ 85.8% ͋ͬͨͷ Ͱɺ࣍ʹਐΜͰ΋ྑͦ͞͏ͩͱ൑அͯ͠ ͍Δ
  24. Mathematics of embedding trees in Euclidean space • ϊʔυ͔ΒͳΔ໦͸ ʹڑ཭

    (తͳ΋ͷ)Λอͬͨ··ຒΊࠐΊΔ͜ ͱ͕਺ֶతʹূ໌Ͱ͖ͨ • ·ͨɺڑ཭ͦͷ΋ͷΛ༻͍ͯ͠·͏ ͱɺڑ཭ΛอͭຒΊࠐΈ͕Ͱ͖ͳ͍Α ͏ͳ৔߹͕͋Δ͜ͱ΋ࣔ͞Εͨ • ͜ΕʹΑΓ॓୊͕ղܾͨ͠ͱ͍ͯ͠Δ
  25. ͭ·Γ? • blog هࣄͰৄ͘͠ղઆ͞Ε͍ͯΔͷ Ͱɺৄࡉ͕ؾʹͳͬͨΒ͔͜͜ΒೖΔ ͷ͕͓͢͢Ί • https://pair-code.github.io/ interpretability/bert-tree/

  26. Visualization of parse tree embeddings • ߏจ໦ͷڑ཭ΛอͭΑ͏ͳຒΊࠐΈͱ BERT ͱͷ݁Ռ͕ྨࣅ

  27. Visualization of parse tree embeddings • ߏจ໦ΛຒΊࠐΜͩ݁ՌͱɺBERT ͷ ֶश݁ՌͱͰڑ཭Λൺֱ •

    ൺΛͱͬͨ஋Λ৭Ͱදࣔ • BERT / ਅͷߏจ໦ Λදࣔ • ੺͍఺ઢ͸ߏจ໦্Ͱܨ͕Γ͕ͳ͔ͬ ͕ͨɺBERT ͷֶश݁ՌͰ͸ۙ͘ͳͬ ͨ΋ͷ • part/of, sale/of ͳͲͻͱ·ͱ· ΓͰѻ͏ͷ͕ྑͦ͞͏ͳ΋ͷ͸͍ۙ
  28. None
  29. Visualization of parse tree embeddings • ߏจ໦ΛຒΊࠐΜͩ݁ՌͱɺBERT ͷ ֶश݁ՌͱͰڑ཭ͷൺͷ෼෍Λݕ౼ •

    ґଘؔ܎͝ͱʹूܭͨ݁͠Ռ͕ӈਤ • ؔ܎͝ͱʹ 1.2 ͔Β 2.5 ·Ͱ޿͘ ෼෍͍ͯ͠Δ • ؔ܎ੑʹରͯ͠ఆྔతͳ؍఺Λ BERT ͕Ճ͍͑ͯΔ͜ͱΛࣔࠦ͢Δ݁Ռ
  30. ໨࣍ 1.Context & related works 2.Geometry of syntax 3.Geometry of

    word senses <- • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion
  31. Geometry of word senses • ߏจ͚ͩͰ͸ͳ͘୯ޠͷҙຯΛଊ͑ΒΕ͍ͯΔ͔ݕ౼ • ҙຯΛද͢෦෼ۭ͕ؒಘΒΕͳ͍͔࣮ݧ • Ͳ͏΍ΒಘΒΕͨ

    ! • จ຺Λਓ޻తʹௐઅͰ͖ͳ͍͔࣮ݧ • Ͱ͖ͳ͔ͬͨͲ͜Ζ͔ѱԽͨ͠
  32. Measurement of word sense disambiguation capability • BERT ͷग़ྗΛ UMAP

    ͰՄࢹԽ • ಉ͡ "die" ʹରͯ͠΋ෳ਺ͷҙຯΛ ΋ͭΫϥελ͕Ͱ͖͍ͯΔ • kNN Λ࢖ͬͯޠٛᐆດੑղফλεΫΛ ߦͬͨ݁Ռ accuracy 71.1% (SOTA)
  33. None
  34. ҙຯͷ৘ใͷ෼཭ • "structural probe" ͱಉ༷ʹͯ͠ ҙຯΛද͢෦෼ۭؒΛநग़ • ߏจ໦ͱͷڑ཭ͷࠩͰ͸ͳ͘ɺ୯ޠ ͷҙຯؒͰͷίαΠϯྨࣅ౓Λར༻ (ৄࡉෆ໌)

    • ࣍ݩ࡟ݮલͷ accuracy ͸ 71.1% • ࣍ݩ࡟ݮΛߦ͏ͱগ্͕͠Δ • ҙຯͷ෦෼ۭؒͱ͍͏΋ͷ͕͋Γͦ͏
  35. Embedding distance and context: a concatenation experiment • จ຺Λҙਤతʹૢ࡞͢Δ͜ͱͰྑ͍݁ ՌΛಘΒΕͳ͍͔࣮ݧ

    • ಛఆͷҙຯ͋Δ୯ޠΛ༻͍͍ͯΔ୅ද తͳจΛݟ͚ͭग़͠ɺಉ͡ҙຯͰಉ͡ ୯ޠΛ༻͍͍ͯΔจʹ࿈݁ͨ͠ • "I went to Edo" ͕୅දతͳจ ͳ৔߹ɺ"He went to Edo"ʹ෇ ͚଍ͯ͠"He went to Edo and I went to Edo" ͱ͍͏จΛ࡞Δ
  36. Embedding distance and context: a concatenation experiment • ԣ࣠: BERT

    ͷϨΠϠʔ • ॎ࣠: ҙຯͷҧ͏Ϋϥελͷத৺ͱͷ ڑ཭ͷൺతͳ΋ͷ (େ͖͍΄ͲΑ͍) • ୅දతͳจΛ෇͚Ճ͑ͨ৔߹ɺͦͷ୯ ޠͷҙຯΛΑΓΑ͘෼཭Ͱ͖Δ͔ͱ ࢥͬͨΒͦΜͳ͜ͱ͸ͳ͔ͬͨ
  37. ໨࣍ 1.Context & related works 2.Geometry of syntax 3.Geometry of

    word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion <-
  38. Conclusion • "structural probe" ʹ਺ֶతͳҙຯ෇͚Λߦͬͨ • ߏจ໦ͷຒΊࠐΈͱBERTͷֶश݁ՌΛൺֱͨ͠ͱ͜ΖɺߏจΛ ֶश͍ͯͦ͠͏ͳ݁Ռ͕ಘΒΕͨ • ߏจΛֶश͢ΔۭؒͱผʹɺҙຯΛֶश͢Δۭ͕ؒ͋Γͦ͏ͳ

    ͜ͱ͕Θ͔ͬͨ • ଞʹࣗવݴޠతͳҙຯͰॏཁͳ෦෼ۭ͕ؒ͋Δ͔͸ࠓޙͷݚڀ ՝୊
  39. ࠷ޙʹ • ࠓ೔ͷΠϕϯτͷ෮श • TensorFlow User Group Tokyo • NN࿦จΛࡘʹञΛҿΉձ

  40. None
  41. None
  42. TensorFlow User Group Tokyo NN࿦จΛࡘʹञΛҿΉձ #9