Visualizing and Measuring the Geometry of BERT

Visualizing and Measuring the Geometry of BERT NN ࿦จΛࡘʹञΛҿΉձ #9

ࣗݾ঺հ • ਿࢁ Ѩ੟ • Software Engineer @Repro • ػցֶशͱ͔౷ܭͱ͔։ൃͱ͔
• TensorFlow Docs ຋༁ & ϨϏϡʔ • ػցֶशਤؑ ڞஶ

Abstract • Google PAIRͰ঺հ͞Ε͍ͯͨ࿦จ • ࣗવݴޠॲཧʹ͓͍ͯ Transformer ʹࣅͨΞʔΩςΫνϟͷ ωοτϫʔΫ͸ۃΊͯ༗๬ •
ͦͷΑ͏ͳωοτϫʔΫͰࣗવݴޠॲཧʹ͓͚Δಛ௃Λ಺෦Ͱ ͲͷΑ͏ʹอ͍࣋ͯ͠Δͷ͔໌Β͔ʹ͍ͨ͠ • BERT ʹ͍ͭͯఆྔɾఆੑతͳ෼ੳΛߦͬͨ • ҙຯ࿦ɾߏจ࿦తͳ৘ใΛֶश͍ͯͦ͠͏ͳ݁Ռ͕ಘΒΕͨ

໨࣍ 1.Context & related works <- 2.Geometry of syntax 3.Geometry
of word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion

Context & related works • A Structural Probe for Finding
Syntax in Word Representations (2019) ͷΞϯαʔʹͳ͍ͬͯΔ • ͜ͷ࿦จൈ͖ʹ͸΄ͱΜͲԿ΋Θ͔Βͳ͍ߏ੒

2 ഒಡΊΔ͓ಘͳ࿦จ

A Structural Probe for Finding Syntax in Word Representations NN
࿦จΛࡘʹञΛҿΉձ #9

ࣗݾ঺հ • ਿࢁ Ѩ੟ • Software Engineer @Repro • ػցֶशͱ͔౷ܭͱ͔։ൃͱ͔
• TensorFlow Docs ຋༁ & ϨϏϡʔ • ػցֶशਤؑ ڞஶ

Abstract • Stanford େֶͷ࿦จ • ୯ޠදݱʹ͍ͭͯ͸ղੳ͕ਐΜͰ͖͍ͯΔ͕ɺߏจ໦ͷදݱ͕ ֶश͞Ε͍ͯΔ͔ʹ͍ͭͯ͸͜Ε·Ͱ͔֬ΊΒΕ͍ͯͳ͍ • ຊݚڀͰ͸ structual
probe ͱ͍͏ख๏ΛఏҊ͢Δ • ͜Ε͸neural networkͷ୯ޠදݱΛઢܗม׵ۭͨؒ͠ʹߏจ ໦͕ຒΊࠐ·Ε͍ͯΔ͔ΛධՁ͢Δ΋ͷͰ͋Δ • ELMo, BERT Ͱ͸ߏจ໦Λֶश͍ͯ͠Δͱࣔࠦ͢Δ݁ՌΛಘͨ

ݚڀͷ໨త • ਂ૚ϞσϧͰ͸ߏจ໦Λֶश͍ͯ͠Δͷ͔ɺͱ͍͏ٙ໰ʹ౴͑ ͍ͨ ͜ͷ࿦จͰઆ໌͢Δ͜ͱ • ୯ޠදݱ͔Βߏจ໦Λݟ͚ͭΔํ๏ʹ͍ͭͯ • ୯ޠදݱͷ௿࣍ݩ΁ͷࣹӨ͔Βߏจ໦ʹؔ͢Δ৘ใΛ෮ݩ͠ɺ ධՁ͢Δํ๏ͱͦͷ۩ମྫ
(ELMo, BERT)ʹ͍ͭͯ

ख๏ͷΞΠσΞ • άϥϑͷϊʔυؒͷڑ཭Λอͬͨ·· ϕΫτϧۭؒʹຒΊࠐΉ͜ͱΛߟ͑Δ • ΋͜͠Ε͕Ͱ͖͍ͯΕ͹ɺ͋Δϊʔυ ͷྡͷϊʔυ Λ୳͢͜ͱ͸ۙ๣ ୳ࡧͱಉ͡ •
·ͨɺϞσϧ͕ਖ਼͘͠໦ߏ଄Λֶश͢ Ε͹ɺͦͷදݱۭؒͷҰ෦͚ͩΛར༻ ͢ΔͷͰ͸ͳ͍͔ • දݱۭؒͷ෦෼ۭؒͰɺ໦ߏ଄ͷڑ཭ Λอ͍ͬͯΔΑ͏ͳ΋ͷΛ୳ͤ͹ྑ͍

ͭ·Γ? • ղઆهࣄ1ʹ͋Δਤ͕Θ͔Γ΍͍͢ • ࠨͷۭ͕ؒ୯ޠͷදݱۭؒ • ࠨਤதͷփ৭ͷฏ໘͕໦ߏ଄Λදݱ͠ ͍ͯΔ෦෼ۭؒ • ӈଆ͕෮ݩ͞Εͨ໦ߏ଄
1 https://nlp.stanford.edu//~johnhew//structural-probe.html

The structural probe • : ൪໨ͷจதͷ ൪໨ͷ୯ޠͱͦͷϕΫτϧ • : ߏจ໦্Ͱͷϊʔυؒڑ཭
• : ෦෼্ۭؒͰͷڑ ཭

Results (Table 1) • จ຺Λߟྀ͠ͳ͍Ϟσϧ(্4ͭ)ʹର͠ ͯɺจ຺Λߟྀ͢ΔϞσϧ(Լ4ͭ)ͷํ ͕ߏจ໦Λ࠶ݱͰ͖͍ͯΔ2 2 ܎Γड͚ߏ଄ʹ͍ͭͯɺछผ΍ํ޲͸ແࢹͯ͠ධՁ͍ͯ͠Δ

Results (Figure 2)

Results (Figure 4) • ࠨ: ߏจ໦Ͱܭࢉͨ͠୯ޠؒڑ཭ • ӈ: BERT(large) 16
૚໨Ͱܭࢉ͠ ͨ୯ޠؒڑ཭ • શମతͳߏ଄Λ࠶ݱͰ͖͍ͯͦ͏

future works • ڑ཭ͦͷ΋ͷͰ͸ͳ͘ڑ཭ͷ 2 ৐Λ༻ ͍Δ͜ͱ͕ॏཁͩͱ࣮ݧ͔ΒΘ͔ͬͨ • ͳͥ 2
৐ͷํ͕ྑ͍ͷ͔͸Α͘෼͔Β ͳ͔ͬͨ

͜͜·Ͱ͕ Context

໨࣍ 1.Context & related works 2.Geometry of syntax <- 3.Geometry
of word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion

Geometry of syntax • BERT ͷֶश݁Ռʹ͍ͭͯɺ࣍ͷ 2 ͭͷ؍఺͔Βߦͬͨ 1.ͦ΋ͦ΋໾ʹཱͭදݱΛֶशͰ͖͍ͯΔͷ͔ 2.ߏจ໦ΛֶशͰ͖͍ͯΔͷ͔

Attention probes and dependency representations • BERT ͷֶश݁Ռʹؔ͢ΔఆྔධՁ (༧උ࣮ݧ) •
Penn Treebank ͷσʔλΛ༻͍ͯɺ 2 ͭͷ୯ޠͷؒͷ܎Γड͚ߏ଄Λ൑ఆ ͤ͞ΔλεΫ • BERT ͷग़ྗΛ΋ͱʹͯ͠ऑ͍Ϟσϧ (ઢܗࣝผػ + L2 ਖ਼ଇԽ) Ͱֶश • ݁Ռɺaccuracy ͕ 85.8% ͋ͬͨͷ Ͱɺ࣍ʹਐΜͰ΋ྑͦ͞͏ͩͱ൑அͯ͠ ͍Δ

Mathematics of embedding trees in Euclidean space • ϊʔυ͔ΒͳΔ໦͸ ʹڑ཭
(తͳ΋ͷ)Λอͬͨ··ຒΊࠐΊΔ͜ ͱ͕਺ֶతʹূ໌Ͱ͖ͨ • ·ͨɺڑ཭ͦͷ΋ͷΛ༻͍ͯ͠·͏ ͱɺڑ཭ΛอͭຒΊࠐΈ͕Ͱ͖ͳ͍Α ͏ͳ৔߹͕͋Δ͜ͱ΋ࣔ͞Εͨ • ͜ΕʹΑΓ॓୊͕ղܾͨ͠ͱ͍ͯ͠Δ

ͭ·Γ? • blog هࣄͰৄ͘͠ղઆ͞Ε͍ͯΔͷ Ͱɺৄࡉ͕ؾʹͳͬͨΒ͔͜͜ΒೖΔ ͷ͕͓͢͢Ί • https://pair-code.github.io/ interpretability/bert-tree/

Visualization of parse tree embeddings • ߏจ໦ͷڑ཭ΛอͭΑ͏ͳຒΊࠐΈͱ BERT ͱͷ݁Ռ͕ྨࣅ

Visualization of parse tree embeddings • ߏจ໦ΛຒΊࠐΜͩ݁ՌͱɺBERT ͷ ֶश݁ՌͱͰڑ཭Λൺֱ •
ൺΛͱͬͨ஋Λ৭Ͱදࣔ • BERT / ਅͷߏจ໦ Λදࣔ • ੺͍఺ઢ͸ߏจ໦্Ͱܨ͕Γ͕ͳ͔ͬ ͕ͨɺBERT ͷֶश݁ՌͰ͸ۙ͘ͳͬ ͨ΋ͷ • part/of, sale/of ͳͲͻͱ·ͱ· ΓͰѻ͏ͷ͕ྑͦ͞͏ͳ΋ͷ͸͍ۙ

Visualization of parse tree embeddings • ߏจ໦ΛຒΊࠐΜͩ݁ՌͱɺBERT ͷ ֶश݁ՌͱͰڑ཭ͷൺͷ෼෍Λݕ౼ •
ґଘؔ܎͝ͱʹूܭͨ݁͠Ռ͕ӈਤ • ؔ܎͝ͱʹ 1.2 ͔Β 2.5 ·Ͱ޿͘ ෼෍͍ͯ͠Δ • ؔ܎ੑʹରͯ͠ఆྔతͳ؍఺Λ BERT ͕Ճ͍͑ͯΔ͜ͱΛࣔࠦ͢Δ݁Ռ

໨࣍ 1.Context & related works 2.Geometry of syntax 3.Geometry of
word senses <- • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion

Geometry of word senses • ߏจ͚ͩͰ͸ͳ͘୯ޠͷҙຯΛଊ͑ΒΕ͍ͯΔ͔ݕ౼ • ҙຯΛද͢෦෼ۭ͕ؒಘΒΕͳ͍͔࣮ݧ • Ͳ͏΍ΒಘΒΕͨ
! • จ຺Λਓ޻తʹௐઅͰ͖ͳ͍͔࣮ݧ • Ͱ͖ͳ͔ͬͨͲ͜Ζ͔ѱԽͨ͠

Measurement of word sense disambiguation capability • BERT ͷग़ྗΛ UMAP
ͰՄࢹԽ • ಉ͡ "die" ʹରͯ͠΋ෳ਺ͷҙຯΛ ΋ͭΫϥελ͕Ͱ͖͍ͯΔ • kNN Λ࢖ͬͯޠٛᐆດੑղফλεΫΛ ߦͬͨ݁Ռ accuracy 71.1% (SOTA)

ҙຯͷ৘ใͷ෼཭ • "structural probe" ͱಉ༷ʹͯ͠ ҙຯΛද͢෦෼ۭؒΛநग़ • ߏจ໦ͱͷڑ཭ͷࠩͰ͸ͳ͘ɺ୯ޠ ͷҙຯؒͰͷίαΠϯྨࣅ౓Λར༻ (ৄࡉෆ໌)
• ࣍ݩ࡟ݮલͷ accuracy ͸ 71.1% • ࣍ݩ࡟ݮΛߦ͏ͱগ্͕͠Δ • ҙຯͷ෦෼ۭؒͱ͍͏΋ͷ͕͋Γͦ͏

Embedding distance and context: a concatenation experiment • จ຺Λҙਤతʹૢ࡞͢Δ͜ͱͰྑ͍݁ ՌΛಘΒΕͳ͍͔࣮ݧ
• ಛఆͷҙຯ͋Δ୯ޠΛ༻͍͍ͯΔ୅ද తͳจΛݟ͚ͭग़͠ɺಉ͡ҙຯͰಉ͡ ୯ޠΛ༻͍͍ͯΔจʹ࿈݁ͨ͠ • "I went to Edo" ͕୅දతͳจ ͳ৔߹ɺ"He went to Edo"ʹ෇ ͚଍ͯ͠"He went to Edo and I went to Edo" ͱ͍͏จΛ࡞Δ

Embedding distance and context: a concatenation experiment • ԣ࣠: BERT
ͷϨΠϠʔ • ॎ࣠: ҙຯͷҧ͏Ϋϥελͷத৺ͱͷ ڑ཭ͷൺతͳ΋ͷ (େ͖͍΄ͲΑ͍) • ୅දతͳจΛ෇͚Ճ͑ͨ৔߹ɺͦͷ୯ ޠͷҙຯΛΑΓΑ͘෼཭Ͱ͖Δ͔ͱ ࢥͬͨΒͦΜͳ͜ͱ͸ͳ͔ͬͨ

໨࣍ 1.Context & related works 2.Geometry of syntax 3.Geometry of
word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion <-

Conclusion • "structural probe" ʹ਺ֶతͳҙຯ෇͚Λߦͬͨ • ߏจ໦ͷຒΊࠐΈͱBERTͷֶश݁ՌΛൺֱͨ͠ͱ͜ΖɺߏจΛ ֶश͍ͯͦ͠͏ͳ݁Ռ͕ಘΒΕͨ • ߏจΛֶश͢ΔۭؒͱผʹɺҙຯΛֶश͢Δۭ͕ؒ͋Γͦ͏ͳ
͜ͱ͕Θ͔ͬͨ • ଞʹࣗવݴޠతͳҙຯͰॏཁͳ෦෼ۭ͕ؒ͋Δ͔͸ࠓޙͷݚڀ ՝୊

࠷ޙʹ • ࠓ೔ͷΠϕϯτͷ෮श • TensorFlow User Group Tokyo • NN࿦จΛࡘʹञΛҿΉձ

TensorFlow User Group Tokyo NN࿦จΛࡘʹञΛҿΉձ #9

Visualizing and Measuring the Geometry of BERT

Visualizing and Measuring the Geometry of BERT

More Decks by Asei Sugiyama

Other Decks in Technology

Featured

Transcript