Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up for free
Visualizing and Measuring the Geometry of BERT
Asei Sugiyama
September 04, 2019
Technology
0
470
Visualizing and Measuring the Geometry of BERT
NN論文を肴に酒を飲む会 #9
https://tfug-tokyo.connpass.com/event/143283/
での発表用資料です
Asei Sugiyama
September 04, 2019
Tweet
Share
More Decks by Asei Sugiyama
See All by Asei Sugiyama
Introduction to TensorFlow Privacy
asei
0
29
Kubeflow Pipelines on the Front Line
asei
0
510
NeurIPS Data-Centric AI Workshop
asei
3
1.9k
Introduction to Fairness Aware Learning
asei
0
570
Dive into TensorFlow Data Validation
asei
0
670
Vertex Pipelines ではじめるサーバーレス機械学習パイプライン
asei
2
1.6k
機械学習パイプラインの要件と Vertex Pipelines / Kubeflow Pipelines V2 による実装
asei
2
1.6k
Introduction to ML Pipelines
asei
8
6k
Introduction to Practical Machine Learning
asei
17
7.3k
Other Decks in Technology
See All in Technology
tfcon-2022-cpp
cpp
5
5.2k
一人から始めるプロダクトSRE / How to start SRE in a product team, all by yourself
vtryo
4
3k
成長を続ける組織でのSRE戦略:プレモーテムによる信頼性の認識共有 SRE Next 2022
niwatakeru
7
3k
CTOのためのQAのつくりかた #scrumniigata / SigSQA How to create QA for CTOs and VPoEs
caori_t
0
360
大きくなるチームを支える技術 / Technology to support a growing SCX team
ku00
0
140
LIFF Deep Dive 2022
line_developers
PRO
1
790
IDOLY PRIDEにおけるAssetBundleビルドパイプラインについて
qualiarts
0
350
個人的に使って良かったUiPathアクティビティ
saoritakita
0
270
Steps toward self-service operations in eureka
fukubaka0825
0
980
testing journey / テストが嫌いでIT業界を離れるはずだったのに〜テスト嫌いが現場で品質改善を実施するまでの物語〜
aki_moon
1
440
toilを解消した話
asumaywy
0
220
Why_Enterprise_Grid_20220525.pdf
na2neko
0
100
Featured
See All Featured
Build your cross-platform service in a week with App Engine
jlugia
219
17k
Typedesign – Prime Four
hannesfritz
33
1.3k
It's Worth the Effort
3n
172
25k
BBQ
matthewcrist
74
7.9k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
4
450
No one is an island. Learnings from fostering a developers community.
thoeni
9
1.1k
Pencils Down: Stop Designing & Start Developing
hursman
112
9.8k
Scaling GitHub
holman
451
140k
StorybookのUI Testing Handbookを読んだ
zakiyama
4
2k
Agile that works and the tools we love
rasmusluckow
319
19k
A Philosophy of Restraint
colly
192
14k
ParisWeb 2013: Learning to Love: Crash Course in Emotional UX Design
dotmariusz
100
5.9k
Transcript
Visualizing and Measuring the Geometry of BERT NN จΛࡘʹञΛҿΉձ #9
ࣗݾհ • ਿࢁ Ѩ • Software Engineer @Repro • ػցֶशͱ͔౷ܭͱ͔։ൃͱ͔
• TensorFlow Docs ༁ & ϨϏϡʔ • ػցֶशਤؑ ڞஶ
Abstract • Google PAIRͰհ͞Ε͍ͯͨจ • ࣗવݴޠॲཧʹ͓͍ͯ Transformer ʹࣅͨΞʔΩςΫνϟͷ ωοτϫʔΫۃΊͯ༗ •
ͦͷΑ͏ͳωοτϫʔΫͰࣗવݴޠॲཧʹ͓͚ΔಛΛ෦Ͱ ͲͷΑ͏ʹอ͍࣋ͯ͠Δͷ͔໌Β͔ʹ͍ͨ͠ • BERT ʹ͍ͭͯఆྔɾఆੑతͳੳΛߦͬͨ • ҙຯɾߏจతͳใΛֶश͍ͯͦ͠͏ͳ݁Ռ͕ಘΒΕͨ
࣍ 1.Context & related works <- 2.Geometry of syntax 3.Geometry
of word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion
Context & related works • A Structural Probe for Finding
Syntax in Word Representations (2019) ͷΞϯαʔʹͳ͍ͬͯΔ • ͜ͷจൈ͖ʹ΄ͱΜͲԿΘ͔Βͳ͍ߏ
!
2 ഒಡΊΔ͓ಘͳจ
A Structural Probe for Finding Syntax in Word Representations NN
จΛࡘʹञΛҿΉձ #9
ࣗݾհ • ਿࢁ Ѩ • Software Engineer @Repro • ػցֶशͱ͔౷ܭͱ͔։ൃͱ͔
• TensorFlow Docs ༁ & ϨϏϡʔ • ػցֶशਤؑ ڞஶ
Abstract • Stanford େֶͷจ • ୯ޠදݱʹ͍ͭͯղੳ͕ਐΜͰ͖͍ͯΔ͕ɺߏจͷදݱ͕ ֶश͞Ε͍ͯΔ͔ʹ͍ͭͯ͜Ε·Ͱ͔֬ΊΒΕ͍ͯͳ͍ • ຊݚڀͰ structual
probe ͱ͍͏ख๏ΛఏҊ͢Δ • ͜Εneural networkͷ୯ޠදݱΛઢܗมۭͨؒ͠ʹߏจ ͕ຒΊࠐ·Ε͍ͯΔ͔ΛධՁ͢ΔͷͰ͋Δ • ELMo, BERT ͰߏจΛֶश͍ͯ͠Δͱࣔࠦ͢Δ݁ՌΛಘͨ
ݚڀͷత • ਂϞσϧͰߏจΛֶश͍ͯ͠Δͷ͔ɺͱ͍͏ٙʹ͑ ͍ͨ ͜ͷจͰઆ໌͢Δ͜ͱ • ୯ޠදݱ͔ΒߏจΛݟ͚ͭΔํ๏ʹ͍ͭͯ • ୯ޠදݱͷ࣍ݩͷࣹӨ͔Βߏจʹؔ͢ΔใΛ෮ݩ͠ɺ ධՁ͢Δํ๏ͱͦͷ۩ମྫ
(ELMo, BERT)ʹ͍ͭͯ
ख๏ͷΞΠσΞ • άϥϑͷϊʔυؒͷڑΛอͬͨ·· ϕΫτϧۭؒʹຒΊࠐΉ͜ͱΛߟ͑Δ • ͜͠Ε͕Ͱ͖͍ͯΕɺ͋Δϊʔυ ͷྡͷϊʔυ Λ୳͢͜ͱۙ ୳ࡧͱಉ͡ •
·ͨɺϞσϧ͕ਖ਼͘͠ߏΛֶश͢ ΕɺͦͷදݱۭؒͷҰ෦͚ͩΛར༻ ͢ΔͷͰͳ͍͔ • දݱۭؒͷ෦ۭؒͰɺߏͷڑ Λอ͍ͬͯΔΑ͏ͳͷΛ୳ͤྑ͍
ͭ·Γ? • ղઆهࣄ1ʹ͋Δਤ͕Θ͔Γ͍͢ • ࠨͷۭ͕ؒ୯ޠͷදݱۭؒ • ࠨਤதͷփ৭ͷฏ໘͕ߏΛදݱ͠ ͍ͯΔ෦ۭؒ • ӈଆ͕෮ݩ͞Εͨߏ
1 https://nlp.stanford.edu//~johnhew//structural-probe.html
None
The structural probe • : ൪ͷจதͷ ൪ͷ୯ޠͱͦͷϕΫτϧ • : ߏจ্Ͱͷϊʔυؒڑ
• : ෦্ۭؒͰͷڑ
Results (Table 1) • จ຺Λߟྀ͠ͳ͍Ϟσϧ(্4ͭ)ʹର͠ ͯɺจ຺Λߟྀ͢ΔϞσϧ(Լ4ͭ)ͷํ ͕ߏจΛ࠶ݱͰ͖͍ͯΔ2 2 Γड͚ߏʹ͍ͭͯɺछผํແࢹͯ͠ධՁ͍ͯ͠Δ
Results (Figure 2)
Results (Figure 4) • ࠨ: ߏจͰܭࢉͨ͠୯ޠؒڑ • ӈ: BERT(large) 16
Ͱܭࢉ͠ ͨ୯ޠؒڑ • શମతͳߏΛ࠶ݱͰ͖͍ͯͦ͏
future works • ڑͦͷͷͰͳ͘ڑͷ 2 Λ༻ ͍Δ͜ͱ͕ॏཁͩͱ࣮ݧ͔ΒΘ͔ͬͨ • ͳͥ 2
ͷํ͕ྑ͍ͷ͔Α͔͘Β ͳ͔ͬͨ
͜͜·Ͱ͕ Context
࣍ 1.Context & related works 2.Geometry of syntax <- 3.Geometry
of word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion
Geometry of syntax • BERT ͷֶश݁Ռʹ͍ͭͯɺ࣍ͷ 2 ͭͷ؍͔Βߦͬͨ 1.ͦͦʹཱͭදݱΛֶशͰ͖͍ͯΔͷ͔ 2.ߏจΛֶशͰ͖͍ͯΔͷ͔
Attention probes and dependency representations • BERT ͷֶश݁Ռʹؔ͢ΔఆྔධՁ (༧උ࣮ݧ) •
Penn Treebank ͷσʔλΛ༻͍ͯɺ 2 ͭͷ୯ޠͷؒͷΓड͚ߏΛఆ ͤ͞ΔλεΫ • BERT ͷग़ྗΛͱʹͯ͠ऑ͍Ϟσϧ (ઢܗࣝผػ + L2 ਖ਼ଇԽ) Ͱֶश • ݁Ռɺaccuracy ͕ 85.8% ͋ͬͨͷ Ͱɺ࣍ʹਐΜͰྑͦ͞͏ͩͱஅͯ͠ ͍Δ
Mathematics of embedding trees in Euclidean space • ϊʔυ͔ΒͳΔ ʹڑ
(తͳͷ)Λอͬͨ··ຒΊࠐΊΔ͜ ͱֶ͕తʹূ໌Ͱ͖ͨ • ·ͨɺڑͦͷͷΛ༻͍ͯ͠·͏ ͱɺڑΛอͭຒΊࠐΈ͕Ͱ͖ͳ͍Α ͏ͳ߹͕͋Δ͜ͱࣔ͞Εͨ • ͜ΕʹΑΓ͕॓ղܾͨ͠ͱ͍ͯ͠Δ
ͭ·Γ? • blog هࣄͰৄ͘͠ղઆ͞Ε͍ͯΔͷ Ͱɺৄࡉ͕ؾʹͳͬͨΒ͔͜͜ΒೖΔ ͷ͕͓͢͢Ί • https://pair-code.github.io/ interpretability/bert-tree/
Visualization of parse tree embeddings • ߏจͷڑΛอͭΑ͏ͳຒΊࠐΈͱ BERT ͱͷ݁Ռ͕ྨࣅ
Visualization of parse tree embeddings • ߏจΛຒΊࠐΜͩ݁ՌͱɺBERT ͷ ֶश݁ՌͱͰڑΛൺֱ •
ൺΛͱͬͨΛ৭Ͱදࣔ • BERT / ਅͷߏจ Λදࣔ • ͍ઢߏจ্Ͱܨ͕Γ͕ͳ͔ͬ ͕ͨɺBERT ͷֶश݁ՌͰۙ͘ͳͬ ͨͷ • part/of, sale/of ͳͲͻͱ·ͱ· ΓͰѻ͏ͷ͕ྑͦ͞͏ͳͷ͍ۙ
None
Visualization of parse tree embeddings • ߏจΛຒΊࠐΜͩ݁ՌͱɺBERT ͷ ֶश݁ՌͱͰڑͷൺͷΛݕ౼ •
ґଘؔ͝ͱʹूܭͨ݁͠Ռ͕ӈਤ • ؔ͝ͱʹ 1.2 ͔Β 2.5 ·Ͱ͘ ͍ͯ͠Δ • ؔੑʹରͯ͠ఆྔతͳ؍Λ BERT ͕Ճ͍͑ͯΔ͜ͱΛࣔࠦ͢Δ݁Ռ
࣍ 1.Context & related works 2.Geometry of syntax 3.Geometry of
word senses <- • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion
Geometry of word senses • ߏจ͚ͩͰͳ͘୯ޠͷҙຯΛଊ͑ΒΕ͍ͯΔ͔ݕ౼ • ҙຯΛද͢෦ۭ͕ؒಘΒΕͳ͍͔࣮ݧ • Ͳ͏ΒಘΒΕͨ
! • จ຺ΛਓతʹௐઅͰ͖ͳ͍͔࣮ݧ • Ͱ͖ͳ͔ͬͨͲ͜Ζ͔ѱԽͨ͠
Measurement of word sense disambiguation capability • BERT ͷग़ྗΛ UMAP
ͰՄࢹԽ • ಉ͡ "die" ʹରͯ͠ෳͷҙຯΛ ͭΫϥελ͕Ͱ͖͍ͯΔ • kNN ΛͬͯޠٛᐆດੑղফλεΫΛ ߦͬͨ݁Ռ accuracy 71.1% (SOTA)
None
ҙຯͷใͷ • "structural probe" ͱಉ༷ʹͯ͠ ҙຯΛද͢෦ۭؒΛநग़ • ߏจͱͷڑͷࠩͰͳ͘ɺ୯ޠ ͷҙຯؒͰͷίαΠϯྨࣅΛར༻ (ৄࡉෆ໌)
• ࣍ݩݮલͷ accuracy 71.1% • ࣍ݩݮΛߦ͏ͱগ্͕͠Δ • ҙຯͷ෦ۭؒͱ͍͏ͷ͕͋Γͦ͏
Embedding distance and context: a concatenation experiment • จ຺Λҙਤతʹૢ࡞͢Δ͜ͱͰྑ͍݁ ՌΛಘΒΕͳ͍͔࣮ݧ
• ಛఆͷҙຯ͋Δ୯ޠΛ༻͍͍ͯΔද తͳจΛݟ͚ͭग़͠ɺಉ͡ҙຯͰಉ͡ ୯ޠΛ༻͍͍ͯΔจʹ࿈݁ͨ͠ • "I went to Edo" ͕දతͳจ ͳ߹ɺ"He went to Edo"ʹ ͚ͯ͠"He went to Edo and I went to Edo" ͱ͍͏จΛ࡞Δ
Embedding distance and context: a concatenation experiment • ԣ࣠: BERT
ͷϨΠϠʔ • ॎ࣠: ҙຯͷҧ͏Ϋϥελͷத৺ͱͷ ڑͷൺతͳͷ (େ͖͍΄ͲΑ͍) • දతͳจΛ͚Ճ͑ͨ߹ɺͦͷ୯ ޠͷҙຯΛΑΓΑ͘Ͱ͖Δ͔ͱ ࢥͬͨΒͦΜͳ͜ͱͳ͔ͬͨ
࣍ 1.Context & related works 2.Geometry of syntax 3.Geometry of
word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion <-
Conclusion • "structural probe" ʹֶతͳҙຯ͚Λߦͬͨ • ߏจͷຒΊࠐΈͱBERTͷֶश݁ՌΛൺֱͨ͠ͱ͜ΖɺߏจΛ ֶश͍ͯͦ͠͏ͳ݁Ռ͕ಘΒΕͨ • ߏจΛֶश͢ΔۭؒͱผʹɺҙຯΛֶश͢Δۭ͕ؒ͋Γͦ͏ͳ
͜ͱ͕Θ͔ͬͨ • ଞʹࣗવݴޠతͳҙຯͰॏཁͳ෦ۭ͕ؒ͋Δ͔ࠓޙͷݚڀ ՝
࠷ޙʹ • ࠓͷΠϕϯτͷ෮श • TensorFlow User Group Tokyo • NNจΛࡘʹञΛҿΉձ
None
None
TensorFlow User Group Tokyo NNจΛࡘʹञΛҿΉձ #9