Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Visualizing and Measuring the Geometry of BERT
Search
Asei Sugiyama
September 04, 2019
Technology
0
900
Visualizing and Measuring the Geometry of BERT
NN論文を肴に酒を飲む会 #9
https://tfug-tokyo.connpass.com/event/143283/
での発表用資料です
Asei Sugiyama
September 04, 2019
Tweet
Share
More Decks by Asei Sugiyama
See All by Asei Sugiyama
エージェントの継続的改善のためのメトリクス再考
asei
3
470
生成AI活用のベストプラクティス集を作ってる件
asei
1
630
GenAIOps: 生成AI時代の DevOps
asei
0
40
生成AI活用の実践解説 (速報版)
asei
1
1k
実践AIガバナンス
asei
3
780
Eval-Centric AI: Agent 開発におけるベストプラクティスの探求
asei
1
310
AI工学特論: MLOps・継続的評価
asei
11
2.9k
生成AIを用いるサービス開発の原則
asei
1
79
基調講演: 生成AIを活用したアプリケーションの開発手法とは?
asei
2
560
Other Decks in Technology
See All in Technology
入院医療費算定業務をAIで支援する:包括医療費支払い制度とDPCコーディング (公開版)
hagino3000
0
110
オブザーバビリティと育てた ID管理・認証認可基盤の歩み / The Journey of an ID Management, Authentication, and Authorization Platform Nurtured with Observability
kaminashi
1
610
ヘンリー会社紹介資料(エンジニア向け) / company deck for engineer
henryofficial
0
360
Introdução a Service Mesh usando o Istio
aeciopires
1
280
マルチエージェントのチームビルディング_2025-10-25
shinoyamada
0
140
[2025年10月版] Databricks Data + AI Boot Camp
databricksjapan
1
260
ViteとTypeScriptのProject Referencesで 大規模モノレポのUIカタログのリリースサイクルを高速化する
shuta13
3
200
ソースを読むプロセスの例
sat
PRO
15
9.9k
ゼロコード計装導入後のカスタム計装でさらに可観測性を高めよう
sansantech
PRO
1
320
20251027_マルチエージェントとは
almondo_event
1
350
IoTLT@ストラタシスジャパン_20251021
norioikedo
0
110
生成AI時代のPythonセキュリティとガバナンス
abenben
0
130
Featured
See All Featured
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.7k
A designer walks into a library…
pauljervisheath
209
24k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4k
Rebuilding a faster, lazier Slack
samanthasiow
84
9.2k
Making Projects Easy
brettharned
120
6.4k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.5k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3.1k
The Power of CSS Pseudo Elements
geoffreycrofte
80
6k
Unsuck your backbone
ammeep
671
58k
How GitHub (no longer) Works
holman
315
140k
RailsConf 2023
tenderlove
30
1.3k
The Cost Of JavaScript in 2023
addyosmani
55
9.1k
Transcript
Visualizing and Measuring the Geometry of BERT NN จΛࡘʹञΛҿΉձ #9
ࣗݾհ • ਿࢁ Ѩ • Software Engineer @Repro • ػցֶशͱ͔౷ܭͱ͔։ൃͱ͔
• TensorFlow Docs ༁ & ϨϏϡʔ • ػցֶशਤؑ ڞஶ
Abstract • Google PAIRͰհ͞Ε͍ͯͨจ • ࣗવݴޠॲཧʹ͓͍ͯ Transformer ʹࣅͨΞʔΩςΫνϟͷ ωοτϫʔΫۃΊͯ༗ •
ͦͷΑ͏ͳωοτϫʔΫͰࣗવݴޠॲཧʹ͓͚ΔಛΛ෦Ͱ ͲͷΑ͏ʹอ͍࣋ͯ͠Δͷ͔໌Β͔ʹ͍ͨ͠ • BERT ʹ͍ͭͯఆྔɾఆੑతͳੳΛߦͬͨ • ҙຯɾߏจతͳใΛֶश͍ͯͦ͠͏ͳ݁Ռ͕ಘΒΕͨ
࣍ 1.Context & related works <- 2.Geometry of syntax 3.Geometry
of word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion
Context & related works • A Structural Probe for Finding
Syntax in Word Representations (2019) ͷΞϯαʔʹͳ͍ͬͯΔ • ͜ͷจൈ͖ʹ΄ͱΜͲԿΘ͔Βͳ͍ߏ
!
2 ഒಡΊΔ͓ಘͳจ
A Structural Probe for Finding Syntax in Word Representations NN
จΛࡘʹञΛҿΉձ #9
ࣗݾհ • ਿࢁ Ѩ • Software Engineer @Repro • ػցֶशͱ͔౷ܭͱ͔։ൃͱ͔
• TensorFlow Docs ༁ & ϨϏϡʔ • ػցֶशਤؑ ڞஶ
Abstract • Stanford େֶͷจ • ୯ޠදݱʹ͍ͭͯղੳ͕ਐΜͰ͖͍ͯΔ͕ɺߏจͷදݱ͕ ֶश͞Ε͍ͯΔ͔ʹ͍ͭͯ͜Ε·Ͱ͔֬ΊΒΕ͍ͯͳ͍ • ຊݚڀͰ structual
probe ͱ͍͏ख๏ΛఏҊ͢Δ • ͜Εneural networkͷ୯ޠදݱΛઢܗมۭͨؒ͠ʹߏจ ͕ຒΊࠐ·Ε͍ͯΔ͔ΛධՁ͢ΔͷͰ͋Δ • ELMo, BERT ͰߏจΛֶश͍ͯ͠Δͱࣔࠦ͢Δ݁ՌΛಘͨ
ݚڀͷత • ਂϞσϧͰߏจΛֶश͍ͯ͠Δͷ͔ɺͱ͍͏ٙʹ͑ ͍ͨ ͜ͷจͰઆ໌͢Δ͜ͱ • ୯ޠදݱ͔ΒߏจΛݟ͚ͭΔํ๏ʹ͍ͭͯ • ୯ޠදݱͷ࣍ݩͷࣹӨ͔Βߏจʹؔ͢ΔใΛ෮ݩ͠ɺ ධՁ͢Δํ๏ͱͦͷ۩ମྫ
(ELMo, BERT)ʹ͍ͭͯ
ख๏ͷΞΠσΞ • άϥϑͷϊʔυؒͷڑΛอͬͨ·· ϕΫτϧۭؒʹຒΊࠐΉ͜ͱΛߟ͑Δ • ͜͠Ε͕Ͱ͖͍ͯΕɺ͋Δϊʔυ ͷྡͷϊʔυ Λ୳͢͜ͱۙ ୳ࡧͱಉ͡ •
·ͨɺϞσϧ͕ਖ਼͘͠ߏΛֶश͢ ΕɺͦͷදݱۭؒͷҰ෦͚ͩΛར༻ ͢ΔͷͰͳ͍͔ • දݱۭؒͷ෦ۭؒͰɺߏͷڑ Λอ͍ͬͯΔΑ͏ͳͷΛ୳ͤྑ͍
ͭ·Γ? • ղઆهࣄ1ʹ͋Δਤ͕Θ͔Γ͍͢ • ࠨͷۭ͕ؒ୯ޠͷදݱۭؒ • ࠨਤதͷփ৭ͷฏ໘͕ߏΛදݱ͠ ͍ͯΔ෦ۭؒ • ӈଆ͕෮ݩ͞Εͨߏ
1 https://nlp.stanford.edu//~johnhew//structural-probe.html
None
The structural probe • : ൪ͷจதͷ ൪ͷ୯ޠͱͦͷϕΫτϧ • : ߏจ্Ͱͷϊʔυؒڑ
• : ෦্ۭؒͰͷڑ
Results (Table 1) • จ຺Λߟྀ͠ͳ͍Ϟσϧ(্4ͭ)ʹର͠ ͯɺจ຺Λߟྀ͢ΔϞσϧ(Լ4ͭ)ͷํ ͕ߏจΛ࠶ݱͰ͖͍ͯΔ2 2 Γड͚ߏʹ͍ͭͯɺछผํແࢹͯ͠ධՁ͍ͯ͠Δ
Results (Figure 2)
Results (Figure 4) • ࠨ: ߏจͰܭࢉͨ͠୯ޠؒڑ • ӈ: BERT(large) 16
Ͱܭࢉ͠ ͨ୯ޠؒڑ • શମతͳߏΛ࠶ݱͰ͖͍ͯͦ͏
future works • ڑͦͷͷͰͳ͘ڑͷ 2 Λ༻ ͍Δ͜ͱ͕ॏཁͩͱ࣮ݧ͔ΒΘ͔ͬͨ • ͳͥ 2
ͷํ͕ྑ͍ͷ͔Α͔͘Β ͳ͔ͬͨ
͜͜·Ͱ͕ Context
࣍ 1.Context & related works 2.Geometry of syntax <- 3.Geometry
of word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion
Geometry of syntax • BERT ͷֶश݁Ռʹ͍ͭͯɺ࣍ͷ 2 ͭͷ؍͔Βߦͬͨ 1.ͦͦʹཱͭදݱΛֶशͰ͖͍ͯΔͷ͔ 2.ߏจΛֶशͰ͖͍ͯΔͷ͔
Attention probes and dependency representations • BERT ͷֶश݁Ռʹؔ͢ΔఆྔධՁ (༧උ࣮ݧ) •
Penn Treebank ͷσʔλΛ༻͍ͯɺ 2 ͭͷ୯ޠͷؒͷΓड͚ߏΛఆ ͤ͞ΔλεΫ • BERT ͷग़ྗΛͱʹͯ͠ऑ͍Ϟσϧ (ઢܗࣝผػ + L2 ਖ਼ଇԽ) Ͱֶश • ݁Ռɺaccuracy ͕ 85.8% ͋ͬͨͷ Ͱɺ࣍ʹਐΜͰྑͦ͞͏ͩͱஅͯ͠ ͍Δ
Mathematics of embedding trees in Euclidean space • ϊʔυ͔ΒͳΔ ʹڑ
(తͳͷ)Λอͬͨ··ຒΊࠐΊΔ͜ ͱֶ͕తʹূ໌Ͱ͖ͨ • ·ͨɺڑͦͷͷΛ༻͍ͯ͠·͏ ͱɺڑΛอͭຒΊࠐΈ͕Ͱ͖ͳ͍Α ͏ͳ߹͕͋Δ͜ͱࣔ͞Εͨ • ͜ΕʹΑΓ͕॓ղܾͨ͠ͱ͍ͯ͠Δ
ͭ·Γ? • blog هࣄͰৄ͘͠ղઆ͞Ε͍ͯΔͷ Ͱɺৄࡉ͕ؾʹͳͬͨΒ͔͜͜ΒೖΔ ͷ͕͓͢͢Ί • https://pair-code.github.io/ interpretability/bert-tree/
Visualization of parse tree embeddings • ߏจͷڑΛอͭΑ͏ͳຒΊࠐΈͱ BERT ͱͷ݁Ռ͕ྨࣅ
Visualization of parse tree embeddings • ߏจΛຒΊࠐΜͩ݁ՌͱɺBERT ͷ ֶश݁ՌͱͰڑΛൺֱ •
ൺΛͱͬͨΛ৭Ͱදࣔ • BERT / ਅͷߏจ Λදࣔ • ͍ઢߏจ্Ͱܨ͕Γ͕ͳ͔ͬ ͕ͨɺBERT ͷֶश݁ՌͰۙ͘ͳͬ ͨͷ • part/of, sale/of ͳͲͻͱ·ͱ· ΓͰѻ͏ͷ͕ྑͦ͞͏ͳͷ͍ۙ
None
Visualization of parse tree embeddings • ߏจΛຒΊࠐΜͩ݁ՌͱɺBERT ͷ ֶश݁ՌͱͰڑͷൺͷΛݕ౼ •
ґଘؔ͝ͱʹूܭͨ݁͠Ռ͕ӈਤ • ؔ͝ͱʹ 1.2 ͔Β 2.5 ·Ͱ͘ ͍ͯ͠Δ • ؔੑʹରͯ͠ఆྔతͳ؍Λ BERT ͕Ճ͍͑ͯΔ͜ͱΛࣔࠦ͢Δ݁Ռ
࣍ 1.Context & related works 2.Geometry of syntax 3.Geometry of
word senses <- • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion
Geometry of word senses • ߏจ͚ͩͰͳ͘୯ޠͷҙຯΛଊ͑ΒΕ͍ͯΔ͔ݕ౼ • ҙຯΛද͢෦ۭ͕ؒಘΒΕͳ͍͔࣮ݧ • Ͳ͏ΒಘΒΕͨ
! • จ຺ΛਓతʹௐઅͰ͖ͳ͍͔࣮ݧ • Ͱ͖ͳ͔ͬͨͲ͜Ζ͔ѱԽͨ͠
Measurement of word sense disambiguation capability • BERT ͷग़ྗΛ UMAP
ͰՄࢹԽ • ಉ͡ "die" ʹରͯ͠ෳͷҙຯΛ ͭΫϥελ͕Ͱ͖͍ͯΔ • kNN ΛͬͯޠٛᐆດੑղফλεΫΛ ߦͬͨ݁Ռ accuracy 71.1% (SOTA)
None
ҙຯͷใͷ • "structural probe" ͱಉ༷ʹͯ͠ ҙຯΛද͢෦ۭؒΛநग़ • ߏจͱͷڑͷࠩͰͳ͘ɺ୯ޠ ͷҙຯؒͰͷίαΠϯྨࣅΛར༻ (ৄࡉෆ໌)
• ࣍ݩݮલͷ accuracy 71.1% • ࣍ݩݮΛߦ͏ͱগ্͕͠Δ • ҙຯͷ෦ۭؒͱ͍͏ͷ͕͋Γͦ͏
Embedding distance and context: a concatenation experiment • จ຺Λҙਤతʹૢ࡞͢Δ͜ͱͰྑ͍݁ ՌΛಘΒΕͳ͍͔࣮ݧ
• ಛఆͷҙຯ͋Δ୯ޠΛ༻͍͍ͯΔද తͳจΛݟ͚ͭग़͠ɺಉ͡ҙຯͰಉ͡ ୯ޠΛ༻͍͍ͯΔจʹ࿈݁ͨ͠ • "I went to Edo" ͕දతͳจ ͳ߹ɺ"He went to Edo"ʹ ͚ͯ͠"He went to Edo and I went to Edo" ͱ͍͏จΛ࡞Δ
Embedding distance and context: a concatenation experiment • ԣ࣠: BERT
ͷϨΠϠʔ • ॎ࣠: ҙຯͷҧ͏Ϋϥελͷத৺ͱͷ ڑͷൺతͳͷ (େ͖͍΄ͲΑ͍) • දతͳจΛ͚Ճ͑ͨ߹ɺͦͷ୯ ޠͷҙຯΛΑΓΑ͘Ͱ͖Δ͔ͱ ࢥͬͨΒͦΜͳ͜ͱͳ͔ͬͨ
࣍ 1.Context & related works 2.Geometry of syntax 3.Geometry of
word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion <-
Conclusion • "structural probe" ʹֶతͳҙຯ͚Λߦͬͨ • ߏจͷຒΊࠐΈͱBERTͷֶश݁ՌΛൺֱͨ͠ͱ͜ΖɺߏจΛ ֶश͍ͯͦ͠͏ͳ݁Ռ͕ಘΒΕͨ • ߏจΛֶश͢ΔۭؒͱผʹɺҙຯΛֶश͢Δۭ͕ؒ͋Γͦ͏ͳ
͜ͱ͕Θ͔ͬͨ • ଞʹࣗવݴޠతͳҙຯͰॏཁͳ෦ۭ͕ؒ͋Δ͔ࠓޙͷݚڀ ՝
࠷ޙʹ • ࠓͷΠϕϯτͷ෮श • TensorFlow User Group Tokyo • NNจΛࡘʹञΛҿΉձ
None
None
TensorFlow User Group Tokyo NNจΛࡘʹञΛҿΉձ #9