Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Visualizing and Measuring the Geometry of BERT
Search
Asei Sugiyama
September 04, 2019
Technology
0
810
Visualizing and Measuring the Geometry of BERT
NN論文を肴に酒を飲む会 #9
https://tfug-tokyo.connpass.com/event/143283/
での発表用資料です
Asei Sugiyama
September 04, 2019
Tweet
Share
More Decks by Asei Sugiyama
See All by Asei Sugiyama
LLMOps: Eval-Centric を前提としたMLOps
asei
2
99
The Rise of LLMOps
asei
11
2.1k
生成AIの活用パターンと継続的評価
asei
14
2.1k
最近の Citadel AI の取り組みのご紹介 (Nov, 2024)
asei
2
57
仕事で取り組む 生成 AI 時代の対話の品質評価
asei
2
53
MLOps の処方箋ができるまで
asei
3
530
LLM を現場で評価する
asei
5
920
生成 AI の評価方法
asei
8
2k
対話品質の評価に向き合う
asei
4
410
Other Decks in Technology
See All in Technology
電話を切らさない技術 電話自動応答サービスを支える フロントエンド
barometrica
2
1.5k
モバイルアプリ開発未経験者が プロダクト開発に携わるまでに取り組んだこと/nikkei-tech-talk-27-3
nikkei_engineer_recruiting
0
100
Windows Server 2025 Pay as you Go ライセンスを試す
murachiakira
0
110
生成AIが変えるデータ分析の全体像
ishikawa_satoru
0
310
【平成レトロ】へぇボタンハック👨🔧
vanchan2625
0
150
クルマのサブスクを Next.jsで内製化した経験とその1年後
kintotechdev
2
120
Postman Flowsで作るAPI連携LINE Bot
miura55
0
200
個人でもIAM Identity Centerを使おう!(アクセス管理編)
ryder472
4
260
飲食店データの分析事例とそれを支えるデータ基盤
kimujun
0
240
A Tour of Anti-patterns for Functional Programming
guvalif
0
1.9k
140年の歴史あるエンタープライズ企業の内製化×マイクロサービス化への航海
yussugi
0
2.5k
OCI 運用監視サービス 概要
oracle4engineer
PRO
0
4.9k
Featured
See All Featured
Being A Developer After 40
akosma
87
590k
Building Adaptive Systems
keathley
38
2.3k
Art, The Web, and Tiny UX
lynnandtonic
297
20k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
26
1.4k
Gamification - CAS2011
davidbonilla
80
5k
Fashionably flexible responsive web design (full day workshop)
malarkey
405
65k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
16
2.1k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
191
16k
Designing for humans not robots
tammielis
250
25k
Navigating Team Friction
lara
183
14k
Teambox: Starting and Learning
jrom
133
8.8k
Scaling GitHub
holman
458
140k
Transcript
Visualizing and Measuring the Geometry of BERT NN จΛࡘʹञΛҿΉձ #9
ࣗݾհ • ਿࢁ Ѩ • Software Engineer @Repro • ػցֶशͱ͔౷ܭͱ͔։ൃͱ͔
• TensorFlow Docs ༁ & ϨϏϡʔ • ػցֶशਤؑ ڞஶ
Abstract • Google PAIRͰհ͞Ε͍ͯͨจ • ࣗવݴޠॲཧʹ͓͍ͯ Transformer ʹࣅͨΞʔΩςΫνϟͷ ωοτϫʔΫۃΊͯ༗ •
ͦͷΑ͏ͳωοτϫʔΫͰࣗવݴޠॲཧʹ͓͚ΔಛΛ෦Ͱ ͲͷΑ͏ʹอ͍࣋ͯ͠Δͷ͔໌Β͔ʹ͍ͨ͠ • BERT ʹ͍ͭͯఆྔɾఆੑతͳੳΛߦͬͨ • ҙຯɾߏจతͳใΛֶश͍ͯͦ͠͏ͳ݁Ռ͕ಘΒΕͨ
࣍ 1.Context & related works <- 2.Geometry of syntax 3.Geometry
of word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion
Context & related works • A Structural Probe for Finding
Syntax in Word Representations (2019) ͷΞϯαʔʹͳ͍ͬͯΔ • ͜ͷจൈ͖ʹ΄ͱΜͲԿΘ͔Βͳ͍ߏ
!
2 ഒಡΊΔ͓ಘͳจ
A Structural Probe for Finding Syntax in Word Representations NN
จΛࡘʹञΛҿΉձ #9
ࣗݾհ • ਿࢁ Ѩ • Software Engineer @Repro • ػցֶशͱ͔౷ܭͱ͔։ൃͱ͔
• TensorFlow Docs ༁ & ϨϏϡʔ • ػցֶशਤؑ ڞஶ
Abstract • Stanford େֶͷจ • ୯ޠදݱʹ͍ͭͯղੳ͕ਐΜͰ͖͍ͯΔ͕ɺߏจͷදݱ͕ ֶश͞Ε͍ͯΔ͔ʹ͍ͭͯ͜Ε·Ͱ͔֬ΊΒΕ͍ͯͳ͍ • ຊݚڀͰ structual
probe ͱ͍͏ख๏ΛఏҊ͢Δ • ͜Εneural networkͷ୯ޠදݱΛઢܗมۭͨؒ͠ʹߏจ ͕ຒΊࠐ·Ε͍ͯΔ͔ΛධՁ͢ΔͷͰ͋Δ • ELMo, BERT ͰߏจΛֶश͍ͯ͠Δͱࣔࠦ͢Δ݁ՌΛಘͨ
ݚڀͷత • ਂϞσϧͰߏจΛֶश͍ͯ͠Δͷ͔ɺͱ͍͏ٙʹ͑ ͍ͨ ͜ͷจͰઆ໌͢Δ͜ͱ • ୯ޠදݱ͔ΒߏจΛݟ͚ͭΔํ๏ʹ͍ͭͯ • ୯ޠදݱͷ࣍ݩͷࣹӨ͔Βߏจʹؔ͢ΔใΛ෮ݩ͠ɺ ධՁ͢Δํ๏ͱͦͷ۩ମྫ
(ELMo, BERT)ʹ͍ͭͯ
ख๏ͷΞΠσΞ • άϥϑͷϊʔυؒͷڑΛอͬͨ·· ϕΫτϧۭؒʹຒΊࠐΉ͜ͱΛߟ͑Δ • ͜͠Ε͕Ͱ͖͍ͯΕɺ͋Δϊʔυ ͷྡͷϊʔυ Λ୳͢͜ͱۙ ୳ࡧͱಉ͡ •
·ͨɺϞσϧ͕ਖ਼͘͠ߏΛֶश͢ ΕɺͦͷදݱۭؒͷҰ෦͚ͩΛར༻ ͢ΔͷͰͳ͍͔ • දݱۭؒͷ෦ۭؒͰɺߏͷڑ Λอ͍ͬͯΔΑ͏ͳͷΛ୳ͤྑ͍
ͭ·Γ? • ղઆهࣄ1ʹ͋Δਤ͕Θ͔Γ͍͢ • ࠨͷۭ͕ؒ୯ޠͷදݱۭؒ • ࠨਤதͷփ৭ͷฏ໘͕ߏΛදݱ͠ ͍ͯΔ෦ۭؒ • ӈଆ͕෮ݩ͞Εͨߏ
1 https://nlp.stanford.edu//~johnhew//structural-probe.html
None
The structural probe • : ൪ͷจதͷ ൪ͷ୯ޠͱͦͷϕΫτϧ • : ߏจ্Ͱͷϊʔυؒڑ
• : ෦্ۭؒͰͷڑ
Results (Table 1) • จ຺Λߟྀ͠ͳ͍Ϟσϧ(্4ͭ)ʹର͠ ͯɺจ຺Λߟྀ͢ΔϞσϧ(Լ4ͭ)ͷํ ͕ߏจΛ࠶ݱͰ͖͍ͯΔ2 2 Γड͚ߏʹ͍ͭͯɺछผํແࢹͯ͠ධՁ͍ͯ͠Δ
Results (Figure 2)
Results (Figure 4) • ࠨ: ߏจͰܭࢉͨ͠୯ޠؒڑ • ӈ: BERT(large) 16
Ͱܭࢉ͠ ͨ୯ޠؒڑ • શମతͳߏΛ࠶ݱͰ͖͍ͯͦ͏
future works • ڑͦͷͷͰͳ͘ڑͷ 2 Λ༻ ͍Δ͜ͱ͕ॏཁͩͱ࣮ݧ͔ΒΘ͔ͬͨ • ͳͥ 2
ͷํ͕ྑ͍ͷ͔Α͔͘Β ͳ͔ͬͨ
͜͜·Ͱ͕ Context
࣍ 1.Context & related works 2.Geometry of syntax <- 3.Geometry
of word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion
Geometry of syntax • BERT ͷֶश݁Ռʹ͍ͭͯɺ࣍ͷ 2 ͭͷ؍͔Βߦͬͨ 1.ͦͦʹཱͭදݱΛֶशͰ͖͍ͯΔͷ͔ 2.ߏจΛֶशͰ͖͍ͯΔͷ͔
Attention probes and dependency representations • BERT ͷֶश݁Ռʹؔ͢ΔఆྔධՁ (༧උ࣮ݧ) •
Penn Treebank ͷσʔλΛ༻͍ͯɺ 2 ͭͷ୯ޠͷؒͷΓड͚ߏΛఆ ͤ͞ΔλεΫ • BERT ͷग़ྗΛͱʹͯ͠ऑ͍Ϟσϧ (ઢܗࣝผػ + L2 ਖ਼ଇԽ) Ͱֶश • ݁Ռɺaccuracy ͕ 85.8% ͋ͬͨͷ Ͱɺ࣍ʹਐΜͰྑͦ͞͏ͩͱஅͯ͠ ͍Δ
Mathematics of embedding trees in Euclidean space • ϊʔυ͔ΒͳΔ ʹڑ
(తͳͷ)Λอͬͨ··ຒΊࠐΊΔ͜ ͱֶ͕తʹূ໌Ͱ͖ͨ • ·ͨɺڑͦͷͷΛ༻͍ͯ͠·͏ ͱɺڑΛอͭຒΊࠐΈ͕Ͱ͖ͳ͍Α ͏ͳ߹͕͋Δ͜ͱࣔ͞Εͨ • ͜ΕʹΑΓ͕॓ղܾͨ͠ͱ͍ͯ͠Δ
ͭ·Γ? • blog هࣄͰৄ͘͠ղઆ͞Ε͍ͯΔͷ Ͱɺৄࡉ͕ؾʹͳͬͨΒ͔͜͜ΒೖΔ ͷ͕͓͢͢Ί • https://pair-code.github.io/ interpretability/bert-tree/
Visualization of parse tree embeddings • ߏจͷڑΛอͭΑ͏ͳຒΊࠐΈͱ BERT ͱͷ݁Ռ͕ྨࣅ
Visualization of parse tree embeddings • ߏจΛຒΊࠐΜͩ݁ՌͱɺBERT ͷ ֶश݁ՌͱͰڑΛൺֱ •
ൺΛͱͬͨΛ৭Ͱදࣔ • BERT / ਅͷߏจ Λදࣔ • ͍ઢߏจ্Ͱܨ͕Γ͕ͳ͔ͬ ͕ͨɺBERT ͷֶश݁ՌͰۙ͘ͳͬ ͨͷ • part/of, sale/of ͳͲͻͱ·ͱ· ΓͰѻ͏ͷ͕ྑͦ͞͏ͳͷ͍ۙ
None
Visualization of parse tree embeddings • ߏจΛຒΊࠐΜͩ݁ՌͱɺBERT ͷ ֶश݁ՌͱͰڑͷൺͷΛݕ౼ •
ґଘؔ͝ͱʹूܭͨ݁͠Ռ͕ӈਤ • ؔ͝ͱʹ 1.2 ͔Β 2.5 ·Ͱ͘ ͍ͯ͠Δ • ؔੑʹରͯ͠ఆྔతͳ؍Λ BERT ͕Ճ͍͑ͯΔ͜ͱΛࣔࠦ͢Δ݁Ռ
࣍ 1.Context & related works 2.Geometry of syntax 3.Geometry of
word senses <- • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion
Geometry of word senses • ߏจ͚ͩͰͳ͘୯ޠͷҙຯΛଊ͑ΒΕ͍ͯΔ͔ݕ౼ • ҙຯΛද͢෦ۭ͕ؒಘΒΕͳ͍͔࣮ݧ • Ͳ͏ΒಘΒΕͨ
! • จ຺ΛਓతʹௐઅͰ͖ͳ͍͔࣮ݧ • Ͱ͖ͳ͔ͬͨͲ͜Ζ͔ѱԽͨ͠
Measurement of word sense disambiguation capability • BERT ͷग़ྗΛ UMAP
ͰՄࢹԽ • ಉ͡ "die" ʹରͯ͠ෳͷҙຯΛ ͭΫϥελ͕Ͱ͖͍ͯΔ • kNN ΛͬͯޠٛᐆດੑղফλεΫΛ ߦͬͨ݁Ռ accuracy 71.1% (SOTA)
None
ҙຯͷใͷ • "structural probe" ͱಉ༷ʹͯ͠ ҙຯΛද͢෦ۭؒΛநग़ • ߏจͱͷڑͷࠩͰͳ͘ɺ୯ޠ ͷҙຯؒͰͷίαΠϯྨࣅΛར༻ (ৄࡉෆ໌)
• ࣍ݩݮલͷ accuracy 71.1% • ࣍ݩݮΛߦ͏ͱগ্͕͠Δ • ҙຯͷ෦ۭؒͱ͍͏ͷ͕͋Γͦ͏
Embedding distance and context: a concatenation experiment • จ຺Λҙਤతʹૢ࡞͢Δ͜ͱͰྑ͍݁ ՌΛಘΒΕͳ͍͔࣮ݧ
• ಛఆͷҙຯ͋Δ୯ޠΛ༻͍͍ͯΔද తͳจΛݟ͚ͭग़͠ɺಉ͡ҙຯͰಉ͡ ୯ޠΛ༻͍͍ͯΔจʹ࿈݁ͨ͠ • "I went to Edo" ͕දతͳจ ͳ߹ɺ"He went to Edo"ʹ ͚ͯ͠"He went to Edo and I went to Edo" ͱ͍͏จΛ࡞Δ
Embedding distance and context: a concatenation experiment • ԣ࣠: BERT
ͷϨΠϠʔ • ॎ࣠: ҙຯͷҧ͏Ϋϥελͷத৺ͱͷ ڑͷൺతͳͷ (େ͖͍΄ͲΑ͍) • දతͳจΛ͚Ճ͑ͨ߹ɺͦͷ୯ ޠͷҙຯΛΑΓΑ͘Ͱ͖Δ͔ͱ ࢥͬͨΒͦΜͳ͜ͱͳ͔ͬͨ
࣍ 1.Context & related works 2.Geometry of syntax 3.Geometry of
word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion <-
Conclusion • "structural probe" ʹֶతͳҙຯ͚Λߦͬͨ • ߏจͷຒΊࠐΈͱBERTͷֶश݁ՌΛൺֱͨ͠ͱ͜ΖɺߏจΛ ֶश͍ͯͦ͠͏ͳ݁Ռ͕ಘΒΕͨ • ߏจΛֶश͢ΔۭؒͱผʹɺҙຯΛֶश͢Δۭ͕ؒ͋Γͦ͏ͳ
͜ͱ͕Θ͔ͬͨ • ଞʹࣗવݴޠతͳҙຯͰॏཁͳ෦ۭ͕ؒ͋Δ͔ࠓޙͷݚڀ ՝
࠷ޙʹ • ࠓͷΠϕϯτͷ෮श • TensorFlow User Group Tokyo • NNจΛࡘʹञΛҿΉձ
None
None
TensorFlow User Group Tokyo NNจΛࡘʹञΛҿΉձ #9