Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Visualizing and Measuring the Geometry of BERT
Search
Asei Sugiyama
September 04, 2019
Technology
0
720
Visualizing and Measuring the Geometry of BERT
NN論文を肴に酒を飲む会 #9
https://tfug-tokyo.connpass.com/event/143283/
での発表用資料です
Asei Sugiyama
September 04, 2019
Tweet
Share
More Decks by Asei Sugiyama
See All by Asei Sugiyama
Kubeflow Pipelines v2 で変わる機械学習パイプライン開発
asei
4
330
遊戯王 AI は次世代のグランドチャレンジになりうるか
asei
0
53
Vertex AI ではじめる MLOps
asei
3
92
MLSE 機械学習オペレーション WG と LangCheck の活動に関するお礼
asei
1
37
Vertex AI Feature Store に 機械学習エンジニアが涙した 理由
asei
2
1.6k
Cloud Next '23 から考える LLMOps
asei
3
750
Azure Architecture Center における MLOps Framework の概要と比較
asei
2
820
AIの標準化や法規制に関する動向 (2023年版)
asei
3
690
MLOps のはじめかた
asei
2
1.8k
Other Decks in Technology
See All in Technology
オーティファイ会社紹介資料 / Autify Company Deck
autifyhq
7
100k
暗黙知を集積するプラットフォーム : 「健常者エミュレータ事例集」の取り組み
sora32127
1
160
検証からプロダクトへ: シームレスなLLM開発の ためのしくみ作り
nunukim
1
100
OCI Data Integration技術情報 / ocidi_technical_jp
oracle4engineer
PRO
1
1.5k
サービスメッシュ環境における OpenTelemetry 活用 / OpenTelemetry in Service Mesh
k6s4i53rx
2
800
あらゆる商品を扱う商品データベースを再設計した話 / product db re-architecture
rince
7
3.2k
GitHub最新情報キャッチアップ 2024年3月
dzeyelid
16
3.1k
ビジネスとコード品質の接合点 そしてコード品質がそこに及ぼす影響 / The Intersections of Business and Engineering, and The Impact of Code Quality There
mtx2s
10
1k
HoneycombとOpenTelemetryでオブザーバビリティに入門してみる
sumiren
2
140
今さら聞けない!? AWSの生成AIサービス Amazon Bedrock入門!
minorun365
PRO
11
1.6k
サービス成長と共に肥大化するモノレポ、長くなるCI時間 / As services grow, monorepos get bigger and CI time gets longer
kohbis
5
2k
データ化エンジニアとしての1年を振り返る
sansantech
PRO
3
250
Featured
See All Featured
5 minutes of I Can Smell Your CMS
philhawksworth
199
19k
Making the Leap to Tech Lead
cromwellryan
122
8.4k
A Tale of Four Properties
chriscoyier
150
22k
Stop Working from a Prison Cell
hatefulcrawdad
265
19k
A Philosophy of Restraint
colly
195
15k
[RailsConf 2023] Rails as a piece of cake
palkan
21
3.8k
Gamification - CAS2011
davidbonilla
76
4.5k
Bootstrapping a Software Product
garrettdimon
PRO
302
110k
Pencils Down: Stop Designing & Start Developing
hursman
115
11k
jQuery: Nuts, Bolts and Bling
dougneiner
57
7.1k
The World Runs on Bad Software
bkeepers
PRO
60
6.6k
Keith and Marios Guide to Fast Websites
keithpitt
407
22k
Transcript
Visualizing and Measuring the Geometry of BERT NN จΛࡘʹञΛҿΉձ #9
ࣗݾհ • ਿࢁ Ѩ • Software Engineer @Repro • ػցֶशͱ͔౷ܭͱ͔։ൃͱ͔
• TensorFlow Docs ༁ & ϨϏϡʔ • ػցֶशਤؑ ڞஶ
Abstract • Google PAIRͰհ͞Ε͍ͯͨจ • ࣗવݴޠॲཧʹ͓͍ͯ Transformer ʹࣅͨΞʔΩςΫνϟͷ ωοτϫʔΫۃΊͯ༗ •
ͦͷΑ͏ͳωοτϫʔΫͰࣗવݴޠॲཧʹ͓͚ΔಛΛ෦Ͱ ͲͷΑ͏ʹอ͍࣋ͯ͠Δͷ͔໌Β͔ʹ͍ͨ͠ • BERT ʹ͍ͭͯఆྔɾఆੑతͳੳΛߦͬͨ • ҙຯɾߏจతͳใΛֶश͍ͯͦ͠͏ͳ݁Ռ͕ಘΒΕͨ
࣍ 1.Context & related works <- 2.Geometry of syntax 3.Geometry
of word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion
Context & related works • A Structural Probe for Finding
Syntax in Word Representations (2019) ͷΞϯαʔʹͳ͍ͬͯΔ • ͜ͷจൈ͖ʹ΄ͱΜͲԿΘ͔Βͳ͍ߏ
!
2 ഒಡΊΔ͓ಘͳจ
A Structural Probe for Finding Syntax in Word Representations NN
จΛࡘʹञΛҿΉձ #9
ࣗݾհ • ਿࢁ Ѩ • Software Engineer @Repro • ػցֶशͱ͔౷ܭͱ͔։ൃͱ͔
• TensorFlow Docs ༁ & ϨϏϡʔ • ػցֶशਤؑ ڞஶ
Abstract • Stanford େֶͷจ • ୯ޠදݱʹ͍ͭͯղੳ͕ਐΜͰ͖͍ͯΔ͕ɺߏจͷදݱ͕ ֶश͞Ε͍ͯΔ͔ʹ͍ͭͯ͜Ε·Ͱ͔֬ΊΒΕ͍ͯͳ͍ • ຊݚڀͰ structual
probe ͱ͍͏ख๏ΛఏҊ͢Δ • ͜Εneural networkͷ୯ޠදݱΛઢܗมۭͨؒ͠ʹߏจ ͕ຒΊࠐ·Ε͍ͯΔ͔ΛධՁ͢ΔͷͰ͋Δ • ELMo, BERT ͰߏจΛֶश͍ͯ͠Δͱࣔࠦ͢Δ݁ՌΛಘͨ
ݚڀͷత • ਂϞσϧͰߏจΛֶश͍ͯ͠Δͷ͔ɺͱ͍͏ٙʹ͑ ͍ͨ ͜ͷจͰઆ໌͢Δ͜ͱ • ୯ޠදݱ͔ΒߏจΛݟ͚ͭΔํ๏ʹ͍ͭͯ • ୯ޠදݱͷ࣍ݩͷࣹӨ͔Βߏจʹؔ͢ΔใΛ෮ݩ͠ɺ ධՁ͢Δํ๏ͱͦͷ۩ମྫ
(ELMo, BERT)ʹ͍ͭͯ
ख๏ͷΞΠσΞ • άϥϑͷϊʔυؒͷڑΛอͬͨ·· ϕΫτϧۭؒʹຒΊࠐΉ͜ͱΛߟ͑Δ • ͜͠Ε͕Ͱ͖͍ͯΕɺ͋Δϊʔυ ͷྡͷϊʔυ Λ୳͢͜ͱۙ ୳ࡧͱಉ͡ •
·ͨɺϞσϧ͕ਖ਼͘͠ߏΛֶश͢ ΕɺͦͷදݱۭؒͷҰ෦͚ͩΛར༻ ͢ΔͷͰͳ͍͔ • දݱۭؒͷ෦ۭؒͰɺߏͷڑ Λอ͍ͬͯΔΑ͏ͳͷΛ୳ͤྑ͍
ͭ·Γ? • ղઆهࣄ1ʹ͋Δਤ͕Θ͔Γ͍͢ • ࠨͷۭ͕ؒ୯ޠͷදݱۭؒ • ࠨਤதͷփ৭ͷฏ໘͕ߏΛදݱ͠ ͍ͯΔ෦ۭؒ • ӈଆ͕෮ݩ͞Εͨߏ
1 https://nlp.stanford.edu//~johnhew//structural-probe.html
None
The structural probe • : ൪ͷจதͷ ൪ͷ୯ޠͱͦͷϕΫτϧ • : ߏจ্Ͱͷϊʔυؒڑ
• : ෦্ۭؒͰͷڑ
Results (Table 1) • จ຺Λߟྀ͠ͳ͍Ϟσϧ(্4ͭ)ʹର͠ ͯɺจ຺Λߟྀ͢ΔϞσϧ(Լ4ͭ)ͷํ ͕ߏจΛ࠶ݱͰ͖͍ͯΔ2 2 Γड͚ߏʹ͍ͭͯɺछผํແࢹͯ͠ධՁ͍ͯ͠Δ
Results (Figure 2)
Results (Figure 4) • ࠨ: ߏจͰܭࢉͨ͠୯ޠؒڑ • ӈ: BERT(large) 16
Ͱܭࢉ͠ ͨ୯ޠؒڑ • શମతͳߏΛ࠶ݱͰ͖͍ͯͦ͏
future works • ڑͦͷͷͰͳ͘ڑͷ 2 Λ༻ ͍Δ͜ͱ͕ॏཁͩͱ࣮ݧ͔ΒΘ͔ͬͨ • ͳͥ 2
ͷํ͕ྑ͍ͷ͔Α͔͘Β ͳ͔ͬͨ
͜͜·Ͱ͕ Context
࣍ 1.Context & related works 2.Geometry of syntax <- 3.Geometry
of word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion
Geometry of syntax • BERT ͷֶश݁Ռʹ͍ͭͯɺ࣍ͷ 2 ͭͷ؍͔Βߦͬͨ 1.ͦͦʹཱͭදݱΛֶशͰ͖͍ͯΔͷ͔ 2.ߏจΛֶशͰ͖͍ͯΔͷ͔
Attention probes and dependency representations • BERT ͷֶश݁Ռʹؔ͢ΔఆྔධՁ (༧උ࣮ݧ) •
Penn Treebank ͷσʔλΛ༻͍ͯɺ 2 ͭͷ୯ޠͷؒͷΓड͚ߏΛఆ ͤ͞ΔλεΫ • BERT ͷग़ྗΛͱʹͯ͠ऑ͍Ϟσϧ (ઢܗࣝผػ + L2 ਖ਼ଇԽ) Ͱֶश • ݁Ռɺaccuracy ͕ 85.8% ͋ͬͨͷ Ͱɺ࣍ʹਐΜͰྑͦ͞͏ͩͱஅͯ͠ ͍Δ
Mathematics of embedding trees in Euclidean space • ϊʔυ͔ΒͳΔ ʹڑ
(తͳͷ)Λอͬͨ··ຒΊࠐΊΔ͜ ͱֶ͕తʹূ໌Ͱ͖ͨ • ·ͨɺڑͦͷͷΛ༻͍ͯ͠·͏ ͱɺڑΛอͭຒΊࠐΈ͕Ͱ͖ͳ͍Α ͏ͳ߹͕͋Δ͜ͱࣔ͞Εͨ • ͜ΕʹΑΓ͕॓ղܾͨ͠ͱ͍ͯ͠Δ
ͭ·Γ? • blog هࣄͰৄ͘͠ղઆ͞Ε͍ͯΔͷ Ͱɺৄࡉ͕ؾʹͳͬͨΒ͔͜͜ΒೖΔ ͷ͕͓͢͢Ί • https://pair-code.github.io/ interpretability/bert-tree/
Visualization of parse tree embeddings • ߏจͷڑΛอͭΑ͏ͳຒΊࠐΈͱ BERT ͱͷ݁Ռ͕ྨࣅ
Visualization of parse tree embeddings • ߏจΛຒΊࠐΜͩ݁ՌͱɺBERT ͷ ֶश݁ՌͱͰڑΛൺֱ •
ൺΛͱͬͨΛ৭Ͱදࣔ • BERT / ਅͷߏจ Λදࣔ • ͍ઢߏจ্Ͱܨ͕Γ͕ͳ͔ͬ ͕ͨɺBERT ͷֶश݁ՌͰۙ͘ͳͬ ͨͷ • part/of, sale/of ͳͲͻͱ·ͱ· ΓͰѻ͏ͷ͕ྑͦ͞͏ͳͷ͍ۙ
None
Visualization of parse tree embeddings • ߏจΛຒΊࠐΜͩ݁ՌͱɺBERT ͷ ֶश݁ՌͱͰڑͷൺͷΛݕ౼ •
ґଘؔ͝ͱʹूܭͨ݁͠Ռ͕ӈਤ • ؔ͝ͱʹ 1.2 ͔Β 2.5 ·Ͱ͘ ͍ͯ͠Δ • ؔੑʹରͯ͠ఆྔతͳ؍Λ BERT ͕Ճ͍͑ͯΔ͜ͱΛࣔࠦ͢Δ݁Ռ
࣍ 1.Context & related works 2.Geometry of syntax 3.Geometry of
word senses <- • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion
Geometry of word senses • ߏจ͚ͩͰͳ͘୯ޠͷҙຯΛଊ͑ΒΕ͍ͯΔ͔ݕ౼ • ҙຯΛද͢෦ۭ͕ؒಘΒΕͳ͍͔࣮ݧ • Ͳ͏ΒಘΒΕͨ
! • จ຺ΛਓతʹௐઅͰ͖ͳ͍͔࣮ݧ • Ͱ͖ͳ͔ͬͨͲ͜Ζ͔ѱԽͨ͠
Measurement of word sense disambiguation capability • BERT ͷग़ྗΛ UMAP
ͰՄࢹԽ • ಉ͡ "die" ʹରͯ͠ෳͷҙຯΛ ͭΫϥελ͕Ͱ͖͍ͯΔ • kNN ΛͬͯޠٛᐆດੑղফλεΫΛ ߦͬͨ݁Ռ accuracy 71.1% (SOTA)
None
ҙຯͷใͷ • "structural probe" ͱಉ༷ʹͯ͠ ҙຯΛද͢෦ۭؒΛநग़ • ߏจͱͷڑͷࠩͰͳ͘ɺ୯ޠ ͷҙຯؒͰͷίαΠϯྨࣅΛར༻ (ৄࡉෆ໌)
• ࣍ݩݮલͷ accuracy 71.1% • ࣍ݩݮΛߦ͏ͱগ্͕͠Δ • ҙຯͷ෦ۭؒͱ͍͏ͷ͕͋Γͦ͏
Embedding distance and context: a concatenation experiment • จ຺Λҙਤతʹૢ࡞͢Δ͜ͱͰྑ͍݁ ՌΛಘΒΕͳ͍͔࣮ݧ
• ಛఆͷҙຯ͋Δ୯ޠΛ༻͍͍ͯΔද తͳจΛݟ͚ͭग़͠ɺಉ͡ҙຯͰಉ͡ ୯ޠΛ༻͍͍ͯΔจʹ࿈݁ͨ͠ • "I went to Edo" ͕දతͳจ ͳ߹ɺ"He went to Edo"ʹ ͚ͯ͠"He went to Edo and I went to Edo" ͱ͍͏จΛ࡞Δ
Embedding distance and context: a concatenation experiment • ԣ࣠: BERT
ͷϨΠϠʔ • ॎ࣠: ҙຯͷҧ͏Ϋϥελͷத৺ͱͷ ڑͷൺతͳͷ (େ͖͍΄ͲΑ͍) • දతͳจΛ͚Ճ͑ͨ߹ɺͦͷ୯ ޠͷҙຯΛΑΓΑ͘Ͱ͖Δ͔ͱ ࢥͬͨΒͦΜͳ͜ͱͳ͔ͬͨ
࣍ 1.Context & related works 2.Geometry of syntax 3.Geometry of
word senses • Measurement of word sense disambiguation capability • Embedding distance and context: a concatenation experiment 4.Conclusion <-
Conclusion • "structural probe" ʹֶతͳҙຯ͚Λߦͬͨ • ߏจͷຒΊࠐΈͱBERTͷֶश݁ՌΛൺֱͨ͠ͱ͜ΖɺߏจΛ ֶश͍ͯͦ͠͏ͳ݁Ռ͕ಘΒΕͨ • ߏจΛֶश͢ΔۭؒͱผʹɺҙຯΛֶश͢Δۭ͕ؒ͋Γͦ͏ͳ
͜ͱ͕Θ͔ͬͨ • ଞʹࣗવݴޠతͳҙຯͰॏཁͳ෦ۭ͕ؒ͋Δ͔ࠓޙͷݚڀ ՝
࠷ޙʹ • ࠓͷΠϕϯτͷ෮श • TensorFlow User Group Tokyo • NNจΛࡘʹञΛҿΉձ
None
None
TensorFlow User Group Tokyo NNจΛࡘʹञΛҿΉձ #9