Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
文献紹介:Treat the Word As a Whole or Look Inside? Subword Embeddings Model Language Change and Typology
Search
Taichi Aida
September 16, 2019
Technology
0
320
文献紹介:Treat the Word As a Whole or Look Inside? Subword Embeddings Model Language Change and Typology
Taichi Aida
September 16, 2019
Tweet
Share
More Decks by Taichi Aida
See All by Taichi Aida
文献紹介:Isotropic Representation Can Improve Zero-Shot Cross-Lingual Transfer on Multilingual Language Models
a1da4
0
46
文献紹介:WhitenedCSE: Whitening-based Contrastive Learning of Sentence Embeddings
a1da4
1
59
文献紹介:On the Transformation of Latent Space in Fine-Tuned NLP Models
a1da4
0
22
新入生向けチュートリアル:文献のサーベイ
a1da4
0
100
文献紹介:Temporal Attention for Language Models
a1da4
0
180
文献紹介:Dynamic Contextualized Word Embeddings
a1da4
2
290
文献紹介:Learning Lexical Subspaces in a Distributional Vector Space
a1da4
0
230
文献紹介:Unsupervised Word Polysemy Quantification with Multiresolution Grids of Contextual Embeddings
a1da4
0
66
新入生向けチュートリアル:tmux
a1da4
0
140
Other Decks in Technology
See All in Technology
Google Cloud Next '24でブログを10本書いた方法と勉強会を沸かせた方法
yasumuusan
0
300
VS CodeでAWSを操作しよう
smt7174
8
1.7k
開発パフォーマンスを最大化するための開発体制
ham0215
2
410
ServiceNow Knowledge 24の歩き方 EYストラテジー・アンド・コンサルティング
manarobot
0
200
Cracking the KubeCon CfP
inductor
2
250
FrontDoorとWebAppsを組み合わせた際のリダイレクト処理の注意点
kenichirokimura
1
520
レガシーをぶっ壊せ。AEONで始めるDevRelの話 / Qiita Night 2024-2-22
aeonpeople
3
1.3k
プロンプトエンジニアリングでがんばらない-Agentic Workflow へ-近藤憲児
kenjikondobai
2
560
Google Cloud の AI を支える裏側のインフラを垣間見る!
maroon1st
0
350
20240418_Google ColabにLLMが搭載されたようなのでPython x データ分析の勉強方法を考えてみる
doradora09
0
130
Terraformあれやこれ/terraform-this-and-that
emiki
8
1.4k
【NW X Security JAWS#3】L3-4:AWS環境のIPv6移行に向けて知っておきたいこと
shotashiratori
0
160
Featured
See All Featured
VelocityConf: Rendering Performance Case Studies
addyosmani
320
23k
Practical Orchestrator
shlominoach
182
9.7k
No one is an island. Learnings from fostering a developers community.
thoeni
16
2.1k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
357
22k
Building Your Own Lightsaber
phodgson
99
5.7k
What's new in Ruby 2.0
geeforr
337
31k
Build The Right Thing And Hit Your Dates
maggiecrowley
24
2k
For a Future-Friendly Web
brad_frost
172
9k
Fashionably flexible responsive web design (full day workshop)
malarkey
398
65k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
2
3.4k
What’s in a name? Adding method to the madness
productmarketing
PRO
16
2.6k
A Modern Web Designer's Workflow
chriscoyier
689
190k
Transcript
จݙհʢʣ Treat the Word As a Whole or Look Inside?
Subword Embeddings Model Language Change and Typology Yang Xu, Jiasheng Zhang, David Reitter 1st International Workshop on Computational Approaches to Historical Language Change, ACL2019 Ԭٕज़Պֶେֶ ࣗવݴޠॲཧݚڀࣨɹ ૬ాɹଠҰ
Abstract • ݴޠֶతͳԾઆΛௐΔͨΊʹ subword Λߟྀͨ͠୯ޠࢄදݱΛఏҊ • Indo-European ͷݴޠ৽͍͠୯ޠ΄Ͳ subword ʹର͢ΔॏΈ͕૿͑ɺɹ
ٯʹதࠃޠ subword ʹର͢ΔॏΈ͕ݮΓɺ୯ޠʹର͢ΔॏΈ͕૿͑ͨ !2
Motivation w ݴޠֶతͳ݁ ʮதࠃޠʹ͓͍ͯɺ࣌ؒͱͱʹ༏Ґੑ͕୯ԻઅˠೋԻઅʹҠͬͨʯ w Ծઆ ʮݱͷதࠃޠʹ͓͍ͯɺ୯ޠʹؚ·ΕΔࣈจࣈʢTVCXPSEʣ ҙຯతͳׂ͕গͳ͍ʯ !3
Related Work w $#08ʢDPOUFYU ͔ΒUBSHFU Λ༧ଌʣ w $IBSBDUFSFOIBODFEXPSEFNCFEEJOH $8&
w 4LJQHSBNʢUBSHFU ͔ΒDPOUFYU Λ༧ଌʣ w GBTU5FYU vc ui vc ui !4 ୯ޠͱจࣈΛಉ͡ॏཁͰѻ͏
Method w %ZOBNJDTVCXPSEJODPSQPSBUFEFNCFEEJOHNPEFM %4& w %4&$#08 w %4&4( w
୯ޠʹ୯ޠͷॏΈ ͰɺTVCXPSEʹ ͰॏΈ͚͢Δ hw i 1 − hw i !5
Method !6
Experiment w %BUBTFUT w 5SBJOJOHXPSEFNCFEEJOH8JLJQFEJBEBUBCBTFEVNQT w $IJOFTF &OHMJTI 'SFODI (FSNBO
*UBMJBO 4QBOJTI w .PEFM w %4&$#08 %4&4(ʢఏҊख๏ʣ w $8& GBTU5FYU !7
Experiment w ࣮ݧ߲ ͱ୯ޠͷൃੜ࣌ظͱͷ૬ؔ w ൃੜ࣌ظɿ͋Δޠ͕(PPHMF#PPLT/HSBNʹॳΊͯొͨ͠ ޠͷҙຯλεΫ
w &NCFEEJOHͷੑೳΛଌΔ w 4JNJMBSJUZͱ"OBMPHZΛ༻ hw i !8
Result ୯ޠͷॏΈ ͱൃੜ࣌ظͱͷ૬ؔɿ*OEP&VSPQFBOͱதࠃͰਖ਼ର w hw i !9
Result ୯ޠͷॏΈ ͱൃੜ࣌ظͱͷ૬ؔɿ*OEP&VSPQFBOͱதࠃͰਖ਼ର w hw i !10 ͕࣌ਐΉͱ ୯ޠʹର͢ΔॏΈ͕ݮগ ˣ
୯ޠΑΓ 4VCXPSEΛॏࢹ
Result ୯ޠͷॏΈ ͱൃੜ࣌ظͱͷ૬ؔɿ*OEP&VSPQFBOͱதࠃͰਖ਼ର w hw i !11 ͕࣌ਐΉͱ ୯ޠʹର͢ΔॏΈ͕૿Ճ ˣ
4VCXPSEΑΓ ୯ޠΛॏࢹ ʢԾઆ͕͔֬ΊΒΕͨʣ
Result w ͦΕͧΕͷάϧʔϓͰൺֱ w $#08ܥʢ%4&$#08 $8&ʣ w 4LJQHSBNܥʢ%4&4( GBTUUFYUʣ w
%4&4(Ͱੑೳͷ্Λ֬ೝ !12
Conclusion w ԾઆΛݕূ͢ΔҝʹɺTVCXPSEΛߟྀ͢Δ୯ޠࢄදݱΛఏҊͨ͠ w *OEP&VSPQFBOͷݴޠͰ৽͘͠ੜ·ΕΔ୯ޠ΄ͲTVCXPSEʹҙຯͷ ॏΈ͕ॏࢹ͞ΕɺதࠃޠͰٯʹTVCXPSEͷॏΈ͕ݮΓɺ୯ޠͦͷ ͷʹରͯ͠ॏΈ͕ͭ͘Α͏ʹͳͬͨʢԾઆΛݕূͨ͠ʣ !13
None
Discussion w ࣮ݧʹରͯ͠۩ମతͳൺֱΛߦͬͨ w தࠃɿͷۙԽͰٕज़Պֶ͕ൃలͨ͜͠ͱʹΑΓɺ৽͍͠୯ ޠ͕ೖ͖ͬͯͨʁ !15