Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
犬でもわかる Minimal Acyclic Subsequential Transducer...
Search
Takuya Asano
June 27, 2019
Technology
2
1.3k
犬でもわかる Minimal Acyclic Subsequential Transducer / Introduction to Minimal Acyclic Subsequential Transducer
はてなの技術勉強会で LT 発表したときの資料です。
Takuya Asano
June 27, 2019
Tweet
Share
More Decks by Takuya Asano
See All by Takuya Asano
Research Paper Introduction in IR Reading 2022 Fall
takuyaa
0
3.2k
Introducing PTHash - Paper Reading Session (2021-11-19)
takuyaa
0
390
Research Paper Introduction in IR Reading 2021 Fall
takuyaa
0
260
Lucene Index Deep Dive
takuyaa
0
600
Introduction to Apache Lucene
takuyaa
5
1.2k
Research Paper Introduction in IR Reading 2020 Fall
takuyaa
2
4k
Research paper introduction (2019-11-12)
takuyaa
2
1k
Research paper introduction (2019-11-07)
takuyaa
1
420
IR Reading 2019秋 論文紹介 / IR Reading 2019Fall
takuyaa
2
1.2k
Other Decks in Technology
See All in Technology
Как мы автоматизировали интеграционное тестирование с Gonkey и не пожалели. Паша Егорычев, Кирилл Поляков
lamodatech
0
2k
Global Azure2025(GitHub Copilot ハンズオン)
tomokusaba
1
540
OPENLOGI Company Profile for engineer
hr01
1
26k
AI-in-the-Enterprise|OpenAIが公開した「AI導入7つの教訓」——ChatGPTで変わる企業の未来とは?
customercloud
PRO
0
150
意思決定を支える検索体験を目指してやってきたこと
hinatades
PRO
0
400
AI 코딩 에이전트 더 똑똑하게 쓰기
nacyot
0
520
企業が押さえるべきMCPの未来
takaakikakei
4
920
3D生成AIのための画像生成
kosukeito
2
600
Azure Maps Visual in PowerBIで分析しよう
nakasho
0
200
社会人力と研究力ー博士号をキャリアの武器にするー
kentaro
2
110
20 Years of Domain-Driven Design: What I’ve Learned About DDD
ewolff
0
110
2025年8月から始まるAWS Lambda INITフェーズ課金/AWS Lambda INIT phase billing changes
quiver
1
690
Featured
See All Featured
How to train your dragon (web standard)
notwaldorf
91
6k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
10
780
The Power of CSS Pseudo Elements
geoffreycrofte
75
5.8k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
13
840
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
31
1.2k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
The Invisible Side of Design
smashingmag
299
50k
For a Future-Friendly Web
brad_frost
177
9.7k
Art, The Web, and Tiny UX
lynnandtonic
298
20k
Speed Design
sergeychernyshev
29
930
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
137
33k
Transcript
ݘͰΘ͔Δ Minimal Acyclic Subsequential Transducer 2019-06-27 ͯͳٕज़ษڧձ id:takuya-a
FSA ͱ FST • FSA (Finite State Automaton) • ༗ݶঢ়ଶΦʔτϚτϯ
• ೖྗྻΛडཧ͢Δ͔Ͳ͏͔ͷ bool Λฦ͢ • FST (Finite State Transducer) • ༗ݶঢ়ଶมث • FSA ͷҰछ • ೖྗྻΛडཧͨ͠ͱ͖ɺग़ྗྻΛฦ͢ • Minimal Acyclic Subsequential Transducer FST ͷҰछ { “onk” } { “onk” => “͓Μ͘” }
FST ͷ͍Έͪ • ͍ΘΏΔʮࣙॻҾ͖ʯʹ͑Δ • ΩʔͱͷϖΞΛอଘͰ͖ΔʢPerl Ͱ͍͏ͱϋογϡͱͯ͑͠Δʣ • ঢ়ଶΛͨͲΔ͚ͩͳͷͰݕࡧ͕ߴ •
ͱ͘ʹ ڞ௨಄ࣙݕࡧ (common prefix search) Ͱ༗ར • ͪΖΜ શҰகݕࡧ (exact match) Ͱ͖Δ • ಄ࣙඌ͕ࣙڞ༗͞ΕΔͷͰলϝϞϦ
FST ͷԠ༻ઌ • ݕࡧΤϯδϯͷࣙॻͱͯ͠ • Apache Lucene ͷίΞΞϧΰϦζϜͱͯ͠ɺ৭Μͳͱ͜ΖͰΘΕ͍ͯΔ • ओʹ୯ޠΛϧοΫΞοϓ͢ΔͨΊʹΘΕΔ
• ܗଶૉղੳثͷࣙॻͱͯ͠ • Janome (Python), Kuromoji (Java) Ͱ࠾༻͞Ε͍ͯΔ • ߴͳ common prefix search ͕ඞཁ • ԻೝࣝͷݴޠϞσϧͱͯ͠ • ॏΈ͖ FST (Weighted FST; WFST) ͕ΘΕΔ • https://www.slideshare.net/JiroNishitoba/wfst-61929888
Minimal Acyclic Subsequential Transducer Minimal ࠷খͷ Acyclic ϧʔϓͷͳ͍ Subsequential ෦(จࣈ)ྻͷ
Transducer มث “takuya” => “a” “takaya” => “n”
TRIE • ಄ࣙͷΈΛڞ༗͢Δσʔλߏ • πϦʔʹͳΔ • ඌࣙڞ༗Ͱ͖ͳ͍ • TAIL ྻͱ͍͏ςΫχοΫͰ
Ұ෦ڞ༗Ͱ͖Δ FST TRIE
Minimal Acyclic Subsequential Transducer ͷߏங • ཧ্࠷খͷ FST Λஞ࣍తʹߏஙͰ͖ΔΞϧΰϦζϜ͕͋Δ •
ৄ͘͠ҎԼͷจΛಡΜͰʂ • Mihov & Maurel (2001), Direct Construction of Minimal Acyclic Subsequential Transducers http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698 • จதͷٙࣅίʔυɺ46ߦ͕ؒҧ͑ͯΔ͔ΒؾΛ͚ͭͯͶ • ޡ: SET_OUTPUT • ਖ਼: SET_STATE_OUTPUT
Minimal Acyclic Subsequential Transducer ͷ࣮ • https://github.com/takuyaa/cdarts • Java Ͱॻ͍ͨ
• Lucene ͷ FST jdartsclone ͱൺֱ͢ΔͨΊ • ଞͷ࣮ • Java: https://github.com/apache/lucene-solr/tree/master/lucene/core/src/java/org/apache/lucene/util/fst • Go: https://github.com/ikawaha/mast • Python: https://github.com/mocobeta/janome/blob/master/janome/fst.py • Rust: https://github.com/BurntSushi/fst
࣮ݧʂ
සग़ӳ୯ޠͷ TRIE ͱ FST • Lucene ͷετοϓϫʔυΛΩʔɺ࿈൪Λͱͯ͠ߏங • શΩʔ: 33
• શจࣈ: 97 • TRIE • ঢ়ଶ: 58 • ભҠ: 57 • FST (Minimal Acyclic Subsequential Transducer) • ঢ়ଶ: 25 • ભҠ: 51 FST TRIE
ϙέϞϯӳมثͷ TRIE ͱ FST • ϙέϞϯͷӳޠ໊ΛΩʔɺຊޠ໊Λͱͯ͠ߏங • શΩʔ: 151 •
શจࣈ: 1103 • TRIE • ঢ়ଶ: 809 • ભҠ: 808 • FST (Minimal Acyclic Subsequential Transducer) • ঢ়ଶ: 459 • ભҠ: 604 FST TRIE
FST Λ֦େͨ͠ͷ ※ UTF-8 ͰΤϯίʔυ͍ͯͯ͠ 1όΠτ͚ͩڞ༗͞ΕͨΓ͢Δ ͷͰද্ࣔจࣈԽ͚ͯ͠·͢
ࢀߟ • Mihov & Maurel (2001), Direct Construction of Minimal
Acyclic Subsequential Transducers http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698 • Finite-state automata and directed acyclic graphs http://www.jandaciuk.pl/Fsm_algorithms/ • Changing Bits: Using Finite State Transducers in Lucene http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html • moco(beta)'s backup: [༁] Using Finite State Transducers in Lucene https://mocobeta-backup.tumblr.com/post/105777650158/using-finite-state-transducers-in-lucene • Index 1,600,000,000 Keys with Automata and Rust - Andrew Gallant's Blog https://blog.burntsushi.net/transducers/ • moco(beta)'s backup: Lucene FST ͷΞϧΰϦζϜ (1) ʙਤղฤʙ https://mocobeta-backup.tumblr.com/post/111076688132/lucene-fst-1 • moco(beta)'s backup: Lucene FST ͷΞϧΰϦζϜ (2) ʙ࣮ฤʙ https://mocobeta-backup.tumblr.com/post/113693778372/lucene-fst-2 • LuceneͰΘΕͯΔFSTΛ࣮ͯ͠Έͨʢਖ਼نදݱϚονɿVMΞϓϩʔνͷটʣ - Qiita https://qiita.com/ikawaha/items/be95304a803020e1b2d1 • Minimal Acyclic Subsequential TransducerͰ༡Ϳ - Negative/Positive Thinking https://jetbead.hatenablog.com/entry/20151014/1444756877