Slide 1

Slide 1 text

ݘͰ΋Θ͔Δ
 Minimal Acyclic Subsequential Transducer 2019-06-27 ͸ͯͳٕज़ษڧձ id:takuya-a

Slide 2

Slide 2 text

FSA ͱ FST • FSA (Finite State Automaton) • ༗ݶঢ়ଶΦʔτϚτϯ • ೖྗྻΛडཧ͢Δ͔Ͳ͏͔ͷ bool Λฦ͢ • FST (Finite State Transducer) • ༗ݶঢ়ଶม׵ث • FSA ͷҰछ • ೖྗྻΛडཧͨ͠ͱ͖ɺग़ྗྻΛฦ͢ • Minimal Acyclic Subsequential Transducer ͸ FST ͷҰछ { “onk” } { “onk” => “͓Μ͘” }

Slide 3

Slide 3 text

FST ͷ࢖͍Έͪ • ͍ΘΏΔʮࣙॻҾ͖ʯʹ࢖͑Δ • Ωʔͱ஋ͷϖΞΛอଘͰ͖ΔʢPerl Ͱ͍͏ͱϋογϡͱͯ͠࢖͑Δʣ • ঢ়ଶΛͨͲΔ͚ͩͳͷͰݕࡧ͕ߴ଎ • ͱ͘ʹ ڞ௨઀಄ࣙݕࡧ (common prefix search) Ͱ͸༗ར • ΋ͪΖΜ ׬શҰகݕࡧ (exact match) ΋Ͱ͖Δ • ઀಄ࣙ΍઀ඌ͕ࣙڞ༗͞ΕΔͷͰলϝϞϦ

Slide 4

Slide 4 text

FST ͷԠ༻ઌ • ݕࡧΤϯδϯͷࣙॻͱͯ͠ • Apache Lucene ͷίΞΞϧΰϦζϜͱͯ͠ɺ৭Μͳͱ͜ΖͰ࢖ΘΕ͍ͯΔ • ओʹ୯ޠΛϧοΫΞοϓ͢ΔͨΊʹ࢖ΘΕΔ • ܗଶૉղੳثͷࣙॻͱͯ͠ • Janome (Python), Kuromoji (Java) Ͱ࠾༻͞Ε͍ͯΔ • ߴ଎ͳ common prefix search ͕ඞཁ • Ի੠ೝࣝͷݴޠϞσϧͱͯ͠ • ॏΈ෇͖ FST (Weighted FST; WFST) ͕࢖ΘΕΔ • https://www.slideshare.net/JiroNishitoba/wfst-61929888

Slide 5

Slide 5 text

Minimal Acyclic Subsequential Transducer Minimal
 ࠷খͷ Acyclic
 ϧʔϓͷͳ͍ Subsequential
 ෦෼(จࣈ)ྻͷ Transducer
 ม׵ث “takuya” => “a”
 “takaya” => “n”

Slide 6

Slide 6 text

TRIE • ઀಄ࣙͷΈΛڞ༗͢Δσʔλߏ଄ • πϦʔʹͳΔ • ઀ඌࣙ͸ڞ༗Ͱ͖ͳ͍ • TAIL ഑ྻͱ͍͏ςΫχοΫͰ
 Ұ෦ڞ༗͸Ͱ͖Δ FST TRIE

Slide 7

Slide 7 text

Minimal Acyclic Subsequential Transducer ͷߏங • ཧ࿦্࠷খͷ FST Λஞ࣍తʹߏஙͰ͖ΔΞϧΰϦζϜ͕͋Δ • ৄ͘͠͸ҎԼͷ࿦จΛಡΜͰʂ • Mihov & Maurel (2001), Direct Construction of Minimal Acyclic Subsequential Transducers
 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698 • ࿦จதͷٙࣅίʔυɺ46ߦ໨͕ؒҧ͑ͯΔ͔ΒؾΛ͚ͭͯͶ • ޡ: SET_OUTPUT • ਖ਼: SET_STATE_OUTPUT

Slide 8

Slide 8 text

Minimal Acyclic Subsequential Transducer ͷ࣮૷ • https://github.com/takuyaa/cdarts • Java Ͱॻ͍ͨ • Lucene ͷ FST ΍ jdartsclone ͱൺֱ͢ΔͨΊ • ଞͷ࣮૷ • Java: https://github.com/apache/lucene-solr/tree/master/lucene/core/src/java/org/apache/lucene/util/fst • Go: https://github.com/ikawaha/mast • Python: https://github.com/mocobeta/janome/blob/master/janome/fst.py • Rust: https://github.com/BurntSushi/fst

Slide 9

Slide 9 text

࣮ݧʂ

Slide 10

Slide 10 text

සग़ӳ୯ޠͷ TRIE ͱ FST • Lucene ͷετοϓϫʔυΛΩʔɺ࿈൪Λ஋ͱͯ͠ߏங • શΩʔ਺: 33 • શจࣈ਺: 97 • TRIE • ঢ়ଶ਺: 58 • ભҠ਺: 57 • FST (Minimal Acyclic Subsequential Transducer) • ঢ়ଶ਺: 25 • ભҠ਺: 51 FST TRIE

Slide 11

Slide 11 text

ϙέϞϯӳ೔ม׵ثͷ TRIE ͱ FST • ϙέϞϯͷӳޠ໊ΛΩʔɺ೔ຊޠ໊Λ஋ͱͯ͠ߏங • શΩʔ਺: 151 • શจࣈ਺: 1103 • TRIE • ঢ়ଶ਺: 809 • ભҠ਺: 808 • FST (Minimal Acyclic Subsequential Transducer) • ঢ়ଶ਺: 459 • ભҠ਺: 604 FST TRIE

Slide 12

Slide 12 text

FST Λ֦େͨ͠΋ͷ ※ UTF-8 ͰΤϯίʔυ͍ͯͯ͠
 1όΠτ໨͚ͩڞ༗͞ΕͨΓ͢Δ
 ͷͰද্ࣔ͸จࣈԽ͚ͯ͠·͢

Slide 13

Slide 13 text

ࢀߟ • Mihov & Maurel (2001), Direct Construction of Minimal Acyclic Subsequential Transducers
 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698 • Finite-state automata and directed acyclic graphs
 http://www.jandaciuk.pl/Fsm_algorithms/ • Changing Bits: Using Finite State Transducers in Lucene
 http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html • moco(beta)'s backup: [຋༁] Using Finite State Transducers in Lucene
 https://mocobeta-backup.tumblr.com/post/105777650158/using-finite-state-transducers-in-lucene • Index 1,600,000,000 Keys with Automata and Rust - Andrew Gallant's Blog
 https://blog.burntsushi.net/transducers/ • moco(beta)'s backup: Lucene FST ͷΞϧΰϦζϜ (1) ʙਤղฤʙ
 https://mocobeta-backup.tumblr.com/post/111076688132/lucene-fst-1 • moco(beta)'s backup: Lucene FST ͷΞϧΰϦζϜ (2) ʙ࣮૷ฤʙ
 https://mocobeta-backup.tumblr.com/post/113693778372/lucene-fst-2 • LuceneͰ࢖ΘΕͯΔFSTΛ࣮૷ͯ͠Έͨʢਖ਼نදݱϚονɿVMΞϓϩʔν΁ͷট଴ʣ - Qiita
 https://qiita.com/ikawaha/items/be95304a803020e1b2d1 • Minimal Acyclic Subsequential TransducerͰ༡Ϳ - Negative/Positive Thinking
 https://jetbead.hatenablog.com/entry/20151014/1444756877