Upgrade to Pro — share decks privately, control downloads, hide ads and more …

犬でもわかる Minimal Acyclic Subsequential Transducer...

犬でもわかる Minimal Acyclic Subsequential Transducer / Introduction to Minimal Acyclic Subsequential Transducer

はてなの技術勉強会で LT 発表したときの資料です。

Avatar for Takuya Asano

Takuya Asano

June 27, 2019
Tweet

More Decks by Takuya Asano

Other Decks in Technology

Transcript

  1. FSA ͱ FST • FSA (Finite State Automaton) • ༗ݶঢ়ଶΦʔτϚτϯ

    • ೖྗྻΛडཧ͢Δ͔Ͳ͏͔ͷ bool Λฦ͢ • FST (Finite State Transducer) • ༗ݶঢ়ଶม׵ث • FSA ͷҰछ • ೖྗྻΛडཧͨ͠ͱ͖ɺग़ྗྻΛฦ͢ • Minimal Acyclic Subsequential Transducer ͸ FST ͷҰछ { “onk” } { “onk” => “͓Μ͘” }
  2. FST ͷ࢖͍Έͪ • ͍ΘΏΔʮࣙॻҾ͖ʯʹ࢖͑Δ • Ωʔͱ஋ͷϖΞΛอଘͰ͖ΔʢPerl Ͱ͍͏ͱϋογϡͱͯ͠࢖͑Δʣ • ঢ়ଶΛͨͲΔ͚ͩͳͷͰݕࡧ͕ߴ଎ •

    ͱ͘ʹ ڞ௨઀಄ࣙݕࡧ (common prefix search) Ͱ͸༗ར • ΋ͪΖΜ ׬શҰகݕࡧ (exact match) ΋Ͱ͖Δ • ઀಄ࣙ΍઀ඌ͕ࣙڞ༗͞ΕΔͷͰলϝϞϦ
  3. FST ͷԠ༻ઌ • ݕࡧΤϯδϯͷࣙॻͱͯ͠ • Apache Lucene ͷίΞΞϧΰϦζϜͱͯ͠ɺ৭Μͳͱ͜ΖͰ࢖ΘΕ͍ͯΔ • ओʹ୯ޠΛϧοΫΞοϓ͢ΔͨΊʹ࢖ΘΕΔ

    • ܗଶૉղੳثͷࣙॻͱͯ͠ • Janome (Python), Kuromoji (Java) Ͱ࠾༻͞Ε͍ͯΔ • ߴ଎ͳ common prefix search ͕ඞཁ • Ի੠ೝࣝͷݴޠϞσϧͱͯ͠ • ॏΈ෇͖ FST (Weighted FST; WFST) ͕࢖ΘΕΔ • https://www.slideshare.net/JiroNishitoba/wfst-61929888
  4. Minimal Acyclic Subsequential Transducer ͷߏங • ཧ࿦্࠷খͷ FST Λஞ࣍తʹߏஙͰ͖ΔΞϧΰϦζϜ͕͋Δ •

    ৄ͘͠͸ҎԼͷ࿦จΛಡΜͰʂ • Mihov & Maurel (2001), Direct Construction of Minimal Acyclic Subsequential Transducers
 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698 • ࿦จதͷٙࣅίʔυɺ46ߦ໨͕ؒҧ͑ͯΔ͔ΒؾΛ͚ͭͯͶ • ޡ: SET_OUTPUT • ਖ਼: SET_STATE_OUTPUT
  5. Minimal Acyclic Subsequential Transducer ͷ࣮૷ • https://github.com/takuyaa/cdarts • Java Ͱॻ͍ͨ

    • Lucene ͷ FST ΍ jdartsclone ͱൺֱ͢ΔͨΊ • ଞͷ࣮૷ • Java: https://github.com/apache/lucene-solr/tree/master/lucene/core/src/java/org/apache/lucene/util/fst • Go: https://github.com/ikawaha/mast • Python: https://github.com/mocobeta/janome/blob/master/janome/fst.py • Rust: https://github.com/BurntSushi/fst
  6. සग़ӳ୯ޠͷ TRIE ͱ FST • Lucene ͷετοϓϫʔυΛΩʔɺ࿈൪Λ஋ͱͯ͠ߏங • શΩʔ਺: 33

    • શจࣈ਺: 97 • TRIE • ঢ়ଶ਺: 58 • ભҠ਺: 57 • FST (Minimal Acyclic Subsequential Transducer) • ঢ়ଶ਺: 25 • ભҠ਺: 51 FST TRIE
  7. ϙέϞϯӳ೔ม׵ثͷ TRIE ͱ FST • ϙέϞϯͷӳޠ໊ΛΩʔɺ೔ຊޠ໊Λ஋ͱͯ͠ߏங • શΩʔ਺: 151 •

    શจࣈ਺: 1103 • TRIE • ঢ়ଶ਺: 809 • ભҠ਺: 808 • FST (Minimal Acyclic Subsequential Transducer) • ঢ়ଶ਺: 459 • ભҠ਺: 604 FST TRIE
  8. ࢀߟ • Mihov & Maurel (2001), Direct Construction of Minimal

    Acyclic Subsequential Transducers
 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698 • Finite-state automata and directed acyclic graphs
 http://www.jandaciuk.pl/Fsm_algorithms/ • Changing Bits: Using Finite State Transducers in Lucene
 http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html • moco(beta)'s backup: [຋༁] Using Finite State Transducers in Lucene
 https://mocobeta-backup.tumblr.com/post/105777650158/using-finite-state-transducers-in-lucene • Index 1,600,000,000 Keys with Automata and Rust - Andrew Gallant's Blog
 https://blog.burntsushi.net/transducers/ • moco(beta)'s backup: Lucene FST ͷΞϧΰϦζϜ (1) ʙਤղฤʙ
 https://mocobeta-backup.tumblr.com/post/111076688132/lucene-fst-1 • moco(beta)'s backup: Lucene FST ͷΞϧΰϦζϜ (2) ʙ࣮૷ฤʙ
 https://mocobeta-backup.tumblr.com/post/113693778372/lucene-fst-2 • LuceneͰ࢖ΘΕͯΔFSTΛ࣮૷ͯ͠Έͨʢਖ਼نදݱϚονɿVMΞϓϩʔν΁ͷট଴ʣ - Qiita
 https://qiita.com/ikawaha/items/be95304a803020e1b2d1 • Minimal Acyclic Subsequential TransducerͰ༡Ϳ - Negative/Positive Thinking
 https://jetbead.hatenablog.com/entry/20151014/1444756877