犬でもわかる Minimal Acyclic Subsequential Transducer / Introduction to Minimal Acyclic Subsequential Transducer

犬でもわかる Minimal Acyclic Subsequential Transducer / Introduction to Minimal Acyclic Subsequential Transducer

はてなの技術勉強会で LT 発表したときの資料です。

13f3313ae1ec1d9b3ed76ccbd746291b?s=128

Takuya Asano

June 27, 2019
Tweet

Transcript

  1. 2.

    FSA ͱ FST • FSA (Finite State Automaton) • ༗ݶঢ়ଶΦʔτϚτϯ

    • ೖྗྻΛडཧ͢Δ͔Ͳ͏͔ͷ bool Λฦ͢ • FST (Finite State Transducer) • ༗ݶঢ়ଶม׵ث • FSA ͷҰछ • ೖྗྻΛडཧͨ͠ͱ͖ɺग़ྗྻΛฦ͢ • Minimal Acyclic Subsequential Transducer ͸ FST ͷҰछ { “onk” } { “onk” => “͓Μ͘” }
  2. 3.

    FST ͷ࢖͍Έͪ • ͍ΘΏΔʮࣙॻҾ͖ʯʹ࢖͑Δ • Ωʔͱ஋ͷϖΞΛอଘͰ͖ΔʢPerl Ͱ͍͏ͱϋογϡͱͯ͠࢖͑Δʣ • ঢ়ଶΛͨͲΔ͚ͩͳͷͰݕࡧ͕ߴ଎ •

    ͱ͘ʹ ڞ௨઀಄ࣙݕࡧ (common prefix search) Ͱ͸༗ར • ΋ͪΖΜ ׬શҰகݕࡧ (exact match) ΋Ͱ͖Δ • ઀಄ࣙ΍઀ඌ͕ࣙڞ༗͞ΕΔͷͰলϝϞϦ
  3. 4.

    FST ͷԠ༻ઌ • ݕࡧΤϯδϯͷࣙॻͱͯ͠ • Apache Lucene ͷίΞΞϧΰϦζϜͱͯ͠ɺ৭Μͳͱ͜ΖͰ࢖ΘΕ͍ͯΔ • ओʹ୯ޠΛϧοΫΞοϓ͢ΔͨΊʹ࢖ΘΕΔ

    • ܗଶૉղੳثͷࣙॻͱͯ͠ • Janome (Python), Kuromoji (Java) Ͱ࠾༻͞Ε͍ͯΔ • ߴ଎ͳ common prefix search ͕ඞཁ • Ի੠ೝࣝͷݴޠϞσϧͱͯ͠ • ॏΈ෇͖ FST (Weighted FST; WFST) ͕࢖ΘΕΔ • https://www.slideshare.net/JiroNishitoba/wfst-61929888
  4. 7.

    Minimal Acyclic Subsequential Transducer ͷߏங • ཧ࿦্࠷খͷ FST Λஞ࣍తʹߏஙͰ͖ΔΞϧΰϦζϜ͕͋Δ •

    ৄ͘͠͸ҎԼͷ࿦จΛಡΜͰʂ • Mihov & Maurel (2001), Direct Construction of Minimal Acyclic Subsequential Transducers
 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698 • ࿦จதͷٙࣅίʔυɺ46ߦ໨͕ؒҧ͑ͯΔ͔ΒؾΛ͚ͭͯͶ • ޡ: SET_OUTPUT • ਖ਼: SET_STATE_OUTPUT
  5. 8.

    Minimal Acyclic Subsequential Transducer ͷ࣮૷ • https://github.com/takuyaa/cdarts • Java Ͱॻ͍ͨ

    • Lucene ͷ FST ΍ jdartsclone ͱൺֱ͢ΔͨΊ • ଞͷ࣮૷ • Java: https://github.com/apache/lucene-solr/tree/master/lucene/core/src/java/org/apache/lucene/util/fst • Go: https://github.com/ikawaha/mast • Python: https://github.com/mocobeta/janome/blob/master/janome/fst.py • Rust: https://github.com/BurntSushi/fst
  6. 9.
  7. 10.

    සग़ӳ୯ޠͷ TRIE ͱ FST • Lucene ͷετοϓϫʔυΛΩʔɺ࿈൪Λ஋ͱͯ͠ߏங • શΩʔ਺: 33

    • શจࣈ਺: 97 • TRIE • ঢ়ଶ਺: 58 • ભҠ਺: 57 • FST (Minimal Acyclic Subsequential Transducer) • ঢ়ଶ਺: 25 • ભҠ਺: 51 FST TRIE
  8. 11.

    ϙέϞϯӳ೔ม׵ثͷ TRIE ͱ FST • ϙέϞϯͷӳޠ໊ΛΩʔɺ೔ຊޠ໊Λ஋ͱͯ͠ߏங • શΩʔ਺: 151 •

    શจࣈ਺: 1103 • TRIE • ঢ়ଶ਺: 809 • ભҠ਺: 808 • FST (Minimal Acyclic Subsequential Transducer) • ঢ়ଶ਺: 459 • ભҠ਺: 604 FST TRIE
  9. 13.

    ࢀߟ • Mihov & Maurel (2001), Direct Construction of Minimal

    Acyclic Subsequential Transducers
 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698 • Finite-state automata and directed acyclic graphs
 http://www.jandaciuk.pl/Fsm_algorithms/ • Changing Bits: Using Finite State Transducers in Lucene
 http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html • moco(beta)'s backup: [຋༁] Using Finite State Transducers in Lucene
 https://mocobeta-backup.tumblr.com/post/105777650158/using-finite-state-transducers-in-lucene • Index 1,600,000,000 Keys with Automata and Rust - Andrew Gallant's Blog
 https://blog.burntsushi.net/transducers/ • moco(beta)'s backup: Lucene FST ͷΞϧΰϦζϜ (1) ʙਤղฤʙ
 https://mocobeta-backup.tumblr.com/post/111076688132/lucene-fst-1 • moco(beta)'s backup: Lucene FST ͷΞϧΰϦζϜ (2) ʙ࣮૷ฤʙ
 https://mocobeta-backup.tumblr.com/post/113693778372/lucene-fst-2 • LuceneͰ࢖ΘΕͯΔFSTΛ࣮૷ͯ͠Έͨʢਖ਼نදݱϚονɿVMΞϓϩʔν΁ͷট଴ʣ - Qiita
 https://qiita.com/ikawaha/items/be95304a803020e1b2d1 • Minimal Acyclic Subsequential TransducerͰ༡Ϳ - Negative/Positive Thinking
 https://jetbead.hatenablog.com/entry/20151014/1444756877