$30 off During Our Annual Pro Sale. View Details »

犬でもわかる Minimal Acyclic Subsequential Transducer / Introduction to Minimal Acyclic Subsequential Transducer

犬でもわかる Minimal Acyclic Subsequential Transducer / Introduction to Minimal Acyclic Subsequential Transducer

はてなの技術勉強会で LT 発表したときの資料です。

Takuya Asano

June 27, 2019
Tweet

More Decks by Takuya Asano

Other Decks in Technology

Transcript

  1. ݘͰ΋Θ͔Δ

    Minimal Acyclic Subsequential
    Transducer
    2019-06-27 ͸ͯͳٕज़ษڧձ

    id:takuya-a

    View Slide

  2. FSA ͱ FST
    • FSA (Finite State Automaton)

    • ༗ݶঢ়ଶΦʔτϚτϯ

    • ೖྗྻΛडཧ͢Δ͔Ͳ͏͔ͷ bool Λฦ͢

    • FST (Finite State Transducer)

    • ༗ݶঢ়ଶม׵ث

    • FSA ͷҰछ

    • ೖྗྻΛडཧͨ͠ͱ͖ɺग़ྗྻΛฦ͢

    • Minimal Acyclic Subsequential Transducer ͸ FST ͷҰछ
    { “onk” }
    { “onk” => “͓Μ͘” }

    View Slide

  3. FST ͷ࢖͍Έͪ
    • ͍ΘΏΔʮࣙॻҾ͖ʯʹ࢖͑Δ

    • Ωʔͱ஋ͷϖΞΛอଘͰ͖ΔʢPerl Ͱ͍͏ͱϋογϡͱͯ͠࢖͑Δʣ

    • ঢ়ଶΛͨͲΔ͚ͩͳͷͰݕࡧ͕ߴ଎

    • ͱ͘ʹ ڞ௨઀಄ࣙݕࡧ (common prefix search) Ͱ͸༗ར

    • ΋ͪΖΜ ׬શҰகݕࡧ (exact match) ΋Ͱ͖Δ

    • ઀಄ࣙ΍઀ඌ͕ࣙڞ༗͞ΕΔͷͰলϝϞϦ

    View Slide

  4. FST ͷԠ༻ઌ
    • ݕࡧΤϯδϯͷࣙॻͱͯ͠

    • Apache Lucene ͷίΞΞϧΰϦζϜͱͯ͠ɺ৭Μͳͱ͜ΖͰ࢖ΘΕ͍ͯΔ

    • ओʹ୯ޠΛϧοΫΞοϓ͢ΔͨΊʹ࢖ΘΕΔ

    • ܗଶૉղੳثͷࣙॻͱͯ͠

    • Janome (Python), Kuromoji (Java) Ͱ࠾༻͞Ε͍ͯΔ

    • ߴ଎ͳ common prefix search ͕ඞཁ

    • Ի੠ೝࣝͷݴޠϞσϧͱͯ͠

    • ॏΈ෇͖ FST (Weighted FST; WFST) ͕࢖ΘΕΔ

    • https://www.slideshare.net/JiroNishitoba/wfst-61929888

    View Slide

  5. Minimal Acyclic Subsequential Transducer
    Minimal

    ࠷খͷ

    Acyclic

    ϧʔϓͷͳ͍

    Subsequential

    ෦෼(จࣈ)ྻͷ

    Transducer

    ม׵ث
    “takuya” => “a”

    “takaya” => “n”

    View Slide

  6. TRIE
    • ઀಄ࣙͷΈΛڞ༗͢Δσʔλߏ଄

    • πϦʔʹͳΔ

    • ઀ඌࣙ͸ڞ༗Ͱ͖ͳ͍

    • TAIL ഑ྻͱ͍͏ςΫχοΫͰ

    Ұ෦ڞ༗͸Ͱ͖Δ
    FST
    TRIE

    View Slide

  7. Minimal Acyclic Subsequential Transducer ͷߏங
    • ཧ࿦্࠷খͷ FST Λஞ࣍తʹߏஙͰ͖ΔΞϧΰϦζϜ͕͋Δ

    • ৄ͘͠͸ҎԼͷ࿦จΛಡΜͰʂ

    • Mihov & Maurel (2001), Direct Construction of Minimal Acyclic
    Subsequential Transducers

    http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698

    • ࿦จதͷٙࣅίʔυɺ46ߦ໨͕ؒҧ͑ͯΔ͔ΒؾΛ͚ͭͯͶ

    • ޡ: SET_OUTPUT

    • ਖ਼: SET_STATE_OUTPUT

    View Slide

  8. Minimal Acyclic Subsequential Transducer ͷ࣮૷
    • https://github.com/takuyaa/cdarts

    • Java Ͱॻ͍ͨ

    • Lucene ͷ FST ΍ jdartsclone ͱൺֱ͢ΔͨΊ

    • ଞͷ࣮૷

    • Java: https://github.com/apache/lucene-solr/tree/master/lucene/core/src/java/org/apache/lucene/util/fst

    • Go: https://github.com/ikawaha/mast

    • Python: https://github.com/mocobeta/janome/blob/master/janome/fst.py

    • Rust: https://github.com/BurntSushi/fst

    View Slide

  9. ࣮ݧʂ

    View Slide

  10. සग़ӳ୯ޠͷ TRIE ͱ FST
    • Lucene ͷετοϓϫʔυΛΩʔɺ࿈൪Λ஋ͱͯ͠ߏங

    • શΩʔ਺: 33

    • શจࣈ਺: 97

    • TRIE
    • ঢ়ଶ਺: 58

    • ભҠ਺: 57

    • FST (Minimal Acyclic Subsequential Transducer)

    • ঢ়ଶ਺: 25

    • ભҠ਺: 51
    FST
    TRIE

    View Slide

  11. ϙέϞϯӳ೔ม׵ثͷ TRIE ͱ FST
    • ϙέϞϯͷӳޠ໊ΛΩʔɺ೔ຊޠ໊Λ஋ͱͯ͠ߏங

    • શΩʔ਺: 151

    • શจࣈ਺: 1103

    • TRIE
    • ঢ়ଶ਺: 809

    • ભҠ਺: 808

    • FST (Minimal Acyclic Subsequential Transducer)

    • ঢ়ଶ਺: 459

    • ભҠ਺: 604
    FST
    TRIE

    View Slide

  12. FST Λ֦େͨ͠΋ͷ
    ※ UTF-8 ͰΤϯίʔυ͍ͯͯ͠

    1όΠτ໨͚ͩڞ༗͞ΕͨΓ͢Δ

    ͷͰද্ࣔ͸จࣈԽ͚ͯ͠·͢

    View Slide

  13. ࢀߟ
    • Mihov & Maurel (2001), Direct Construction of Minimal Acyclic Subsequential Transducers

    http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698

    • Finite-state automata and directed acyclic graphs

    http://www.jandaciuk.pl/Fsm_algorithms/

    • Changing Bits: Using Finite State Transducers in Lucene

    http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html

    • moco(beta)'s backup: [຋༁] Using Finite State Transducers in Lucene

    https://mocobeta-backup.tumblr.com/post/105777650158/using-finite-state-transducers-in-lucene

    • Index 1,600,000,000 Keys with Automata and Rust - Andrew Gallant's Blog

    https://blog.burntsushi.net/transducers/

    • moco(beta)'s backup: Lucene FST ͷΞϧΰϦζϜ (1) ʙਤղฤʙ

    https://mocobeta-backup.tumblr.com/post/111076688132/lucene-fst-1

    • moco(beta)'s backup: Lucene FST ͷΞϧΰϦζϜ (2) ʙ࣮૷ฤʙ

    https://mocobeta-backup.tumblr.com/post/113693778372/lucene-fst-2

    • LuceneͰ࢖ΘΕͯΔFSTΛ࣮૷ͯ͠Έͨʢਖ਼نදݱϚονɿVMΞϓϩʔν΁ͷট଴ʣ - Qiita

    https://qiita.com/ikawaha/items/be95304a803020e1b2d1

    • Minimal Acyclic Subsequential TransducerͰ༡Ϳ - Negative/Positive Thinking

    https://jetbead.hatenablog.com/entry/20151014/1444756877

    View Slide