Upgrade to Pro — share decks privately, control downloads, hide ads and more …

音声認識におけるサーバサイド開発 / Server Side on Voice Recognition

音声認識におけるサーバサイド開発 / Server Side on Voice Recognition

2019/7/25に行われたLINE Developer Meetup #56 in KYOTOでの登壇資料です
https://line.connpass.com/event/139283/

53850955f15249a1a9dc49df6113e400?s=128

LINE Developers
PRO

July 25, 2019
Tweet

Transcript

  1. Engineering Ի੠ೝࣝʹ͓͚ΔαʔόαΠυ։ൃ Shuta Ichimura Clova Developer Team @ Kyoto, LINE

    Corp.
  2. Engineering • ࢢଜɹऩଠɹʢΠνϜϥɹγϡ΢λʣ • Clova։ൃνʔϜˏژ౎ • Ի੠ೝࣝΤϯδχΞ Since Sep. 2018

    Ի੠ೝࣝཁૉٕज़։ൃ୲౰ Decoder։ൃ ԻڹϞσϧ։ൃ • झຯɿਆࣾ hopping (ژ౎ɺಸྑɺ۝भɺग़Ӣ etc..) SELF-INTRO
  3. Engineering Server SCOPE OF TODAY NSpeech (Decoder) NLU Models (AM,LM)

    NVoice (Speech Synthesis) Clova Developer Team @ Kyoto Today’s scope is the ASR
  4. Engineering CONTENTSɹ 1. Overview of the ASR 2. Developing Models(AM,

    LM) 3. Developing Decoder 4. Q & A
  5. Engineering OVERVIEW OF THE ASR p(W|X) ≈ argmax p(X|W) p(W)

    AM Recog. Result LM Decoding Xfeatures = x1 , x2 . . xk Wwords = w1 , w2 . . wn Building in advance Decoding online
  6. Engineering Extract Feature Features to Phone Word to Sentence Features

    (FBank, MFCC ,etc.) Phone Seq. to Word Phone Sequence Word Speech: FLOW OF THE ASR PROCESSɹ Text:ࠓ೔͸ӍͰ͢ AM (DNN) Lexicon (L.fst) HMM (HC.fst) Grammar (G.fst) LM (HCLG.fst) AM (*.nnet) use use use Training AM on Hadoop and GPU Training LM on Hadoop Developing Decoder Developing Server Side In ASR
  7. Engineering DEVELOPING MODELS ։ൃεύϯ Acoustics Model(AM) جຊతʹఆظతͳϞσϧֶश͸ߦΘͳ͍ ԻڹతͳཁҼʢϚΠΫɺഎܠࡶԻetcʣ͕มΘΕ͹࠶ֶशΛߦ͏ Language Model(LM)

    جຊతʹຖि࡞੒͠Ξοϓσʔτ͍ͯ͠Δ ৽ޠͳͲੈͷதͷྲྀߦʹରԠ͠ͳ͍ͱ͍͚ͳ͍ͨΊ
  8. Engineering Developing An Acoustic Model

  9. Engineering On Hadoop Training Mono-Phone Training Tri-Phone Force Alignment Feat.

    And Transcript On GPUs Feat. And Tri-Phone Training Neural Net.(NN) NN based AM Pre-Process MapReduce Training On GPUs Tens of Millions ML Infer An Alignment Tri-Phone Feat. G2P etc.. Data A Couple of Days A Couple of Weeks Developing AM a m e sil-a+m a-m+e e+sil
  10. Engineering Developing A Language Model

  11. Engineering On Hadoop Counting Words Building N-Gram Transcripts(Corpuses) On CPU(Can’t

    Use Hadoop) Lex And N-Gram Building WFST WFST based LM Pre-Process Tens of Millions Lots of Processes Fixed prob. WFST Lex And N-Gram. Filtering,G2P etc.. A Day A day and Using A Couple of Hundred GB Memory Not suitable for distributed processing cuz of graph structure Developing LM
  12. Engineering Developing Decoder

  13. Engineering ௿ϨΠςϯγ͕ٻΊΒΕΔʢTAT:਺ඦmsecҎ಺ʣ ೝࣝύϥϝʔλνϡʔχϯάʢ଎౓ͱੑೳͷτϨʔυΦϑʣ ੑೳվળʢجຊతʹϞσϧ࠶ֶशͰ͸෭࡞༻͕େ͖͘ͳΓͦ͏ͳ՝୊ʹରͯ͠ରԠʣ υϝΠϯରԠ υϝΠϯϞσϧΛϕʔεʢେޠኮʣʹରͯ͠Ϛʔδ ˠઐ໳༻ޠͳͲͷҰൠతͰͳ͍୯ޠ͕ೝࣝͰ͖ΔΑ͏ʹ Corner-CaseରԠ ݱঢ়ͷAMɺLMͰؒҧ͑΍͍͢ൃ࿩ʹରͯ͠ॲஔΛߦ͏ ˠ

    AM scoreʹΑΔɺreranking ΍ pruning e.g.) ؒҧ͑΍͍͢έʔε 24 -> 24ճ ൃ࿩ͯ͠ͳ͍͕ɺʮճʯ͕ग़ྗ͞ΕΔ ͶʔͣʔΈʔ ˠ ϝʔϧ12 ؒԆͼͨ͠ൃ࿩ελΠϧͰσλϥϝͳ݁ՌͱͳΔ Text Normalization ೫໦ࡔϑΥʔςΟʔΤΠτ ˠ ೫໦ࡔ46 Head line new ˠ ϔουϥΠϯχϡʔε ޡೝࣝ෼ੳ ԻڹతͳཁҼ͔ݴޠతͳཁҼ͔Λ෼ੳ͢Δҝͷπʔϧ։ൃ Ϟσϧֶश͓ΑͼDecoder։ൃ΁ϑΟʔυόοΫ
  14. Engineering LANGUAGE ETC.. • C/C++, python, scala etc.. • Kaldi,

    Hadoop
  15. Engineering Thank you for your attention Questions ? END OF

    DOCUMENT Developers