Upgrade to Pro — share decks privately, control downloads, hide ads and more …

音声認識におけるサーバサイド開発 / Server Side on Voice Recognition

音声認識におけるサーバサイド開発 / Server Side on Voice Recognition

2019/7/25に行われたLINE Developer Meetup #56 in KYOTOでの登壇資料です
https://line.connpass.com/event/139283/

LINE Developers

July 25, 2019
Tweet

More Decks by LINE Developers

Other Decks in Programming

Transcript

  1. Engineering • ࢢଜɹऩଠɹʢΠνϜϥɹγϡ΢λʣ • Clova։ൃνʔϜˏژ౎ • Ի੠ೝࣝΤϯδχΞ Since Sep. 2018

    Ի੠ೝࣝཁૉٕज़։ൃ୲౰ Decoder։ൃ ԻڹϞσϧ։ൃ • झຯɿਆࣾ hopping (ژ౎ɺಸྑɺ۝भɺग़Ӣ etc..) SELF-INTRO
  2. Engineering Server SCOPE OF TODAY NSpeech (Decoder) NLU Models (AM,LM)

    NVoice (Speech Synthesis) Clova Developer Team @ Kyoto Today’s scope is the ASR
  3. Engineering OVERVIEW OF THE ASR p(W|X) ≈ argmax p(X|W) p(W)

    AM Recog. Result LM Decoding Xfeatures = x1 , x2 . . xk Wwords = w1 , w2 . . wn Building in advance Decoding online
  4. Engineering Extract Feature Features to Phone Word to Sentence Features

    (FBank, MFCC ,etc.) Phone Seq. to Word Phone Sequence Word Speech: FLOW OF THE ASR PROCESSɹ Text:ࠓ೔͸ӍͰ͢ AM (DNN) Lexicon (L.fst) HMM (HC.fst) Grammar (G.fst) LM (HCLG.fst) AM (*.nnet) use use use Training AM on Hadoop and GPU Training LM on Hadoop Developing Decoder Developing Server Side In ASR
  5. Engineering On Hadoop Training Mono-Phone Training Tri-Phone Force Alignment Feat.

    And Transcript On GPUs Feat. And Tri-Phone Training Neural Net.(NN) NN based AM Pre-Process MapReduce Training On GPUs Tens of Millions ML Infer An Alignment Tri-Phone Feat. G2P etc.. Data A Couple of Days A Couple of Weeks Developing AM a m e sil-a+m a-m+e e+sil
  6. Engineering On Hadoop Counting Words Building N-Gram Transcripts(Corpuses) On CPU(Can’t

    Use Hadoop) Lex And N-Gram Building WFST WFST based LM Pre-Process Tens of Millions Lots of Processes Fixed prob. WFST Lex And N-Gram. Filtering,G2P etc.. A Day A day and Using A Couple of Hundred GB Memory Not suitable for distributed processing cuz of graph structure Developing LM
  7. Engineering ௿ϨΠςϯγ͕ٻΊΒΕΔʢTAT:਺ඦmsecҎ಺ʣ ೝࣝύϥϝʔλνϡʔχϯάʢ଎౓ͱੑೳͷτϨʔυΦϑʣ ੑೳվળʢجຊతʹϞσϧ࠶ֶशͰ͸෭࡞༻͕େ͖͘ͳΓͦ͏ͳ՝୊ʹରͯ͠ରԠʣ υϝΠϯରԠ υϝΠϯϞσϧΛϕʔεʢେޠኮʣʹରͯ͠Ϛʔδ ˠઐ໳༻ޠͳͲͷҰൠతͰͳ͍୯ޠ͕ೝࣝͰ͖ΔΑ͏ʹ Corner-CaseରԠ ݱঢ়ͷAMɺLMͰؒҧ͑΍͍͢ൃ࿩ʹରͯ͠ॲஔΛߦ͏ ˠ

    AM scoreʹΑΔɺreranking ΍ pruning e.g.) ؒҧ͑΍͍͢έʔε 24 -> 24ճ ൃ࿩ͯ͠ͳ͍͕ɺʮճʯ͕ग़ྗ͞ΕΔ ͶʔͣʔΈʔ ˠ ϝʔϧ12 ؒԆͼͨ͠ൃ࿩ελΠϧͰσλϥϝͳ݁ՌͱͳΔ Text Normalization ೫໦ࡔϑΥʔςΟʔΤΠτ ˠ ೫໦ࡔ46 Head line new ˠ ϔουϥΠϯχϡʔε ޡೝࣝ෼ੳ ԻڹతͳཁҼ͔ݴޠతͳཁҼ͔Λ෼ੳ͢Δҝͷπʔϧ։ൃ Ϟσϧֶश͓ΑͼDecoder։ൃ΁ϑΟʔυόοΫ