(FBank, MFCC ,etc.) Phone Seq. to Word Phone Sequence Word Speech: FLOW OF THE ASR PROCESSɹ Text:ࠓӍͰ͢ AM (DNN) Lexicon (L.fst) HMM (HC.fst) Grammar (G.fst) LM (HCLG.fst) AM (*.nnet) use use use Training AM on Hadoop and GPU Training LM on Hadoop Developing Decoder Developing Server Side In ASR
And Transcript On GPUs Feat. And Tri-Phone Training Neural Net.(NN) NN based AM Pre-Process MapReduce Training On GPUs Tens of Millions ML Infer An Alignment Tri-Phone Feat. G2P etc.. Data A Couple of Days A Couple of Weeks Developing AM a m e sil-a+m a-m+e e+sil
Use Hadoop) Lex And N-Gram Building WFST WFST based LM Pre-Process Tens of Millions Lots of Processes Fixed prob. WFST Lex And N-Gram. Filtering,G2P etc.. A Day A day and Using A Couple of Hundred GB Memory Not suitable for distributed processing cuz of graph structure Developing LM