Slide 1

Slide 1 text

ÍÎÂÎÅ ÏÎÊÎËÅÍÈÅ ÑÈÑÒÅÌ ÐÀÑÏÎÇÍÀÂÀÍÈß ÐÅ×È Øìûð¼â Í.Â., Ïðèáûëü Ì.À. ÀÖ Òåõíîëîãèè 1 / 24

Slide 2

Slide 2 text

ÀÖ Òåõíîëîãèè ˆ 2003 - ðóññêèé ãîëîñ äëÿ Festival ˆ 2009 - ðóññêèå ìîäåëè äëÿ CMUSphinx, Voxforge ˆ 2011-2013 - CMUSphinx íà Android ˆ 2015-2019 - Kaldi, ìîäåëè äëÿ ðóññêîãî â Kaldi 2 / 24

Slide 3

Slide 3 text

Íàì åñòü, êóäà ñòðåìèòüñÿ ˆ Ñïðîñèòå ó Ãóãëà/ßíäåêñà ïðî òðîïèãàáìà ˆ Õîäèòå âî âðåìÿ äèêòîâêè ˆ Ïåðåáèâàéòå äðóã äðóãà ˆ Âñòàâëÿéòå êèòàéñêèå ñëîâà â ðàçãîâîð ˆ Êðè÷èòå ˆ Âêëþ÷èòå ìóçûêó íà çàäíåì ôîíå 3 / 24

Slide 4

Slide 4 text

Ðàñïîçíàâàíèå ñ îòêðûòûì èñõîäíûì êîäîì ˆ Ëó÷øàÿ òî÷íîñòü/ñêîðîñòü òðåíèðîâêè ˆ òðåíèðîâêà íà ãðÿçíûõ äàííûõ ˆ Äåêîäèðîâàíèå íà GPU ˆ Õîðîøàÿ ïîääåðæêà ˆ Âîñïðîèçâîäèìûå ïðèìåðû 4 / 24

Slide 5

Slide 5 text

Ðàñïîçíàâàíèå ñ îòêðûòûì èñõîäíûì êîäîì Ðåöåïò äëÿ ¾ñâîåé¿ ñèñòåìû: ˆ Ñêà÷àòü Kaldi ˆ Ñêà÷àòü ~ 2000 ÷àñîâ ðå÷è ˆ Íàòðåíèðîâàòü ˆ Çàïóñòèòü íà ñåðâåðå 5 / 24

Slide 6

Slide 6 text

Ðàñïîçíàâàíèå ñ îòêðûòûì èñõîäíûì êîäîì Facebook wav2letter, Mozilla Deep Speech, CTC Decode ˆ Õîðîøàÿ òî÷íîñòü íà 10 òûñÿ÷àõ ÷àñîâ äàííûõ ˆ Íà 1000 ÷àñîâ òî÷íîñòü â äâà ðàçà ìåíüøå Kaldi ˆ Âðåìÿ òðåíèðîâêè  íåñêîëüêî íåäåëü íà 4 GPU ˆ Íåò ìåðû óâåðåííîñòè, âàðèàíòîâ äåêîäèðîâàíèÿ ˆ Âðåìåííûå ìåòêè äîáàâèëè 03/2019 ˆ Ìíîãî ðåêëàìû 6 / 24

Slide 7

Slide 7 text

Ñèíòåç ðå÷è ñ îòêðûòûì èñõîäíûì êîäîì Nvidia Tacotron2 + Waveglow https://github.com/NVIDIA/waveglow ˆ Òðåíèðóåòñÿ 2 íåäåëè íà 2 x RTX2080 ˆ Êà÷åñòâî òàê ñåáå ˆ Íååñòåñòâåííàÿ èíòîíàöèÿ ðå÷è 7 / 24

Slide 8

Slide 8 text

Òåêóùåå ïîëîæåíèå äåë ˆ Îãðîìíûå îáú¼ìû äàííûõ ˆ Ñëîæíûå àðõèòåêòóðû ˆ Äîëãèå è èçíóðèòåëüíûå òðåíèðîâêè ˆ Ìåãàêîðïîðàöèè 8 / 24

Slide 9

Slide 9 text

Ðàñïîçíàâàíèå ðå÷è îò Ãóãëà Âûñòóïëåíèå íà ISCSLP2018 ˆ RNN-T Transducer ˆ 27000 ÷àñîâ ðå÷è c ïîâòîðàìè äëÿ èñêàæåíèé è 500000 ôàìèëèé ~ 200000 ÷àñî⠈ 64 TPU ˆ Ñëîâàðü èç êóñêîâ ñëî⠈ Ìîäåëü äëÿ ìîáèëüíèêîâ 100 Ìá 9 / 24

Slide 10

Slide 10 text

Ìîäåëü BERT îò Ãóãëà BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://github.com/google-research/bert ˆ 16 TPUs (64 TPU ÷èïîâ) ˆ 4 äíÿ íà îäíó òðåíèðîâêó (íóæíî ~ 50 ïîïûòîê) ˆ ~ 10Gb òåêñòà 10 / 24

Slide 11

Slide 11 text

Ìîäåëü XLM îò Ôåéñáóêà Cross-lingual Language Model Pretraining https://github.com/facebookresearch/XLM ˆ 64 Volta GPU ˆ Òðåíèðóåòñÿ íåäåëþ 11 / 24

Slide 12

Slide 12 text

Ñèíòåç ðå÷è îò Àìàçîíà Robust Universal Neural Vocoding https://arxiv.org/pdf/1811.06292.pdf ˆ Îòëè÷íîå êà÷åñòâî ˆ 17 ÿçûêîâ ñðàçó ˆ 74 äèêòîðà ˆ 140 ÷àñîâ çàïèñåé ˆ ~ 20 GPU, ÷òîáû 1 èòåðàöèÿ çàíèìàëà íåäåëþ 12 / 24

Slide 13

Slide 13 text

Äàííûå äëÿ ðàñïîçíàâàíèÿ ðå÷è ˆ Ñðåäíèé òåëåêîì ñîçäà¼ò 10000 ÷àñîâ äàííûõ â äåíü ˆ Ïîëüçîâàòåëè àññèñòåíòîâ  2 ìëí çàïðîñîâ â äåíü (2000 ÷àñîâ) ˆ Òåëåâèäåíèå  100 ÷àñîâ â äåíü, 3000 â ìåñÿö ˆ Youtube  1 ìèëëèîí ÷àñîâ ðå÷è 13 / 24

Slide 14

Slide 14 text

×åëîâå÷åñêèå êà÷åñòâà AI ˆ Îáó÷åíèå íà ãðÿçíûõ è íåðàçìå÷åííûõ äàííûõ ˆ Îáó÷åíèå íà íåñêîëüêèõ ïðèìåðàõ ˆ Íåïðåðûâíîå îáó÷åíèå ˆ Óñòîé÷èâîå ïðèíÿòèå ðåøåíèé ˆ Ïîíÿòíîå ïðèíÿòèå ðåøåíèé ˆ Ïðèìåíåíèå æèçíåííîãî îïûòû ˆ Ïåðåíîñ çíàíèé ìåæäó ïîõîæèìè ñèòóàöèÿìè (ÿçûêàìè, ñòèëÿìè) 14 / 24

Slide 15

Slide 15 text

Çíàíèå åñòü çàïîìèíàíèå Understanding deep learning requires rethinking generalization (2017) https://arxiv.org/abs/1611.03530 Íåéðîñåòè çàïîìèíàþò ñëó÷àéíûå âõîäû 15 / 24

Slide 16

Slide 16 text

Ìîäåëü ÿçûêà ñ ïðîñòûì ñãëàæèâàíèåì Large Language Models in Machine Translation (2007) https://www.aclweb.org/anthology/D07-1090.pdf ˆ Íå èñïîëüçóåòñÿ ñãëàæèâàíèå äëÿ íåçíàêîìûõ n-ãðàì ˆ 1.8 Òá äàííûõ èç ñåòè ˆ BLEU 0.44 ïðîòèâ 0.43 äëÿ ìîäåëè ñî ñãëàæèâàíèåì ˆ Îãðîìíûé ðàçìåð ìîäåëè 16 / 24

Slide 17

Slide 17 text

VOSK ˆ Áàçà ðå÷è 100000 ÷àñî⠈ Ìíîãî ÿçûêî⠈ Áûñòðûé ïîèñê ñ ïîìîùüþ óìíîãî õýøèðîâàíè ˆ Áûñòðîå äîáàâëåíèå íîâûõ ïðèìåðî⠈ Áûñòðàÿ äèàãíîñòèêà ðåçóëüòàòîâ 17 / 24

Slide 18

Slide 18 text

VOSK 18 / 24

Slide 19

Slide 19 text

Ëîêàëüíî-÷óâñòâèòåëüíîå õýøèðîâàíèå 19 / 24

Slide 20

Slide 20 text

Ëîêàëüíî-÷óâñòâèòåëüíîå õýøèðîâàíèå 20 / 24

Slide 21

Slide 21 text

Ëîêàëüíî-÷óâñòâèòåëüíîå õýøèðîâàíèå 1. Ñïåêòðàëüíîå ïðåäñòàâëåíèå 2. Âåéâëåò-ïðåîáðàçîâàíèå, ãëàâíûå êîìïîíåíòû 3. Áèíàðèçàöèÿ 4. Minhash 21 / 24

Slide 22

Slide 22 text

Óñïåõè http://github.com/alphacep/vosk ˆ Áûñòðûé èíäåêñ áîëåå 1000 ÷àñîâ ðå÷è ˆ Äî 50% ñåãìåíòîâ óñïåøíî âåðèôèöèðóþòñÿ ˆ Ìãíîâåííûé ïîèñê è ìîäèôèêàöèÿ áàçû 22 / 24

Slide 23

Slide 23 text

Äàëüíåéøèå ïëàíû http://github.com/alphacep/vosk ˆ Ñåãìåíòàöèÿ áåç Kaldi ˆ Ðàñïðåäåë¼ííàÿ áàçà äàííûõ ˆ Äåêîäèðîâàíèå íà ìîáèëüíûõ ˆ Äåêîäèðîâàíèå íàëîæåííûõ ñèãíàëî⠈ Òåðàáàéòíûå êîìïüþòåðû äëÿ èñêóññòâåííîãî èíòåëëåêòà 23 / 24

Slide 24

Slide 24 text

Êîíòàêòû Github: https://github.com/alphacep/vosk Telegram: https://t.me/cmusphinx Email: [email protected] 24 / 24