All you need is Google's AI Tech: I/O Extended Seoul 2019

Google I/O Extended Seoul All you need is Google’s AI
Tech Junseong Kim (ӣળࢿ) Machine Learning Engineer at ScatterLab

ӣળࢿ Machine Learning Engineer at ScatterLab Pingpong AI Research Team
(Junseong Kim) Crazy at NLP! Open-Domain Dialog System / Chatbot Sentence Representation Model Neural Machine Translation Machine Learning on Service github.com/codertimo fb.com/codertimo linkedin.com/in/codertimo codertimo.github.io

Before Get Started… ਗېח Google I/O 2019ীࢲ աৡ ML ࣁٜ࣌ਸ
جইࠁݶࢲ Overviewܳ ೡ ৘੿੉঻ਵա… (ࢎ੹ী 1दр ࠙۝ਵ۽ Google I/Oܳ য়ߡ࠭ೠ ੗ܐܳ ٜ݅য ف঻਺਷ উ࠺޻)

Before Get Started… ژ ׮ܲ Overview ࣁ࣌੉ ߄۽ খী…

Before Get Started… ੉ۧѱ ೧ࢲח ௾ੌաѷ׮!!! ޥо ؀উਸ ଺ইঠ ೧
“All you need is Google’s AI”ী Ѧݏח ߊ಴ ઱ઁо ޤо ੓ਸө?

Before Get Started… ٥ ࢶࢤש੉ ೧઱न ݈ॹ : “Google AIח
੹ࣁ҅ ୭Ҋ੄ AI োҳ ӝҙੑפ׮”

Publications (2018-2019) Google AI Opened 1063 Publication for 1 year

Google I/O AI Topics People + AI AI Fairness Tensorflow
Families (Lite, TFX, TF.js …) Jeff Dean, Hinton Presentation/Fireside Chat AutoML on Google Cloud TPU Pod / Coral Edge TPU Climate Changes Federated Learning Google AI ܻࢲ஖ Ѿҗ ࠁ׮ח AI ઙࢎ੗ٜਸ ਤೠ Product ೞ૑݅ ҳӖ I/O ীࢲח Product ੉ঠӝо ઱ܳ ੉ܟ਺

Let’s Extend Google I/O! ҳӖ AIীࢲ 18-19֙ ࢎ੉ী ߊ಴ೠ ഴܯೠ
োҳ Ѿҗٜ੉ ݆਷ؘ.. Ӓ۠ ղਊ਷ Ѣ੄ হ֎.. য়! Ӓ۞ݶ Research Overviewܳ ೧ࠁ੗! ҳӖ੉ ݅ٚ ഴܯೠ ೟ण ߑߨٜҗ ݽ؛ٜਸ ୨੿ܻ ೧ࠁחѢঠ!

Let’s Extend Google I/O! ҳӖ AIীࢲ 18-19֙ ࢎ੉ী ߊ಴ೠ ഴܯೠ
োҳ Ѿҗٜ੉ ݆਷ؘ.. Ӓ۠ ղਊ਷ Ѣ੄ হ֎.. য়! Ӓ۞ݶ Research Overviewܳ ೧ࠁ੗! ҳӖ੉ ݅ٚ ഴܯೠ ೟ण ߑߨٜҗ ݽ؛ٜਸ ୨੿ܻ ೧ࠁחѢঠ! ױ, MLূ૑פযо ইצ ੌ߈ ୒઺ٜب ੉೧ೡ ࣻ ੓ب۾ औѱ ੽Ӕ೧ࠁ੗!

• Publications • NLP • BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding • Evolved Transformer : NAS for better BERT model • Vision • Self-Supervised Tracking via Video Colorization • Speech Generation • Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron • Direct speech-to-speech translation with a sequence-to-sequence model • Speech Recognition • Streaming End-to-end Speech Recognition For Mobile Devices

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 11
Oct 2018 #1

Motivation ୭Ӕ ׮নೠ োҳܳ ా೧ࢲ যڃ ࠙ঠী ؀೧ ݽ؛੄ ‘੉೧’ܳ
֫੉ݶ ೧׼ ࠙ঠܳ ഻ঁ ࡅܰҊ ؊ ੿ഛೞѱ ೟णೠ׮ח Ѫ੉ ૐݺغ঻਺ BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding ৈ۞ ݽ؛ীࢲ ࢎਊೡ ࣻ ੓ח ӝ߈ ӝࣿ(ݽ؛)ਸ ݅٘ח Ѫ੉ ઺ਃೠ োҳ઱ઁ

` Motivation 1. Word2Vec / Bag-of-Words 2. ా҅ӝ߈ Language Modeling
(2000~) 3. Bi-LSTM Language Modeling (2012~) 4. Multi-Layer BI-LSTM Language Modeling (2014) 5. ELMo: Deep Contextualized word representation (2017) 6. Universal Language Model Fine-Tuning for Text Classification (2018) 7. BERT: Pre-training of Deep Bidirectional Transformer for Language Modeling (2018) 8. GPT-2 : Language Models are Unsupervised Multitask Learners 8. XLNet: Generalized Autoregressive Pre-training for Language Modeling (2019) Transfer Learning History of NLP

Motivation BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Ӓ۞ݶ ఫझ౟ܳ ੉೧ೡ ࣻ ੓ب۾ যڌѱ ݽ؛ਸ ೟ण दః૑? Autoregressive Language Modeling ੉੹ ױযٜਸ ઱Ҋ Ӓ ׮਺ ױয ݏ୶ӝ [‘৬’, ‘૓૞’, ‘ߓҊ೐׮’, ‘..’, ‘ޤ’] -> [‘ݡ૑?’] Input Target output LSTM-LM, ELMo, GPT

ࣚ൜޹਷ ష౟ֈ ୭Ҋ੄ ҕѺࣻ۽ ੗ܻ ੟ও׮ BERT: Pre-training of Deep
Bidirectional Transformers for Language Understanding Model

ࣚ൜޹਷ ష౟ֈ ୭Ҋ੄ ҕѺࣻ۽ ੗ܻ ੟ও׮ ࣚ൜޹਷ ష౟ֈ ୭Ҋ੄ ҕѺࣻ۽
੗ܻ ੟ও׮ ݠन۞׬ই ੉۠Ѣ ݏ୶ݶࢲ ঱যખ ҕࠗ೧ࠊۄ! ݃! BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Model

ࣚ൜޹਷ ష౟ֈ ୭Ҋ੄ ҕѺࣻ۽ ੗ܻ ੟ও׮ ࣚ൜޹਷ ష౟ֈ ୭Ҋ੄ ҕѺࣻ۽
੗ܻ ੟ও׮ ݠन۞׬ই ੉۠Ѣ ݏ୶ݶࢲ ঱যખ ҕࠗ೧ࠊۄ! ݃! ֎ ޤ ੉۠ѩפ׮ BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Model

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Model
ߊ಴ ઁݾ੄ ৔х੉ӝب ೠ… Attention is All You Need Self-Attentionӝ߈ Transformer Encoderܳ ৈ۞க ऺও਺

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Model
Transformer Layer Multi-layer (12-layer base model) … …

֎ Ӓۗ؊פ ࢎۈਸ ੉҂যਃ Result BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Train
Pretrainingীח Wikipediaա BookCorpus ୊ۢ ؀۝੄ ఫझ౟ ؘ੉ఠ ࢎਊ (؀۝੄ दр) ঱যܳ ੉೧ೞח ೟ण

Pretrainingীח Wikipediaա BookCorpus ୊ۢ ؀۝੄ ఫझ౟ ؘ੉ఠ ࢎਊ (؀۝੄ दр) ঱যܳ ੉೧ೞח ೟ण Cloud TPU Pod 512 Core ($384 USD/h)

Fine-Tuning Sub Task ৘) х੿ ࠙ࢳ Text Classification ޙઁ पઁ పझ௼ী ؀೧ࢲח ۨ੉࠶݂ ػ ؘ੉ఠࣇਵ۽ Fine-Tuning (঴݃ উѦܿ)

2018֙ب ML࠙ঠ੄ BEST Paper۽ ੋ੿߉Ҋ ੓਺

Github झఋ іࣻ݅ 16,190ѐ…

୹୊: ೠࢿ޹ש ಕ࠘ BERTܳ ೟णदఆ ٸ ߊࢤೞח CO2নਸ ࠺౸ೞח ֤ޙ੉ աৢ੿ب..

The Evolved Transformer 30 Jan 2019 #2

The Evolved Transformer Motivation ୭Ӕী Vision ࠙ঠীࢲ AutoMLਸ ੉ਊೠ Neural
Architecture Searchо ഝߊೞѱ োҳؽ ӝ҅о ݅ٚ ݽ؛੉ ӝઓ੉ ࢎۈ੉ ݅ٚ Hand-Craft ݽ؛ࠁ׮ ࣘبա ࢿמীࢲ જ਷ Ѿҗܳ ࠁৈ઱Ҋ ੓਺ https://ai.googleblog.com/2017/11/automl-for-large-scale-image.html

The Evolved Transformer Motivation Attention All You Need ীࢲ ઁউػ
Transformerח  Feed Forward NN੉ RNN ࠁ׮ જ਷ ࢿמਸ ыਸ ࣻ ੓ח Ѫਸ ࠁৈ઱঻਺ Ӓېࢲ ҳӖীࢲח Transformer ҳઑܳ   ࣗࢸਸ ॳח ੘ࢿೞѢա ޭ૑ஸ ਺ঈਸ ੘ҋೞח ݽ؛ٜী ੸ӓ੸ਵ۽ ࢎਊೞח ઺੐ https://arxiv.org/abs/1706.03762 https://openai.com/blog/better-language-models/#sample5 GPT-2: Better Language Modeling for Generation https://magenta.tensorﬂow.org/music-transformer Music Transformer: Generating Music With Long-Term Structure

The Evolved Transformer Motivation Ӓ۞ݶ Transformer Architecture ܳ ӝ߈ਵ۽ ࠁ׮
؊ જ਷ ݽ؛ਸ NAS۽ ଺ਸ ࣻ ੓૑ ঋਸө? Applying AutoML to Transformer Architecture https://ai.googleblog.com/2019/06/applying-automl-to-transformer.html

The Evolved Transformer Novel Method Developing the Techniques To begin
the evolutionary NAS, it was necessary for us to develop new techniques, due to the fact that the task used to evaluate the “fitness” of each architecture, WMT’14 English-Germantranslation, is computationally expensive. This makes the searches more expensive than similar searches executed in the vision domain, which can leverage smaller datasets, like CIFAR-10. The first of these techniques is warm starting— seeding the initial evolution population with the Transformer architecture instead of random models. This helps ground the search in an area of the search space we know is strong, thereby allowing it to find better models faster. The second technique is a new method we developed called Progressive Dynamic Hurdles (PDH), an algorithm that augments the evolutionary search to allocate more resources to the strongest candidates, in contrast to previous works, where each candidate model of the NAS is allocated the same amount of resources when it is being evaluated. PDH allows us to terminate the evaluation of a model early if it is flagrantly bad, allowing promising architectures to be awarded more resources. 1. ׳ࢿೞҊ੗ ೞח పझ௼о ߣ৉ పझ௼੉ӝ ٸޙী Trainingਸ ೠߣೡ ٸ ݃׮ য়ے ೟ण੉ ೙ਃೣ 2. Ӓؘ۠ Random Modelࠗఠ Searching ਸ द੘ೞݶ Spaceо ցޖ և਺ 3. Transformer ݽ؛ਸ ૓ച੄ द੘(warm-up)ਵ۽ ࣁ౴ೞҊ ଺ইࠁ੗! (೟ण ߑߨ 1) 4. ૓ചܳ ೞݶࢲ ؊ ੜೞח ݽ؛ٜী ૘઺੸ਵ۽ ܻࣗझܳ ૘઺ೣ (࢜۽ ઁউೠ / ೟ण ߑߨ 2) ਃড:

The Evolved Transformer Model য়ܲଃ੉ NAS AutoML੉ ଺਷ Ѿҗ  Evolved
Transformer Encoder Block ೞ૑݅ ӝઓ Transformer ҳઑ৬ ׮ܰѱ Self-Attention ࡺ݅ ইפۄ Wide-Convolutionਸ Hybrid ೞѱ ࢎਊೞח ߑधਸ ࢎਊೞ৓਺ ৽ଃҗ ࠺Ү೧ࢲ ௼ѱ ׮ܰ૑ ঋ਷ ҳઑܳ ы਺

The Evolved Transformer Result ߣ৉ పझ௼ীࢲ ੘૑݅ ӝઓ Transformerࠁ׮ જ਷
Perplexity, BLEU Score ׳ࢿೣ Language Modeling 1B Taskীࢲח 2.2੼ ੿ب੄ Perplexity ೱ࢚੉ ੓঻਺

Self-Supervised Tracking via Video Colorization June 27, 2018 #3

Self -Supervised Tracking   via Video Colorization Motivation Video Object
Tracking਷ Pose-Estimation, Object Injection, Video Stylization ١ ਑૒੉ח Objectܳ ׮ܖח పझ௼੄ ࢿמਸ Ѿ੿ೞח ҭ੢൤ ઺ਃೠ ݽ؛

Self -Supervised Tracking   via Video Colorization Motivation Object Trackingਸ
ೞӝ ਤ೧ࢲח ؀۝੄ Pixelױਤ੄ Labeling ੉ ೙ਃ۽ ೮਺ Ӓ۞ա ೗ࣄxदр(࠺٣য়) ױਤ੄ labeling ਷ ੿݈ ൨ٜҊ ݆਷ दр੉ ೙ਃ۽ ೣ 1ୡী 30frame੉ݶ.. 1࠙੉ݶ 1,800 imageী ؀ೠ pixelױਤ ۨ੉࠶݂

Motivation Self -Supervised Tracking   via Video Colorization ز৔࢚਷ غѱ
݆਷ؘ ۨ੉࠶݂਷ ࠛоמೣ… ۨ੉࠶݂ হ੉ ؀۝੄ ز৔࢚ਸ ഝਊ೧ࢲ Object Trackingਸ ೟णೡ ࣻ হਸө..? Youtube

Motivation Self -Supervised Tracking   via Video Colorization ੹୓ Video
Frame઺ ೠ Frame݅ ࢝ӭ੉ ੓ח ࢚క۽ ળ׮ݶ  ࢝ਸ ૑਍ Video(Frames)ٜਸ ࢝சೡ ࣻ ੓ਸө? Ӓ۞ݶ ૒੽ ۨ੉࠶݂ਸ ೞ૑ ঋইب ਑૒੉ח ё୓ী ؀ೠ ੉೧بо ֫ই૑૑ ঋਸө? ৵ջݶ ੿૑ػ ੉޷૑ܳ ଵઑ೧ࢲ ਑૒੉ח ё୓ী ؀ೠ ࢝ਸ ச೧ঠ ೞפ?!

Model Self -Supervised Tracking   via Video Colorization Reference Frame
Input Frames Color Predicted Frames

Model Self -Supervised Tracking   via Video Colorization ݽ؛੉ ؀۝੄
Unlabed Video۽ ೟णغয ઎חؘ, ݽ؛੉ যڃ ੼ਸ ߓਛח૑ insightܳ ঳Ҋ र਺ Pixel ױਤ੄ Videoܳ Encodingೞח Embedding Layerܳ PCA۽ 3ରਗച दெࠌ਺ пп੄ ё୓о ׮ܲ embedding spaceীࢲ ੗োझۣѱ ܻ࠙غয ੓঻਺ -> Embedding Nearest Neighborח э਷ ё୓۽ ࢤпೡ ࣻ ੓਺

Result Self -Supervised Tracking   via Video Colorization Tracking Pose
(JHMDB) పझ௼ীࢲب Supervised-Label-Data ೞա হ੉ ӝઓ ݽ؛(Heavily Supervised)җ Ѣ੄ ਬࢎೠ ࢿמਸ ࠁৈ઱঻਺ ೞ૑݅ ঑ب੸ਵ۽ જ਷ ࢿמਸ ࠁৈ઱૑ח ޅ೮਺ (৘࢚җח ׮ܰѱ?)

Future Works Self -Supervised Tracking   via Video Colorization Future
Work Our results show that video colorization provides a signal that can be used for learning to track objects in videos without supervision. Moreover, we found that the failures from our system are correlated with failures to colorize the video, which suggests that further improving the video colorization model can advance progress in self-supervised tracking. ખ ؊ ੗ࣁೞѱ ࠁפӬ Colorization੉ ੜ উغࢲ Tracking੉ ੜ উػ ҃਋о ؀ࠗ࠙੉঻਺ Colorizationਸ ؊ ੜೡ ࣻ ੓ח ݽ؛ਸ ৌब൤ ٜ݅ ৘੿੉׮ (ۨ੉࠶ਸ ؊ ೞחѱ ইפۄ)

Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
March 27, 2018 #4

Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
Motivation aka. Tacatron 2 / Shen, Jonathan, et al. "Natural tts synthesis by conditioning wavenet on mel spectrogram predictions.” 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018. ޙ੢ਸ ֍ਵݶ ࢎۈ੉ ݈ೞח Ѫ э਷ ਺ࢿਸ Generation ೞח Tacatron 2 (17֙ 10ਘ)

Motivation https://carpedm20.github.io/tacotron/ Towards End-to-End Prosody Transfer for Expressive Speech Synthesis
with Tacotron

Motivation ೞ૑݅ ࢎۈ੄ ঱যۄח Ѫ਷ ࢚ട, х੿, ࢿѺী ٮۄ ݈ೞח
झఋੌ੉ ׮ ׮ܴ ݃஖ ইࡅо ই੉ٜীѱ زചܳ ੍য઴ ٸ நܼఠܳ ൑ղղݶࢲ ݈ೞח Ѫ ୊ۢ ױࣽ൤ ޙ੢(str)ਸ Voice ۽ ࢤࢿ दఃח ѱ ইפۄ রনਸ ଵઑ೧ࢲ ࢤࢿ ೧ࠁ੗ Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

Motivation Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with
Tacotron (Reference Audio) ъഐز : ੉Ѣú ੿݈ú ݍ੓णפ׮~! Input Voice(Tacatron Output) ࣚࢳ൞: ੉Ѣ ੿݈ ݍ੓णפ׮. Output Voice ࣚࢳ൞: ੉Ѣú ੿݈ú ݍ੓णפ׮~!

Model রন Embeddingਵ۽ ୶о೧ࢲ Conditional Generationਸ ݅ٞ Towards End-to-End Prosody
Transfer for Expressive Speech Synthesis with Tacotron п Tokenী ؀ೠ Attentionਸ ୶о೧ࢲ ਺ࢿীࢲ যڃ ࠗ࠙੄ রনਸ ٮۄ ೧ঠ ೡ૑ ଵઑ

Result Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with
Tacotron https://ai.googleblog.com/2018/03/expressive-speech-synthesis-with.html Textо ઱য઎ਸ ٸ, Reference Voiceܳ ੉ਊ೧ࢲ ೧׼ রনਸ ߈৔ೠ Voice Generation

Problem Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with
Tacotron https://ai.googleblog.com/2018/03/expressive-speech-synthesis-with.html ೞ૑݅ Input Sentenceо زੌೠ Referenceо ೦࢚ ೙ਃೣ Ӓ ࢎۈ੉ ݈ೞח झఋੌ ੗୓ܳ ೞա੄ Embeddingਵ۽ ٜ݅ ࣻ হਸө? Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis п Style݃׮ Tacatron ਸ ೟णೡ ೙ਃ হ੉ Conditional Variableਸ ֍যࢲ झఋੌ ੑ൤ӝ

Problem Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with
Tacotron https://ai.googleblog.com/2018/03/expressive-speech-synthesis-with.html ࢎਊ੗੄ ਺ࢿਸ Ӓ؀۽ ߈৔ೞחѱ ইפۄ, Attentionਸ ੉ਊ೧ ೞա੄ Style Embeddingਵ۽ ݅ٞ ੉ ࢎۈ੉ যڃ ߑधਵ۽ ੉ঠӝ ೞח૑ܳ ந஖೧ࢲ Conditioningਸ ೧ࢲ Generation

Result Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with
Tacotron https://ai.googleblog.com/2018/03/expressive-speech-synthesis-with.html п ׮ܲ Style Embeddingਸ ੉ਊ೧ࢲ ೧׼ झఋੌਸ ߈৔ೠ Voice Generation

Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model May 15, 2019
#5

Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model Motivation ӝઓ੄ ਺ࢿ
ߣ৉ ݽ؛ ਺ࢿੋध ASR ఫझ౟: “য়ט զॿ যٸ” ఫझ౟: “How’s The Weather Today” ਺ࢿࢤࢿ TTS

Motivation ਺ࢿীࢲ݅ ו՜ ࣻ ੓ח রন੉ա х੿ਸ ߣ৉ী ߈৔ೡ ࣻ
হ਺ (৘:Ҵઁ োগ) + ੹୓ ݽ؛ ੗୓о ցޖ ఀ (਺ࢿੋध, ߣ৉, ਺ࢿࢤࢿ -> ޖઑѤ ௿ۄ਋٘ܳ Ѣ୛ঠ ೣ) Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model

Model ઺рী ఫझ౟ ߸ജহ੉ ਺ࢿ -> ਺ࢿਵ۽ ߸ജೞח ݽ؛ਸ ݅ٞ
Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model

Model Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model ӝࠄ੸ਵ۽ח Input
Spectrogram(झಕੋয)ਸ Target Spectogram(৔য)ਵ۽ ߣ৉ೞח Ѧ ೟ण

Model ӝࠄ੸ਵ۽ח Input Spectrogram(झಕੋয)ਸ Target Spectogram(৔য)ਵ۽ ߣ৉ೞח Ѧ ೟ण +
Input Spectrogram੄ ߣ৉Ѿҗܳ ৔য, झಕੋয phonemes(਺ࣗ)۽ ࢤࢿೞח Ѫب э੉ ೟ण Multi-Task Objective Training Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model

Model + Input Spectrogram੄ ߣ৉Ѿҗܳ ৔য, झಕੋয phonemes(਺ࣗ)۽ ࢤࢿೞח Ѫب
э੉ ೟ण Multi-Task Objective Training Voice-Text(phonemes, ਺ࣗ) р੄ ࢚ҙ ҙ҅ܳ ૒/р੽੸ਵ۽ э੉ ೟णਸ ೞѱ ؽ ৘: BERTীࢲ Next Sentence Predictionਸ э੉ ೞח Ѫ ୊ۢ ؊ જ਷ ੉೧ܳ ਤೣ Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model

Train Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model Voice-Voice Pair
ؘ੉ఠо ੸ӝ ٸޙী ੉ܳ ӓࠂೞӝ ਤೠ ߑߨਸ ইې ֤ޙীࢲ ઁद (ೞ૑݅ ղਊ੉ ցޖ ӡয૕ Ѫ эইࢲ ҾӘೞन ٜ࠙਷ ૒੽ ೠߣ ੍যࠁࣁਃ )

Result Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model

Result Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model ই૒ Ground
Truth (ݾ಴ೞח ࢿמ)ী ௼ѱ ޷஖૑ ޅೣ

Result ই૒਷ ੌ߈੸ੋ Voice-Text-Voice ӝ߈੄ ߣ৉ࠁ׮ח য࢝ೞѢա যׂೣ ೞ૑݅ Voice-To-Voice
Translationਸ ੉੿ب ؘݽ۽ ࢿҕदఅѤ ୊਺! (੷ח ӝ҅о ੉੿ب۽ ڙڙೞ׮Ҋ?!? ۄҊ ࢤп೮যਃ) ੉ োҳо ਺ࢿ End-to-End ߣ৉੄ ୹ߊ੼੉ غ঻ਵݶ જѷ਺! (ളള) Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model

Streaming End-to-End Speech Recognition for Mobile Devices 12 Mar 2019
#6

Streaming End-to-End Speech Recognition for Mobile Devices Motivation ҳӖ਷ ਺ࢿੋध
ӝࣿਸ ѐߊೞӝ ਤ೧ࢲ 2012֙ࠗఠ ׮নೠ োҳٜਸ ૓೯೧ ৳਺ ݒ֙ જ਷ ࢿמ੄ ࢜۽਍ ݽ؛ٜਸ ߊ಴ ೮঻਺ (DNN, RNN, LSTM, CNN etc)

Motivation о੢ ௾ ޙઁ੼ ઺ ೞաо ਺ࢿਸ ݒߣ ࢲߡ۽ ࠁյٸ
ߊࢤೞח Latency, ਺ࢿੋध ݽ؛੉ ਕբী ௼Ҋ ো࢑۝੉ ݆ই ೩٘ಪ੉ա On-Device ӝӝীࢲ ߄۽ Inference ೡ ࣻ হ঻਺ ݽ؛ਸ ੘Ҋ ࡅܰѱ ٜ݅ݶࢲب ਺ࢿੋध ࢿמ਷ ਬ૑ೡ ࣻ ੓ח ߑߨ੉ হਸө? Streaming End-to-End Speech Recognition for Mobile Devices

Motivation Streaming End-to-End Speech Recognition for Mobile Devices Acoustic Model
(Audio -> ਺ࣗ Phonemes) ৘: .wav -> o k ay go o g le ha os th e whe th er Pronunciation Model (਺ࣗ Phonemes -> Words) ex) … -> okay / google / ha /os /the /whether Language Model (Words -> Complete Sentence) ex) … -> okay google, how’s the whether ળࢿ: “য়ா੉ ҳӖ য়ט զॿ যٸ” -> ࢲߡ (ࢎۈ੄ ਺ࢿ) Output : “য়ா੉ ҳӖ য়ט զॿ যٸ” ӝઓ੄ ਺ࢿੋध ݽ؛਷ п੗ ೟णػ ৈ۞ ஹನք౟۽ ੉ܖযઉ ੓਺ ৈ۞ ݽ؛ٜ੉ ੓য ޖѩҊ וܾ ࣻ ߆ী হ਺

Motivation Streaming End-to-End Speech Recognition for Mobile Devices Chorowski, Jan
K., et al. "Attention-based models for speech recognition." Advances in neural information processing systems. 2015. Chan, William, et al. "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition." 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016. ઺р ݽ؛ হ੉ End-to-End ਺ࢿੋध ݽ؛ٜب ৌब൤ োҳغ঻ਵա ই૒ө૑ח ӝઓ੄ ਺ࢿੋध੄ ࢿמਸ ٮۄৢ ࣻ হ঻਺

Model Streaming End-to-End Speech Recognition for Mobile Devices Attentionਸ ࢎਊೞ૑
ঋ਷ Sequence-To-Sequence Model Input(਺ࢿ)ਸ ೠߣী ઻ঠ ೞח ӝઓ seq2seq ݽ؛җ ׮ܰѱ োࣘ੸ਵ۽ ਺ࢿ੄ charܳ ৘ஏೡ ࣻ ੓਺ Recurrent Neural Network Transducer

Result Streaming End-to-End Speech Recognition for Mobile Devices RNN-Tܳ ੉ਊೠ
Char-Level ਺ࢿੋध ݽ؛ : 2GB

Char-Level ਺ࢿੋध ݽ؛ : 2GB -> With Beam-Searchܳ ੉ਊೠ Single Neural Network : OMG 450MB

Char-Level ਺ࢿੋध ݽ؛ : 2GB -> ೠߣ؊! Low Precision / Tensorflow Lite Compression : OMG 80MB -> With Beam-Searchܳ ੉ਊೠ Single Neural Network : OMG 450MB

Char-Level ਺ࢿੋध ݽ؛ : 2GB -> ೠߣ؊! Low Precision / Tensorflow Lite Compression : OMG 80MB -> With Beam-Searchܳ ੉ਊೠ Single Neural Network : OMG 450MB ӝઓ ݽ؛ী ࠺೧ࢲ 4ߓ ࡅܰҊ, ӝઓ ࢲߡ ਺ࢿੋध ݽ؛җ Ѣ੄ زੌೠ ੿ഛࢿਸ ыח ݽ߄ੌ ਺ࢿੋध ݽ؛ਸ ٜ݅য ߡ۷ٮ!

Result Streaming End-to-End Speech Recognition for Mobile Devices ੋఠ֔੉ হח
য়೐ۄੋীࢲب ೩٘ಪ ੗୓݅ਵ۽ ਺ࢿੋध ݽ؛ਸ جܾ ࣻ ੓ب۾ ٜ݅঻਺

Total Review #7

Review {Semi, Self}-Supervised Learning ؊੉࢚੄ ؘ੉ఠ ۨ੉࠶݂਷ Ӓ݅! ݽ؛੄ ੉೧بܳ
֫ৈࢲ ࢿמਸ ֫੉੗!

Review End-to-End Model ઺рী ৈ۞ ݽ؛ٜਸ ٮ۽ٮ۽ ٜ݅যࢲ ೤஖૑ ݈Ҋ
ೠߣী ೟ण, ৘ஏೞ੗!

Review On-Device Model ݽ߄ੌীࢲ ૒੽ inference ೡ ࣻ ੓ਸ ݅ఀ
оߺҊ ࡅܲ ݽ؛ਸ ٜ݅੗!

੷ی э੉ োҳ/ѐߊ ೡېਃ? #8

೐ۿ౟ / ߔূ٘ / ݠन۞׬ ূ૑פয ੹ ૒ҵ ଻ਊ ઺
ੑפ׮(࢑ӝ/੹ޙো оמ) 㙉ە/ೝಯ ౱ਵ۽ ജ৔೤פ׮ ޖ۰ 100রѤ੄ ஠స ؘ੉ఠ۽ ݈ੜೞח ੋҕ૑מ ݅٘ח ઺

хࢎ೤פ׮! github.com/codertimo fb.com/codertimo linkedin.com/in/codertimo codertimo.github.io [email protected]

All you need is Google's AI Tech: I/O Extended ...

All you need is Google's AI Tech: I/O Extended Seoul 2019

More Decks by Junseong

Other Decks in Research

Featured

Transcript