Upgrade to Pro — share decks privately, control downloads, hide ads and more …

All you need is Google's AI Tech: I/O Extended Seoul 2019

Junseong
June 30, 2019

All you need is Google's AI Tech: I/O Extended Seoul 2019

I/O Extended Seoul 2019에서 "All you need is Google's AI Tech" 이라는 주제로 발표를 하였습니다. 2018-2019년 사이에 발표된 논문들 중 각 카테고리별 대표적인 논문 6편을 뽑아 간단한 리뷰를 진행하였습니다.

Junseong

June 30, 2019
Tweet

More Decks by Junseong

Other Decks in Research

Transcript

  1. Google I/O Extended Seoul All you need is Google’s AI

    Tech Junseong Kim (ӣળࢿ) Machine Learning Engineer at ScatterLab
  2. ӣળࢿ Machine Learning Engineer at ScatterLab Pingpong AI Research Team

    (Junseong Kim) Crazy at NLP! Open-Domain Dialog System / Chatbot Sentence Representation Model Neural Machine Translation Machine Learning on Service github.com/codertimo fb.com/codertimo linkedin.com/in/codertimo codertimo.github.io
  3. Before Get Started… ਗېח Google I/O 2019ীࢲ աৡ ML ࣁٜ࣌ਸ

    جইࠁݶࢲ Overviewܳ ೡ ৘੿੉঻ਵա… (ࢎ੹ী 1दр ࠙۝ਵ۽ Google I/Oܳ য়ߡ࠭ೠ ੗ܐܳ ٜ݅য ف঻਺਷ উ࠺޻)
  4. Before Get Started… ੉ۧѱ ೧ࢲח ௾ੌաѷ׮!!! ޥо ؀উਸ ଺ইঠ ೧

    “All you need is Google’s AI”ী Ѧݏח ߊ಴ ઱ઁо ޤо ੓ਸө?
  5. Google I/O AI Topics People + AI AI Fairness Tensorflow

    Families (Lite, TFX, TF.js …) Jeff Dean, Hinton Presentation/Fireside Chat AutoML on Google Cloud TPU Pod / Coral Edge TPU Climate Changes Federated Learning Google AI ܻࢲ஖ Ѿҗ ࠁ׮ח AI ઙࢎ੗ٜਸ ਤೠ Product ೞ૑݅ ҳӖ I/O ীࢲח Product ੉ঠӝо ઱ܳ ੉ܟ਺
  6. Let’s Extend Google I/O! ҳӖ AIীࢲ 18-19֙ ࢎ੉ী ߊ಴ೠ ഴܯೠ

    োҳ Ѿҗٜ੉ ݆਷ؘ.. Ӓ۠ ղਊ਷ Ѣ੄ হ֎.. য়! Ӓ۞ݶ Research Overviewܳ ೧ࠁ੗! ҳӖ੉ ݅ٚ ഴܯೠ ೟ण ߑߨٜҗ ݽ؛ٜਸ ୨੿ܻ ೧ࠁחѢঠ!
  7. Let’s Extend Google I/O! ҳӖ AIীࢲ 18-19֙ ࢎ੉ী ߊ಴ೠ ഴܯೠ

    োҳ Ѿҗٜ੉ ݆਷ؘ.. Ӓ۠ ղਊ਷ Ѣ੄ হ֎.. য়! Ӓ۞ݶ Research Overviewܳ ೧ࠁ੗! ҳӖ੉ ݅ٚ ഴܯೠ ೟ण ߑߨٜҗ ݽ؛ٜਸ ୨੿ܻ ೧ࠁחѢঠ! ױ, MLূ૑פযо ইצ ੌ߈ ୒઺ٜب ੉೧ೡ ࣻ ੓ب۾ औѱ ੽Ӕ೧ࠁ੗!
  8. • Publications • NLP • BERT: Pre-training of Deep Bidirectional

    Transformers for Language Understanding • Evolved Transformer : NAS for better BERT model • Vision • Self-Supervised Tracking via Video Colorization • Speech Generation • Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron • Direct speech-to-speech translation with a sequence-to-sequence model • Speech Recognition • Streaming End-to-end Speech Recognition For Mobile Devices
  9. Motivation ୭Ӕ ׮নೠ োҳܳ ా೧ࢲ যڃ ࠙ঠী ؀೧ ݽ؛੄ ‘੉೧’ܳ

    ֫੉ݶ ೧׼ ࠙ঠܳ ഻ঁ ࡅܰҊ ؊ ੿ഛೞѱ ೟णೠ׮ח Ѫ੉ ૐݺغ঻਺ BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding ৈ۞ ݽ؛ীࢲ ࢎਊೡ ࣻ ੓ח ӝ߈ ӝࣿ(ݽ؛)ਸ ݅٘ח Ѫ੉ ઺ਃೠ োҳ઱ઁ
  10. ` Motivation 1. Word2Vec / Bag-of-Words 2. ా҅ӝ߈ Language Modeling

    (2000~) 3. Bi-LSTM Language Modeling (2012~) 4. Multi-Layer BI-LSTM Language Modeling (2014) 5. ELMo: Deep Contextualized word representation (2017) 6. Universal Language Model Fine-Tuning for Text Classification (2018) 7. BERT: Pre-training of Deep Bidirectional Transformer for Language Modeling (2018) 8. GPT-2 : Language Models are Unsupervised Multitask Learners 8. XLNet: Generalized Autoregressive Pre-training for Language Modeling (2019) Transfer Learning History of NLP
  11. Motivation BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Ӓ۞ݶ ఫझ౟ܳ ੉೧ೡ ࣻ ੓ب۾ যڌѱ ݽ؛ਸ ೟ण दః૑? Autoregressive Language Modeling ੉੹ ױযٜਸ ઱Ҋ Ӓ ׮਺ ױয ݏ୶ӝ [‘৬’, ‘૓૞’, ‘ߓҊ೐׮’, ‘..’, ‘ޤ’] -> [‘ݡ૑?’] Input Target output LSTM-LM, ELMo, GPT
  12. ࣚ൜޹਷ ష౟ֈ ୭Ҋ੄ ҕѺࣻ۽ ੗ܻ ੟ও׮ BERT: Pre-training of Deep

    Bidirectional Transformers for Language Understanding Model
  13. ࣚ൜޹਷ ష౟ֈ ୭Ҋ੄ ҕѺࣻ۽ ੗ܻ ੟ও׮ ࣚ൜޹਷ ష౟ֈ ୭Ҋ੄ ҕѺࣻ۽

    ੗ܻ ੟ও׮ ݠन۞׬ই ੉۠Ѣ ݏ୶ݶࢲ ঱যખ ҕࠗ೧ࠊۄ! ݃! BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Model
  14. ࣚ൜޹਷ ష౟ֈ ୭Ҋ੄ ҕѺࣻ۽ ੗ܻ ੟ও׮ ࣚ൜޹਷ ష౟ֈ ୭Ҋ੄ ҕѺࣻ۽

    ੗ܻ ੟ও׮ ݠन۞׬ই ੉۠Ѣ ݏ୶ݶࢲ ঱যખ ҕࠗ೧ࠊۄ! ݃! ֎ ޤ ੉۠ѩפ׮ BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Model
  15. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Model

    ߊ಴ ઁݾ੄ ৔х੉ӝب ೠ… Attention is All You Need Self-Attentionӝ߈ Transformer Encoderܳ ৈ۞க ऺও਺
  16. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Model

    Transformer Layer Multi-layer (12-layer base model) … …
  17. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Train

    Pretrainingীח Wikipediaա BookCorpus ୊ۢ ؀۝੄ ఫझ౟ ؘ੉ఠ ࢎਊ (؀۝੄ दр) ঱যܳ ੉೧ೞח ೟ण
  18. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Train

    Pretrainingীח Wikipediaա BookCorpus ୊ۢ ؀۝੄ ఫझ౟ ؘ੉ఠ ࢎਊ (؀۝੄ दр) ঱যܳ ੉೧ೞח ೟ण Cloud TPU Pod 512 Core ($384 USD/h)
  19. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Train

    Fine-Tuning Sub Task ৘) х੿ ࠙ࢳ Text Classification ޙઁ पઁ పझ௼ী ؀೧ࢲח ۨ੉࠶݂ ػ ؘ੉ఠࣇਵ۽ Fine-Tuning (঴݃ উѦܿ)
  20. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Train

    ୹୊: ೠࢿ޹ש ಕ࠘ BERTܳ ೟णदఆ ٸ ߊࢤೞח CO2নਸ ࠺౸ೞח ֤ޙ੉ աৢ੿ب..
  21. The Evolved Transformer Motivation ୭Ӕী Vision ࠙ঠীࢲ AutoMLਸ ੉ਊೠ Neural

    Architecture Searchо ഝߊೞѱ োҳؽ ӝ҅о ݅ٚ ݽ؛੉ ӝઓ੉ ࢎۈ੉ ݅ٚ Hand-Craft ݽ؛ࠁ׮ ࣘبա ࢿמীࢲ જ਷ Ѿҗܳ ࠁৈ઱Ҋ ੓਺ https://ai.googleblog.com/2017/11/automl-for-large-scale-image.html
  22. The Evolved Transformer Motivation Attention All You Need ীࢲ ઁউػ

    Transformerח
 Feed Forward NN੉ RNN ࠁ׮ જ਷ ࢿמਸ ыਸ ࣻ ੓ח Ѫਸ ࠁৈ઱঻਺ Ӓېࢲ ҳӖীࢲח Transformer ҳઑܳ 
 ࣗࢸਸ ॳח ੘ࢿೞѢա ޭ૑ஸ ਺ঈਸ ੘ҋೞח ݽ؛ٜী ੸ӓ੸ਵ۽ ࢎਊೞח ઺੐ https://arxiv.org/abs/1706.03762 https://openai.com/blog/better-language-models/#sample5 GPT-2: Better Language Modeling for Generation https://magenta.tensorflow.org/music-transformer Music Transformer: Generating Music With Long-Term Structure
  23. The Evolved Transformer Motivation Ӓ۞ݶ Transformer Architecture ܳ ӝ߈ਵ۽ ࠁ׮

    ؊ જ਷ ݽ؛ਸ NAS۽ ଺ਸ ࣻ ੓૑ ঋਸө? Applying AutoML to Transformer Architecture https://ai.googleblog.com/2019/06/applying-automl-to-transformer.html
  24. The Evolved Transformer Novel Method Developing the Techniques To begin

    the evolutionary NAS, it was necessary for us to develop new techniques, due to the fact that the task used to evaluate the “fitness” of each architecture, WMT’14 English-Germantranslation, is computationally expensive. This makes the searches more expensive than similar searches executed in the vision domain, which can leverage smaller datasets, like CIFAR-10. The first of these techniques is warm starting— seeding the initial evolution population with the Transformer architecture instead of random models. This helps ground the search in an area of the search space we know is strong, thereby allowing it to find better models faster. The second technique is a new method we developed called Progressive Dynamic Hurdles (PDH), an algorithm that augments the evolutionary search to allocate more resources to the strongest candidates, in contrast to previous works, where each candidate model of the NAS is allocated the same amount of resources when it is being evaluated. PDH allows us to terminate the evaluation of a model early if it is flagrantly bad, allowing promising architectures to be awarded more resources. 1. ׳ࢿೞҊ੗ ೞח పझ௼о ߣ৉ పझ௼੉ӝ ٸޙী Trainingਸ ೠߣೡ ٸ ݃׮ য়ے ೟ण੉ ೙ਃೣ 2. Ӓؘ۠ Random Modelࠗఠ Searching ਸ द੘ೞݶ Spaceо ցޖ և਺ 3. Transformer ݽ؛ਸ ૓ച੄ द੘(warm-up)ਵ۽ ࣁ౴ೞҊ ଺ইࠁ੗! (೟ण ߑߨ 1) 4. ૓ചܳ ೞݶࢲ ؊ ੜೞח ݽ؛ٜী ૘઺੸ਵ۽ ܻࣗझܳ ૘઺ೣ (࢜۽ ઁউೠ / ೟ण ߑߨ 2) ਃড:
  25. The Evolved Transformer Model য়ܲଃ੉ NAS AutoML੉ ଺਷ Ѿҗ
 Evolved

    Transformer Encoder Block ೞ૑݅ ӝઓ Transformer ҳઑ৬ ׮ܰѱ Self-Attention ࡺ݅ ইפۄ Wide-Convolutionਸ Hybrid ೞѱ ࢎਊೞח ߑधਸ ࢎਊೞ৓਺ ৽ଃҗ ࠺Ү೧ࢲ ௼ѱ ׮ܰ૑ ঋ਷ ҳઑܳ ы਺
  26. The Evolved Transformer Result ߣ৉ పझ௼ীࢲ ੘૑݅ ӝઓ Transformerࠁ׮ જ਷

    Perplexity, BLEU Score ׳ࢿೣ Language Modeling 1B Taskীࢲח 2.2੼ ੿ب੄ Perplexity ೱ࢚੉ ੓঻਺
  27. Self -Supervised Tracking 
 via Video Colorization Motivation Video Object

    Tracking਷ Pose-Estimation, Object Injection, Video Stylization ١ ਑૒੉ח Objectܳ ׮ܖח పझ௼੄ ࢿמਸ Ѿ੿ೞח ҭ੢൤ ઺ਃೠ ݽ؛
  28. Self -Supervised Tracking 
 via Video Colorization Motivation Object Trackingਸ

    ೞӝ ਤ೧ࢲח ؀۝੄ Pixelױਤ੄ Labeling ੉ ೙ਃ۽ ೮਺ Ӓ۞ա ೗ࣄxदр(࠺٣য়) ױਤ੄ labeling ਷ ੿݈ ൨ٜҊ ݆਷ दр੉ ೙ਃ۽ ೣ 1ୡী 30frame੉ݶ.. 1࠙੉ݶ 1,800 imageী ؀ೠ pixelױਤ ۨ੉࠶݂
  29. Motivation Self -Supervised Tracking 
 via Video Colorization ز৔࢚਷ غѱ

    ݆਷ؘ ۨ੉࠶݂਷ ࠛоמೣ… ۨ੉࠶݂ হ੉ ؀۝੄ ز৔࢚ਸ ഝਊ೧ࢲ Object Trackingਸ ೟णೡ ࣻ হਸө..? Youtube
  30. Motivation Self -Supervised Tracking 
 via Video Colorization ੹୓ Video

    Frame઺ ೠ Frame݅ ࢝ӭ੉ ੓ח ࢚క۽ ળ׮ݶ
 ࢝ਸ ૑਍ Video(Frames)ٜਸ ࢝சೡ ࣻ ੓ਸө? Ӓ۞ݶ ૒੽ ۨ੉࠶݂ਸ ೞ૑ ঋইب ਑૒੉ח ё୓ী ؀ೠ ੉೧بо ֫ই૑૑ ঋਸө? ৵ջݶ ੿૑ػ ੉޷૑ܳ ଵઑ೧ࢲ ਑૒੉ח ё୓ী ؀ೠ ࢝ਸ ச೧ঠ ೞפ?!
  31. Model Self -Supervised Tracking 
 via Video Colorization ݽ؛੉ ؀۝੄

    Unlabed Video۽ ೟णغয ઎חؘ, ݽ؛੉ যڃ ੼ਸ ߓਛח૑ insightܳ ঳Ҋ र਺ Pixel ױਤ੄ Videoܳ Encodingೞח Embedding Layerܳ PCA۽ 3ରਗച दெࠌ਺ пп੄ ё୓о ׮ܲ embedding spaceীࢲ ੗োझۣѱ ܻ࠙غয ੓঻਺ -> Embedding Nearest Neighborח э਷ ё୓۽ ࢤпೡ ࣻ ੓਺
  32. Result Self -Supervised Tracking 
 via Video Colorization Tracking Pose

    (JHMDB) పझ௼ীࢲب Supervised-Label-Data ೞա হ੉ ӝઓ ݽ؛(Heavily Supervised)җ Ѣ੄ ਬࢎೠ ࢿמਸ ࠁৈ઱঻਺ ೞ૑݅ ঑ب੸ਵ۽ જ਷ ࢿמਸ ࠁৈ઱૑ח ޅ೮਺ (৘࢚җח ׮ܰѱ?)
  33. Future Works Self -Supervised Tracking 
 via Video Colorization Future

    Work Our results show that video colorization provides a signal that can be used for learning to track objects in videos without supervision. Moreover, we found that the failures from our system are correlated with failures to colorize the video, which suggests that further improving the video colorization model can advance progress in self-supervised tracking. ખ ؊ ੗ࣁೞѱ ࠁפӬ Colorization੉ ੜ উغࢲ Tracking੉ ੜ উػ ҃਋о ؀ࠗ࠙੉঻਺ Colorizationਸ ؊ ੜೡ ࣻ ੓ח ݽ؛ਸ ৌब൤ ٜ݅ ৘੿੉׮ (ۨ੉࠶ਸ ؊ ೞחѱ ইפۄ)
  34. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

    Motivation aka. Tacatron 2 / Shen, Jonathan, et al. "Natural tts synthesis by conditioning wavenet on mel spectrogram predictions.” 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018. ޙ੢ਸ ֍ਵݶ ࢎۈ੉ ݈ೞח Ѫ э਷ ਺ࢿਸ Generation ೞח Tacatron 2 (17֙ 10ਘ)
  35. Motivation ೞ૑݅ ࢎۈ੄ ঱যۄח Ѫ਷ ࢚ട, х੿, ࢿѺী ٮۄ ݈ೞח

    झఋੌ੉ ׮ ׮ܴ ݃஖ ইࡅо ই੉ٜীѱ زചܳ ੍য઴ ٸ நܼఠܳ ൑ղղݶࢲ ݈ೞח Ѫ ୊ۢ ױࣽ൤ ޙ੢(str)ਸ Voice ۽ ࢤࢿ दఃח ѱ ইפۄ রনਸ ଵઑ೧ࢲ ࢤࢿ ೧ࠁ੗ Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
  36. Motivation Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with

    Tacotron (Reference Audio) ъഐز : ੉Ѣú ੿݈ú ݍ੓णפ׮~! Input Voice(Tacatron Output) ࣚࢳ൞: ੉Ѣ ੿݈ ݍ੓णפ׮. Output Voice ࣚࢳ൞: ੉Ѣú ੿݈ú ݍ੓णפ׮~!
  37. Model রন Embeddingਵ۽ ୶о೧ࢲ Conditional Generationਸ ݅ٞ Towards End-to-End Prosody

    Transfer for Expressive Speech Synthesis with Tacotron п Tokenী ؀ೠ Attentionਸ ୶о೧ࢲ ਺ࢿীࢲ যڃ ࠗ࠙੄ রনਸ ٮۄ ೧ঠ ೡ૑ ଵઑ
  38. Result Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with

    Tacotron https://ai.googleblog.com/2018/03/expressive-speech-synthesis-with.html Textо ઱য઎ਸ ٸ, Reference Voiceܳ ੉ਊ೧ࢲ ೧׼ রনਸ ߈৔ೠ Voice Generation
  39. Problem Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with

    Tacotron https://ai.googleblog.com/2018/03/expressive-speech-synthesis-with.html ೞ૑݅ Input Sentenceо زੌೠ Referenceо ೦࢚ ೙ਃೣ Ӓ ࢎۈ੉ ݈ೞח झఋੌ ੗୓ܳ ೞա੄ Embeddingਵ۽ ٜ݅ ࣻ হਸө? Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis п Style݃׮ Tacatron ਸ ೟णೡ ೙ਃ হ੉ Conditional Variableਸ ֍যࢲ झఋੌ ੑ൤ӝ
  40. Problem Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with

    Tacotron https://ai.googleblog.com/2018/03/expressive-speech-synthesis-with.html ࢎਊ੗੄ ਺ࢿਸ Ӓ؀۽ ߈৔ೞחѱ ইפۄ, Attentionਸ ੉ਊ೧ ೞա੄ Style Embeddingਵ۽ ݅ٞ ੉ ࢎۈ੉ যڃ ߑधਵ۽ ੉ঠӝ ೞח૑ܳ ந஖೧ࢲ Conditioningਸ ೧ࢲ Generation
  41. Result Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with

    Tacotron https://ai.googleblog.com/2018/03/expressive-speech-synthesis-with.html п ׮ܲ Style Embeddingਸ ੉ਊ೧ࢲ ೧׼ झఋੌਸ ߈৔ೠ Voice Generation
  42. Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model Motivation ӝઓ੄ ਺ࢿ

    ߣ৉ ݽ؛ ਺ࢿੋध ASR ఫझ౟: “য়ט զॿ যٸ” ఫझ౟: “How’s The Weather Today” ਺ࢿࢤࢿ TTS
  43. Motivation ਺ࢿীࢲ݅ ו՜ ࣻ ੓ח রন੉ա х੿ਸ ߣ৉ী ߈৔ೡ ࣻ

    হ਺ (৘:Ҵઁ োগ) + ੹୓ ݽ؛ ੗୓о ցޖ ఀ (਺ࢿੋध, ߣ৉, ਺ࢿࢤࢿ -> ޖઑѤ ௿ۄ਋٘ܳ Ѣ୛ঠ ೣ) Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model
  44. Model ઺рী ఫझ౟ ߸ജহ੉ ਺ࢿ -> ਺ࢿਵ۽ ߸ജೞח ݽ؛ਸ ݅ٞ

    Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model
  45. Model Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model ӝࠄ੸ਵ۽ח Input

    Spectrogram(झಕੋয)ਸ Target Spectogram(৔য)ਵ۽ ߣ৉ೞח Ѧ ೟ण
  46. Model ӝࠄ੸ਵ۽ח Input Spectrogram(झಕੋয)ਸ Target Spectogram(৔য)ਵ۽ ߣ৉ೞח Ѧ ೟ण +

    Input Spectrogram੄ ߣ৉Ѿҗܳ ৔য, झಕੋয phonemes(਺ࣗ)۽ ࢤࢿೞח Ѫب э੉ ೟ण Multi-Task Objective Training Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model
  47. Model + Input Spectrogram੄ ߣ৉Ѿҗܳ ৔য, झಕੋয phonemes(਺ࣗ)۽ ࢤࢿೞח Ѫب

    э੉ ೟ण Multi-Task Objective Training Voice-Text(phonemes, ਺ࣗ) р੄ ࢚ҙ ҙ҅ܳ ૒/р੽੸ਵ۽ э੉ ೟णਸ ೞѱ ؽ ৘: BERTীࢲ Next Sentence Predictionਸ э੉ ೞח Ѫ ୊ۢ ؊ જ਷ ੉೧ܳ ਤೣ Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model
  48. Train Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model Voice-Voice Pair

    ؘ੉ఠо ੸ӝ ٸޙী ੉ܳ ӓࠂೞӝ ਤೠ ߑߨਸ ইې ֤ޙীࢲ ઁद (ೞ૑݅ ղਊ੉ ցޖ ӡয૕ Ѫ эইࢲ ҾӘೞन ٜ࠙਷ ૒੽ ೠߣ ੍যࠁࣁਃ )
  49. Result ই૒਷ ੌ߈੸ੋ Voice-Text-Voice ӝ߈੄ ߣ৉ࠁ׮ח য࢝ೞѢա যׂೣ ೞ૑݅ Voice-To-Voice

    Translationਸ ੉੿ب ؘݽ۽ ࢿҕदఅѤ ୊਺! (੷ח ӝ҅о ੉੿ب۽ ڙڙೞ׮Ҋ?!? ۄҊ ࢤп೮যਃ) ੉ োҳо ਺ࢿ End-to-End ߣ৉੄ ୹ߊ੼੉ غ঻ਵݶ જѷ਺! (ളള) Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model
  50. Streaming End-to-End Speech Recognition for Mobile Devices Motivation ҳӖ਷ ਺ࢿੋध

    ӝࣿਸ ѐߊೞӝ ਤ೧ࢲ 2012֙ࠗఠ ׮নೠ োҳٜਸ ૓೯೧ ৳਺ ݒ֙ જ਷ ࢿמ੄ ࢜۽਍ ݽ؛ٜਸ ߊ಴ ೮঻਺ (DNN, RNN, LSTM, CNN etc)
  51. Motivation о੢ ௾ ޙઁ੼ ઺ ೞաо ਺ࢿਸ ݒߣ ࢲߡ۽ ࠁյٸ

    ߊࢤೞח Latency, ਺ࢿੋध ݽ؛੉ ਕբী ௼Ҋ ো࢑۝੉ ݆ই ೩٘ಪ੉ա On-Device ӝӝীࢲ ߄۽ Inference ೡ ࣻ হ঻਺ ݽ؛ਸ ੘Ҋ ࡅܰѱ ٜ݅ݶࢲب ਺ࢿੋध ࢿמ਷ ਬ૑ೡ ࣻ ੓ח ߑߨ੉ হਸө? Streaming End-to-End Speech Recognition for Mobile Devices
  52. Motivation Streaming End-to-End Speech Recognition for Mobile Devices Acoustic Model

    (Audio -> ਺ࣗ Phonemes) ৘: .wav -> o k ay go o g le ha os th e whe th er Pronunciation Model (਺ࣗ Phonemes -> Words) ex) … -> okay / google / ha /os /the /whether Language Model (Words -> Complete Sentence) ex) … -> okay google, how’s the whether ળࢿ: “য়ா੉ ҳӖ য়ט զॿ যٸ” -> ࢲߡ (ࢎۈ੄ ਺ࢿ) Output : “য়ா੉ ҳӖ য়ט զॿ যٸ” ӝઓ੄ ਺ࢿੋध ݽ؛਷ п੗ ೟णػ ৈ۞ ஹನք౟۽ ੉ܖযઉ ੓਺ ৈ۞ ݽ؛ٜ੉ ੓য ޖѩҊ וܾ ࣻ ߆ী হ਺
  53. Motivation Streaming End-to-End Speech Recognition for Mobile Devices Chorowski, Jan

    K., et al. "Attention-based models for speech recognition." Advances in neural information processing systems. 2015. Chan, William, et al. "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition." 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016. ઺р ݽ؛ হ੉ End-to-End ਺ࢿੋध ݽ؛ٜب ৌब൤ োҳغ঻ਵա ই૒ө૑ח ӝઓ੄ ਺ࢿੋध੄ ࢿמਸ ٮۄৢ ࣻ হ঻਺
  54. Model Streaming End-to-End Speech Recognition for Mobile Devices Attentionਸ ࢎਊೞ૑

    ঋ਷ Sequence-To-Sequence Model Input(਺ࢿ)ਸ ೠߣী ઻ঠ ೞח ӝઓ seq2seq ݽ؛җ ׮ܰѱ োࣘ੸ਵ۽ ਺ࢿ੄ charܳ ৘ஏೡ ࣻ ੓਺ Recurrent Neural Network Transducer
  55. Result Streaming End-to-End Speech Recognition for Mobile Devices RNN-Tܳ ੉ਊೠ

    Char-Level ਺ࢿੋध ݽ؛ : 2GB -> With Beam-Searchܳ ੉ਊೠ Single Neural Network : OMG 450MB
  56. Result Streaming End-to-End Speech Recognition for Mobile Devices RNN-Tܳ ੉ਊೠ

    Char-Level ਺ࢿੋध ݽ؛ : 2GB -> ೠߣ؊! Low Precision / Tensorflow Lite Compression : OMG 80MB -> With Beam-Searchܳ ੉ਊೠ Single Neural Network : OMG 450MB
  57. Result Streaming End-to-End Speech Recognition for Mobile Devices RNN-Tܳ ੉ਊೠ

    Char-Level ਺ࢿੋध ݽ؛ : 2GB -> ೠߣ؊! Low Precision / Tensorflow Lite Compression : OMG 80MB -> With Beam-Searchܳ ੉ਊೠ Single Neural Network : OMG 450MB ӝઓ ݽ؛ী ࠺೧ࢲ 4ߓ ࡅܰҊ, ӝઓ ࢲߡ ਺ࢿੋध ݽ؛җ Ѣ੄ زੌೠ ੿ഛࢿਸ ыח ݽ߄ੌ ਺ࢿੋध ݽ؛ਸ ٜ݅য ߡ۷ٮ!
  58. Result Streaming End-to-End Speech Recognition for Mobile Devices ੋఠ֔੉ হח

    য়೐ۄੋীࢲب ೩٘ಪ ੗୓݅ਵ۽ ਺ࢿੋध ݽ؛ਸ جܾ ࣻ ੓ب۾ ٜ݅঻਺
  59. ೐ۿ౟ / ߔূ٘ / ݠन۞׬ ূ૑פয ੹ ૒ҵ ଻ਊ ઺

    ੑפ׮(࢑ӝ/੹ޙো оמ) 㙉ە/ೝಯ ౱ਵ۽ ജ৔೤פ׮ ޖ۰ 100রѤ੄ ஠స ؘ੉ఠ۽ ݈ੜೞח ੋҕ૑מ ݅٘ח ઺