Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Speech_2022 LINE Internship AI Course

April 28, 2022

Speech_2022 LINE Internship AI Course

2022年4月26日に開催した「LINE 技術職インターンシップ AIコース説明会」における「Speech」領域の概要説明資料です。



April 28, 2022

More Decks by LINE

Other Decks in Technology



    MULTI-CHANNEL WIENER FILTERING ICASSP2021 M. Togami Disentangled speaker and language representations using mutual information minimization and domain adaptation for cross-lingual TTS ICASSP2021 T. Komatsu REFINEMENT OF DIRECTION OF ARRIVAL ESTIMATORS BY MAJORIZATION-MINIMIZATION OPTIMIZATION ON THE ARRAY MANIFOLD ICASSP2021 R. Scheibler SURROGATE SOURCE MODEL LEARNING FOR DETERMINED SOURCE SEPARATION ICASSP2021 R. Scheibler JOINT DEREVERBERATION AND SEPARATION WITH ITERATIVE SOURCE STEERING ICASSP2021 R. Scheibler, M. Togami PARALLEL WAVEFORM SYNTHESIS BASED ON GENERATIVE ADVERSARIAL NETWORKS WITH VOICING-AWARE CONDITIONAL DISCRIMINATORS ICASSP2021 R. Yamamoto TTS-BY-TTS: TTS-DRIVEN DATA AUGMENTATION FOR FAST AND HIGH-QUALITY SPEECH SYNTHESIS ICASSP2021 R. Yamamoto Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization IEEE TSP R. Scheibler Multichannel Separation and Classification of Sound Events EUSIPCO2021 R. Scheibler, T. Komatsu, M. Togami Multi-Source Domain Adaptation with Sinkhorn Barycenter EUSIPCO2021 T. Komatsu Acoustic Event Detection with classifier chains INTERSPEECH2021 T. Komatsu Relaxing the Conditional Independence Assumption of CTC-based ASR by Conditioning on Intermediate Predictions INTERSPEECH2021 T. Komatsu Sound Source Localization with Majorization Minimization INTERSPEECH2021 M. Togami Efficient and Stable Adversarial Learning Using Unpaired Data for Unsupervised Multichannel Speech Separation INTERSPEECH2021 Y. Nakagome, M. Togami High-fidelity Parallel WaveGAN with Multi-band Harmonic-plus-Noise Model INTERSPEECH2021 R. Yamamoto Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis INTERSPEECH2021 K. Futamata, B. Park, R. Yamamoto, K. Tachibana Over-Determined Semi-Blind Speech Source Separation APSIPA2021 M. Togami COMPARISON OF LOW COMPLEXITY SELF-ATTENTION MECHANISMS FOR ACOUSTIC EVENT DETECTION APSIPA2021 T. Komatsu, R. Scheibler A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation ASRU2021 T. Komatsu Computationally-Efficient Overdetermined Blind Source Separation Based on Iterative Source Steering IEEE SPL R. Scheibler
  2. A gong is hit three times while cars drive in

    the background. A person walks inside a home as the wooden floor creaks. 4QFFDIνʔϜͷ೥ΠϯλʔϯͰ͸ʜ ᶃ ࣮ࡍͷϓϩμΫτͰ༻͍Δٕज़ͷӡ༻ɾվળ  ࣾ಺γεςϜΛ༻͍ͨԻ੠ೝࣝ։ൃͳͲ ᶄ -*/&ͷٕज़Λ༻͍ͨϓϩτλΠϓΞϓϦͷ։ൃ  -*/&ͷԻ੠ೝࣝ΍؀ڥԻೝࣝΛ༻͍ͨΞϓϦέʔγϣϯͳͲ ᶅ ࠷৽ٕज़ͷࣾ಺σʔλͰͷ௥ࢼ  ଞػؔͷ࿦จͷ࣮૷ɺࣾ಺σʔλͰͷ࣮༻ੑݕূɾධՁ ᶆ ৽نٕज़ఏҊɻ࿦จԽͱࠃࡍձٞ΁ͷ౤ߘ  -*/&ͷٕज़ͷൃల΍ɺࣗݾڭࢣ͋ΓֶशͳͲ৽نٕज़ͷ։ൃ ᶇ ϚϧνϞʔμϧॲཧͳͲ৽ͨͳݚڀ෼໺ͷ։୓  "VEJPDBQUJPOJOH Ի /-1 ɺ"VEJPWJTVBM"43 Ի 7JTJPO ͳͲ
  3. Ի੠ೝࣝ΍؀ڥԻೝࣝʹؔ͢Δ࠷ઌ୺ͷٕज़ʹܞΘΔ͜ͱ͕Ͱ͖·͢ τοϓձٞͰͷൃදܦݧΛଟ࣋ͭ͘ΤϯδχΞϦαʔνϟʔͱҰॹʹ࢓ࣄ͕Ͱ͖·͢  ςʔϚͷྫɿదਖ਼΍ر๬ʹΑͬͯબఆ͠·͢ ᶃ ࣮ࡍͷϓϩμΫτͰ༻͍ΔͨΊͷٕज़ͷӡ༻ɾվળ ᶄ -*/&ͷٕज़Λ༻͍ͨϓϩτλΠϓΞϓϦͷ։ൃ ᶅ ࠷৽ٕज़ͷࣾ಺σʔλͰͷ௥ࢼ

    ᶆ ৽نٕज़ఏҊɻ࿦จԽͱࠃࡍձٞ΁ͷ౤ߘ ᶇ ϚϧνϞʔμϧॲཧͳͲ৽ͨͳݚڀ෼໺ͷ։୓  ඞཁͳεΩϧɿ1ZUIPO 1Z5PSDIͰͷ։ൃܦݧ Ի੠͕ઐ໳͡Όͳͯ͘΋0,ʂ  ͋Δͱ͏Ε͍͠εΩϧɿ  Ի੠ٕज़ͷجૅ஌ࣝ  ΞϓϦ։ൃܦݧɺαʔόʔαΠυͷ։ൃܦݧ  ࠃࡍձٞ࿦จͷ࣮૷ܦݧɺࠃࡍձٞ΁ͷ࿦จ౤ߘܦݧ 4QFFDIνʔϜͷΠϯλʔϯͷ·ͱΊ