Slide 1

Slide 1 text

Introduction of LINE Speech Team Tatsuya Komatsu, Speech team, LINE

Slide 2

Slide 2 text

LINE AiCall Call answering 4QFFDIνʔϜͷऔΓ૊Έ Speech transcription &OEUP&OEԻ੠ೝࣝͱपลٕज़ʹؔ͢Δݚڀ։ൃ ؀ڥԻ෼ੳ΍ϚϧνϞʔμϧॲཧͳͲͷ৽ٕज़ͷݚڀ

Slide 3

Slide 3 text

4QFFDIνʔϜͷٕ࣋ͭज़ ඇࣗݾճؼܕԻ੠ೝࣝͰੈքτοϓΫϥεͷੑೳͷٕज़Λ։ൃ ˞ϕϯνϚʔΫ࿦จͰ࠷ߴੑೳ <)JHVDIJ "436> 4FMGDPOEJUJPOFE$5$ ؀ڥԻೝࣝίϯϖςΟγϣϯͰੈք̍ҐΛ֫ಘ 4FMGBUUFOUJPOϕʔεͷೝࣝख๏Ͱ̍Ґ֫ಘ IUUQTEDBTFDPNNVOJUZDIBMMFOHF (PMEFO3FUSJFWFS%PH-VLFTJOHTXJUIQJBOPNVTJD IUUQTXXXZPVUVCFDPNXBUDI WI%V%%WMC2 DSZJOHCBCZ IUUQTXXXZPVUVCFDPNXBUDI W 6#+*&:,7H ٶ࡚͞Μ Πϯλʔϯ :PVUVCFͷ؀ڥԻΛ෼ྨ͢ΔλεΫ

Slide 4

Slide 4 text

"DDFQUFE1BQFS-JTUJO Title Conf/Journal Author END TO END LEARNING FOR CONVOLUTIVE MULTI-CHANNEL WIENER FILTERING ICASSP2021 M. Togami Disentangled speaker and language representations using mutual information minimization and domain adaptation for cross-lingual TTS ICASSP2021 T. Komatsu REFINEMENT OF DIRECTION OF ARRIVAL ESTIMATORS BY MAJORIZATION-MINIMIZATION OPTIMIZATION ON THE ARRAY MANIFOLD ICASSP2021 R. Scheibler SURROGATE SOURCE MODEL LEARNING FOR DETERMINED SOURCE SEPARATION ICASSP2021 R. Scheibler JOINT DEREVERBERATION AND SEPARATION WITH ITERATIVE SOURCE STEERING ICASSP2021 R. Scheibler, M. Togami PARALLEL WAVEFORM SYNTHESIS BASED ON GENERATIVE ADVERSARIAL NETWORKS WITH VOICING-AWARE CONDITIONAL DISCRIMINATORS ICASSP2021 R. Yamamoto TTS-BY-TTS: TTS-DRIVEN DATA AUGMENTATION FOR FAST AND HIGH-QUALITY SPEECH SYNTHESIS ICASSP2021 R. Yamamoto Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization IEEE TSP R. Scheibler Multichannel Separation and Classification of Sound Events EUSIPCO2021 R. Scheibler, T. Komatsu, M. Togami Multi-Source Domain Adaptation with Sinkhorn Barycenter EUSIPCO2021 T. Komatsu Acoustic Event Detection with classifier chains INTERSPEECH2021 T. Komatsu Relaxing the Conditional Independence Assumption of CTC-based ASR by Conditioning on Intermediate Predictions INTERSPEECH2021 T. Komatsu Sound Source Localization with Majorization Minimization INTERSPEECH2021 M. Togami Efficient and Stable Adversarial Learning Using Unpaired Data for Unsupervised Multichannel Speech Separation INTERSPEECH2021 Y. Nakagome, M. Togami High-fidelity Parallel WaveGAN with Multi-band Harmonic-plus-Noise Model INTERSPEECH2021 R. Yamamoto Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis INTERSPEECH2021 K. Futamata, B. Park, R. Yamamoto, K. Tachibana Over-Determined Semi-Blind Speech Source Separation APSIPA2021 M. Togami COMPARISON OF LOW COMPLEXITY SELF-ATTENTION MECHANISMS FOR ACOUSTIC EVENT DETECTION APSIPA2021 T. Komatsu, R. Scheibler A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation ASRU2021 T. Komatsu Computationally-Efficient Overdetermined Blind Source Separation Based on Iterative Source Steering IEEE SPL R. Scheibler

Slide 5

Slide 5 text

աڈͷΠϯλʔϯςʔϚͷྫ ؀ڥԻೝࣝͷͨΊͷऑϥϕϧֶश *$"441ʹ࠾୒ -*/&&OHJOFFSJOHCMPHʹͯެ։த IUUQTFOHJOFFSJOHMJOFDPSQDPNKBCMPH Ի੠෼཭ɾ࢒ڹআڈ *$"441΍*OUFSTQFFDIʹ࠾୒ %JTFOUBOHMFEͳ࿩ऀಛ௃நग़ *$"441ʹ࠾୒ ҰൠԻೝࣝʹ޲͚ͨࣗݾڭࢣ͋Γֶश *$"441ʹ࠾୒

Slide 6

Slide 6 text

A gong is hit three times while cars drive in the background. A person walks inside a home as the wooden floor creaks. 4QFFDIνʔϜͷ೥ΠϯλʔϯͰ͸ʜ ᶃ ࣮ࡍͷϓϩμΫτͰ༻͍Δٕज़ͷӡ༻ɾվળ ࣾ಺γεςϜΛ༻͍ͨԻ੠ೝࣝ։ൃͳͲ ᶄ -*/&ͷٕज़Λ༻͍ͨϓϩτλΠϓΞϓϦͷ։ൃ -*/&ͷԻ੠ೝࣝ΍؀ڥԻೝࣝΛ༻͍ͨΞϓϦέʔγϣϯͳͲ ᶅ ࠷৽ٕज़ͷࣾ಺σʔλͰͷ௥ࢼ ଞػؔͷ࿦จͷ࣮૷ɺࣾ಺σʔλͰͷ࣮༻ੑݕূɾධՁ ᶆ ৽نٕज़ఏҊɻ࿦จԽͱࠃࡍձٞ΁ͷ౤ߘ -*/&ͷٕज़ͷൃల΍ɺࣗݾڭࢣ͋ΓֶशͳͲ৽نٕज़ͷ։ൃ ᶇ ϚϧνϞʔμϧॲཧͳͲ৽ͨͳݚڀ෼໺ͷ։୓ "VEJPDBQUJPOJOH Ի /-1 ɺ"VEJPWJTVBM"43 Ի 7JTJPO ͳͲ

Slide 7

Slide 7 text

Ի੠ೝࣝ΍؀ڥԻೝࣝʹؔ͢Δ࠷ઌ୺ͷٕज़ʹܞΘΔ͜ͱ͕Ͱ͖·͢ τοϓձٞͰͷൃදܦݧΛଟ࣋ͭ͘ΤϯδχΞϦαʔνϟʔͱҰॹʹ࢓ࣄ͕Ͱ͖·͢ ςʔϚͷྫɿదਖ਼΍ر๬ʹΑͬͯબఆ͠·͢ ᶃ ࣮ࡍͷϓϩμΫτͰ༻͍ΔͨΊͷٕज़ͷӡ༻ɾվળ ᶄ -*/&ͷٕज़Λ༻͍ͨϓϩτλΠϓΞϓϦͷ։ൃ ᶅ ࠷৽ٕज़ͷࣾ಺σʔλͰͷ௥ࢼ ᶆ ৽نٕज़ఏҊɻ࿦จԽͱࠃࡍձٞ΁ͷ౤ߘ ᶇ ϚϧνϞʔμϧॲཧͳͲ৽ͨͳݚڀ෼໺ͷ։୓ ඞཁͳεΩϧɿ1ZUIPO 1Z5PSDIͰͷ։ൃܦݧ Ի੠͕ઐ໳͡Όͳͯ͘΋0,ʂ ͋Δͱ͏Ε͍͠εΩϧɿ Ի੠ٕज़ͷجૅ஌ࣝ ΞϓϦ։ൃܦݧɺαʔόʔαΠυͷ։ൃܦݧ ࠃࡍձٞ࿦จͷ࣮૷ܦݧɺࠃࡍձٞ΁ͷ࿦จ౤ߘܦݧ 4QFFDIνʔϜͷΠϯλʔϯͷ·ͱΊ