Unsupervised Separation of Speech for Smooth Communications

About Myself 1984 Switzerland 2012-2017 PhD EPFL Zurich Stockholm Tokyo
Safecast 2017-2020 TMU March 2020 ‛ Speech Team @ LINE

Agenda › What is Source Separation ? › Recognizing Speech
› Separation Algorithm › Fast Source Separation

What is Source Separation ?

The Cocktail Party Effect

How do we do it ?

How do we do it ? right ear left ear

Separation with Multiple Microphones spatial mixing ? separation (mixing)-1

Separation with Multiple Microphones spatial mixing ? separation (mixing)-1 hidden
environment

Separation with Multiple Microphones spatial mixing ? separation (mixing)-1 input
data hidden environment

data hidden environment algorithm

data output hidden environment algorithm

Smart Voice Assistants

Automatic Minutes Taking

Augmented Hearing

Multiple Sound Event Detection ‛ Talk by Tatsuya Komatsu on
Acoustic Event Detection

How do we need it?

How do we need it? Fast

How do we need it? Fast Hands-off

How do we need it? Fast Hands-off High-quality

Recognizing Speech

Spectrogram of Speech Sample frequency time

How to Recognize a Mixture ? 1 source more speakers

How to Recognize a Mixture ? 1 source 2 sources
more speakers

4 sources more speakers

4 sources 8 sources more speakers

4 sources 8 sources Crowd more speakers

Sources with Sparse Time Activity time freq. speech signal model
spectrogram

Separation Algorithm

Source Separation is Hard! spatial mixing

Source Separation is Hard! spatial mixing separation (mixing)-1 ?

Source Separation is Hard! spatial mixing separation (mixing)-1 ? both
unknown problem ill-posed

Source Separation is Hard! spatial mixing separation (mixing)-1 ? x
+ y = 11 analogy: both unknown problem ill-posed

Source Separation is Hard! spatial mixing separation (mixing)-1 ? 2
+ 9 = 11 ? x + y = 11 analogy: both unknown problem ill-posed

+ 9 = 11 ? 7 + 4 = 11 ? x + y = 11 analogy: both unknown problem ill-posed

+ 9 = 11 ? 7 + 4 = 11 ? x + y = 11 analogy: Infinite number of solutions! both unknown problem ill-posed

Algorithm using Speech-likeness as a Guide

separation (mixing)-1 guess 1 Algorithm using Speech-likeness as a Guide

speech-likeness test looks like this ?

speech-likeness test looks like this ? current source estimate model spectrogram time freq how similar ? speech-likeness test

speech-likeness test looks like this ?

no update guess speech-likeness test looks like this ?

Algorithm using Speech-likeness as a Guide no update guess speech-likeness
test looks like this ?

no update guess speech-likeness test looks like this ?

done! yes no update guess speech-likeness test looks like this ?

Separation via Optimization x₀ 0.0 0.5 1.0 1.5 2.0 2.5
3.0 Cost 2ptimization Landscape f(x) starting point minimum All we know is value and slope at x0!! speech-like mixture-like

Optimization is like Skiing

Optimization with Gradient Descent Optimal Step Size x₀ 0.0 0.5
1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost

Optimization with Gradient Descent Optimal Step Size x₀ x₁ 0.0
0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost

Optimization with Gradient Descent Optimal Step Size x₀ x₁ 0.0

Optimization with Gradient Descent Optimal Step Size x₀ x₁ x₂
0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 2 Cost

Optimization with Gradient Descent Optimal Step Size x₀ x₁ x₂

x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5
3.0 ← speech-like Iteration: 3 Cost Optimization with Gradient Descent Optimal Step Size

x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5
3.0 ← speech-like Iteration: 3 Cost Optimization with Gradient Descent Optimal Step Size NICE! ✌

Don’t Go Too Fast!

Gradient Descent Fails Step Size is Too Big! x₀ 0.0

Gradient Descent Fails Step Size is Too Big! x₀ x₁

x₂ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 2 Cost

x₂ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 3 Cost

x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5
3.0 ← speech-like Iteration: 3 Cost Gradient Descent Fails Step Size is Too Big!

x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5
3.0 ← speech-like Iteration: 3 Cost Gradient Descent Fails Step Size is Too Big! ↑ went up ↑

x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5
3.0 ← speech-like Iteration: 3 Cost Gradient Descent Fails Step Size is Too Big! ↑ went up ↑ FAIL! ☹

Safe Descent with Majorization-Minimzation

Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease!
x₀ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like 2ptimization Landscape Cost

x₀ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost Auxiliary

x₀ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost Auxiliary 1. touches

x₀ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost Auxiliary 1. touches 2. always above

x₀ x₁ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost Auxiliary 1. touches 2. always above 3. easy to minimize

x₀ x₁ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 2 Cost Auxiliary

x₀ x₁ x₂ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 2 Cost Auxiliary

x₀ x₁ x₂ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 3 Cost Auxiliary

x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 3 Cost Auxiliary

x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 4 Cost Auxiliary

x₀ x₁ x₂ x₃ x₄ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 4 Cost Auxiliary

x₀ x₁ x₂ x₃ x₄ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 5 Cost Auxiliary

x₀ x₁ x₂ x₃ x₄ x₅ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 5 Cost Auxiliary

x₀ x₁ x₂ x₃ x₄ x₅ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 6 Cost Auxiliary

x₀ x₁ x₂ x₃ x₄ x₅ x₆ 0.0 0.5 1.0
1.5 2.0 2.5 3.0 ← speech-like Iteration: 6 Cost Auxiliary Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease!

x₀ x₁ x₂ x₃ x₄ x₅ x₆ 0.0 0.5 1.0
1.5 2.0 2.5 3.0 ← speech-like Iteration: 6 Cost Auxiliary Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease! Nice, but…

x₀ x₁ x₂ x₃ x₄ x₅ x₆ 0.0 0.5 1.0
1.5 2.0 2.5 3.0 ← speech-like Iteration: 6 Cost Auxiliary Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease! Nice, but… kinda slow!

Fast Separation @ LINE

Find a Tighter Fitting Function Tighter fitting function will converge
faster! x₀ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like 2ptimization Landscape Cost

faster! x₀ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-OiNe Iteration: 1 Cost 1ew AuxiOiary 2Od AuxiOiary

faster! x₀ x₁ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-OiNe Iteration: 1 Cost 1ew AuxiOiary 2Od AuxiOiary

faster! x₀ x₁ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-OiNe Iteration: 2 Cost 1ew AuxiOiary 2Od AuxiOiary

faster! x₀ x₁ x₂ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-OiNe Iteration: 2 Cost 1ew AuxiOiary 2Od AuxiOiary

faster! x₀ x₁ x₂ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-liNe Iteration: 3 Cost 1ew Auxiliary

faster! x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-liNe Iteration: 3 Cost 1ew Auxiliary

faster! x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-liNe Iteration: 3 Cost 1ew Auxiliary NICE! ✌

0 2 4 6 Runtime [s] better! → 6eparation New
algorithm developed at LINE https://arxiv.org/abs/2008.10048 https://github.com/fakufaku/auxiva-ipa

algorithm developed at LINE the old ways https://arxiv.org/abs/2008.10048 https://github.com/fakufaku/auxiva-ipa

algorithm developed at LINE 0 2 4 6 Runtime [s] better! → 6eparation the old ways https://arxiv.org/abs/2008.10048 https://github.com/fakufaku/auxiva-ipa

algorithm developed at LINE 0 2 4 6 Runtime [s] better! → 6eparation the old ways https://arxiv.org/abs/2008.10048 4x faster! https://github.com/fakufaku/auxiva-ipa

Summary Source Separation @ LINE Fast Hands-off High-quality

Source Separation with pyroomacoustics https://github.com/LCAV/pyroomacoustics

Thank you!

Unsupervised Separation of Speech for Smooth Co...

Unsupervised Separation of Speech for Smooth Communications

More Decks by LINE DevDay 2020

Other Decks in Technology

Featured

Transcript