Unsupervised Separation of Speech for Smooth Communications

Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

About Myself 1984 Switzerland 2012-2017 PhD EPFL Zurich Stockholm Tokyo Safecast 2017-2020 TMU March 2020 ‛ Speech Team @ LINE

Slide 3

Slide 3 text

Agenda › What is Source Separation ? › Recognizing Speech › Separation Algorithm › Fast Source Separation

Slide 4

Slide 4 text

What is Source Separation ?

Slide 5

Slide 5 text

The Cocktail Party Effect

Slide 6

Slide 6 text

How do we do it ?

Slide 7

Slide 7 text

How do we do it ? right ear left ear

Slide 8

Slide 8 text

How do we do it ? right ear left ear

Slide 9

Slide 9 text

Separation with Multiple Microphones spatial mixing ? separation (mixing)-1

Slide 10

Slide 10 text

Separation with Multiple Microphones spatial mixing ? separation (mixing)-1 hidden environment

Slide 11

Slide 11 text

Separation with Multiple Microphones spatial mixing ? separation (mixing)-1 input data hidden environment

Slide 12

Slide 12 text

Separation with Multiple Microphones spatial mixing ? separation (mixing)-1 input data hidden environment algorithm

Slide 13

Slide 13 text

Separation with Multiple Microphones spatial mixing ? separation (mixing)-1 input data output hidden environment algorithm

Slide 14

Slide 14 text

Smart Voice Assistants

Slide 15

Slide 15 text

Automatic Minutes Taking

Slide 16

Slide 16 text

Augmented Hearing

Slide 17

Slide 17 text

Multiple Sound Event Detection ‛ Talk by Tatsuya Komatsu on Acoustic Event Detection

Slide 18

Slide 18 text

How do we need it?

Slide 19

Slide 19 text

How do we need it? Fast

Slide 20

Slide 20 text

How do we need it? Fast Hands-off

Slide 21

Slide 21 text

How do we need it? Fast Hands-off High-quality

Slide 22

Slide 22 text

Recognizing Speech

Slide 23

Slide 23 text

Spectrogram of Speech Sample frequency time

Slide 24

Slide 24 text

How to Recognize a Mixture ? 1 source more speakers

Slide 25

Slide 25 text

How to Recognize a Mixture ? 1 source 2 sources more speakers

Slide 26

Slide 26 text

How to Recognize a Mixture ? 1 source 2 sources 4 sources more speakers

Slide 27

Slide 27 text

How to Recognize a Mixture ? 1 source 2 sources 4 sources 8 sources more speakers

Slide 28

Slide 28 text

How to Recognize a Mixture ? 1 source 2 sources 4 sources 8 sources Crowd more speakers

Slide 29

Slide 29 text

Sources with Sparse Time Activity time freq. speech signal model spectrogram

Slide 30

Slide 30 text

Separation Algorithm

Slide 31

Slide 31 text

Source Separation is Hard! spatial mixing

Slide 32

Slide 32 text

Source Separation is Hard! spatial mixing separation (mixing)-1 ?

Slide 33

Slide 33 text

Source Separation is Hard! spatial mixing separation (mixing)-1 ? both unknown problem ill-posed

Slide 34

Slide 34 text

Source Separation is Hard! spatial mixing separation (mixing)-1 ? x + y = 11 analogy: both unknown problem ill-posed

Slide 35

Slide 35 text

Source Separation is Hard! spatial mixing separation (mixing)-1 ? 2 + 9 = 11 ? x + y = 11 analogy: both unknown problem ill-posed

Slide 36

Slide 36 text

Source Separation is Hard! spatial mixing separation (mixing)-1 ? 2 + 9 = 11 ? 7 + 4 = 11 ? x + y = 11 analogy: both unknown problem ill-posed

Slide 37

Slide 37 text

Source Separation is Hard! spatial mixing separation (mixing)-1 ? 2 + 9 = 11 ? 7 + 4 = 11 ? x + y = 11 analogy: Infinite number of solutions! both unknown problem ill-posed

Slide 38

Slide 38 text

Algorithm using Speech-likeness as a Guide

Slide 39

Slide 39 text

separation (mixing)-1 guess 1 Algorithm using Speech-likeness as a Guide

Slide 40

Slide 40 text

separation (mixing)-1 guess 1 Algorithm using Speech-likeness as a Guide speech-likeness test looks like this ?

Slide 41

Slide 41 text

separation (mixing)-1 guess 1 Algorithm using Speech-likeness as a Guide speech-likeness test looks like this ? current source estimate model spectrogram time freq how similar ? speech-likeness test

Slide 42

Slide 42 text

separation (mixing)-1 guess 1 Algorithm using Speech-likeness as a Guide speech-likeness test looks like this ?

Slide 43

Slide 43 text

separation (mixing)-1 guess 1 Algorithm using Speech-likeness as a Guide no update guess speech-likeness test looks like this ?

Slide 44

Slide 44 text

Algorithm using Speech-likeness as a Guide no update guess speech-likeness test looks like this ?

Slide 45

Slide 45 text

separation (mixing)-1 guess 2 Algorithm using Speech-likeness as a Guide no update guess speech-likeness test looks like this ?

Slide 46

Slide 46 text

separation (mixing)-1 guess 2 Algorithm using Speech-likeness as a Guide done! yes no update guess speech-likeness test looks like this ?

Slide 47

Slide 47 text

Separation via Optimization x₀ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Cost 2ptimization Landscape f(x) starting point minimum All we know is value and slope at x0!! speech-like mixture-like

Slide 48

Slide 48 text

Optimization is like Skiing

Slide 49

Slide 49 text

Optimization with Gradient Descent Optimal Step Size x₀ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost

Slide 50

Slide 50 text

Optimization with Gradient Descent Optimal Step Size x₀ x₁ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost

Slide 51

Slide 51 text

Optimization with Gradient Descent Optimal Step Size x₀ x₁ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 2 Cost

Slide 52

Slide 52 text

Optimization with Gradient Descent Optimal Step Size x₀ x₁ x₂ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 2 Cost

Slide 53

Slide 53 text

Optimization with Gradient Descent Optimal Step Size x₀ x₁ x₂ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 3 Cost

Slide 54

Slide 54 text

x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 3 Cost Optimization with Gradient Descent Optimal Step Size

Slide 55

Slide 55 text

x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 3 Cost Optimization with Gradient Descent Optimal Step Size NICE! ✌

Slide 56

Slide 56 text

Don’t Go Too Fast!

Slide 57

Slide 57 text

Gradient Descent Fails Step Size is Too Big! x₀ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost

Slide 58

Slide 58 text

Gradient Descent Fails Step Size is Too Big! x₀ x₁ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost

Slide 59

Slide 59 text

Gradient Descent Fails Step Size is Too Big! x₀ x₁ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 2 Cost

Slide 60

Slide 60 text

Gradient Descent Fails Step Size is Too Big! x₀ x₁ x₂ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 2 Cost

Slide 61

Slide 61 text

Gradient Descent Fails Step Size is Too Big! x₀ x₁ x₂ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 3 Cost

Slide 62

Slide 62 text

x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 3 Cost Gradient Descent Fails Step Size is Too Big!

Slide 63

Slide 63 text

x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 3 Cost Gradient Descent Fails Step Size is Too Big! ↑ went up ↑

Slide 64

Slide 64 text

x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 3 Cost Gradient Descent Fails Step Size is Too Big! ↑ went up ↑ FAIL! ☹

Slide 65

Slide 65 text

Safe Descent with Majorization-Minimzation

Slide 66

Slide 66 text

Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease! x₀ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like 2ptimization Landscape Cost

Slide 67

Slide 67 text

Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease! x₀ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost Auxiliary

Slide 68

Slide 68 text

Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease! x₀ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost Auxiliary 1. touches

Slide 69

Slide 69 text

Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease! x₀ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost Auxiliary 1. touches 2. always above

Slide 70

Slide 70 text

Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease! x₀ x₁ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost Auxiliary 1. touches 2. always above 3. easy to minimize

Slide 71

Slide 71 text

Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease! x₀ x₁ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 2 Cost Auxiliary

Slide 72

Slide 72 text

Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease! x₀ x₁ x₂ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 2 Cost Auxiliary

Slide 73

Slide 73 text

Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease! x₀ x₁ x₂ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 3 Cost Auxiliary

Slide 74

Slide 74 text

Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease! x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 3 Cost Auxiliary

Slide 75

Slide 75 text

Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease! x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 4 Cost Auxiliary

Slide 76

Slide 76 text

Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease! x₀ x₁ x₂ x₃ x₄ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 4 Cost Auxiliary

Slide 77

Slide 77 text

Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease! x₀ x₁ x₂ x₃ x₄ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 5 Cost Auxiliary

Slide 78

Slide 78 text

Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease! x₀ x₁ x₂ x₃ x₄ x₅ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 5 Cost Auxiliary

Slide 79

Slide 79 text

Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease! x₀ x₁ x₂ x₃ x₄ x₅ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 6 Cost Auxiliary

Slide 80

Slide 80 text

x₀ x₁ x₂ x₃ x₄ x₅ x₆ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 6 Cost Auxiliary Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease!

Slide 81

Slide 81 text

x₀ x₁ x₂ x₃ x₄ x₅ x₆ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 6 Cost Auxiliary Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease! Nice, but…

Slide 82

Slide 82 text

Slide 83

Slide 83 text

Fast Separation @ LINE

Slide 84

Slide 84 text

Find a Tighter Fitting Function Tighter fitting function will converge faster! x₀ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like 2ptimization Landscape Cost

Slide 85

Slide 85 text

Find a Tighter Fitting Function Tighter fitting function will converge faster! x₀ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-OiNe Iteration: 1 Cost 1ew AuxiOiary 2Od AuxiOiary

Slide 86

Slide 86 text

Find a Tighter Fitting Function Tighter fitting function will converge faster! x₀ x₁ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-OiNe Iteration: 1 Cost 1ew AuxiOiary 2Od AuxiOiary

Slide 87

Slide 87 text

Find a Tighter Fitting Function Tighter fitting function will converge faster! x₀ x₁ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-OiNe Iteration: 2 Cost 1ew AuxiOiary 2Od AuxiOiary

Slide 88

Slide 88 text

Find a Tighter Fitting Function Tighter fitting function will converge faster! x₀ x₁ x₂ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-OiNe Iteration: 2 Cost 1ew AuxiOiary 2Od AuxiOiary

Slide 89

Slide 89 text

Find a Tighter Fitting Function Tighter fitting function will converge faster! x₀ x₁ x₂ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-liNe Iteration: 3 Cost 1ew Auxiliary

Slide 90

Slide 90 text

Find a Tighter Fitting Function Tighter fitting function will converge faster! x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-liNe Iteration: 3 Cost 1ew Auxiliary

Slide 91

Slide 91 text

Find a Tighter Fitting Function Tighter fitting function will converge faster! x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-liNe Iteration: 3 Cost 1ew Auxiliary NICE! ✌

Slide 92

Slide 92 text

0 2 4 6 Runtime [s] better! → 6eparation New algorithm developed at LINE https://arxiv.org/abs/2008.10048 https://github.com/fakufaku/auxiva-ipa

Slide 93

Slide 93 text

0 2 4 6 Runtime [s] better! → 6eparation New algorithm developed at LINE the old ways https://arxiv.org/abs/2008.10048 https://github.com/fakufaku/auxiva-ipa

Slide 94

Slide 94 text

0 2 4 6 Runtime [s] better! → 6eparation New algorithm developed at LINE 0 2 4 6 Runtime [s] better! → 6eparation the old ways https://arxiv.org/abs/2008.10048 https://github.com/fakufaku/auxiva-ipa

Slide 95

Slide 95 text

0 2 4 6 Runtime [s] better! → 6eparation New algorithm developed at LINE 0 2 4 6 Runtime [s] better! → 6eparation the old ways https://arxiv.org/abs/2008.10048 4x faster! https://github.com/fakufaku/auxiva-ipa