Separation with Multiple Microphones
spatial
mixing
?
separation
(mixing)-1
input data
hidden environment
Slide 12
Slide 12 text
Separation with Multiple Microphones
spatial
mixing
?
separation
(mixing)-1
input data
hidden environment algorithm
Slide 13
Slide 13 text
Separation with Multiple Microphones
spatial
mixing
?
separation
(mixing)-1
input data output
hidden environment algorithm
Slide 14
Slide 14 text
Smart Voice Assistants
Slide 15
Slide 15 text
Automatic Minutes Taking
Slide 16
Slide 16 text
Augmented Hearing
Slide 17
Slide 17 text
Multiple Sound Event Detection
‛ Talk by Tatsuya Komatsu on Acoustic Event Detection
Slide 18
Slide 18 text
How do we need it?
Slide 19
Slide 19 text
How do we need it?
Fast
Slide 20
Slide 20 text
How do we need it?
Fast Hands-off
Slide 21
Slide 21 text
How do we need it?
Fast Hands-off High-quality
Slide 22
Slide 22 text
Recognizing Speech
Slide 23
Slide 23 text
Spectrogram of Speech Sample
frequency
time
Slide 24
Slide 24 text
How to Recognize a Mixture ?
1 source
more speakers
Slide 25
Slide 25 text
How to Recognize a Mixture ?
1 source
2 sources
more speakers
Slide 26
Slide 26 text
How to Recognize a Mixture ?
1 source
2 sources
4 sources
more speakers
Slide 27
Slide 27 text
How to Recognize a Mixture ?
1 source
2 sources
4 sources
8 sources
more speakers
Slide 28
Slide 28 text
How to Recognize a Mixture ?
1 source
2 sources
4 sources
8 sources
Crowd
more speakers
Slide 29
Slide 29 text
Sources with Sparse Time Activity
time
freq.
speech signal
model spectrogram
Slide 30
Slide 30 text
Separation Algorithm
Slide 31
Slide 31 text
Source Separation is Hard!
spatial
mixing
Slide 32
Slide 32 text
Source Separation is Hard!
spatial
mixing
separation
(mixing)-1
?
Slide 33
Slide 33 text
Source Separation is Hard!
spatial
mixing
separation
(mixing)-1
?
both unknown problem ill-posed
Slide 34
Slide 34 text
Source Separation is Hard!
spatial
mixing
separation
(mixing)-1
?
x + y = 11
analogy:
both unknown problem ill-posed
Slide 35
Slide 35 text
Source Separation is Hard!
spatial
mixing
separation
(mixing)-1
?
2 + 9 = 11 ?
x + y = 11
analogy:
both unknown problem ill-posed
Slide 36
Slide 36 text
Source Separation is Hard!
spatial
mixing
separation
(mixing)-1
?
2 + 9 = 11 ?
7 + 4 = 11 ?
x + y = 11
analogy:
both unknown problem ill-posed
Slide 37
Slide 37 text
Source Separation is Hard!
spatial
mixing
separation
(mixing)-1
?
2 + 9 = 11 ?
7 + 4 = 11 ?
x + y = 11
analogy:
Infinite number of solutions!
both unknown problem ill-posed
Slide 38
Slide 38 text
Algorithm using Speech-likeness as a Guide
Slide 39
Slide 39 text
separation
(mixing)-1
guess 1
Algorithm using Speech-likeness as a Guide
Slide 40
Slide 40 text
separation
(mixing)-1
guess 1
Algorithm using Speech-likeness as a Guide
speech-likeness
test
looks like this ?
Slide 41
Slide 41 text
separation
(mixing)-1
guess 1
Algorithm using Speech-likeness as a Guide
speech-likeness
test
looks like this ?
current source estimate
model spectrogram
time
freq
how similar ?
speech-likeness test
Slide 42
Slide 42 text
separation
(mixing)-1
guess 1
Algorithm using Speech-likeness as a Guide
speech-likeness
test
looks like this ?
Slide 43
Slide 43 text
separation
(mixing)-1
guess 1
Algorithm using Speech-likeness as a Guide
no
update guess
speech-likeness
test
looks like this ?
Slide 44
Slide 44 text
Algorithm using Speech-likeness as a Guide
no
update guess
speech-likeness
test
looks like this ?
Slide 45
Slide 45 text
separation
(mixing)-1
guess 2
Algorithm using Speech-likeness as a Guide
no
update guess
speech-likeness
test
looks like this ?
Slide 46
Slide 46 text
separation
(mixing)-1
guess 2
Algorithm using Speech-likeness as a Guide
done!
yes
no
update guess
speech-likeness
test
looks like this ?
Slide 47
Slide 47 text
Separation via Optimization
x₀
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Cost
2ptimization Landscape
f(x)
starting point
minimum
All we know is value and slope at x0!!
speech-like
mixture-like
Find a Tighter Fitting Function
Tighter fitting function will converge faster!
x₀
0.0
0.5
1.0
1.5
2.0
2.5
3.0
← speech-like
2ptimization Landscape
Cost
Slide 85
Slide 85 text
Find a Tighter Fitting Function
Tighter fitting function will converge faster!
x₀
0.0
0.5
1.0
1.5
2.0
2.5
3.0
← speech-OiNe
Iteration: 1
Cost
1ew AuxiOiary
2Od AuxiOiary
Slide 86
Slide 86 text
Find a Tighter Fitting Function
Tighter fitting function will converge faster!
x₀
x₁
0.0
0.5
1.0
1.5
2.0
2.5
3.0
← speech-OiNe
Iteration: 1
Cost
1ew AuxiOiary
2Od AuxiOiary
Slide 87
Slide 87 text
Find a Tighter Fitting Function
Tighter fitting function will converge faster!
x₀
x₁
0.0
0.5
1.0
1.5
2.0
2.5
3.0
← speech-OiNe
Iteration: 2
Cost
1ew AuxiOiary
2Od AuxiOiary
Slide 88
Slide 88 text
Find a Tighter Fitting Function
Tighter fitting function will converge faster!
x₀
x₁
x₂
0.0
0.5
1.0
1.5
2.0
2.5
3.0
← speech-OiNe
Iteration: 2
Cost
1ew AuxiOiary
2Od AuxiOiary
Slide 89
Slide 89 text
Find a Tighter Fitting Function
Tighter fitting function will converge faster!
x₀
x₁
x₂
0.0
0.5
1.0
1.5
2.0
2.5
3.0
← speech-liNe
Iteration: 3
Cost
1ew Auxiliary
Slide 90
Slide 90 text
Find a Tighter Fitting Function
Tighter fitting function will converge faster!
x₀
x₁
x₂
x₃
0.0
0.5
1.0
1.5
2.0
2.5
3.0
← speech-liNe
Iteration: 3
Cost
1ew Auxiliary
Slide 91
Slide 91 text
Find a Tighter Fitting Function
Tighter fitting function will converge faster!
x₀
x₁
x₂
x₃
0.0
0.5
1.0
1.5
2.0
2.5
3.0
← speech-liNe
Iteration: 3
Cost
1ew Auxiliary
NICE! ✌
Slide 92
Slide 92 text
0 2 4 6
Runtime [s]
better! →
6eparation
New algorithm developed at LINE
https://arxiv.org/abs/2008.10048
https://github.com/fakufaku/auxiva-ipa
Slide 93
Slide 93 text
0 2 4 6
Runtime [s]
better! →
6eparation
New algorithm developed at LINE
the old ways
https://arxiv.org/abs/2008.10048
https://github.com/fakufaku/auxiva-ipa
Slide 94
Slide 94 text
0 2 4 6
Runtime [s]
better! →
6eparation
New algorithm developed at LINE
0 2 4 6
Runtime [s]
better! →
6eparation
the old ways
https://arxiv.org/abs/2008.10048
https://github.com/fakufaku/auxiva-ipa
Slide 95
Slide 95 text
0 2 4 6
Runtime [s]
better! →
6eparation
New algorithm developed at LINE
0 2 4 6
Runtime [s]
better! →
6eparation
the old ways
https://arxiv.org/abs/2008.10048
4x faster!
https://github.com/fakufaku/auxiva-ipa
Slide 96
Slide 96 text
0 2 4 6
Runtime [s]
better! →
6eparation
New algorithm developed at LINE
0 2 4 6
Runtime [s]
better! →
6eparation
the old ways
https://arxiv.org/abs/2008.10048
4x faster!
https://github.com/fakufaku/auxiva-ipa
Slide 97
Slide 97 text
0 2 4 6
Runtime [s]
better! →
6eparation
New algorithm developed at LINE
0 2 4 6
Runtime [s]
better! →
6eparation
the old ways
https://arxiv.org/abs/2008.10048
4x faster!
https://github.com/fakufaku/auxiva-ipa
Slide 98
Slide 98 text
0 2 4 6
Runtime [s]
better! →
6eparation
New algorithm developed at LINE
0 2 4 6
Runtime [s]
better! →
6eparation
the old ways
https://arxiv.org/abs/2008.10048
4x faster!
https://github.com/fakufaku/auxiva-ipa
Slide 99
Slide 99 text
Summary Source Separation @ LINE
Fast Hands-off High-quality
Slide 100
Slide 100 text
Source Separation with pyroomacoustics
https://github.com/LCAV/pyroomacoustics