Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unsupervised Separation of Speech for Smooth Communications

Unsupervised Separation of Speech for Smooth Communications

Eebedc2ee7ff95ffb9d9102c6d4a065c?s=128

LINE DevDay 2020

November 25, 2020
Tweet

Transcript

  1. None
  2. About Myself 1984 Switzerland 2012-2017 PhD EPFL Zurich Stockholm Tokyo

    Safecast 2017-2020 TMU March 2020 ‛ Speech Team @ LINE
  3. Agenda › What is Source Separation ? › Recognizing Speech

    › Separation Algorithm › Fast Source Separation
  4. What is Source Separation ?

  5. The Cocktail Party Effect

  6. How do we do it ?

  7. How do we do it ? right ear left ear

  8. How do we do it ? right ear left ear

  9. Separation with Multiple Microphones spatial mixing ? separation (mixing)-1

  10. Separation with Multiple Microphones spatial mixing ? separation (mixing)-1 hidden

    environment
  11. Separation with Multiple Microphones spatial mixing ? separation (mixing)-1 input

    data hidden environment
  12. Separation with Multiple Microphones spatial mixing ? separation (mixing)-1 input

    data hidden environment algorithm
  13. Separation with Multiple Microphones spatial mixing ? separation (mixing)-1 input

    data output hidden environment algorithm
  14. Smart Voice Assistants

  15. Automatic Minutes Taking

  16. Augmented Hearing

  17. Multiple Sound Event Detection ‛ Talk by Tatsuya Komatsu on

    Acoustic Event Detection
  18. How do we need it?

  19. How do we need it? Fast

  20. How do we need it? Fast Hands-off

  21. How do we need it? Fast Hands-off High-quality

  22. Recognizing Speech

  23. Spectrogram of Speech Sample frequency time

  24. How to Recognize a Mixture ? 1 source more speakers

  25. How to Recognize a Mixture ? 1 source 2 sources

    more speakers
  26. How to Recognize a Mixture ? 1 source 2 sources

    4 sources more speakers
  27. How to Recognize a Mixture ? 1 source 2 sources

    4 sources 8 sources more speakers
  28. How to Recognize a Mixture ? 1 source 2 sources

    4 sources 8 sources Crowd more speakers
  29. Sources with Sparse Time Activity time freq. speech signal model

    spectrogram
  30. Separation Algorithm

  31. Source Separation is Hard! spatial mixing

  32. Source Separation is Hard! spatial mixing separation (mixing)-1 ?

  33. Source Separation is Hard! spatial mixing separation (mixing)-1 ? both

    unknown problem ill-posed
  34. Source Separation is Hard! spatial mixing separation (mixing)-1 ? x

    + y = 11 analogy: both unknown problem ill-posed
  35. Source Separation is Hard! spatial mixing separation (mixing)-1 ? 2

    + 9 = 11 ? x + y = 11 analogy: both unknown problem ill-posed
  36. Source Separation is Hard! spatial mixing separation (mixing)-1 ? 2

    + 9 = 11 ? 7 + 4 = 11 ? x + y = 11 analogy: both unknown problem ill-posed
  37. Source Separation is Hard! spatial mixing separation (mixing)-1 ? 2

    + 9 = 11 ? 7 + 4 = 11 ? x + y = 11 analogy: Infinite number of solutions! both unknown problem ill-posed
  38. Algorithm using Speech-likeness as a Guide

  39. separation (mixing)-1 guess 1 Algorithm using Speech-likeness as a Guide

  40. separation (mixing)-1 guess 1 Algorithm using Speech-likeness as a Guide

    speech-likeness test looks like this ?
  41. separation (mixing)-1 guess 1 Algorithm using Speech-likeness as a Guide

    speech-likeness test looks like this ? current source estimate model spectrogram time freq how similar ? speech-likeness test
  42. separation (mixing)-1 guess 1 Algorithm using Speech-likeness as a Guide

    speech-likeness test looks like this ?
  43. separation (mixing)-1 guess 1 Algorithm using Speech-likeness as a Guide

    no update guess speech-likeness test looks like this ?
  44. Algorithm using Speech-likeness as a Guide no update guess speech-likeness

    test looks like this ?
  45. separation (mixing)-1 guess 2 Algorithm using Speech-likeness as a Guide

    no update guess speech-likeness test looks like this ?
  46. separation (mixing)-1 guess 2 Algorithm using Speech-likeness as a Guide

    done! yes no update guess speech-likeness test looks like this ?
  47. Separation via Optimization x₀ 0.0 0.5 1.0 1.5 2.0 2.5

    3.0 Cost 2ptimization Landscape f(x) starting point minimum All we know is value and slope at x0!! speech-like mixture-like
  48. Optimization is like Skiing

  49. Optimization with Gradient Descent Optimal Step Size x₀ 0.0 0.5

    1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost
  50. Optimization with Gradient Descent Optimal Step Size x₀ x₁ 0.0

    0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost
  51. Optimization with Gradient Descent Optimal Step Size x₀ x₁ 0.0

    0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 2 Cost
  52. Optimization with Gradient Descent Optimal Step Size x₀ x₁ x₂

    0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 2 Cost
  53. Optimization with Gradient Descent Optimal Step Size x₀ x₁ x₂

    0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 3 Cost
  54. x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5

    3.0 ← speech-like Iteration: 3 Cost Optimization with Gradient Descent Optimal Step Size
  55. x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5

    3.0 ← speech-like Iteration: 3 Cost Optimization with Gradient Descent Optimal Step Size NICE! ✌
  56. Don’t Go Too Fast!

  57. Gradient Descent Fails Step Size is Too Big! x₀ 0.0

    0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost
  58. Gradient Descent Fails Step Size is Too Big! x₀ x₁

    0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost
  59. Gradient Descent Fails Step Size is Too Big! x₀ x₁

    0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 2 Cost
  60. Gradient Descent Fails Step Size is Too Big! x₀ x₁

    x₂ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 2 Cost
  61. Gradient Descent Fails Step Size is Too Big! x₀ x₁

    x₂ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 3 Cost
  62. x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5

    3.0 ← speech-like Iteration: 3 Cost Gradient Descent Fails Step Size is Too Big!
  63. x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5

    3.0 ← speech-like Iteration: 3 Cost Gradient Descent Fails Step Size is Too Big! ↑ went up ↑
  64. x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5

    3.0 ← speech-like Iteration: 3 Cost Gradient Descent Fails Step Size is Too Big! ↑ went up ↑ FAIL! ☹
  65. Safe Descent with Majorization-Minimzation

  66. Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease!

    x₀ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like 2ptimization Landscape Cost
  67. Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease!

    x₀ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost Auxiliary
  68. Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease!

    x₀ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost Auxiliary 1. touches
  69. Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease!

    x₀ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost Auxiliary 1. touches 2. always above
  70. Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease!

    x₀ x₁ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 1 Cost Auxiliary 1. touches 2. always above 3. easy to minimize
  71. Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease!

    x₀ x₁ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 2 Cost Auxiliary
  72. Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease!

    x₀ x₁ x₂ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 2 Cost Auxiliary
  73. Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease!

    x₀ x₁ x₂ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 3 Cost Auxiliary
  74. Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease!

    x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 3 Cost Auxiliary
  75. Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease!

    x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 4 Cost Auxiliary
  76. Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease!

    x₀ x₁ x₂ x₃ x₄ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 4 Cost Auxiliary
  77. Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease!

    x₀ x₁ x₂ x₃ x₄ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 5 Cost Auxiliary
  78. Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease!

    x₀ x₁ x₂ x₃ x₄ x₅ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 5 Cost Auxiliary
  79. Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease!

    x₀ x₁ x₂ x₃ x₄ x₅ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like Iteration: 6 Cost Auxiliary
  80. x₀ x₁ x₂ x₃ x₄ x₅ x₆ 0.0 0.5 1.0

    1.5 2.0 2.5 3.0 ← speech-like Iteration: 6 Cost Auxiliary Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease!
  81. x₀ x₁ x₂ x₃ x₄ x₅ x₆ 0.0 0.5 1.0

    1.5 2.0 2.5 3.0 ← speech-like Iteration: 6 Cost Auxiliary Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease! Nice, but…
  82. x₀ x₁ x₂ x₃ x₄ x₅ x₆ 0.0 0.5 1.0

    1.5 2.0 2.5 3.0 ← speech-like Iteration: 6 Cost Auxiliary Optimization with Majorization-Minimization No Step Size Required! Guaranteed to Decrease! Nice, but… kinda slow!
  83. Fast Separation @ LINE

  84. Find a Tighter Fitting Function Tighter fitting function will converge

    faster! x₀ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-like 2ptimization Landscape Cost
  85. Find a Tighter Fitting Function Tighter fitting function will converge

    faster! x₀ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-OiNe Iteration: 1 Cost 1ew AuxiOiary 2Od AuxiOiary
  86. Find a Tighter Fitting Function Tighter fitting function will converge

    faster! x₀ x₁ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-OiNe Iteration: 1 Cost 1ew AuxiOiary 2Od AuxiOiary
  87. Find a Tighter Fitting Function Tighter fitting function will converge

    faster! x₀ x₁ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-OiNe Iteration: 2 Cost 1ew AuxiOiary 2Od AuxiOiary
  88. Find a Tighter Fitting Function Tighter fitting function will converge

    faster! x₀ x₁ x₂ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-OiNe Iteration: 2 Cost 1ew AuxiOiary 2Od AuxiOiary
  89. Find a Tighter Fitting Function Tighter fitting function will converge

    faster! x₀ x₁ x₂ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-liNe Iteration: 3 Cost 1ew Auxiliary
  90. Find a Tighter Fitting Function Tighter fitting function will converge

    faster! x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-liNe Iteration: 3 Cost 1ew Auxiliary
  91. Find a Tighter Fitting Function Tighter fitting function will converge

    faster! x₀ x₁ x₂ x₃ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ← speech-liNe Iteration: 3 Cost 1ew Auxiliary NICE! ✌
  92. 0 2 4 6 Runtime [s] better! → 6eparation New

    algorithm developed at LINE https://arxiv.org/abs/2008.10048 https://github.com/fakufaku/auxiva-ipa
  93. 0 2 4 6 Runtime [s] better! → 6eparation New

    algorithm developed at LINE the old ways https://arxiv.org/abs/2008.10048 https://github.com/fakufaku/auxiva-ipa
  94. 0 2 4 6 Runtime [s] better! → 6eparation New

    algorithm developed at LINE 0 2 4 6 Runtime [s] better! → 6eparation the old ways https://arxiv.org/abs/2008.10048 https://github.com/fakufaku/auxiva-ipa
  95. 0 2 4 6 Runtime [s] better! → 6eparation New

    algorithm developed at LINE 0 2 4 6 Runtime [s] better! → 6eparation the old ways https://arxiv.org/abs/2008.10048 4x faster! https://github.com/fakufaku/auxiva-ipa
  96. 0 2 4 6 Runtime [s] better! → 6eparation New

    algorithm developed at LINE 0 2 4 6 Runtime [s] better! → 6eparation the old ways https://arxiv.org/abs/2008.10048 4x faster! https://github.com/fakufaku/auxiva-ipa
  97. 0 2 4 6 Runtime [s] better! → 6eparation New

    algorithm developed at LINE 0 2 4 6 Runtime [s] better! → 6eparation the old ways https://arxiv.org/abs/2008.10048 4x faster! https://github.com/fakufaku/auxiva-ipa
  98. 0 2 4 6 Runtime [s] better! → 6eparation New

    algorithm developed at LINE 0 2 4 6 Runtime [s] better! → 6eparation the old ways https://arxiv.org/abs/2008.10048 4x faster! https://github.com/fakufaku/auxiva-ipa
  99. Summary Source Separation @ LINE Fast Hands-off High-quality

  100. Source Separation with pyroomacoustics https://github.com/LCAV/pyroomacoustics

  101. Thank you!