education Ph.D. in Signal Processing from EPFL (Switzerland) previously • Post Doc at Tokyo Metropolitan University • Intern/Researcher at NEC, IBM • Build mobile Geiger counters Safecast • Since 2014, developer of pyroomacoustics research • Fast transforms (Fourier, Hadamard, sparse, etc) • Multi-channel Audio Processing • Reproducible research hobby Ski, DIY electronics, fermentation homepage http://www.robinscheibler.org github @fakufaku twitter @fakufakurevenge 2
REASON and modify it Development Loop Prototyping of multichannel algorithms Without pyroomacoustics: experiments → time consuming With pyroomacoustics: simulation → fast → short cycle Data Augmentation Without pyroomacoustics: few examples of RIR, difficult to collect With pyroomacoustics: easy to generate lots of examples 7
3.2], fs=16000, absorption=0.25, max_order=17 ) absorption Use Sabine’s formula T60 = 24 log 10 c V Sa V : volume, S: surface, c: speed of sound ⇒ solve for a max_order • Image source are contained in a diamond • Min. integer such that sphere w. radius T60 ∗ c is enclosed Code ref: https://github.com/fakufaku/bss_speech_dataset/blob/master/room_builder.py#L12 15
Bezzam, Snips (now part of Sonos) Task Keyword spotting, i.e. recognize "Hey Snips!" Clean samples Recordings of keyword ("Hey Snips!") Noise samples MUSAN (sounds) and Librispeech (speech) Test samples Hold-out set of "Hey Snips" re-recorded Prior art1 ISM, T60 sampled randomly (ISM T60) 1. Chanwoo Kim et al., “Generation of large-scale simulated utterances in virtual rooms to train deep-neural networks for far-field speech recognition in google home," Interspeech, 2017. 17
HYB FREQ AIR ISM only Hybrid Rand. material Scattering Multi-freq. Air absorption SNR Noise ISM T60 ISM MAT HYB MAT HYB FREQ HYB FREQ AIR clean 0.92% 0.58% 0.53% 0.46% 0.42% 5 dB sounds 9.42% 7.14% 7.25% 6.04% 5.42% 5 dB speech 16.0% 13.1% 14.7% 12.5% 12.5% 2 dB sounds 16.8% 14.6% 14.2% 12.3% 11.2% 2 dB speech 30.4% 27.1% 29.9% 26.0% 26.6% Avg. rel. improv. - 20.8% 18.2% 29.9% 33.0% Table: False rejection rates (in percent) for a false alarm per hour rate of 0.125 (three false alarms per day). 18
(M > K) Frequency Domain Blind Source Separation Advantage of BSS • No prior information required, only signals! • Reliable enhancement via separation 20
(M > K) Frequency Domain Blind Source Separation Separated sources Mics spectrograms sources time frequency Advantage of BSS • No prior information required, only signals! • Reliable enhancement via separation 20
(M > K) Frequency Domain Blind Source Separation Separated sources Mics spectrograms sources time frequency Advantage of BSS • No prior information required, only signals! • Reliable enhancement via separation 20
(M > K) Frequency Domain Blind Source Separation Separated sources Mics spectrograms sources time frequency Advantage of BSS • No prior information required, only signals! • Reliable enhancement via separation 20
(M > K) Frequency Domain Blind Source Separation Separated sources Mics spectrograms sources time frequency Advantage of BSS • No prior information required, only signals! • Reliable enhancement via separation 20
AuxIVA1 / OverIVA2 spherical SparseAuxIVA3 spherical ILRMA4 low-rank FastMNMF5 low-rank 1. N. Ono, “Stable and fast update rules for independent vector analysis based on auxiliary function technique," WASPAA, 2011. 2. R. Scheibler and N. Ono, “Independent vector analysis with more microphones Than Sources," WASPAA, 2019. 3. J. Janský et al., “A computationally cheaper method for blind speech separation based on AuxIVA and incomplete demixing transform," Proc. IWAENC, 2016. 4. D. Kitamura et al., “Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization," IEEE/ACM Trans. ASLP, 2016. 5. K. Sekiguchi et al., “Fast Multichannel Source Separation Based on Jointly Diagonalizable Spatial Covariance Matrices," EUSIPCO, 2019. 21
Time SIR SIR SIR Clean - ∞ ∞ ∞ Mix - -2.8 -2.89 -2.75 AuxIVA 6.33 s 10.13 15.95 11.56 ILRMA 8.84 s 10.48 16.08 12.03 FastMNMF 35.9 s 11.38 17.12 10.60 23
of multichannel processing algo. • Data augmentation effective for ASR/KWS systems • Rapid prototyping and faster experiment cycle What’s next ? • Release next_gen_simulator (ray tracing, air abs.) • Desired: directional microphones and sources • Help is very welcome! https://github.com/LCAV/pyroomacoustics 24