Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Neural Spatial Audio Processing for Sound Field...

Neural Spatial Audio Processing for Sound Field Analysis and Control

Keynote talk at Audio Analysis Workshop 2025

Avatar for NII S. Koyama's Lab

NII S. Koyama's Lab

August 22, 2025
Tweet

More Decks by NII S. Koyama's Lab

Other Decks in Research

Transcript

  1. Neural Spatial Audio Processing for Sound Field Analysis and Control

    Shoichi Koyama National Institute of Informatics, Tokyo, Japan
  2. About NII Ø NII is national research institute of informatics

    in Japan – Main lab is located in central Tokyo – Associated with graduate university called SOKENDAI August 22, 2025 2 Kashiwa Annex NII 29 km Imperial Palace ICASSP 2028 venue
  3. August 22, 2025 3 Basic Technologies of Sound Field Estimation

    and Control VR/AR audio Active noise control Local-field recording and reproduction Signal enhancement Visualization/auralization Room acoustic analysis Our research topics Sound field estimation/control and its applications
  4. Sound field estimation August 22, 2025 4 How to estimate

    distribution of continuous physical quantity of sound from discrete sensor observations? Target region: Microphone Fundamental problem, but very important in various applications
  5. Sound field estimation August 22, 2025 5 How to estimate

    distribution of continuous physical quantity of sound from discrete sensor observations? Estimate pressure distribution with observations at discrete set of mics in the frequency domain Target region: Microphone
  6. Sound field estimation Ø Prior work on sound field estimation

    – Basis expansion-based methods [Colton+ 1992] • Plane wave expansion (or Herglotz wave function) • Spherical wave function expansion • Equivalent source distribution (or single-layer potential) – Infinite-dimensional expansion or kernel regression • Harmonic analysis of infinite order [Ueno+ 2018] • Directionally-weighted kernel regression [Ueno+ 2021] August 22, 2025 6 Comprehensive review is available at • Ueno and Koyama, “Sound Field Estimation: Theories and Applications,” Foundations and Trends® in Signal Processing, 2025.
  7. Kernel regression for sound field estimation Ø Function to be

    interpolated is represented by weighted sum of kernel functions Ø Kernel function to constrain the solution to satisfy the Helmholtz equation – With directional weighting of von Mises‒Fisher distribution – With uniform weighting August 22, 2025 7 Kernel regression with constraint of Helmholtz equation where Kernel function Direction Sharpness
  8. Kernel regression for sound field estimation Ø Experimental results using

    real data from MeshRIR dataset – Reconstructing pulse signal from single loudspeaker w/ 18 mic August 22, 2025 8 Ground truth Kernel regression w/ HE constraint Kernel regression w/ Gaussian kernel (Black dots indicate mic positions) [Koyama+ 2021] Applied to binaural rendering, spatial active noise control, etc.[Ueno+ 2025]
  9. Neural spatial audio processing Ø Why neural networks? – Adaptability

    to acoustic environments • Estimator is fixed regardless of environment in current methods • High representational power of NNs allows adaptation to environment – Data-driven prior information • Data obtained in advance gives rich prior information on environment • High accuracy can be maintained even with extremely small number of mics August 22, 2025 9 • Physics-Constrained Neural Kernel • Autoencoder conditioned on source and sensor positions [Ribeiro+ 2024] [Koyama+ 2025]
  10. Related work: Physics-Informed Neural Network Ø Implicit neural representation (or

    Neural field) – NN implicitly representing continuous function August 22, 2025 10 Input Output [Sitzmann+ 2020] Loss function: Physical properties are not taken into consideration
  11. Related work: Physics-Informed Neural Network Ø Physics-informed neural network (PINN)

    – Implicit neural representation incorporating loss function that evaluates deviation from governing PDE (PDE loss) August 22, 2025 11 Input Output [Raissi+ 2019] Loss function: Penalized
  12. Physics-Constrained Neural Kernel Ø Directional weighting function of kernel function

    is adapted to environment August 22, 2025 12 Implicit neural representation of kernel function with constraint of Helmholtz equation Microphone Directed component Residual component Kernel function based on plane wave expansion [Ribeiro+ 2024]
  13. Physics-Constrained Neural Kernel Ø Directed component – Weighted sum of

    (sparse) von Mises‒Fisher distributions to represent direct sound and early reflections August 22, 2025 13 Implicit neural representation of kernel function with constraint of Helmholtz equation <latexit sha1_base64="I2jQ2Fgmjq+z5rZFxQfy+e9Ar9o=">AAADQ3icfZHNbtQwEMed8FWWj27hyMViQdqiZZWgqkVClSrKgQtQJLattNlGjneSWo2dyHaAxfJTcOGFeAiegRviCsJJdle0pYwU6af//GecmUnKnCkdBN88/9LlK1evrVzv3Lh56/Zqd+3OvioqSWFEi7yQhwlRkDMBI810DoelBMKTHA6Sk906f/AepGKFeKdnJUw4yQRLGSXaSXH3y4fYRJzoY8nNlElr+1HCTQSa2Ge4waTmQcsZ4ZzYdbyNI1Xx2Ijt0B69xq0eCxylktBlP7BHbXmdyYnIcsDL7m3Hqa1zsslZa3b7c/+6jbu9YBg0gc9DOIcemsdevOZ9jqYFrTgITXOi1DgMSj0xRGpGXfdOVCkoCT0hGYwdCsJBTUyzQosfOmWK00K6T2jcqH9XGMKVmvHEOevp1NlcLf4rN650+nRimCgrDYK2D6VVjnWB63tgt3SgOp85IFQy96+YHhO3Ru2u1jn1TMIHzWZVqtw0L8BNKeGVU96UIIku5CMTEZlx8tG6qbNoUNP/jEwsjI4uMromjLNPYM2SLrQysbAuyJ0xPHu087D/ZBhuDjffbvR2ns8PuoLuofuoj0K0hXbQS7SHRoii394D77E39L/63/0f/s/W6nvzmrvoVPi//gCenhPw</latexit> wdir(⌘; , ) = N X n=1 n e n h⌘,dn i C( n) <latexit sha1_base64="wce9c2fonwWBBzEGuOJL2nk9Ius=">AAACwHicfZFbSxtBFMcnq201vUXFJ18WQ8EWCbulaF8EUR98ERUaFbIhnJ2c3QzOZZ2ZLcZ1P4WfwFf9RH4bZ2Mi9dYDAz/+5z8z5xJnnBkbBHc1b2r63fsPM7P1j58+f/namJs/MirXFNtUcaVPYjDImcS2ZZbjSaYRRMzxOD7drvLHf1EbpuQfO8ywKyCVLGEUrJN6jcWV6DKKRRGlIASU0WUv3Ai/9xrNoBWMwn8J4RiaZBwHvbnaVdRXNBcoLeVgTCcMMtstQFtGOZb1KDeYAT2FFDsOJQg03WJUf+l/c0rfT5R2R1p/pP57owBhzFDEzinADszzXCW+luvkNvndLZjMcouSPnyU5Ny3yq+G4feZRmr50AFQzVytPh2ABmrdyOpPvonFavW+Nolx3eyg61LjnlP2M9Rglf5RRKBTAeel6zqNViv6n5HJidHRW0b3CBPsAsvikd60MjmxTsitMXy+tJdw9LMVrrXWDn81N7fGC50hS2SZrJCQrJNNsksOSJtQUpBrckNuvS1v4Cnv7MHq1cZ3FsiT8C7uAZQE35U=</latexit> (k k1 = 1) <latexit sha1_base64="44H961efgdVBY4JkEGr9+jx5ReM=">AAADfHiclZFdb9MwFIadlo9RPtaNS24sClILXZVUqONm0sS44AYY0rpNqrvIcZ3Ua+xktoMoln8Fv4xL/gzCSVPENobEkSI9Ouc95+T4jfKUKe37P7xG89btO3c37rXuP3j4aLO9tX2sskISOiZZmsnTCCuaMkHHmumUnuaSYh6l9CRaHJT1k89UKpaJI73M6ZTjRLCYEaxdKmx/Rwuc5zg0iGM9l9zMmLS2iyJupA2Dfg3DHtyDSBU8NGIvsGcfIEow5zgUsIViiYk5D32U0lh3nexCagO764nnFkVUY1iOQg4s3IGLeq4JhrZ3tlquYnNk7X/0QYskS+a6Z81Bt9KGomfDdscf+FXA6xDU0AF1HIZb3jc0y0jBqdAkxUpNAj/XU4OlZiSltoUKRXNMFjihE4cCc6qmpnp7C5+7zAzGmXSf0LDK/tlhMFdqySOnrI68WiuTf6tNCh2/nhom8kJTQVaL4iKFOoOlkdAZRYlOlw4wkcz9KyRz7KzQzu7WpTUR71dvqmLlrnlL3ZWSvneZjzmVWGfyhUFYJhx/se7qBPVL+peQibXQ0U1CN4Rx9pVa85tulDKxlq7J2RhcNe06HA8HwWgw+vSqs/+mNnQDPAFPQRcEYBfsg3fgEIwB8Xa9qRd7SeNn81nzZXNnJW14dc9jcCmao1839ST1</latexit> dir(r1, r2) = N X n=1 n j0 ⇣p (j ⌘ kr12)T(j ⌘ kr12) ⌘ C( n) Sparsity constraint Normalization const
  14. Physics-Constrained Neural Kernel Ø Residual component – Implicit neural representation

    to represent late reverberation August 22, 2025 14 Implicit neural representation of kernel function with constraint of Helmholtz equation <latexit sha1_base64="s4P1x3nuyJ2mvTCbZ2ZsvG/WxFE=">AAAC+XicfZHdahNBFMcn61eNX6leejMYhCol7BaphSIU9cIbYwXTFrIhnJ2cTYbuzC4zZ9U47FP4BN6Jt/oyeqsP4uw2EdsaDwz85n/+83HOSYpMWgrD763gwsVLl6+sXW1fu37j5q3O+u0Dm5dG4EDkWW6OErCYSY0DkpThUWEQVJLhYXL8rM4fvkVjZa7f0LzAkYKplqkUQF4ad/rvxi5WQDOjnEFbVRtxolyMBNUub5Bm9eYBf8KXvn5/pWvc6Ya9sAl+HqIFdNki9sfrrY/xJBelQk0iA2uHUVjQyIEhKTKs2nFpsQBxDFMcetSg0I5cU3jF73tlwtPc+KWJN+rfJxwoa+cq8c767/Zsrhb/lRuWlO6MnNRFSajFyUNpmXHKed1FPpEGBWVzDyCM9H/lYgYGBPlet089k6jNpm82tb6a5+irNPjSK68KNEC5eehiMFMF7ytf9TTerOl/RqmXRk+rjP4SqeQHrNwfWmmVemldkh9jdHZo5+Fgqxdt97ZfP+ruPV0MdI3dZffYBovYY7bHXrB9NmCCfWM/2E/2K3DBp+Bz8OXEGrQWZ+6wUxF8/Q016fgt</latexit> wres(⌘; ✓) = NN(⌘; ✓) <latexit sha1_base64="dexZadk2Poc298GGiqVC2ayTShI=">AAADX3icfZHfihMxFMYzrbpr1bWrV+pNsAi7UktnkVUQoagXKogr2t2FppYz6WkbO8kMSUatQ57CV/FlvPRNzEynxf3ngYHffOc7OTk5URoLY7vd30GtfunylY3Nq41r129s3Wxu3zo0SaY59nkSJ/o4AoOxUNi3wsZ4nGoEGcV4FM1fFvmjr6iNSNQnu0hxKGGqxERwsF4aNX+xOaQpjHImwc60zDUa53ZY5MmNwnYFe7v0OWVC2coYRflHrzr67bxKhhbcs5LsrOBdujKh+5w/Wv18cSwGNY2RrqvadNmRMl1m3Lpy7NamUbPV7XTLoGchrKBFqjgYbQc/2TjhmURleQzGDMJuaoc5aCu4b9JgmcEU+BymOPCoQKIZ5uXzOvrAK2M6SbT/lKWl+m9FDtKYhYy8s7irOZ0rxPNyg8xOng5zodLMouLLRpMspjahxa7oWGjkNl54AK6FvyvlM9DArd9o40SbSLbLdzIT46d5hX5Kje+88j5FDTbRD3MGeirhu/NTT1m7oP8ZhVoZPV1k9IcIKX6gy9d0oVWolXVFfo3h6aWdhcO9Trjf2f/wuNV7US10k9wj98kOCckT0iOvyQHpEx7cDXrBm+Bt7U99o75Vby6ttaCquU1ORP3OX+2/HOQ=</latexit> res(r1, r2) = Z S 2 wres(⌘; ✓)e jh⌘,rid⌘ Computed by numerical integration : Implicit neural representation
  15. Physics-Constrained Neural Kernel Ø Kernel function is sum of directed

    and residual kernels – Hyperparameters are jointly optimized by a steepest descent-based algorithm – Solution still satisfies Helmholtz equation – Inference by linear operation based on kernel ridge regression August 22, 2025 15 Implicit neural representation of kernel function with constraint of Helmholtz equation <latexit sha1_base64="0rmx5Ei2B3tRv8KkicAKVISG2OQ=">AAAC4XicfZHNThsxEMedLW1p+hXKsReLqFLVomi3qgKXSqjl0EsFSASQslE068wGK7bXsr2IdLUP0FsFR3iaXssL8DZ481EVUjqSpZ/+87fHM5Nowa0Lw+ta8GDp4aPHy0/qT589f/GysfLqwGa5YdhhmcjMUQIWBVfYcdwJPNIGQSYCD5PRlyp/eILG8kztu7HGnoSh4iln4LzUb7TjEWgN9BOdQr+IJbhjI4sBN2VJ3y/oBm1Z9hvNsBVOgi5CNIMmmcVuf6V2Hg8ylktUjgmwthuF2vUKMI4zgWU9zi1qYCMYYtejAom2V0waLOkbrwxomhl/lKMT9e8bBUhrxzLxzuqX9m6uEv+V6+Yu3ewVXOncoWLTQmkuqMtoNS3qh4DMibEHYIb7v1J2DAaY8zOt3yqTyPXJhGxqfTfb6Ls0+M0rOxoNuMy8K2IwQwmnpe96GK9X9D8jV3Ojp/uM/hEu+Xcsiz90r5WruXVOfo3R3aUtwsGHVtRutfc+Nrc+zxa6TF6TNfKWRGSDbJGvZJd0CCOX5Bf5Ta4CFvwIfgZnU2tQm91ZJbciuLgBJGTucg==</latexit>  = dir + res Directed kernel Residual kernel <latexit sha1_base64="LfIE5/umsVmKZKr2rKhL6DRhj+4=">AAACz3icfZFNbxMxEIad5auEj6Zw5GIRkBCKot2qKhwr6IELaiuRtlI2imad2Y1V27uyZ4GwWtQrXPkjXOk/6b+pN00q2lJGsvT4nXdsjycplHQUhqet4NbtO3fvrdxvP3j46PFqZ+3JvstLK3AgcpXbwwQcKmlwQJIUHhYWQScKD5Kj903+4DNaJ3PziWYFjjRkRqZSAHlp3HkRJ7qKEySoe3zOGWh9saFpkxl3umE/nAe/DtECumwRu+O11q94kotSoyGhwLlhFBY0qsCSFArrdlw6LEAcQYZDjwY0ulE1b6fmL70y4Wlu/TLE5+rfFRVo52Y68U4NNHVXc434r9ywpPTtqJKmKAmNOL8oLRWnnDd/wyfSoiA18wDCSv9WLqZgQZD/wfalaxLda863LnW+m230XVr86JWdAi1Qbl9XMdhMw9fad53FvYb+Z5RmafR0k9EfIrX8hnV1QTdapVlal+THGF0d2nXYX+9Hm/3NvY3u1rvFQFfYM/acvWIRe8O22Ae2ywZMsJ/sN/vDToK94EvwPTg+twatRc1TdimCH2e3guZ0</latexit> , , ✓ Estimation is still achieved by FIR filter in time domain
  16. Physics-Constrained Neural Kernel Ø Numerical experiment: T60: 400 ms, #

    mics: 41, spherical shell array August 22, 2025 16 Ground truth (600 Hz) NN PINN Adaptive kernel NMSE: -6.8 dB NMSE: -16.3 dB NMSE: -24.8dB [Koyama+ 2025] Proposed PCNK Proposed PCNK
  17. Magnitude field estimation August 22, 2025 17 Ø Spatial distribution

    of ATF magnitude from discrete set of measurements of ATF magnitudes, e.g., – Estimating the sound field using signals not synchronized – Estimating the directivity of musical instruments or other vibrating bodies -0.1 -0.05 0 0.05 0.1 0.15 0.2 -1 -0.5 0 0.5 1 1.5 z (m) 0.05 0.1 0.15 0.2 Estimating “magnitude” distribution of acoustic transfer function (ATF) Microphone Target region: Pressure Magnitude
  18. Basis expansion revisited Ø is approximated as linear combination of

    basis functions Ø Least-squares estimation of expansion coefficients Ø Estimation of at target positions August 22, 2025 18 Basis function matrix Expansion coef Basis function matrix Basis expansion as linear autoencoder Encoder Decoder Latent variable Spatially dependent Spatially independent Spatially dependent
  19. Autoencoder conditioned on source and mic positions Ø Nonlinear extension

    of basis expansion-based method – ATF at arbitrary positions of sources and mics and frequencies can be obtained owing to their conditioning – Combining multiple datasets with differenct measurement setups is possible – Retraining is unnecessary in inference; therefore, computationally efficient August 22, 2025 19 &ODPEFS %FDPEFS 1SPUPUZQFT -BUFOUWBSJBCMFT "WFSBHF -PH"5'NBHOJUVEFT Spatially independent Spatially dependent Spatially dependent Nonlinear autoencoder for ATF magnitude estimation [Koyama+ 2025]
  20. Experiments: Setting Ø Dataset preparation – ATF of single room

    using image source method – Room: shoebox-shape, 4.0m x 6.0m x 3.0m – Target region: Cuiboid, 1.0m x 1.0m x 1.0m – Target pos: 1331 pos, every 0.1m – Source pos: 1024 pos, random outside target region – Target freq: 0 - 1000 Hz – Training 820, validation 122, and test 122 – # measurements: 5, 10, 20, and 100 (random) Ø Comparison – Proposed method based on autoencoder – Neural field (NF) – Kernel ridge regression with Gaussian kernel (KRR) August 22, 2025 20 5BSHFUSFHJPO 3PPN
  21. Experiments: Results Ø Average LSD w.r.t. # mics August 22,

    2025 21 Constantly low LSD is achieved by the proposed method
  22. Experiments: Results Ø Estimated ATF mag at center of target

    region when M=5 August 22, 2025 22 Proposed NF KRR
  23. Experiments: Results Ø Magnitude distribution on horizontal plane at 250

    Hz when M=5 August 22, 2025 23 Proposed NF KRR Groundtruth
  24. Conclusion Ø Neural spatial audio processing for sound field analysis

    and control – NNs will provide adaptability to acoustic environment and data-driven prior information – Physics-Constrained Neural Kernel • Implicit neural representation of kernel function • Constraint on Helmholtz equation by plane wave expansion-based representation – Autoencoder conditioned on source and mic positions • Nonlinear extention of basis expansion-based method • Magnitude field estimation by using a very small number of mics – Interested? • Join our special session “Neural spatial audio processing” at ICASSP 2026! August 22, 2025 24 Thank you for your attention!