Slide 1

Slide 1 text

Neural Spatial Audio Processing for Sound Field Analysis and Control Shoichi Koyama National Institute of Informatics, Tokyo, Japan

Slide 2

Slide 2 text

About NII Ø NII is national research institute of informatics in Japan – Main lab is located in central Tokyo – Associated with graduate university called SOKENDAI August 22, 2025 2 Kashiwa Annex NII 29 km Imperial Palace ICASSP 2028 venue

Slide 3

Slide 3 text

August 22, 2025 3 Basic Technologies of Sound Field Estimation and Control VR/AR audio Active noise control Local-field recording and reproduction Signal enhancement Visualization/auralization Room acoustic analysis Our research topics Sound field estimation/control and its applications

Slide 4

Slide 4 text

Sound field estimation August 22, 2025 4 How to estimate distribution of continuous physical quantity of sound from discrete sensor observations? Target region: Microphone Fundamental problem, but very important in various applications

Slide 5

Slide 5 text

Sound field estimation August 22, 2025 5 How to estimate distribution of continuous physical quantity of sound from discrete sensor observations? Estimate pressure distribution with observations at discrete set of mics in the frequency domain Target region: Microphone

Slide 6

Slide 6 text

Sound field estimation Ø Prior work on sound field estimation – Basis expansion-based methods [Colton+ 1992] • Plane wave expansion (or Herglotz wave function) • Spherical wave function expansion • Equivalent source distribution (or single-layer potential) – Infinite-dimensional expansion or kernel regression • Harmonic analysis of infinite order [Ueno+ 2018] • Directionally-weighted kernel regression [Ueno+ 2021] August 22, 2025 6 Comprehensive review is available at • Ueno and Koyama, “Sound Field Estimation: Theories and Applications,” Foundations and Trends® in Signal Processing, 2025.

Slide 7

Slide 7 text

Kernel regression for sound field estimation Ø Function to be interpolated is represented by weighted sum of kernel functions Ø Kernel function to constrain the solution to satisfy the Helmholtz equation – With directional weighting of von Mises‒Fisher distribution – With uniform weighting August 22, 2025 7 Kernel regression with constraint of Helmholtz equation where Kernel function Direction Sharpness

Slide 8

Slide 8 text

Kernel regression for sound field estimation Ø Experimental results using real data from MeshRIR dataset – Reconstructing pulse signal from single loudspeaker w/ 18 mic August 22, 2025 8 Ground truth Kernel regression w/ HE constraint Kernel regression w/ Gaussian kernel (Black dots indicate mic positions) [Koyama+ 2021] Applied to binaural rendering, spatial active noise control, etc.[Ueno+ 2025]

Slide 9

Slide 9 text

Neural spatial audio processing Ø Why neural networks? – Adaptability to acoustic environments • Estimator is fixed regardless of environment in current methods • High representational power of NNs allows adaptation to environment – Data-driven prior information • Data obtained in advance gives rich prior information on environment • High accuracy can be maintained even with extremely small number of mics August 22, 2025 9 • Physics-Constrained Neural Kernel • Autoencoder conditioned on source and sensor positions [Ribeiro+ 2024] [Koyama+ 2025]

Slide 10

Slide 10 text

Related work: Physics-Informed Neural Network Ø Implicit neural representation (or Neural field) – NN implicitly representing continuous function August 22, 2025 10 Input Output [Sitzmann+ 2020] Loss function: Physical properties are not taken into consideration

Slide 11

Slide 11 text

Related work: Physics-Informed Neural Network Ø Physics-informed neural network (PINN) – Implicit neural representation incorporating loss function that evaluates deviation from governing PDE (PDE loss) August 22, 2025 11 Input Output [Raissi+ 2019] Loss function: Penalized

Slide 12

Slide 12 text

Physics-Constrained Neural Kernel Ø Directional weighting function of kernel function is adapted to environment August 22, 2025 12 Implicit neural representation of kernel function with constraint of Helmholtz equation Microphone Directed component Residual component Kernel function based on plane wave expansion [Ribeiro+ 2024]

Slide 13

Slide 13 text

Physics-Constrained Neural Kernel Ø Directed component – Weighted sum of (sparse) von Mises‒Fisher distributions to represent direct sound and early reflections August 22, 2025 13 Implicit neural representation of kernel function with constraint of Helmholtz equation AAADQ3icfZHNbtQwEMed8FWWj27hyMViQdqiZZWgqkVClSrKgQtQJLattNlGjneSWo2dyHaAxfJTcOGFeAiegRviCsJJdle0pYwU6af//GecmUnKnCkdBN88/9LlK1evrVzv3Lh56/Zqd+3OvioqSWFEi7yQhwlRkDMBI810DoelBMKTHA6Sk906f/AepGKFeKdnJUw4yQRLGSXaSXH3y4fYRJzoY8nNlElr+1HCTQSa2Ge4waTmQcsZ4ZzYdbyNI1Xx2Ijt0B69xq0eCxylktBlP7BHbXmdyYnIcsDL7m3Hqa1zsslZa3b7c/+6jbu9YBg0gc9DOIcemsdevOZ9jqYFrTgITXOi1DgMSj0xRGpGXfdOVCkoCT0hGYwdCsJBTUyzQosfOmWK00K6T2jcqH9XGMKVmvHEOevp1NlcLf4rN650+nRimCgrDYK2D6VVjnWB63tgt3SgOp85IFQy96+YHhO3Ru2u1jn1TMIHzWZVqtw0L8BNKeGVU96UIIku5CMTEZlx8tG6qbNoUNP/jEwsjI4uMromjLNPYM2SLrQysbAuyJ0xPHu087D/ZBhuDjffbvR2ns8PuoLuofuoj0K0hXbQS7SHRoii394D77E39L/63/0f/s/W6nvzmrvoVPi//gCenhPw wdir(⌘; , ) = N X n=1 n e n h⌘,dn i C( n) AAACwHicfZFbSxtBFMcnq201vUXFJ18WQ8EWCbulaF8EUR98ERUaFbIhnJ2c3QzOZZ2ZLcZ1P4WfwFf9RH4bZ2Mi9dYDAz/+5z8z5xJnnBkbBHc1b2r63fsPM7P1j58+f/namJs/MirXFNtUcaVPYjDImcS2ZZbjSaYRRMzxOD7drvLHf1EbpuQfO8ywKyCVLGEUrJN6jcWV6DKKRRGlIASU0WUv3Ai/9xrNoBWMwn8J4RiaZBwHvbnaVdRXNBcoLeVgTCcMMtstQFtGOZb1KDeYAT2FFDsOJQg03WJUf+l/c0rfT5R2R1p/pP57owBhzFDEzinADszzXCW+luvkNvndLZjMcouSPnyU5Ny3yq+G4feZRmr50AFQzVytPh2ABmrdyOpPvonFavW+Nolx3eyg61LjnlP2M9Rglf5RRKBTAeel6zqNViv6n5HJidHRW0b3CBPsAsvikd60MjmxTsitMXy+tJdw9LMVrrXWDn81N7fGC50hS2SZrJCQrJNNsksOSJtQUpBrckNuvS1v4Cnv7MHq1cZ3FsiT8C7uAZQE35U= (k k1 = 1) AAADfHiclZFdb9MwFIadlo9RPtaNS24sClILXZVUqONm0sS44AYY0rpNqrvIcZ3Ua+xktoMoln8Fv4xL/gzCSVPENobEkSI9Ouc95+T4jfKUKe37P7xG89btO3c37rXuP3j4aLO9tX2sskISOiZZmsnTCCuaMkHHmumUnuaSYh6l9CRaHJT1k89UKpaJI73M6ZTjRLCYEaxdKmx/Rwuc5zg0iGM9l9zMmLS2iyJupA2Dfg3DHtyDSBU8NGIvsGcfIEow5zgUsIViiYk5D32U0lh3nexCagO764nnFkVUY1iOQg4s3IGLeq4JhrZ3tlquYnNk7X/0QYskS+a6Z81Bt9KGomfDdscf+FXA6xDU0AF1HIZb3jc0y0jBqdAkxUpNAj/XU4OlZiSltoUKRXNMFjihE4cCc6qmpnp7C5+7zAzGmXSf0LDK/tlhMFdqySOnrI68WiuTf6tNCh2/nhom8kJTQVaL4iKFOoOlkdAZRYlOlw4wkcz9KyRz7KzQzu7WpTUR71dvqmLlrnlL3ZWSvneZjzmVWGfyhUFYJhx/se7qBPVL+peQibXQ0U1CN4Rx9pVa85tulDKxlq7J2RhcNe06HA8HwWgw+vSqs/+mNnQDPAFPQRcEYBfsg3fgEIwB8Xa9qRd7SeNn81nzZXNnJW14dc9jcCmao1839ST1 dir(r1, r2) = N X n=1 n j0 ⇣p (j ⌘ kr12)T(j ⌘ kr12) ⌘ C( n) Sparsity constraint Normalization const

Slide 14

Slide 14 text

Physics-Constrained Neural Kernel Ø Residual component – Implicit neural representation to represent late reverberation August 22, 2025 14 Implicit neural representation of kernel function with constraint of Helmholtz equation AAAC+XicfZHdahNBFMcn61eNX6leejMYhCol7BaphSIU9cIbYwXTFrIhnJ2cTYbuzC4zZ9U47FP4BN6Jt/oyeqsP4uw2EdsaDwz85n/+83HOSYpMWgrD763gwsVLl6+sXW1fu37j5q3O+u0Dm5dG4EDkWW6OErCYSY0DkpThUWEQVJLhYXL8rM4fvkVjZa7f0LzAkYKplqkUQF4ad/rvxi5WQDOjnEFbVRtxolyMBNUub5Bm9eYBf8KXvn5/pWvc6Ya9sAl+HqIFdNki9sfrrY/xJBelQk0iA2uHUVjQyIEhKTKs2nFpsQBxDFMcetSg0I5cU3jF73tlwtPc+KWJN+rfJxwoa+cq8c767/Zsrhb/lRuWlO6MnNRFSajFyUNpmXHKed1FPpEGBWVzDyCM9H/lYgYGBPlet089k6jNpm82tb6a5+irNPjSK68KNEC5eehiMFMF7ytf9TTerOl/RqmXRk+rjP4SqeQHrNwfWmmVemldkh9jdHZo5+Fgqxdt97ZfP+ruPV0MdI3dZffYBovYY7bHXrB9NmCCfWM/2E/2K3DBp+Bz8OXEGrQWZ+6wUxF8/Q016fgt wres(⌘; ✓) = NN(⌘; ✓) AAADX3icfZHfihMxFMYzrbpr1bWrV+pNsAi7UktnkVUQoagXKogr2t2FppYz6WkbO8kMSUatQ57CV/FlvPRNzEynxf3ngYHffOc7OTk5URoLY7vd30GtfunylY3Nq41r129s3Wxu3zo0SaY59nkSJ/o4AoOxUNi3wsZ4nGoEGcV4FM1fFvmjr6iNSNQnu0hxKGGqxERwsF4aNX+xOaQpjHImwc60zDUa53ZY5MmNwnYFe7v0OWVC2coYRflHrzr67bxKhhbcs5LsrOBdujKh+5w/Wv18cSwGNY2RrqvadNmRMl1m3Lpy7NamUbPV7XTLoGchrKBFqjgYbQc/2TjhmURleQzGDMJuaoc5aCu4b9JgmcEU+BymOPCoQKIZ5uXzOvrAK2M6SbT/lKWl+m9FDtKYhYy8s7irOZ0rxPNyg8xOng5zodLMouLLRpMspjahxa7oWGjkNl54AK6FvyvlM9DArd9o40SbSLbLdzIT46d5hX5Kje+88j5FDTbRD3MGeirhu/NTT1m7oP8ZhVoZPV1k9IcIKX6gy9d0oVWolXVFfo3h6aWdhcO9Trjf2f/wuNV7US10k9wj98kOCckT0iOvyQHpEx7cDXrBm+Bt7U99o75Vby6ttaCquU1ORP3OX+2/HOQ= res(r1, r2) = Z S 2 wres(⌘; ✓)e jh⌘,rid⌘ Computed by numerical integration : Implicit neural representation

Slide 15

Slide 15 text

Physics-Constrained Neural Kernel Ø Kernel function is sum of directed and residual kernels – Hyperparameters are jointly optimized by a steepest descent-based algorithm – Solution still satisfies Helmholtz equation – Inference by linear operation based on kernel ridge regression August 22, 2025 15 Implicit neural representation of kernel function with constraint of Helmholtz equation AAAC4XicfZHNThsxEMedLW1p+hXKsReLqFLVomi3qgKXSqjl0EsFSASQslE068wGK7bXsr2IdLUP0FsFR3iaXssL8DZ481EVUjqSpZ/+87fHM5Nowa0Lw+ta8GDp4aPHy0/qT589f/GysfLqwGa5YdhhmcjMUQIWBVfYcdwJPNIGQSYCD5PRlyp/eILG8kztu7HGnoSh4iln4LzUb7TjEWgN9BOdQr+IJbhjI4sBN2VJ3y/oBm1Z9hvNsBVOgi5CNIMmmcVuf6V2Hg8ylktUjgmwthuF2vUKMI4zgWU9zi1qYCMYYtejAom2V0waLOkbrwxomhl/lKMT9e8bBUhrxzLxzuqX9m6uEv+V6+Yu3ewVXOncoWLTQmkuqMtoNS3qh4DMibEHYIb7v1J2DAaY8zOt3yqTyPXJhGxqfTfb6Ls0+M0rOxoNuMy8K2IwQwmnpe96GK9X9D8jV3Ojp/uM/hEu+Xcsiz90r5WruXVOfo3R3aUtwsGHVtRutfc+Nrc+zxa6TF6TNfKWRGSDbJGvZJd0CCOX5Bf5Ta4CFvwIfgZnU2tQm91ZJbciuLgBJGTucg==  = dir + res Directed kernel Residual kernel AAACz3icfZFNbxMxEIad5auEj6Zw5GIRkBCKot2qKhwr6IELaiuRtlI2imad2Y1V27uyZ4GwWtQrXPkjXOk/6b+pN00q2lJGsvT4nXdsjycplHQUhqet4NbtO3fvrdxvP3j46PFqZ+3JvstLK3AgcpXbwwQcKmlwQJIUHhYWQScKD5Kj903+4DNaJ3PziWYFjjRkRqZSAHlp3HkRJ7qKEySoe3zOGWh9saFpkxl3umE/nAe/DtECumwRu+O11q94kotSoyGhwLlhFBY0qsCSFArrdlw6LEAcQYZDjwY0ulE1b6fmL70y4Wlu/TLE5+rfFRVo52Y68U4NNHVXc434r9ywpPTtqJKmKAmNOL8oLRWnnDd/wyfSoiA18wDCSv9WLqZgQZD/wfalaxLda863LnW+m230XVr86JWdAi1Qbl9XMdhMw9fad53FvYb+Z5RmafR0k9EfIrX8hnV1QTdapVlal+THGF0d2nXYX+9Hm/3NvY3u1rvFQFfYM/acvWIRe8O22Ae2ywZMsJ/sN/vDToK94EvwPTg+twatRc1TdimCH2e3guZ0 , , ✓ Estimation is still achieved by FIR filter in time domain

Slide 16

Slide 16 text

Physics-Constrained Neural Kernel Ø Numerical experiment: T60: 400 ms, # mics: 41, spherical shell array August 22, 2025 16 Ground truth (600 Hz) NN PINN Adaptive kernel NMSE: -6.8 dB NMSE: -16.3 dB NMSE: -24.8dB [Koyama+ 2025] Proposed PCNK Proposed PCNK

Slide 17

Slide 17 text

Magnitude field estimation August 22, 2025 17 Ø Spatial distribution of ATF magnitude from discrete set of measurements of ATF magnitudes, e.g., – Estimating the sound field using signals not synchronized – Estimating the directivity of musical instruments or other vibrating bodies -0.1 -0.05 0 0.05 0.1 0.15 0.2 -1 -0.5 0 0.5 1 1.5 z (m) 0.05 0.1 0.15 0.2 Estimating “magnitude” distribution of acoustic transfer function (ATF) Microphone Target region: Pressure Magnitude

Slide 18

Slide 18 text

Basis expansion revisited Ø is approximated as linear combination of basis functions Ø Least-squares estimation of expansion coefficients Ø Estimation of at target positions August 22, 2025 18 Basis function matrix Expansion coef Basis function matrix Basis expansion as linear autoencoder Encoder Decoder Latent variable Spatially dependent Spatially independent Spatially dependent

Slide 19

Slide 19 text

Autoencoder conditioned on source and mic positions Ø Nonlinear extension of basis expansion-based method – ATF at arbitrary positions of sources and mics and frequencies can be obtained owing to their conditioning – Combining multiple datasets with differenct measurement setups is possible – Retraining is unnecessary in inference; therefore, computationally efficient August 22, 2025 19 &ODPEFS %FDPEFS 1SPUPUZQFT -BUFOUWBSJBCMFT "WFSBHF -PH"5'NBHOJUVEFT Spatially independent Spatially dependent Spatially dependent Nonlinear autoencoder for ATF magnitude estimation [Koyama+ 2025]

Slide 20

Slide 20 text

Experiments: Setting Ø Dataset preparation – ATF of single room using image source method – Room: shoebox-shape, 4.0m x 6.0m x 3.0m – Target region: Cuiboid, 1.0m x 1.0m x 1.0m – Target pos: 1331 pos, every 0.1m – Source pos: 1024 pos, random outside target region – Target freq: 0 - 1000 Hz – Training 820, validation 122, and test 122 – # measurements: 5, 10, 20, and 100 (random) Ø Comparison – Proposed method based on autoencoder – Neural field (NF) – Kernel ridge regression with Gaussian kernel (KRR) August 22, 2025 20 5BSHFUSFHJPO 3PPN

Slide 21

Slide 21 text

Experiments: Results Ø Average LSD w.r.t. # mics August 22, 2025 21 Constantly low LSD is achieved by the proposed method

Slide 22

Slide 22 text

Experiments: Results Ø Estimated ATF mag at center of target region when M=5 August 22, 2025 22 Proposed NF KRR

Slide 23

Slide 23 text

Experiments: Results Ø Magnitude distribution on horizontal plane at 250 Hz when M=5 August 22, 2025 23 Proposed NF KRR Groundtruth

Slide 24

Slide 24 text

Conclusion Ø Neural spatial audio processing for sound field analysis and control – NNs will provide adaptability to acoustic environment and data-driven prior information – Physics-Constrained Neural Kernel • Implicit neural representation of kernel function • Constraint on Helmholtz equation by plane wave expansion-based representation – Autoencoder conditioned on source and mic positions • Nonlinear extention of basis expansion-based method • Magnitude field estimation by using a very small number of mics – Interested? • Join our special session “Neural spatial audio processing” at ICASSP 2026! August 22, 2025 24 Thank you for your attention!