Neural Spatial Audio Processing for Sound Field Analysis and Control

Neural Spatial Audio Processing for Sound Field Analysis and Control
Shoichi Koyama National Institute of Informatics, Tokyo, Japan

About NII Ø NII is national research institute of informatics
in Japan – Main lab is located in central Tokyo – Associated with graduate university called SOKENDAI August 22, 2025 2 Kashiwa Annex NII 29 km Imperial Palace ICASSP 2028 venue

August 22, 2025 3 Basic Technologies of Sound Field Estimation
and Control VR/AR audio Active noise control Local-ﬁeld recording and reproduction Signal enhancement Visualization/auralization Room acoustic analysis Our research topics Sound field estimation/control and its applications

Sound field estimation August 22, 2025 4 How to estimate
distribution of continuous physical quantity of sound from discrete sensor observations? Target region: Microphone Fundamental problem, but very important in various applications

Sound field estimation August 22, 2025 5 How to estimate
distribution of continuous physical quantity of sound from discrete sensor observations? Estimate pressure distribution with observations at discrete set of mics in the frequency domain Target region: Microphone

Sound field estimation Ø Prior work on sound field estimation
– Basis expansion-based methods [Colton+ 1992] • Plane wave expansion (or Herglotz wave function) • Spherical wave function expansion • Equivalent source distribution (or single-layer potential) – Infinite-dimensional expansion or kernel regression • Harmonic analysis of infinite order [Ueno+ 2018] • Directionally-weighted kernel regression [Ueno+ 2021] August 22, 2025 6 Comprehensive review is available at • Ueno and Koyama, “Sound Field Estimation: Theories and Applications,” Foundations and Trends® in Signal Processing, 2025.

Kernel regression for sound field estimation Ø Function to be
interpolated is represented by weighted sum of kernel functions Ø Kernel function to constrain the solution to satisfy the Helmholtz equation – With directional weighting of von Mises‒Fisher distribution – With uniform weighting August 22, 2025 7 Kernel regression with constraint of Helmholtz equation where Kernel function Direction Sharpness

Kernel regression for sound field estimation Ø Experimental results using
real data from MeshRIR dataset – Reconstructing pulse signal from single loudspeaker w/ 18 mic August 22, 2025 8 Ground truth Kernel regression w/ HE constraint Kernel regression w/ Gaussian kernel (Black dots indicate mic positions) [Koyama+ 2021] Applied to binaural rendering, spatial active noise control, etc.[Ueno+ 2025]

Neural spatial audio processing Ø Why neural networks? – Adaptability
to acoustic environments • Estimator is fixed regardless of environment in current methods • High representational power of NNs allows adaptation to environment – Data-driven prior information • Data obtained in advance gives rich prior information on environment • High accuracy can be maintained even with extremely small number of mics August 22, 2025 9 • Physics-Constrained Neural Kernel • Autoencoder conditioned on source and sensor positions [Ribeiro+ 2024] [Koyama+ 2025]

Related work: Physics-Informed Neural Network Ø Implicit neural representation (or
Neural field) – NN implicitly representing continuous function August 22, 2025 10 Input Output [Sitzmann+ 2020] Loss function: Physical properties are not taken into consideration

Related work: Physics-Informed Neural Network Ø Physics-informed neural network (PINN)
– Implicit neural representation incorporating loss function that evaluates deviation from governing PDE (PDE loss) August 22, 2025 11 Input Output [Raissi+ 2019] Loss function: Penalized

Physics-Constrained Neural Kernel Ø Directional weighting function of kernel function
is adapted to environment August 22, 2025 12 Implicit neural representation of kernel function with constraint of Helmholtz equation Microphone Directed component Residual component Kernel function based on plane wave expansion [Ribeiro+ 2024]

Physics-Constrained Neural Kernel Ø Directed component – Weighted sum of
(sparse) von Mises‒Fisher distributions to represent direct sound and early reflections August 22, 2025 13 Implicit neural representation of kernel function with constraint of Helmholtz equation <latexit sha1_base64="I2jQ2Fgmjq+z5rZFxQfy+e9Ar9o=">AAADQ3icfZHNbtQwEMed8FWWj27hyMViQdqiZZWgqkVClSrKgQtQJLattNlGjneSWo2dyHaAxfJTcOGFeAiegRviCsJJdle0pYwU6af//GecmUnKnCkdBN88/9LlK1evrVzv3Lh56/Zqd+3OvioqSWFEi7yQhwlRkDMBI810DoelBMKTHA6Sk906f/AepGKFeKdnJUw4yQRLGSXaSXH3y4fYRJzoY8nNlElr+1HCTQSa2Ge4waTmQcsZ4ZzYdbyNI1Xx2Ijt0B69xq0eCxylktBlP7BHbXmdyYnIcsDL7m3Hqa1zsslZa3b7c/+6jbu9YBg0gc9DOIcemsdevOZ9jqYFrTgITXOi1DgMSj0xRGpGXfdOVCkoCT0hGYwdCsJBTUyzQosfOmWK00K6T2jcqH9XGMKVmvHEOevp1NlcLf4rN650+nRimCgrDYK2D6VVjnWB63tgt3SgOp85IFQy96+YHhO3Ru2u1jn1TMIHzWZVqtw0L8BNKeGVU96UIIku5CMTEZlx8tG6qbNoUNP/jEwsjI4uMromjLNPYM2SLrQysbAuyJ0xPHu087D/ZBhuDjffbvR2ns8PuoLuofuoj0K0hXbQS7SHRoii394D77E39L/63/0f/s/W6nvzmrvoVPi//gCenhPw</latexit> wdir(⌘; , ) = N X n=1 n e n h⌘,dn i C( n) <latexit sha1_base64="wce9c2fonwWBBzEGuOJL2nk9Ius=">AAACwHicfZFbSxtBFMcnq201vUXFJ18WQ8EWCbulaF8EUR98ERUaFbIhnJ2c3QzOZZ2ZLcZ1P4WfwFf9RH4bZ2Mi9dYDAz/+5z8z5xJnnBkbBHc1b2r63fsPM7P1j58+f/namJs/MirXFNtUcaVPYjDImcS2ZZbjSaYRRMzxOD7drvLHf1EbpuQfO8ywKyCVLGEUrJN6jcWV6DKKRRGlIASU0WUv3Ai/9xrNoBWMwn8J4RiaZBwHvbnaVdRXNBcoLeVgTCcMMtstQFtGOZb1KDeYAT2FFDsOJQg03WJUf+l/c0rfT5R2R1p/pP57owBhzFDEzinADszzXCW+luvkNvndLZjMcouSPnyU5Ny3yq+G4feZRmr50AFQzVytPh2ABmrdyOpPvonFavW+Nolx3eyg61LjnlP2M9Rglf5RRKBTAeel6zqNViv6n5HJidHRW0b3CBPsAsvikd60MjmxTsitMXy+tJdw9LMVrrXWDn81N7fGC50hS2SZrJCQrJNNsksOSJtQUpBrckNuvS1v4Cnv7MHq1cZ3FsiT8C7uAZQE35U=</latexit> (k k1 = 1) <latexit sha1_base64="44H961efgdVBY4JkEGr9+jx5ReM=">AAADfHiclZFdb9MwFIadlo9RPtaNS24sClILXZVUqONm0sS44AYY0rpNqrvIcZ3Ua+xktoMoln8Fv4xL/gzCSVPENobEkSI9Ouc95+T4jfKUKe37P7xG89btO3c37rXuP3j4aLO9tX2sskISOiZZmsnTCCuaMkHHmumUnuaSYh6l9CRaHJT1k89UKpaJI73M6ZTjRLCYEaxdKmx/Rwuc5zg0iGM9l9zMmLS2iyJupA2Dfg3DHtyDSBU8NGIvsGcfIEow5zgUsIViiYk5D32U0lh3nexCagO764nnFkVUY1iOQg4s3IGLeq4JhrZ3tlquYnNk7X/0QYskS+a6Z81Bt9KGomfDdscf+FXA6xDU0AF1HIZb3jc0y0jBqdAkxUpNAj/XU4OlZiSltoUKRXNMFjihE4cCc6qmpnp7C5+7zAzGmXSf0LDK/tlhMFdqySOnrI68WiuTf6tNCh2/nhom8kJTQVaL4iKFOoOlkdAZRYlOlw4wkcz9KyRz7KzQzu7WpTUR71dvqmLlrnlL3ZWSvneZjzmVWGfyhUFYJhx/se7qBPVL+peQibXQ0U1CN4Rx9pVa85tulDKxlq7J2RhcNe06HA8HwWgw+vSqs/+mNnQDPAFPQRcEYBfsg3fgEIwB8Xa9qRd7SeNn81nzZXNnJW14dc9jcCmao1839ST1</latexit> dir(r1, r2) = N X n=1 n j0 ⇣p (j ⌘ kr12)T(j ⌘ kr12) ⌘ C( n) Sparsity constraint Normalization const

Physics-Constrained Neural Kernel Ø Residual component – Implicit neural representation
to represent late reverberation August 22, 2025 14 Implicit neural representation of kernel function with constraint of Helmholtz equation <latexit sha1_base64="s4P1x3nuyJ2mvTCbZ2ZsvG/WxFE=">AAAC+XicfZHdahNBFMcn61eNX6leejMYhCol7BaphSIU9cIbYwXTFrIhnJ2cTYbuzC4zZ9U47FP4BN6Jt/oyeqsP4uw2EdsaDwz85n/+83HOSYpMWgrD763gwsVLl6+sXW1fu37j5q3O+u0Dm5dG4EDkWW6OErCYSY0DkpThUWEQVJLhYXL8rM4fvkVjZa7f0LzAkYKplqkUQF4ad/rvxi5WQDOjnEFbVRtxolyMBNUub5Bm9eYBf8KXvn5/pWvc6Ya9sAl+HqIFdNki9sfrrY/xJBelQk0iA2uHUVjQyIEhKTKs2nFpsQBxDFMcetSg0I5cU3jF73tlwtPc+KWJN+rfJxwoa+cq8c767/Zsrhb/lRuWlO6MnNRFSajFyUNpmXHKed1FPpEGBWVzDyCM9H/lYgYGBPlet089k6jNpm82tb6a5+irNPjSK68KNEC5eehiMFMF7ytf9TTerOl/RqmXRk+rjP4SqeQHrNwfWmmVemldkh9jdHZo5+Fgqxdt97ZfP+ruPV0MdI3dZffYBovYY7bHXrB9NmCCfWM/2E/2K3DBp+Bz8OXEGrQWZ+6wUxF8/Q016fgt</latexit> wres(⌘; ✓) = NN(⌘; ✓) <latexit sha1_base64="dexZadk2Poc298GGiqVC2ayTShI=">AAADX3icfZHfihMxFMYzrbpr1bWrV+pNsAi7UktnkVUQoagXKogr2t2FppYz6WkbO8kMSUatQ57CV/FlvPRNzEynxf3ngYHffOc7OTk5URoLY7vd30GtfunylY3Nq41r129s3Wxu3zo0SaY59nkSJ/o4AoOxUNi3wsZ4nGoEGcV4FM1fFvmjr6iNSNQnu0hxKGGqxERwsF4aNX+xOaQpjHImwc60zDUa53ZY5MmNwnYFe7v0OWVC2coYRflHrzr67bxKhhbcs5LsrOBdujKh+5w/Wv18cSwGNY2RrqvadNmRMl1m3Lpy7NamUbPV7XTLoGchrKBFqjgYbQc/2TjhmURleQzGDMJuaoc5aCu4b9JgmcEU+BymOPCoQKIZ5uXzOvrAK2M6SbT/lKWl+m9FDtKYhYy8s7irOZ0rxPNyg8xOng5zodLMouLLRpMspjahxa7oWGjkNl54AK6FvyvlM9DArd9o40SbSLbLdzIT46d5hX5Kje+88j5FDTbRD3MGeirhu/NTT1m7oP8ZhVoZPV1k9IcIKX6gy9d0oVWolXVFfo3h6aWdhcO9Trjf2f/wuNV7US10k9wj98kOCckT0iOvyQHpEx7cDXrBm+Bt7U99o75Vby6ttaCquU1ORP3OX+2/HOQ=</latexit> res(r1, r2) = Z S 2 wres(⌘; ✓)e jh⌘,rid⌘ Computed by numerical integration : Implicit neural representation

Physics-Constrained Neural Kernel Ø Kernel function is sum of directed
and residual kernels – Hyperparameters are jointly optimized by a steepest descent-based algorithm – Solution still satisfies Helmholtz equation – Inference by linear operation based on kernel ridge regression August 22, 2025 15 Implicit neural representation of kernel function with constraint of Helmholtz equation <latexit sha1_base64="0rmx5Ei2B3tRv8KkicAKVISG2OQ=">AAAC4XicfZHNThsxEMedLW1p+hXKsReLqFLVomi3qgKXSqjl0EsFSASQslE068wGK7bXsr2IdLUP0FsFR3iaXssL8DZ481EVUjqSpZ/+87fHM5Nowa0Lw+ta8GDp4aPHy0/qT589f/GysfLqwGa5YdhhmcjMUQIWBVfYcdwJPNIGQSYCD5PRlyp/eILG8kztu7HGnoSh4iln4LzUb7TjEWgN9BOdQr+IJbhjI4sBN2VJ3y/oBm1Z9hvNsBVOgi5CNIMmmcVuf6V2Hg8ylktUjgmwthuF2vUKMI4zgWU9zi1qYCMYYtejAom2V0waLOkbrwxomhl/lKMT9e8bBUhrxzLxzuqX9m6uEv+V6+Yu3ewVXOncoWLTQmkuqMtoNS3qh4DMibEHYIb7v1J2DAaY8zOt3yqTyPXJhGxqfTfb6Ls0+M0rOxoNuMy8K2IwQwmnpe96GK9X9D8jV3Ojp/uM/hEu+Xcsiz90r5WruXVOfo3R3aUtwsGHVtRutfc+Nrc+zxa6TF6TNfKWRGSDbJGvZJd0CCOX5Bf5Ta4CFvwIfgZnU2tQm91ZJbciuLgBJGTucg==</latexit>  = dir + res Directed kernel Residual kernel <latexit sha1_base64="LfIE5/umsVmKZKr2rKhL6DRhj+4=">AAACz3icfZFNbxMxEIad5auEj6Zw5GIRkBCKot2qKhwr6IELaiuRtlI2imad2Y1V27uyZ4GwWtQrXPkjXOk/6b+pN00q2lJGsvT4nXdsjycplHQUhqet4NbtO3fvrdxvP3j46PFqZ+3JvstLK3AgcpXbwwQcKmlwQJIUHhYWQScKD5Kj903+4DNaJ3PziWYFjjRkRqZSAHlp3HkRJ7qKEySoe3zOGWh9saFpkxl3umE/nAe/DtECumwRu+O11q94kotSoyGhwLlhFBY0qsCSFArrdlw6LEAcQYZDjwY0ulE1b6fmL70y4Wlu/TLE5+rfFRVo52Y68U4NNHVXc434r9ywpPTtqJKmKAmNOL8oLRWnnDd/wyfSoiA18wDCSv9WLqZgQZD/wfalaxLda863LnW+m230XVr86JWdAi1Qbl9XMdhMw9fad53FvYb+Z5RmafR0k9EfIrX8hnV1QTdapVlal+THGF0d2nXYX+9Hm/3NvY3u1rvFQFfYM/acvWIRe8O22Ae2ywZMsJ/sN/vDToK94EvwPTg+twatRc1TdimCH2e3guZ0</latexit> , , ✓ Estimation is still achieved by FIR ﬁlter in time domain

Physics-Constrained Neural Kernel Ø Numerical experiment: T60: 400 ms, #
mics: 41, spherical shell array August 22, 2025 16 Ground truth (600 Hz) NN PINN Adaptive kernel NMSE: -6.8 dB NMSE: -16.3 dB NMSE: -24.8dB [Koyama+ 2025] Proposed PCNK Proposed PCNK

Magnitude ﬁeld estimation August 22, 2025 17 Ø Spatial distribution
of ATF magnitude from discrete set of measurements of ATF magnitudes, e.g., – Estimating the sound field using signals not synchronized – Estimating the directivity of musical instruments or other vibrating bodies -0.1 -0.05 0 0.05 0.1 0.15 0.2 -1 -0.5 0 0.5 1 1.5 z (m) 0.05 0.1 0.15 0.2 Estimating “magnitude” distribution of acoustic transfer function (ATF) Microphone Target region: Pressure Magnitude

Basis expansion revisited Ø is approximated as linear combination of
basis functions Ø Least-squares estimation of expansion coefficients Ø Estimation of at target positions August 22, 2025 18 Basis function matrix Expansion coef Basis function matrix Basis expansion as linear autoencoder Encoder Decoder Latent variable Spatially dependent Spatially independent Spatially dependent

Autoencoder conditioned on source and mic positions Ø Nonlinear extension
of basis expansion-based method – ATF at arbitrary positions of sources and mics and frequencies can be obtained owing to their conditioning – Combining multiple datasets with diﬀerenct measurement setups is possible – Retraining is unnecessary in inference; therefore, computationally eﬃcient August 22, 2025 19 &ODPEFS %FDPEFS 1SPUPUZQFT -BUFOUWBSJBCMFT "WFSBHF -PH"5'NBHOJUVEFT Spatially independent Spatially dependent Spatially dependent Nonlinear autoencoder for ATF magnitude estimation [Koyama+ 2025]

Experiments: Setting Ø Dataset preparation – ATF of single room
using image source method – Room: shoebox-shape, 4.0m x 6.0m x 3.0m – Target region: Cuiboid, 1.0m x 1.0m x 1.0m – Target pos: 1331 pos, every 0.1m – Source pos: 1024 pos, random outside target region – Target freq: 0 - 1000 Hz – Training 820, validation 122, and test 122 – # measurements: 5, 10, 20, and 100 (random) Ø Comparison – Proposed method based on autoencoder – Neural field (NF) – Kernel ridge regression with Gaussian kernel (KRR) August 22, 2025 20 5BSHFUSFHJPO 3PPN

Experiments: Results Ø Average LSD w.r.t. # mics August 22,
2025 21 Constantly low LSD is achieved by the proposed method

Experiments: Results Ø Estimated ATF mag at center of target
region when M=5 August 22, 2025 22 Proposed NF KRR

Experiments: Results Ø Magnitude distribution on horizontal plane at 250
Hz when M=5 August 22, 2025 23 Proposed NF KRR Groundtruth

Conclusion Ø Neural spatial audio processing for sound field analysis
and control – NNs will provide adaptability to acoustic environment and data-driven prior information – Physics-Constrained Neural Kernel • Implicit neural representation of kernel function • Constraint on Helmholtz equation by plane wave expansion-based representation – Autoencoder conditioned on source and mic positions • Nonlinear extention of basis expansion-based method • Magnitude field estimation by using a very small number of mics – Interested? • Join our special session “Neural spatial audio processing” at ICASSP 2026! August 22, 2025 24 Thank you for your attention!

Neural Spatial Audio Processing for Sound Field...

Neural Spatial Audio Processing for Sound Field Analysis and Control

NII S. Koyama's Lab

More Decks by NII S. Koyama's Lab

Other Decks in Research

Featured

Transcript

Neural Spatial Audio Processing for Sound Field Analysis and Control

About NII Ø NII is national research institute of informatics

August 22, 2025 3 Basic Technologies of Sound Field Estimation

Sound field estimation August 22, 2025 4 How to estimate

Sound field estimation August 22, 2025 5 How to estimate

Sound field estimation Ø Prior work on sound field estimation

Kernel regression for sound field estimation Ø Function to be

Kernel regression for sound field estimation Ø Experimental results using

Neural spatial audio processing Ø Why neural networks? – Adaptability

Related work: Physics-Informed Neural Network Ø Implicit neural representation (or

Related work: Physics-Informed Neural Network Ø Physics-informed neural network (PINN)

Physics-Constrained Neural Kernel Ø Directional weighting function of kernel function

Physics-Constrained Neural Kernel Ø Directed component – Weighted sum of

Physics-Constrained Neural Kernel Ø Residual component – Implicit neural representation

Physics-Constrained Neural Kernel Ø Kernel function is sum of directed

Physics-Constrained Neural Kernel Ø Numerical experiment: T60: 400 ms, #

Magnitude ﬁeld estimation August 22, 2025 17 Ø Spatial distribution

Basis expansion revisited Ø is approximated as linear combination of

Autoencoder conditioned on source and mic positions Ø Nonlinear extension

Experiments: Setting Ø Dataset preparation – ATF of single room

Experiments: Results Ø Average LSD w.r.t. # mics August 22,

Experiments: Results Ø Estimated ATF mag at center of target

Experiments: Results Ø Magnitude distribution on horizontal plane at 250

Conclusion Ø Neural spatial audio processing for sound field analysis