Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Comparison of modal versus delay-and-sum beamforming in the context of data-based binaural synthesis

Comparison of modal versus delay-and-sum beamforming in the context of data-based binaural synthesis

Talk given at the 132nd Convention of the Audio Engineering Society presenting the paper
S. Spors, H. Wierstorf, and M. Geier. Comparison of modal versus delay-and-sum beamforming in the context of data-based binaural synthesis. In 132nd AES Convention. Audio Engineering Society (AES), April 2012.

Sascha Spors

May 09, 2012
Tweet

More Decks by Sascha Spors

Other Decks in Research

Transcript

  1. Comparison of Modal versus Delay-and-Sum Beamforming
    in the Context of Data-Based Binaural Synthesis
    Sascha Spors, Hagen Wierstorf and Matthias Geier
    Quality and Usability Lab
    Deutsche Telekom Laboratories
    Technische Universität Berlin
    132nd Convention of the Audio Engineering Society
    Budapest, April 2012

    View Slide

  2. Introduction Beamforming Techniques Results Conclusions
    Dynamic Data-based Binaural Synthesis
    [Duraiswami et al. 2005, Li et al. 2006, Melchior et al. 2009, ...]
    capture and decompose sound field into plane waves
    filter with far-field head-related transfer functions (HRTFs)
    plane wave
    decomposition
    Primary
    Source
    ¯
    HL
    (φ, θ, γ, δ, ω)
    ¯
    HR
    (φ, θ, γ, δ, ω)
    x
    y
    M
    P(x, ω) ¯
    P(φ, θ, ω)
    (γ, δ)
    (φ, θ)
    analysis by spherical microphone arrays is subject to practical limitations
    perceptual impact of limitations not fully investigated
    S.Spors, H.Wierstorf and M.Geier
    Comparison of data-based binaural synthesis
    132ndAES
    1 / 15

    View Slide

  3. Introduction Beamforming Techniques Results Conclusions
    Previous Studies
    Influence of spatial bandlimitation and sampling in modal beamforming
    interaural cross correlation (ICC) [Rafaely et al. 2010]
    localization (ITD/ILD) [Avni et al. 2010]
    listening experiment incl. dual radius cardioid array [Melchior et al. 2009]
    localization/coloration w/o sampling [Spors et al. 2012]
    ...
    Aims of this paper
    comparison of two beamforming techniques
    separate consideration of spatial bandwidth and sampling
    localization properties estimated by binaural model
    S.Spors, H.Wierstorf and M.Geier
    Comparison of data-based binaural synthesis
    132ndAES
    2 / 15

    View Slide

  4. Introduction Beamforming Techniques Results Conclusions
    Spatial Dimensionality of Sound Fields
    Summary of concept from [Kennedy et al. 2007]
    A finite number of orthogonal components is sufficient to represent a multipath sound
    field with bounded error within a bounded source-free region
    Quantitative results for a spherical region of radius R
    spherical harmonics series expansion truncated to order N
    field truncation error ǫN decays exponentially with increasing N
    N increases linearly with frequency f and radius R for fixed ǫ
    Example for human head
    R = 9 cm, full audio bandwidth f = 20 kHz
    scattering around human head neglected
    ǫ < 67.8% for N > 45 ⇒ M ≥ 2116 sensors required
    S.Spors, H.Wierstorf and M.Geier
    Comparison of data-based binaural synthesis
    132ndAES
    3 / 15

    View Slide

  5. Introduction Beamforming Techniques Results Conclusions
    Principle of Modal Beamforming
    modal beamforming
    plane wave
    decomposition
    spherical harmonics
    decomposition
    N
    M
    P(x, ω) ˚
    Pm
    n
    (ω)
    ¯
    P(φ, θ, ω)
    (dual) open/rigid spheres with pressure/cardioid microphones
    representation of captured sound field w.r.t. surface spherical harmonics
    plane wave decomposition from spherical harmonics expansion
    issues with numerical conditioning and complexity
    S.Spors, H.Wierstorf and M.Geier
    Comparison of data-based binaural synthesis
    132ndAES
    4 / 15

    View Slide

  6. Introduction Beamforming Techniques Results Conclusions
    Practical Limitations of Modal Beamforming
    1 spatial sampling
    repetitions of spatial spectrum (e.g. Gaussian sampling [Ahrens et al. 2012])
    typically limitation of spatial bandwidth
    2 equipment noise
    differential operation for low frequencies
    frequency dependent white noise gain (WNG)
    3 sensor and position mismatch
    Example
    (N = 6, M = 84, uniform sampling) from [Rafaely, Analysis and Design of Spherical Microphone Arrays, 2005]
    S.Spors, H.Wierstorf and M.Geier
    Comparison of data-based binaural synthesis
    132ndAES
    5 / 15

    View Slide

  7. Introduction Beamforming Techniques Results Conclusions
    Principle of Delay-and-Sum Beamforming
    +
    ...
    delay-and-sum beamforming
    τ0
    τ1
    τM
    M
    P(x, ω) ¯
    P(φ, θ, ω)
    arbitrary (free-field) sensor geometries
    delays compensate for the propagation time between microphones
    constructive interference by summation
    numerically efficient and stable
    S.Spors, H.Wierstorf and M.Geier
    Comparison of data-based binaural synthesis
    132ndAES
    6 / 15

    View Slide

  8. Introduction Beamforming Techniques Results Conclusions
    Practical Aspects of Delay-and-Sum Beamforming
    1 spatial sampling
    leads to repetitions in spatial spectrum
    no limitation of spatial bandwidth
    2 equipment noise
    no differential operation for low frequencies
    constant white noise gain (WNG)
    3 sensor and position mismatch
    White noise gain (WNG)
    102
    103
    104
    −140
    −120
    −100
    −80
    −60
    −40
    −20
    0
    20
    40
    frequency (Hz)
    WNG (dB)
    delay−and−sum
    modal (N=23)
    Directivity index (DI)
    102
    103
    104
    0
    5
    10
    15
    20
    25
    30
    35
    40
    45
    50
    frequency (Hz)
    DI (dB)
    delay−and−sum
    modal (N=23)
    (R = 0
    .
    5 m, N = 23, M = 770) [Rafaely 2005]
    S.Spors, H.Wierstorf and M.Geier
    Comparison of data-based binaural synthesis
    132ndAES
    7 / 15

    View Slide

  9. Introduction Beamforming Techniques Results Conclusions
    Experimental Conditions
    cont. delay&sum
    beamforming
    cont. modal
    beamforming
    delay&sum
    beamforming
    modal beamforming
    plane wave
    decomposition
    spherical harmonics
    decomposition
    plane
    wave
    (φpw
    , θpw
    )
    HL
    (φ, θ, γ, δ, ω)
    HR
    (φ, θ, γ, δ, ω)
    x
    y
    N
    N
    M
    P(x, ω)
    ˚
    Pm
    n
    (ω)
    ¯
    P(φ, θ, ω)
    unit-amplitude plane wave as incident sound field (φpw
    = 0◦)
    analytic spatially continuous solutions [Rafaely 2005]
    numerical simulation w/o equipment noise
    S.Spors, H.Wierstorf and M.Geier
    Comparison of data-based binaural synthesis
    132ndAES
    8 / 15

    View Slide

  10. Introduction Beamforming Techniques Results Conclusions
    Experimental Parameters
    horizontal-plane HRTFs, 3 m distance [Wierstorf at al. 2011]
    sampling rate fS
    = 44.1 kHz
    radius of microphone array R = 0.5 m
    M = 770 microphones, maximum order N = 23
    omnidirectional/cardioid microphones for delay-and-sum/modal beamformer
    localization estimated by binaural model [Dietz et al. 2011]
    Implementation
    Sound Field Analysis Toolbox (SOFiA) [Bernschütz et al. 2011]
    Auditory Modeling Toolbox (AMT) [Sondergaardet al. 2011]
    delay-and-sum beamforming requires 6 dB per octave high-pass filter
    S.Spors, H.Wierstorf and M.Geier
    Comparison of data-based binaural synthesis
    132ndAES
    9 / 15

    View Slide

  11. Introduction Beamforming Techniques Results Conclusions
    Continuous Plane Wave Decomposition
    Frequency Response
    Modal Beamforming (N = 23) Delay-and-sum Beamforming
    (R = 0
    .
    5 m, fS = 44
    .
    1kHz)
    S.Spors, H.Wierstorf and M.Geier
    Comparison of data-based binaural synthesis
    132ndAES
    10 / 15

    View Slide

  12. Introduction Beamforming Techniques Results Conclusions
    Continuous Plane Wave Decomposition
    Temporal Response
    Modal Beamforming (N = 23)
    φ (deg)
    time (ms)
    −180 −90 0 90 180
    −1
    −0.5
    0
    0.5
    1
    dB
    −50
    −40
    −30
    −20
    −10
    0
    Delay-and-sum Beamforming
    φ (deg)
    time (ms)
    −180 −90 0 90 180
    −3
    −2
    −1
    0
    1
    2
    3
    dB
    −50
    −40
    −30
    −20
    −10
    0
    (R = 0
    .
    5 m, fS = 44
    .
    1kHz)
    S.Spors, H.Wierstorf and M.Geier
    Comparison of data-based binaural synthesis
    132ndAES
    10 / 15

    View Slide

  13. Introduction Beamforming Techniques Results Conclusions
    Spatially Sampled Plane Wave Decomposition
    Frequency Response
    Modal Beamforming (N = 23) Delay-and-sum Beamforming
    (R = 0
    .
    5 m, fS = 44
    .
    1kHz, M = 770, Lebedev grid)
    S.Spors, H.Wierstorf and M.Geier
    Comparison of data-based binaural synthesis
    132ndAES
    11 / 15

    View Slide

  14. Introduction Beamforming Techniques Results Conclusions
    Spatially Sampled Plane Wave Decomposition
    Temporal Response
    Modal Beamforming (N = 23)
    φ (deg)
    time (ms)
    −180 −90 0 90 180
    −3
    −2
    −1
    0
    1
    2
    3
    dB
    −50
    −40
    −30
    −20
    −10
    0
    Delay-and-sum Beamforming
    φ (deg)
    time (ms)
    −180 −90 0 90 180
    −3
    −2
    −1
    0
    1
    2
    3
    dB
    −50
    −40
    −30
    −20
    −10
    0
    (R = 0
    .
    5 m, fS = 44
    .
    1kHz, M = 770, Lebedev grid)
    S.Spors, H.Wierstorf and M.Geier
    Comparison of data-based binaural synthesis
    132ndAES
    11 / 15

    View Slide

  15. Introduction Beamforming Techniques Results Conclusions
    Perceived Direction
    Modal Beamforming
    0◦
    5◦
    10◦
    15◦
    20◦
    −90◦ −45◦ 0◦ 45◦ 90◦
    |∆azimuth angle|
    azimuth angle
    JND
    cont. N = 3
    cont. N = 5
    cont. N = 10
    cont. N = 23
    sampled
    Delay-and-sum Beamforming
    0◦
    5◦
    10◦
    15◦
    20◦
    −90◦ −45◦ 0◦ 45◦ 90◦
    |∆azimuth angle|
    azimuth angle
    JND
    continuous
    sampled
    (R = 0
    .
    5 m, fS = 44
    .
    1kHz, M = 770, Lebedev grid)
    S.Spors, H.Wierstorf and M.Geier
    Comparison of data-based binaural synthesis
    132ndAES
    12 / 15

    View Slide

  16. Introduction Beamforming Techniques Results Conclusions
    Weighted Frequency Response
    Modal Beamforming
    −10
    −5
    0
    5
    10
    15
    20
    100 1000 10000
    ∆magnitude (dB)
    center frequency (Hz)
    cont. N = 3
    cont. N = 5
    cont. N = 10
    cont. N = 23
    sampled
    Delay-and-sum Beamforming
    −10
    −5
    0
    5
    10
    15
    20
    100 1000 10000
    ∆magnitude (dB)
    center frequency (Hz)
    continuous
    sampled
    (R = 0
    .
    5 m, fS = 44
    .
    1kHz, M = 770, Lebedev grid)
    comparison of left-ear HRTF/BRTF for source in front
    weighting by auditory filter bank & loudness compression
    S.Spors, H.Wierstorf and M.Geier
    Comparison of data-based binaural synthesis
    132ndAES
    13 / 15

    View Slide

  17. Introduction Beamforming Techniques Results Conclusions
    Results from Informal Listening
    Spatially continuous plane wave decomposition
    hardly any artifacts are audible for speech and music
    slight change in coloration for noise bursts
    perceived distance increases with increasing order N in modal beamforming
    Spatially sampled plane wave decomposition
    slight localization errors and coloration for modal beamforming (N = 23)
    spatial artifacts for delay-and-sum beamforming
    Listening examples can be downloaded from http://audio.qu.tu-berlin.de
    S.Spors, H.Wierstorf and M.Geier
    Comparison of data-based binaural synthesis
    132ndAES
    14 / 15

    View Slide

  18. Introduction Beamforming Techniques Results Conclusions
    Conclusions
    numerical implementation is critical
    modal beamforming seems to provide better results
    spatial artifacts of delay-and-sum beamforming not predicted by model
    final conclusion requires formal listening test incl. equipment noise
    similarities to (perceptual) properties of WFS and NFC-HOA
    Future work
    improved delay-and-sum beamforming by selection of microphones
    results for complex sound fields incl. elevated sources
    S.Spors, H.Wierstorf and M.Geier
    Comparison of data-based binaural synthesis
    132ndAES
    15 / 15

    View Slide

  19. Introduction Beamforming Techniques Results Conclusions
    Thanks!
    S.Spors, H.Wierstorf and M.Geier
    Comparison of data-based binaural synthesis
    132ndAES
    15 / 15

    View Slide