Slide 1

Slide 1 text

Optimal transport in linear inverse problems Filip Elvander Dept. Information and Communications Engineering Aalto University

Slide 2

Slide 2 text

0. Structured and Stochastic Modeling Group My background • PhD in mathematical statistics from Lund University 2020. • 2020-2021, postdoc at KU Leuven. • 2021-2022, postdoctoral fellow with FWO (Belgian research council) and KU Leuven. • Assistant prof. in signal processing at Aalto University since 2022. Theme of group • Statistical signal processing (in very wide sense) • Applications in localization, audio processing, spectroscopy. PhD students at Aalto University • Anton Bj¨ orkman, • Linda Fabiani, • Rumeshika Pallewela, • Yuyang Liu. PhD student at Lund University • David Sundstr¨ om (main supervisor prof. Andreas Jakobsson).

Slide 3

Slide 3 text

0. Distance measures in learning • The standard distance measures we use are based on the ”y-axis”. • Works great in most applications, but troublesome in some. • Spectral estimation: what are sensible ways of describing spectral perturbations? • Φ(ω)+ε: no problem. • Φ(ω +ε): immediate breakdown. • Vanishing gradients due to saturation, robust recovery impossible,... Optimal (mass) transport offers an alternative with many powerful properties. Shifting a narrow-band spectrum -3 -2 -1 0 1 2 3 100 0.4 0.45 0.5 0.55 0.6 c 100 L 2 distance KL divergence

Slide 4

Slide 4 text

0. Optimal mass transport

Slide 5

Slide 5 text

0. Optimal mass transport: ingredients • Two spaces, X, Y, and two distributions µX ∈ M+(X) and µY ∈ M+(Y). • What is the cheapest way of rearranging µX to µY ? • The effort of rearrangement is measured in terms of a ”ground cost”, c : X ×Y → R+. • Example: X = Y = R3 and c(x,y) = ∥x−y∥2 2 . Image from C. Villani, Optimal Transport - Old and New (Springer, 2009)

Slide 6

Slide 6 text

0. Optimal mass transport: Monge Define the distance between µX and µY as S(µX ,µY ) ≜ min T:X→Y ˆ X c(x,T(x))dµX (x) , s.t. T#µX = µY . • x ∈ X is mapped to T(x) ∈ Y with ground cost c(x,T(x)). • The amount of mass at x is dµX (x). • T rearranges µX into µY (T#µX = µY ). • Drawbacks: (1) difficult nonlinear problem, (2) Cannot split masses Image from C. Villani, Optimal Transport - Old and New (Springer, 2009)

Slide 7

Slide 7 text

0. Kantorovich formulation Instead of a mapping T : X → Y, look for a coupling m ∈ M+(X ×Y): minimize m ˆ c(x,y)dm(x,y) , s.t. ˆ dm(·,y) = dµX , ˆ dm(x,·) = dµY . • Equivalent to Monge problem in many interesting cases. • Allows for mass splitting • Convex problem! Figure: m = µX ⊗ µY . Figure: Minimizing m for c(x,y) = (x−y)2.

Slide 8

Slide 8 text

0. Induced geometry • The space of distributions (e.g., spectra) M+(X) inherits properties of X via the ”ground cost” c. • Possible to define geodesics on M+(X). 1 0 0.5 3 2 1 0.02 0 0 -1 -2 -3 0.04 0.06 0.08

Slide 9

Slide 9 text

0. Linear inverse problems In most setups, we only have imperfect knowledge: y = A(µ)+w , G linear operator. • Temporal spectral estimation: y covariance function, G Fourier transform. • Spatial spectral estimation: y array covariance, G incorporating array steering vectors. • System identification: y output, G convolution operator, (µ not non-negative in this case). Solving inverse problems with OT regularization: minimize µ 1 2 ∥y−G(µ)∥2 2 +S(µ) , S modeling transport. • In many useful situations, S is convex in µ. • Induce smoothness (in support) for sequential observations, • Averaging for ensembles of observations, • Shrinkage towards a prior µ0, • ...

Slide 10

Slide 10 text

0. Tracking with dynamics (rad) 5 4 3 t (s) 2 1 0 0 6 4 5 2 3 1 Point-wise Capon. 0 1 2 3 4 5 t (s) 2 4 6 (rad) 0 0.5 1 0 1 2 3 4 5 t (s) -1 0 1 v (rad/s) 0 0.5 1 OT regularization.

Slide 11

Slide 11 text

0. Spatial barycenter -2 -1.5 -1 -0.5 0 0.5 1 1.5 x-position -2 -1.5 -1 -0.5 0 0.5 1 1.5 y-position array 1 array 2 array 1, orientation array 1, assumed orientation sources 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 MUSIC. -2 -1.5 -1 -0.5 0 0.5 1 1.5 x-position -2 -1.5 -1 -0.5 0 0.5 1 1.5 y-position array 1 array 2 array 1, orientation array 1, assumed orientation sources 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 OT barycenter.

Slide 12

Slide 12 text

0. Clustered transport • If target sources are broad-band, the corresponding to spatio-temporal spectra are sparse in location but not frequency. • Easy to include in transport formulation: promote group-sparse flow of mass. • Potential benefit in information sharing between carriers. 0.5 1 1.5 2 2.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Temporal spectra. t = 1 0.5 1 1.5 2 2.5 -80 -60 -40 -20 0 20 40 60 80 t = 2 0.5 1 1.5 2 2.5 -80 -60 -40 -20 0 20 40 60 80 t = 3 0.5 1 1.5 2 2.5 -80 -60 -40 -20 0 20 40 60 80 t = 4 0.5 1 1.5 2 2.5 -80 -60 -40 -20 0 20 40 60 80 t = 5 0.5 1 1.5 2 2.5 -80 -60 -40 -20 0 20 40 60 80 Group-sparse OT. t = 1 0.5 1 1.5 2 2.5 -80 -60 -40 -20 0 20 40 60 80 t = 2 0.5 1 1.5 2 2.5 -80 -60 -40 -20 0 20 40 60 80 t = 3 0.5 1 1.5 2 2.5 -80 -60 -40 -20 0 20 40 60 80 t = 4 0.5 1 1.5 2 2.5 -80 -60 -40 -20 0 20 40 60 80 t = 5 0.5 1 1.5 2 2.5 -80 -60 -40 -20 0 20 40 60 80 Group lasso.

Slide 13

Slide 13 text

0. Tracking with hydrophone data Temporal spectrum. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Spatial spectrum.

Slide 14

Slide 14 text

0. Multi-marginal formulation All these problems are extensions of the multi-marginal transport formulation minimize m∈M+(X) ˆ X m(x)c(x)dx+ 1 2 T ∑ t=1 ∥G (Pt(m))−yt∥2 2 • x = (x1,x2,...,xT ). • There are T margins, each being an element of M+(X), with ground-space X. • X = XT := X ×X ×...×X • Projection on each margin, Pt : M+(X) → M+(X). • The way that the ground-cost c is defined determines the interpretation. • In practice, we discretize the space X.

Slide 15

Slide 15 text

0. Discretization and entropy regularization Primal problem minimize M ⟨M,C⟩+ 1 2 T ∑ t=1 ∥GPt(M)−yt∥2 2 +εD(M) • With n discretization points in each margin, M and C are tensors in RnT := Rn×n×...×n. • D(M) = ∑i1 ,...,iT [M]i1 ,...,iT log[M]i1 ,...,iT −1 +1. • Impossible to solve directly, but the entropy term leads to a nice dual problem. Dual problem maximize λ λ λt ∈RN ,t=1,...,T −ε⟨U,K⟩− 1 2 T ∑ t=1 ∥λ λ λt∥2 2 + T ∑ t=1 ⟨λ λ λt,yt⟩ • Here U = u1 ⊗u2 ⊗...⊗uT , with ut = exp 1 ε GT λ λ λt . K = exp −1 ε C . • The primal solution is M = U⊙K. • Dimension of dual is considerably smaller than that of the primal. • The dual can be solved efficiently, with computational bottleneck being computation of projections Pt (U⊙K). Often, structure in K can be exploited.

Slide 16

Slide 16 text

0. Identification of acoustic systems Input-output relationship for acoustic systems: y = x∗h+w • h ∈ RNh room impulse response (RIR). • h depends on relative source-receiver positions, room geometry, air temperature, etc. • Perturbations in the environment manifest as perturbations in the delay path, i.e., support of h. • Seeing an RIR as a set of tuples, h” = ” (h2 k ,τk) k , transport can be used to re-arrange the energy distribution over time. • For example, we can use a crude geometry-based h0 as a ”prior” when estimating h from y.

Slide 17

Slide 17 text

0. RIR estimation Letting h0 be the prior RIR, we can define S(·,h0) : RNh → R as S(h,h0) = min M∈R Nh×Nh + ⟨C,M⟩+εD(M) s.t. M1Nh = h2 0 , MT 1Nh ≥ h2. • D(M) = ∑i,j[M]i,j log[M]i,j −1 +1, entropic term. • S convex in its first argument. We can estimate h as ˆ h = arg min h∈RN h 1 2 ||y−h∗x||2 2 +ηS(h,h0). • Minimization by forward-backward splitting. • Entropy term D(M) makes for (fairly) easy proximal operator. • Accuracy can be tuned by ε, but with price of slower convergence.

Slide 18

Slide 18 text

0. RIR estimation Regularizing with S(h,h0), we expect more robustness than with standard metrics. • E.g., ∥h−h0∥p only causes bias when support has changed. 16 18 20 22 24 Temperature, °C -12 -10 -8 -6 -4 -2 0 2 NMSE (dB) Simulated RIR with temperature perturbations. 70 80 90 100 110 120 0 5 10 Amplitude 10-3 OT. 70 80 90 100 110 120 0 5 10 Amplitude 10-3 ℓ1 .

Slide 19

Slide 19 text

0. A completely different problem Joint data association and source localization

Slide 20

Slide 20 text

0. Summary • OT offers a fundamentally different way of performing comparison and quantifying distances, as compared to standard metrics. • Bulk of work done has been in cases where the marginals/distributions are known or directly observable ⇒ many opportunities from a signal processing perspective. • Very flexible and modular framework. • Challenges: computational complexity and extensions to transport between non-positive objects.