Bernhard Schmitzer (Georg-August-Universität Göttingen, Germany) Entropic Transfer Operators for Data-driven Analysis of Dynamical Systems
WORKSHOP ON OPTIMAL TRANSPORT
FROM THEORY TO APPLICATIONS
INTERFACING DYNAMICAL SYSTEMS, OPTIMIZATION, AND MACHINE LEARNING
Venue: Humboldt University of Berlin, Dorotheenstraße 24
metric space update map F : X → X, for simplicity: F continuous xt+1 = F(xt ) Remarks time-continuous systems can be treated by integrating flow can be extended to stochastic dynamics xt+1 ∼ κxt ∈ P(X) Challenge systems of interest often high-dimensional, stochastic, chaotic ⇒ gain little insight from studying individual trajectories seek simplified, coarse-grained, effective description: cyclic behaviour, almost-invariant regions, fast and slow coordinates 5 / 25
points distributed according to xt ∼ µt ∈ P(X) distribution at time t + 1: xt+1 ∼ µt+1 = F#µt Eµt [ϕ] = X ϕ(x) dµt (x) Eµt+1 [ϕ] = X ϕ(F(x)) dµt (x) = X ϕ dF# µt Transfer operator T : P(X) → P(X), µ → F#µ linear operator represents dynamical system at level of distributions adjoint Koopman operator: ϕ → ϕ ◦ F often interested in invariant measures: Tµ = µ restriction to densities: T : Lp(µ) → Lp(F#µ) less complex spaces ⇒ spectral analysis, recover dominant dynamics 6 / 25
(X), partition X = i Xi (m-essentially disjoint), reduced space X := {X1, . . . , XN} Markov matrix P over X: Pi,j := m(Xj ∩F−1(Xi )) m(Xj ) , estimate by sampling slow convergence if support of m is high-dimensional Modern variants Markov state models reaction coordinates, transition manifolds, . . . Estimate adjoint Koopman operator K basis functions (ψ1, . . . , ψM ) : X → R, estimate K in subspace spanned by (ψa )a , based on samples (xi , yi = F(xi ))N i=1 ψa (yi ) = (Kψa )(xi ) ≈ b Ka,bψb (xi ) least squares approximation for coefficients Ka,b : min K i,a ψa (yi ) − b Ka,bψb (xi ) 2 wide variety of choices for (ψa )a , dictionary learning, kernel methods, . . . ⇒ Koopmanism 8 / 25
Γ(µ, ν) := {γ ∈ M+ (X × X) : P1♯ γ = µ, P2♯ γ = ν} marginals: P1♯ γ(A) := γ(A × X), P2♯ γ(B) := γ(X × B) Optimal transport C(µ, ν) := inf X×X c(x, y) dγ(x, y) γ ∈ Γ(µ, ν) = sup X f dµ + X g dν f ∈ C(X), g ∈ C(X), f ⊕ g ≤ c cost function c ∈ C(X × X) for moving unit mass from x to y Wasserstein distance on probability measures P(X) Wp (µ, ν) := (C(µ, ν))1/p for c(x, y) := d(x, y)p, p ∈ [1, ∞) 10 / 25
, yi = F(xi ))N i=1 xi ∼ µ, µ: invariant measure µN:= 1 N N i=1 δxi , νN:= 1 N N i=1 δyi , µN, νN ∗ ⇀ µ as N → ∞ goal: estimate (approximate) transfer operator T : L2(µ) → L2(µ) Naive first proposal: TN : L2(µN) → L2(νN), TN1xi = 1yi usually µN ̸= νN, TN not endomorphism TN is identity matrix in canonical bases {1xi }i , {1yi }i ⇒ no useful information, need to map back from L2(νN) to L2(µN) θ 13 / 25
∈ Γ(νN, µN) induces operator G : L2(νN) → L2(µN) via: ⟨ϕ, Gψ⟩L2(µN ) = X ϕ (Gψ)dµN := X×X ϕ(y) ψ(x) dγ(x, y) discrete case: matrix rep. of G given by that of γ ‘Closing’ TN : L2(µN) → L2(νN) let γN be optimal W2 plan from νN to µN, induced operator GN composition GN ◦ TN : L2(µN) → L2(µN) product of two permutation matrices, spectrum dominated by combinatorial artefacts when T non-compact, do not expect convergence GN ◦ TN → T θ 14 / 25
, d-torus; F : x → x + θ, θ ∈ Rd eigenbasis of T: for k ∈ Zd vector φk (x) = exp(2πik⊤x), value λk = exp(−2πik⊤θ) Smoothed operator Tε = Gε ◦ T vector φε k = φk , value λε k ≈ exp(−π2ε∥k∥2) · λk for small ε Gε acts approximately like diffusion kernel, time step ∆t ∝ ε ⇒ spectrum of Tε good approximation of T for eigenvectors with length scale 1/∥k∥ above blur scale √ ε: √ ε ≪ 1/∥k∥ ⇔ ε∥k∥2 ≪ 1 ⇒ λε k ≈ λk Discretization N = nd , (xi )N i=1 : uniform Cartesian lattice, n points along each axis TN,ε = GN,ε ◦ TN vector φN,ε k = φk , value λN,ε k ≈ λε k if 1 n ≪ √ ε ≪ 1 ∥k∥ 18 / 25
= i Xi , discrete transition rates Pi,j := µ(Xj ∩F−1(Xi )) µ(Xj ) finding appropriate Xi in high dimensions is difficult entropic transfer operator is mesh-free, non-parametric, complexity controlled by ε (work in progress) Gaussian perturbations Tε Gauss := Gε Gauss ◦ T, Gε Gauss : Gaussian blur at scale √ ε perturbs invariant measure of T, full support restrict Gaussian to spt µ still perturbs invariant measure Diffusion maps, graph Laplacians bi-stochastic normalization [Marshall and Coifman, 2019] RKHS embedding embed xi , yi = F(xi ) into RKHS, k(x, y) = ⟨Φ(x), Φ(y)⟩ = exp(−c(x, y)/ε) (regularized) least squares regression problem for linear operator on span of {Φ(xi )}i , {Φ(yi )}i TRKHS is not a Markov operator, role of ε much less clear 24 / 25
system analysis optimal transport Entropic transfer operators: first impression new method for estimating transfer operator fully data driven, only parameter: blur scale √ ε trade-off: nr. of samples ⇔ resolution of analysis mesh free, seems to work in high dimensions (high ≈ 30) OT theory provides framework for analysis Future work extension to stochastic systems out-of-sample embedding quantitative convergence analysis interpretation of Gε as (approximate) diffusion relation to other dimensionality reduction methods applications 25 / 25