Slide 1

Slide 1 text

Bayesian Cyclic Networks, Mutual Information and Reduced-Order Bayesian Inference Laboratoire des signaux et systèmes, CNRS-Centrale Supelec-Univ Paris Sud, 17 July 2015 Robert K. Niven UNSW Canberra, ACT, Australia. [email protected] Bernd R. Noack Institut PPrime, Poitiers, France. Eurika Kaiser Institut PPrime, Poitiers, France. Lou Cattafesta Florida State University, USA. Laurent Cordier Institut PPrime, Poitiers, France. Markus Abel Ambrosys GmbH / Univ. of Potsdam, Germany Funding from ARC, Go8/DAAD, CNRS, Region Poitou-Charentes

Slide 2

Slide 2 text

© R.K. Niven 2 Discrete: Continuous: Bayesian Updating

Slide 3

Slide 3 text

© R.K. Niven 3 Reduced Order Model (ROM) ROM Inversion ROM- Bayesian Updating Bayesian Updating

Slide 4

Slide 4 text

© R.K. Niven 4 Contents • Cluster-based reduced-order modelling - algorithm - examples • “Bayesian cyclic networks” - concept - mathematical implications • Application: reduced-order Bayesian inference - modelling of turbulent flows - turbulent flow control

Slide 5

Slide 5 text

© R.K. Niven 5 Cluster-based Reduced-Order Modelling (Kaiser et al., J. Fluid Mech. 754: 365-414, 2014) - time-series data are partitioned into similar clusters - compute probability transition matrix → clustered dynamical model

Slide 6

Slide 6 text

© R.K. Niven 6 Clustering Algorithm (Kaiser et al. 2014) 1. Time-series data (e.g. flow snapshots) are classified by a distance metric e.g. k-means algorithm: use Euclidean metric d mn =|| xm − xn ||= ds Ω s ∫ xm (s)⋅xn (s) 2. Data partitioned into K equally weighted Voronoi cells (“clusters”) 3. Cluster allocations are optimised by minimising an objective function, representing the intracluster variances J = || xm − ck || xm ∈ Ck ∑ 2 k=1 K ∑ where c k = centroid of cluster C k → cluster-based reduced order model of data (CROM)

Slide 7

Slide 7 text

© R.K. Niven 7 Clustered Dynamical Model (Kaiser et al. 2014) 1. Calc. stepwise probability transition matrix P, based on frequencies of transitions P = P j|i ⎡ ⎣ ⎤ ⎦ with P j|i = n j|i n j|i j ∑ 2. → Markov model for probabilities (vector) of clusters at th time step p  = P p −1 = P p 0 with asymptotic limits: p∞ = lim →∞ P p 0 and P∞ = lim →∞ P 3. → Clustered dynamical system model

Slide 8

Slide 8 text

© R.K. Niven 8 Example 1: Lorenz Attractor (Kaiser et al. 2014)

Slide 9

Slide 9 text

© R.K. Niven 9 Example 2: Mixing Layer

Slide 10

Slide 10 text

© R.K. Niven 10 Example 2: Mixing Layer (cont’d) (Kaiser et al. 2014) 1-step transition matrix Simplified dynamical model

Slide 11

Slide 11 text

© R.K. Niven 11 Example 2: Mixing Layer (cont’d) (Kaiser et al. 2014) Voronoi plot

Slide 12

Slide 12 text

© R.K. Niven 12 Example 2: Mixing Layer (cont’d) Transition matrices:  = (a) 1; (b) 10; (c) 100; (d) 1000

Slide 13

Slide 13 text

© R.K. Niven 13 Example 3: Ahmed Body (Kaiser et al. 2014) Instantaneous isosurface (pressure coefficient) Transition matrix (1 step) Simplified dynamical model

Slide 14

Slide 14 text

© R.K. Niven 14 Example 3: Ahmed Body (Kaiser et al. 2014) Voronoi plot

Slide 15

Slide 15 text

© R.K. Niven 15 Example 4: Engine Combustion Cycle (Cao et al. 2015) Voronoi plot

Slide 16

Slide 16 text

© R.K. Niven 16 Advantages 1. Clear representation of transitions → simplified dynamical model 2. Dramatic reduction in order 3. Computationally efficient (although did use Galerkin ROM) Disadvantages 1. Purely “data-driven”: - inference on data space only - does not incorporate any information on the model space, or any uncertainties 2. Number of clusters chosen in advance (not optimised) 3. Dynamical model is oversimplified (not probabilistic) e.g. what is the space of possible clustered dynamical models?

Slide 17

Slide 17 text

© R.K. Niven 17 Q1: How can we combine the advantages of clustering for (data) reduction and simplification, with a more robust framework for probabilistic inference (= Bayes) ? Q2: How can we build on this framework for flow control?

Slide 18

Slide 18 text

© R.K. Niven 18 Theoretical Framework

Slide 19

Slide 19 text

© R.K. Niven 19 Functional Analysis But here want to represent probabilistic connections

Slide 20

Slide 20 text

© R.K. Niven 20 Bayesian Networks = acyclic probability networks - not so useful here!

Slide 21

Slide 21 text

© R.K. Niven 21 Markov Chains = networks of probabilities - assume independent of history - almost what we want!

Slide 22

Slide 22 text

© R.K. Niven 22 “Bayesian Cyclic Networks” Here define as probabilistic network, which - includes probabilistic cycles (complete graph) - includes all prior probabilities Here assume Markovian – but can extend if necessary

Slide 23

Slide 23 text

© R.K. Niven 23 2-D Bayesian Cyclic Network (Discrete) p(i, j) = p(i)p(j | i) = p(j)p(i | j) ⇒ p(i, j) p(i) = p(j | i), p(i, j) p(j) = p(i | j) ⇒ p(j | i) = p(j) p(i | j) p(i) Consider i = D i , j = H j ⇒ extended Bayes’ theorem

Slide 24

Slide 24 text

© R.K. Niven 24 2-D Bayesian Cyclic Network (Continuous) p(x,y)dxdy = p(x)dx p(y | x)dy = p(y)dy p(x | y)dx ⇒ p(x,y) p(x) = p(y | x), p(x,y) p(y) = p(x | y) ⇒ p(y | x) = p(y) p(x | y) p(x) Consider x = D, y = θ ⇒ continuous Bayes’ theorem

Slide 25

Slide 25 text

© R.K. Niven 25 3-D Bayesian Cyclic Network (Discrete) p(i, j,k) = p(i)p(j | i)p(k | j) = p(i)p(k | i)p(j | k) = p(j)p(k | j)p(i | k) = p(j)p(i | j)p(k | i) = p(k)p(i | k)p(j | i) = p(k)p(j | k)p(i | j) , (3! = 6 relations)

Slide 26

Slide 26 text

© R.K. Niven 26 3-D Bayesian Cyclic Network (Discrete) From p(i, j,k) and Bayes → p(i | j) p(i) = p(i | k) p(i) = p(j | i) p(j) = p(j | k) p(j) = p(k | i) p(k) = p(k | j) p(k) → p(i, j) p(i)p(j) = p(i,k) p(i)p(k) = p(j,k) p(j)p(k) → p(i, j)ln j ∑ i ∑ p(i, j) p(i)p(j) = p(i,k)ln k ∑ i ∑ p(i,k) p(i)p(k) = p(j,k)ln k ∑ j ∑ p(j,k) p(j)p(k) This is the mutual information! I(ϒ i ,ϒ j ) = I(ϒ i ,ϒ k ) = I(ϒ j ,ϒ k ) → M.I. between any pair of parameters is identical

Slide 27

Slide 27 text

© R.K. Niven 27 3-D Bayesian Cyclic Network (Continuous) From p(x,y,z)dxdydz and Bayes → same relations → I(ϒ x ,ϒ y ) = I(ϒ x ,ϒ z ) = I(ϒ y ,ϒ z ) with I(ϒ x ,ϒ y ) = p(x,y)ln p(x,y) p(x)p(y) Ω y ∫ Ω x ∫ dxdy

Slide 28

Slide 28 text

© R.K. Niven 28 4-D Bayesian Cyclic Network (Discrete) p(i, j,k,) = p(i)p(j | i)p(k | j)p( | k), etc (4! = 24 relations)

Slide 29

Slide 29 text

© R.K. Niven 29 4-D Bayesian Cyclic Network (Discrete) From p(i, j,k,) and Bayes → p(α,β) p(α)p(β) = constant for α,β ∈{i, j,k,} I(ϒ α,ϒ β) = constant → M.I. between any pair of parameters is identical Same result for continuous case Similarly for any Markovian Bayesian cyclic network with n nodes

Slide 30

Slide 30 text

© R.K. Niven 30 Reduced-Order Bayesian Inference

Slide 31

Slide 31 text

© R.K. Niven 31 Bayesian Updating Discrete: Continuous:

Slide 32

Slide 32 text

© R.K. Niven 32 Bayesian Updating Discrete: Continuous: Data Space Model Space Comput. expensive!

Slide 33

Slide 33 text

© R.K. Niven 33 Reduced-Order Bayesian Inference Clustering (or ROM) Declustering Clustered dynamical model Bayesian Updating

Slide 34

Slide 34 text

© R.K. Niven 34 Reduced-Order Bayesian Inference Continuous (or dense) Reduced order Data Space Model Space

Slide 35

Slide 35 text

© R.K. Niven 35 Reduced-Order Bayesian Inference Note: mix of continuous and discrete variables (omit diagonals) Will be a loss of information due to clustering: I(ϒ x ,ϒ θ ) direct ≥ I(ϒ x ,ϒ θ ) loop → measure of uncertainty in algorithm ΔI x,θ = I(ϒ x ,ϒ θ ) direct − I(ϒ x ,ϒ θ ) loop ≥ 0 - compare to computational “costs” ΔC x,θ = C(ϒ x ,ϒ θ ) direct −C(ϒ x ,ϒ θ ) loop ≥ 0 ⎫ ⎬ ⎪ ⎪ ⎭ ⎪ ⎪ min N (ΔC x,θ + ΔI x,θ) = max N C(ϒ x ,ϒ θ) loop ( +I(ϒ x ,ϒ θ) loop ) Optimal criterion!

Slide 36

Slide 36 text

© R.K. Niven 36 Synthesis of Bayes and ROM for Flow Control

Slide 37

Slide 37 text

© R.K. Niven 37 Flow Control dξ(t) dt = f(ξ(t),u(t)) Dynamical system y(t) = g(ξ(t),u(t)) Sensor system u(t) = K(y(t)) Control operator where ξ(t) = parameter(s) y(t) = sensor signals u(t) = control signals Want the models f and g Commonly K found by minimising an objective function J(ξ,u) Plant Controller y(t) u(t) ξ(t) Other outputs

Slide 38

Slide 38 text

© R.K. Niven 38 Flow Control Framework dξ(t) dt = f(ξ(t),u(t)) y(t) = g(ξ(t),u(t)) u(t) = K(y(t)) Propose same framework! but now with x = {D(ξ(t),u(t)) m ,y(t),u(t)} θ = {f,g,K}

Slide 39

Slide 39 text

© R.K. Niven 39 Conclusions • Cluster-based reduced-order modelling - algorithm - examples: Lorenz, mixing layer, Ahmed body, engine cycle • “Bayesian cyclic networks” = cyclic probabilistic network (complete graph) - Markovian → mutual information is equivalent between pairs of variables • Application: Reduced-order Bayesian inference - flow modelling - inequality in mutual information (non-Markovian) → criterion for optimal choice of ROM - flow control

Slide 40

Slide 40 text

© R.K. Niven 40 Merci!

Slide 41

Slide 41 text

© R.K. Niven 41