Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tan et al. 2019: Factorized Inference in Deep Markov Models for Incomplete Multimodal Time Series

8002c84eb4c18170632f8fb7efb09288?s=47 Minqi Pan
March 18, 2020

Tan et al. 2019: Factorized Inference in Deep Markov Models for Incomplete Multimodal Time Series

8002c84eb4c18170632f8fb7efb09288?s=128

Minqi Pan

March 18, 2020
Tweet

Transcript

  1. Methods Experiments Tan et al. 2019: Factorized Inference in Deep

    Markov Models for Incomplete Multimodal Time Series Minqi Pan March 17, 2020 Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  2. Methods Experiments Factorized Inference in Deep Markov Models for Incomplete

    Multimodal Time Series AAAI 2020 “ML: Probabilistic Methods II”, Feb 12nd, 2020 Tan Zhi-Xuan, Harold Soh, Desmond C. Ong A*STAR, MIT, National University of Singapore Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  3. Methods Experiments Outline 1 Methods Factorized Posterior Distributions Multimodal Fusion

    via Product of Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference 2 Experiments Datasets Inference Tasks Weakly Supervised Learning Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  4. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Outline 1 Methods Factorized Posterior Distributions Multimodal Fusion via Product of Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference 2 Experiments Datasets Inference Tasks Weakly Supervised Learning Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  5. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Multimodal Deep Markov Models (MDMMs) zt: vector valued latent state xm t : vector valued observation for modality m at time t Define an MDMM with M modalities by Transition distributions are assumed to be a multivariate Guassian with means and covariances which are differentiable functions of the previous latent state zt ∼ N(µθ (zt−1 ), Σθ (zt−1 )) Emission distributions xm t ∼ Π(κm θ (zt )) E.g. if the data is binary, Π =independent Bernoulli parameterized by κm θ (zt ) Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  6. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Subsuming Linear Gaussian State Space Models zt ∼ N(µθ(zt−1), Σθ(zt−1)) xm t ∼ Π(κm θ (zt)) Kalman filters µθ (zt−1 ) = Gt zt−1 + Bt ut where Gt , Bt are a matrices Σθ (zt−1 ) = Kt where Kt is a matrix κm θ (zt ) = Ft zt where Ft is a matrix Π = N We can do inference analytically! Deep nonlinear models µθ (zt−1 ) is a neural network parameterized by θ Σθ (zt−1 ) is a neural network parameterized by θ κm θ (zt ) is a neural network parameterized by θ Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  7. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Jointly Learning θ (Generative) and φ (Inference) θ of the generative model pθ(z1:T , x1:T ) ASSUMPTION: we consider learning in a Bayesian network whose joint distribution (generatively) factorizes as pθ (z1:T , x1:T ) = pθ (x1:T |z1:T )pθ (z1:T ) Note that the marginal data likelihood is intractable: pθ (x1:T ) = pθ (z1:T )pθ (x1:T |z1:T )dz φ of the variational posterior qφ(z1:T |x1:T ) qφ (z1:T |x1:T ) approximates the true posterior pθ (z1:T |x1:T ) pθ (z1:T |x1:T ) = pθ(x1:T |z1:T )pθ(z1:T ) pθ(x1:T ) is intractable Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  8. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Evidence Lower Bound (ELBO) L(x; θ, φ) =Eqφ(z1:T |x1:T ) [log pθ(x1:T |z1:T )] − Eqφ(z1:T |x1:T ) [KL(qφ(z1:T |x1:T ) pθ(z1:T ))] Jensen’s inequality: L is a lower bound of the log marginal likelihood L(x; θ, φ) pθ(x1:T ) ML Learning ⇒ Let’s maximize L (via gradient ascent with stochastic backpropagation, sampling from qφ) The expectation wrt qφ(z1:T |x1:T ) implicitly depends on the network parameters φ. When using a Gaussian variational approximation qφ(z1:T |x1:T ) ∼ N(µφ(x1:T ), Σφ(x1:T )), µφ, Σφ are parameteric functions of the observation Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  9. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference MDMMs can do 3 Kinds of Inferences 1 Filtering: given PAST, infer p(zt|x1:t) for some zt 2 Smoothing: given PAST and FUTURE, infer p(zt|x1:T ) for some zt 3 Sequencing: given PAST and FUTURE, infer p(z1:T |x1:T ) Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  10. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Factorization over Time p(z1:T |x1:T ) = p(z1|x1:T )p(z2|z1, x1:T )p(z3|z2, x1:T ) . . . = p(z1|x1:T )p(z2|z1, x2:T )p(z3|z2, x3:T ) . . . = p(z1|x1:T ) T t=2 p(zt|zt−1, xt:T ) Each latent state zt depends only on the previous latent state zt−1 all current and future observations xt:T Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  11. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference “Conditional Smoothing Posterior” p(zt|zt−1, xt:T ) it is the posterior that corresponds to the conditional prior p(zt|zt−1), hence we call it conditional “posterior” it combines information from both PAST and FUTURE, hence we call it “smoothing” Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  12. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Factorizing the Conditional Smoothing Posterior (1) x1:M t:T ⊥ ⊥ zt−1|zt (by d-seperation) ⇒ p(zt|zt−1, x1:M t:T ) = p(zt−1, zt, x1:M t:T ) p(zt−1, x1:M t:T ) = p(x1:M t:T |zt−1, zt)p(zt−1, zt) p(zt−1, x1:M t:T ) = p(zt−1)p(zt|zt−1)p(x1:M t:T |zt) p(x1:M t:T |zt−1)p(zt−1) Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  13. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Factorizing the Conditional Smoothing Posterior (2) xt ⊥ ⊥ xt+1:T |zt (by Local Markov Property) ⇒ p(zt|zt−1, x1:M t:T ) = p(zt−1)p(zt|zt−1)p(x1:M t:T |zt) p(x1:M t:T |zt−1)p(zt−1) = p(zt−1)p(zt|zt−1)p(x1:M t |zt)p(x1:M t+1:T |zt) p(x1:M t:T |zt−1)p(zt−1) = p(zt|zt−1)p(x1:M t |zt)p(x1:M t+1:T |zt) p(x1:M t:T |zt−1) = p(x1:M t+1:T |zt)p(x1:M t |zt) p(zt|zt−1) p(x1:M t:T |zt−1) Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  14. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Factorizing the Conditional Smoothing Posterior (3) Dropping 1 p(x1:M t:T |zt−1) Assuming p(x1:M t |zt) = M m=1 p(xm t |zt) ⇒ p(zt|zt−1, x1:M t:T ) = p(x1:M t+1:T |zt)p(x1:M t |zt) p(zt|zt−1) p(x1:M t:T |zt−1) ∝ p(x1:M t+1:T |zt)p(x1:M t |zt)p(zt|zt−1) = p(x1:M t+1:T |zt) M m=1 p(xm t |zt) p(zt|zt−1) Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  15. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Factorizing the Conditional Smoothing Posterior (4) Dropping p(x1:M t+1:T ) M m=1 p(xm t ) = p(x1:M t:T ) ⇒ p(zt|zt−1, x1:M t:T ) ∝ p(x1:M t+1:T |zt) M m=1 p(xm t |zt) p(zt|zt−1) = p(zt|x1:M t+1:T )p(x1:M t+1:T ) p(zt) M m=1 p(zt|xm t )p(xm t ) p(zt) p(zt|zt−1) ∝ p(zt|x1:M t+1:T ) M m=1 p(zt|xm t ) p(zt) p(zt|zt−1) p(zt) Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  16. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Future×Present×Past (1) Backward Filtering p(zt|xt:T ) ∝ p(zt|xt+1:T ) m p(zt|xm t ) p(zt) Forward Smoothing p(zt|x1:T ) ∝ p(zt|xt+1:T ) m p(zt|xm t ) p(zt) p(zt|x1:t−1) p(zt) Conditional Smoothing Posterior p(zt|zt−1, xt:T ) ∝ p(zt|xt+1:T ) m p(zt|xm t ) p(zt) p(zt|zt−1) p(zt) Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  17. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Future×Present×Past (2) Each distribution is decomposed into 1 Its dependence on future observations p(zt|xt+1:T ) 2 Its dependence on each modality m in the present p(zt|xm t ) 3 Its dependence on the past p(zt|zt−1) or p(zt|x1:t−1) Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  18. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Insights of the Factorizations Any missing modalities ¯ m ∈ [1, M] at time t can simply be left out of the product over modalities, leaving us with distributions that correctly condition on only the modalities [1, M]\{ ¯ m} that are present We can compute all three distributions if we can approximate the dependence on the future q(zt|xt+1:T ) p(zt|xt+1:T ), learn approximate posteriors q(zt|xm t ) p(zt|xm t ) for each modality m, and know the model dynamics p(zt), p(zt|zt−1) Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  19. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Outline 1 Methods Factorized Posterior Distributions Multimodal Fusion via Product of Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference 2 Experiments Datasets Inference Tasks Weakly Supervised Learning Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  20. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Gaussian Assumption It is not tractable to compute the product of generic probability distributions So assume that each term in the factorization is Gaussian If each distribution is Gaussian, then their products or quotients are also Guassian, and their products or quotients can be computed in closed form Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  21. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Uncertainty Awareness The output distribution of Product-of-Gaussians is dominated by the input Gaussian term with lower variance (higher precision), thereby fusing information in a way that gives more weight to higher-certainty inputs Automatically balances the information provided by each modality m, depending on: whether p(zt |xm t ) is high or low certainty the information provided from the past and future through p(zt |zt−1 ) and p(zt |xt+1:T ) Thereby performing multimodal temporal fusion in a manner that is uncertainty-aware Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  22. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Outline 1 Methods Factorized Posterior Distributions Multimodal Fusion via Product of Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference 2 Experiments Datasets Inference Tasks Weakly Supervised Learning Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  23. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Missing Observations in the Future p(zt|xt+1:T ) does not admit further factorization, hence does not readily handle missing data among those future observations zt ⊥ ⊥ xt+1:T |zt+1 (by d-seperation) ⇒ p(zt|xt+1:T ) = zt+1 p(zt, zt+1|xt+1:T )dzt+1 = zt+1 p(zt|zt+1, xt+1:T )p(zt+1|xt+1:T )dzt+1 = zt+1 p(zt|zt+1)p(zt+1|xt+1:T )dzt+1 = Ep(zt+1|xt+1:T ) [p(zt|zt+1)] Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  24. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Approximating p(zt |xt+1:T ) = Ep(zt+1 |xt+1:T ) [p(zt |zt+1 )] Tractable approximation via Huber et al. 2011 Assume p(zt|xt+1:T ) ∼ N(µ, Σ) with diagonal Σ Assume p(zt|zt+1) ∼ N(µ, Σ) with diagonal Σ Draw (µ1, Σ2), . . . , (µK, ΣK) of p(zt|zt+1) under p(zt+1|xt+1:T ), then Approximate ˆ µ of p(zt |xt+1:T ) via moment-matching as 1 K K k=1 µk Approximate ˆ Σ of p(zt |xt+1:T ) via moment-matching as 1 K K k=1 (Σk + µ2 k ) − ˆ µ2 Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  25. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Insights of p(zt |xt+1:T ) = Ep(zt+1 |xt+1:T ) [p(zt |zt+1 )] (1) The backward filtering distribution p(zt|xt:T ) ∝ p(zt|xt+1:T ) m p(zt|xm t ) p(zt) becomes p(zt|xt:T ) ∝ Ep(zt+1|xt+1:T ) [p(zt|zt+1)] M m=1 p(zt|xm t ) p(zt) By sampling under the filtering distribution for time t + 1, p(zt+1|xt+1:T ), we can compute the filtering distribution for time t, p(zt|xt:T ) We can recursively compute p(zt|xt:T ) backwards in time, starting from t = T: p(zT |xT:T ) → p(zT−1|xT:T ) → p(zT−1|xT−1:T ) → · · · → p(z1|x1:T ) Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  26. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Insights of p(zt |xt+1:T ) = Ep(zt+1 |xt+1:T ) [p(zt |zt+1 )] (2) Once we can perform p(zt|xt:T ) ∝ Ep(zt+1|xt+1:T ) [p(zt|zt+1)] M m=1 p(zt|xm t ) p(zt) filtering backwards in time, we can use this to approximate p(zt|xt+1:T ) in the smoothing distribution p(zt|x1:T ) ∝ p(zt|xt+1:T ) m p(zt|xm t ) p(zt) p(zt|x1:t−1) p(zt) and the conditional smoothing posterior p(zt|zt−1, xt:T ) ∝ p(zt|xt+1:T ) m p(zt|xm t ) p(zt) p(zt|zt−1) p(zt) Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  27. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Insights of p(zt |xt+1:T ) = Ep(zt+1 |xt+1:T ) [p(zt |zt+1 )] (3) This approach removes the explicit dependence on all future observations xt+1:T , allowing us to handle missing data Suppose the data points X = {xmi ti } are missing, rather than directly compute the dependence on an incomplete set of future observations p(zt|xt+1:T \X ) we can instead sample zt+1 under the filtering distribution conditioned on incomplete observations p(zt+1|xt+1:T \X ) and then compute p(zt|zt+1) given the sampled zt+1, thereby approximating p(zt|xt+1:T \X ) Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  28. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Outline 1 Methods Factorized Posterior Distributions Multimodal Fusion via Product of Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference 2 Experiments Datasets Inference Tasks Weakly Supervised Learning Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  29. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Factorized Variational Approximations (1) Define the variational posterior approximation q: q(zt|xm t ) ≡ ˜ q(zt|xm t )p(zt) ˜ q(zt|xm t ) is parameterized by a time-invariant neural network for each modality m We learn the Gaussian quotients ˜ q(zt|xm t ) directly, so as to avoid the constraint required for ensuring a quotient of Gaussians is well-defined: ˜ q(zt|xm t ) = q(zt|xm t ) p(zt) We also parameterize the transition dynamics p(zt|zt−1) and p(zt|zt+1) using neural networks for the quotient distributions Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  30. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Factorized Variational Approximations (2) Denote E← as a shorthand for the expectation under the approximate backward filtering distribution q(zt+1|xt+1:T ): p(zt|xt+1:T ) = Ep(zt+1|xt+1:T ) [p(zt|zt+1)] = E←[p(zt|zt+1)] Denote E→ as the expectation under the forward smoothing distribution q(zt−1|x1:T ): p(zt|x1:t−1) = Eq(zt−1|x1:T ) [p(zt|zt−1)] = E→[p(zt|zt−1)] Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  31. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Factorized Variational Approximations (3) 1 Backward Filtering (Variational Backward Algorithm) q(zt|xt:T ) ∝ E←[p(zt|zt+1)] m ˜ q(zt|xm t ) 2 Forward Smoothing (Variational Backward-Forward Algorithm) q(zt|x1:T ) ∝ E←[p(zt|zt+1)] m ˜ q(zt|xm t )E→[p(zt|zt−1)] p(zt) 3 Conditional Smoothing Posterior q(zt|zt−1, xt:T ) ∝ E←[p(zt|zt+1)] m ˜ q(zt|xm t ) p(zt|zt−1) p(zt) Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  32. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Variational Backward Algorithm function BackwardFilter(x1:T , K) Initialize q(zt|xT+1:T ) ← p(zT ) for t = T to 1 do Let M ⊂ [1, M] be the observed modailities at t q(zt|xt:T ) ← q(zt|xt+1:T ) M ˜ q(zt|xm t ) Sample K particles zk t ∼ q(zt|xt:T ) for k ∈ [1, K] Compute p(zt−1|zk t ) for each particle zk t q(zt−1|xt:T ) ← 1 K K k=1 p(zt−1|zk t ) end for return {q(zt|xt:T ), q(zt|xt+1:T ) for t ∈ [1, T]} end function Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  33. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Variational Backward Algorithm (Remarks) By reversing time: The algorithm gives us a variational forward algorithm that computes the forward filtering distribution q(zt |x1:t ) By setting the number of particles K = 1: The algorithm effectively computes the conditional filtering posterior q(zt |zt+1 , xt ) and conditional prior p(zt |zt+1 ) for a randomly sampled latent sequence z1:T Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  34. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Variational Backward-Forward Algorithm function ForwardSmooth(x1:T , Kb, Kf ) Initialize ˜ p(zt|x1:0) ← 1 Collect q(zt|xt+1:T ) from BackwardFilter(x1:T , Kb) for t = 1 to T do Let M ⊂ [1, M] be the observed modalities at t q(zt|x1:T ) ← q(zt|xt+1:T ) M [˜ q(zt|xm t )]q(zt|x1:t−1) p(zt) Sample Kf particles zt ∼ q(zt|x1:T ) for k ∈ [1, Kf ] Compute p(zt+1|zk t ) for each particle zk t q(zt+1|x1:t) ← 1 Kf Kf k=1 p(zt+1|zk t ) end for return {q(zt|x1:T ), q(zt|x1:t−1) for t ∈ [1, T]} end function Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  35. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Variational Backward-Forward Algorithm (Remarks) By setting the number of particles Kf = 1: The algorithm effectively computes the conditional smoothing posterior q(zt |zt−1 , xt:T ) and conditional prior p(zt |zt−1 ) for a randomly sampled latent sequence z1:T Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  36. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Knowing p(zt ) of Each t Variational Backward-Forward Algorithm requires knowing p(zt) for each t Sampling p(zt) in the forward pass We avoid the instability of sampling T successive latents with no observations by instead assuming p(zt) is constant with time, i.e. the MDMM is stationary when nothing is observed During training, we add KL p(zt) Ezt−1 p(zt|zt−1) + KL p(zt) Ezt+1 p(zt|zt+1) to the loss to ensure that the transition dynamics obey this assumption Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  37. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference ELBO for Backward Filtering The filtering ELBO: Lfilter = T t=1 [Eq(zt|xt:T ) log p(xt|zt)− Eq(zt+1|xt+1:T ) KL(q(zt|zt+1, xt) p(zt|zt+1))] It corresponds to a “backward filtering” variational posterior q(z1:T |x1:T ) = t q(zt|zz+1, xt) where each zt is only inferred using the current observation xt and the future latent state zt+1 Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  38. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference ELBO for Forward Smoothing The smoothing ELBO: Lsmooth = T t=1 [Eq(zt|x1:T ) log p(xt|zt)− Eq(zt−1|x1:T ) KL(zt|zt−1, xt:T ) p(zt|zt−1))] It corresponds to the correct factorization of the posterior p(z1:T |x1:T ) = p(z1|x1:T ) T t=2 p(zt|zt−1, xt:T ) where each term combines information from both past and future Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  39. Methods Experiments Factorized Posterior Distributions Multimodal Fusion via Product of

    Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference Backward-Forward Variational Inference (BFVI) Since Lsmooth corresponds to the correct factorization, it should theoretically be enough to minimize Lsmooth to learn good MDMM parameters θ, φ However, in order to compute Lsmooth, we must perform a backward pass which requires sampling under the backward filtering Hence, to accurately approximatee Lsmooth, the backward filtering distribution has to be reasonably accurate as well This motivates learning the parameters θ, φ by jointly maximizing the filtering and smoothing ELBOs as a weighted sum We call this paradigm BFVI due to its use of variational posteriors for both backward filtering and forward smoothing Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  40. Methods Experiments Datasets Inference Tasks Weakly Supervised Learning Outline 1

    Methods Factorized Posterior Distributions Multimodal Fusion via Product of Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference 2 Experiments Datasets Inference Tasks Weakly Supervised Learning Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  41. Methods Experiments Datasets Inference Tasks Weakly Supervised Learning MTS Dataset

    I: Noisy Spirals R ∈ 2U[−1,1) x(t) : {0, 1, 2 . . . 99} → R2: x(t) ≡ √ R · r(t) cos θ(t) + 0.1 · N 1 √ R · r(t) sin θ(t) + 0.1 · N r(0) . . . r(99), θ(0) . . . θ(99): r(0) ≡ 0.25 + U[0,0.5) . . . r(99) ≡ 2.25 + U[0,0.5) θ(0) ≡ U[0,π) . . . θ(99) ≡ U[4π,5π) or θ(0) ≡ U[0,−π) . . . θ(99) ≡ U[−4π,−5π) 5 latent dimensions 2 perceptron layers for encoding q(zt|xm t ) and decoding p(xm t |zt) Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  42. Methods Experiments Datasets Inference Tasks Weakly Supervised Learning MTS Dataset

    II: Weizmann Human Actions 90 videos of 9 people each performing 10 actions We converted it to a trimodal time series dataset by treating silhouette masks and an additional modality, and treating actions as per-frame labels We selected one person’s videos as the test set, and the other 80 videos as the training set, allowing us to test action label prediction on an unseen person 256 latent dimensions Convolutional / Deconvolutional neural networks for encoding and decoding Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  43. Methods Experiments Datasets Inference Tasks Weakly Supervised Learning Outline 1

    Methods Factorized Posterior Distributions Multimodal Fusion via Product of Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference 2 Experiments Datasets Inference Tasks Weakly Supervised Learning Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  44. Methods Experiments Datasets Inference Tasks Weakly Supervised Learning Temporal Inference

    Tasks 1 Reconstruction: reconstruction given complete observations 2 Drop Half: reconstruction after half of the inputs are randomly deleted 3 Forward Extrapolation: predicting the last 25% of a sequence when the reset is given 4 Backward Extrapolation: inferring the first 25% of a sequence when the reset is given Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  45. Methods Experiments Datasets Inference Tasks Weakly Supervised Learning Weizmann Human

    Actions Multimodal training Unimodal testing: we provided only video frames as input NO silhouette masks NO action labels Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  46. Methods Experiments Datasets Inference Tasks Weakly Supervised Learning Cross-Modal Inference

    Tasks 1 Conditional Generation for Spirals: given x coordinates and initial 25% of y coordinates, generate reset of spirals 2 Conditional Generation for Weizmann: given the video frames, generate the silhouette masks 3 Label Prediction for Weizmann: infer action labels given only video frames Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  47. Methods Experiments Datasets Inference Tasks Weakly Supervised Learning BFVI vs

    RNN-based Methods F-Mask and F-Skip Use forward RNNs, one per modality Use zero-masking and update skipping respectively B-Mask and B-Skip Use backward RNNs With masking and skipping respectively BFVI achieves high performance on all tasks, whereas RNN-based methods only perform well on a few; in particular, all methods besides BFVI do poorly on the conditional generation task RNN lack a principled approach to multimodal fusion, and hence fail to learn a latent space which captures the mutual information between action labels and images BFVI learns to both predict one modality from another, and to propagate informatiokn across time Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  48. Methods Experiments Datasets Inference Tasks Weakly Supervised Learning Outline 1

    Methods Factorized Posterior Distributions Multimodal Fusion via Product of Gaussians Approximate Filtering with Missing Data Backward-Forward Variational Inference 2 Experiments Datasets Inference Tasks Weakly Supervised Learning Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS
  49. Methods Experiments Datasets Inference Tasks Weakly Supervised Learning Two Forms

    of Weakly Supervised Learning Learning with data missing uniformly at random Noisy sensors Asynchronous sensors Learning with missing modalities Semi-supervised learning The dataset is partially unlabelled by annotators A fraction of the sequences in the dataset only has a single modality present Sensor break-down Minqi Pan Tan et al. 2019: Factorized Inference in DMM for IMTS