Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[TMLR 2026, Featured Certification] Double Bou...

[TMLR 2026, Featured Certification] Double Bounded α-Divergence Optimization for Density Estimation

Authors: Kazu Ghalamkari, Jesper Løve Hinrich, Morten Mørup

Abstract: Tensor-based discrete density estimation requires flexible modeling and proper divergence criteria to enable effective learning; however, traditional approaches using α-divergence face analytical challenges due to the α-power terms in the objective function, which hinder the derivation of closed-form update rules. We present a generalization of the expectation-maximization (EM) algorithm, called the E2M algorithm. It circumvents this issue by first relaxing the optimization into the minimization of a surrogate objective based on the Kullback–Leibler (KL) divergence, which is tractable via the standard EM algorithm, and subsequently applying a tensor many-body approximation in the M-step to enable simultaneous closed-form updates of all parameters. Our approach offers flexible modeling for not only a variety of low-rank structures, including the CP, Tucker, and Tensor Train formats, but also their mixtures, thus allowing us to leverage the strengths of different low-rank structures. We evaluate the effectiveness of our approach on synthetic and real datasets, highlighting its superior convergence to gradient-based procedures, robustness to outliers, and favorable density estimation performance compared to prominent existing tensor-based methods.

Published in Transactions on Machine Learning Research (TMLR) with Featured Certification in 2026

[Video]
https://www.youtube.com/watch?v=adXhQ8roDGA

[Paper]
https://openreview.net/forum?id=954CjhXSXL

Avatar for Kazu Ghalamkari

Kazu Ghalamkari

April 08, 2026

More Decks by Kazu Ghalamkari

Other Decks in Science

Transcript

  1. E2M: Double Bounded α-Divergence Optimization for Tensor-based Discrete Density Estimation

    Published in Transactions on Machine Learning Research, 2026 @KazuGhalamkari Kazu Ghalamkari Technical University of Denmark Jesper Løve Hinrich Technical University of Denmark Morten Mørup Technical University of Denmark Published in TMLR 2026 with a Featured Certification
  2. Non-negative tensor factorization for density estimation 2 = minimize +

    ⋯ + Discrete samples Low-rank model Empirical dist. (non-negative) (non-negative) Approximates the distribution behind the data.
  3. Non-negative tensor factorization for density estimation 3 = minimize +

    ⋯ + Discrete samples What kind of low-rank structure should we use? Low-rank model Empirical dist. (non-negative) (non-negative) Approximates the distribution behind the data.
  4. Non-negative tensor factorization for density estimation 4 = minimize +

    ⋯ + Discrete samples What kind of low-rank structure should we use? Low-rank model Empirical dist. (non-negative) (non-negative) Let’s mix various low-rank structures. The weights can be learned from data automatically. What kind of objective should we optimize? Let’s focus on α-divergence that enhances the noise robustness. How can we optimize the α-divergence? We generalized the EM-algorithm to optimize the α-divergence. Approximates the distribution behind the data.
  5. KL-divergence is not a perfect measure 5 The KL-divergence has

    a large penalty if and , which induces overfitting to noise. noise Good fitting ignores noise Overfit to noise due to the nature of the KL-divergence. Large penalty for KL noise Alternative divergences, such as α-divergence, reduce the weakness of the KL-divergence. KL-divergence Data Model α-divergence Hyper-parameter (α>0) Fitting with α=1.0 Fitting with α=0.5 Fitting with α=0.1
  6. Double-bound strategy for alpha-divergence optimization 6 KL-divergence α-divergence M-step minimizes

    with closed-form E-step E1-step E2-step Simple low-rank model Still no closed-form due to summation in the logarithm. M-step minimizes with closed-form
  7. Derivation of the double bound for E2M-algorithm Minimizing Maximizing Low-rank

    model Jensen's inequality For a concave function f where and E1-step E2-step Easier to find closed- form update rules
  8. Complexity of E2M-algorithm M-step (Convex optimization problem ) Closed-form updates

    for CPD Low-rank model Complexity CPD O(DNR) Tucker O(DNRD) Tensor-Train O(DNR2) Low-rank model where Closed-form updates for Tucker and Tensor-train can be found in our paper. Computational complexity of each iteration for the tensor dimension D, number of samples N, and rank R. E1-step E2-step E2-step
  9. Reconstructed rank 30 CP tensor P with varying α. Noise-robustness

    of the alpha-divergence 9 We can adjust the sensitivity to outliers and noise by the hyperparameter alpha. True distribution Observed T T with 50 noise Noise sensitive Noise robust class A class B 90 × 90 × 2 tensor α-divergence Hyper-parameter (α>0)
  10. Comparison of the convergence performance Input T Reconstruction P Gradient-based

    method requires learning-rate tuning. Proposed method No need for learning-rate tuning.
  11. Comparison of the convergence performance Input T Reconstruction P Gradient-based

    method requires learning-rate tuning. Proposed method No need for learning-rate tuning. Monotonic decrease of the objective function and convergence guarantee..
  12. Conclusion E1-step E2-step Double-bounded EM-algorithm with monotonic decrease Sensitivity control

    by parameter α Closed-form updates (in many cases) Closed-form updates Closed-form updates Our paper covers a comparison with DNN-based methods, experiments with real datasets, and generalization to continuous settings.