[TMLR 2026, Featured Certification] Double Bounded α-Divergence Optimization for Density Estimation

E2M: Double Bounded α-Divergence Optimization for Tensor-based Discrete Density Estimation
Published in Transactions on Machine Learning Research, 2026 @KazuGhalamkari Kazu Ghalamkari Technical University of Denmark Jesper Løve Hinrich Technical University of Denmark Morten Mørup Technical University of Denmark Published in TMLR 2026 with a Featured Certification

Non-negative tensor factorization for density estimation 2 = minimize +
⋯ + Discrete samples Low-rank model Empirical dist. (non-negative) (non-negative) Approximates the distribution behind the data.

⋯ + Discrete samples What kind of low-rank structure should we use? Low-rank model Empirical dist. (non-negative) (non-negative) Approximates the distribution behind the data.

⋯ + Discrete samples What kind of low-rank structure should we use? Low-rank model Empirical dist. (non-negative) (non-negative) Let’s mix various low-rank structures. The weights can be learned from data automatically. What kind of objective should we optimize? Let’s focus on α-divergence that enhances the noise robustness. How can we optimize the α-divergence? We generalized the EM-algorithm to optimize the α-divergence. Approximates the distribution behind the data.

KL-divergence is not a perfect measure 5 The KL-divergence has
a large penalty if and , which induces overfitting to noise. noise Good fitting ignores noise Overfit to noise due to the nature of the KL-divergence. Large penalty for KL noise Alternative divergences, such as α-divergence, reduce the weakness of the KL-divergence. KL-divergence Data Model α-divergence Hyper-parameter (α>0) Fitting with α=1.0 Fitting with α=0.5 Fitting with α=0.1

Double-bound strategy for alpha-divergence optimization 6 KL-divergence α-divergence M-step minimizes
with closed-form E-step E1-step E2-step Simple low-rank model Still no closed-form due to summation in the logarithm. M-step minimizes with closed-form

Derivation of the double bound for E2M-algorithm Minimizing Maximizing Low-rank
model Jensen's inequality For a concave function f where and E1-step E2-step Easier to find closed- form update rules

Complexity of E2M-algorithm M-step (Convex optimization problem ) Closed-form updates
for CPD Low-rank model Complexity CPD O(DNR) Tucker O(DNRD) Tensor-Train O(DNR2) Low-rank model where Closed-form updates for Tucker and Tensor-train can be found in our paper. Computational complexity of each iteration for the tensor dimension D, number of samples N, and rank R. E1-step E2-step E2-step

Reconstructed rank 30 CP tensor P with varying α. Noise-robustness
of the alpha-divergence 9 We can adjust the sensitivity to outliers and noise by the hyperparameter alpha. True distribution Observed T T with 50 noise Noise sensitive Noise robust class A class B 90 × 90 × 2 tensor α-divergence Hyper-parameter (α>0)

Comparison of the convergence performance Input T Reconstruction P Gradient-based
method requires learning-rate tuning. Proposed method No need for learning-rate tuning.

Comparison of the convergence performance Input T Reconstruction P Gradient-based
method requires learning-rate tuning. Proposed method No need for learning-rate tuning. Monotonic decrease of the objective function and convergence guarantee..

Python implementation with demo code is available 12 https://github.com/gkazunii/eemix/

Conclusion E1-step E2-step Double-bounded EM-algorithm with monotonic decrease Sensitivity control
by parameter α Closed-form updates (in many cases) Closed-form updates Closed-form updates Our paper covers a comparison with DNN-based methods, experiments with real datasets, and generalization to continuous settings.

[TMLR 2026, Featured Certification] Double Bou...

[TMLR 2026, Featured Certification] Double Bounded α-Divergence Optimization for Density Estimation

Kazu Ghalamkari

More Decks by Kazu Ghalamkari

Other Decks in Science

Featured

Transcript

E2M: Double Bounded α-Divergence Optimization for Tensor-based Discrete Density Estimation

Non-negative tensor factorization for density estimation 2 = minimize +

Non-negative tensor factorization for density estimation 3 = minimize +

Non-negative tensor factorization for density estimation 4 = minimize +

KL-divergence is not a perfect measure 5 The KL-divergence has

Double-bound strategy for alpha-divergence optimization 6 KL-divergence α-divergence M-step minimizes

Derivation of the double bound for E2M-algorithm Minimizing Maximizing Low-rank

Complexity of E2M-algorithm M-step (Convex optimization problem ) Closed-form updates

Reconstructed rank 30 CP tensor P with varying α. Noise-robustness

Comparison of the convergence performance Input T Reconstruction P Gradient-based

Comparison of the convergence performance Input T Reconstruction P Gradient-based

Python implementation with demo code is available 12 https://github.com/gkazunii/eemix/

Conclusion E1-step E2-step Double-bounded EM-algorithm with monotonic decrease Sensitivity control