Tensor Factorization Meets Deformed Information Geometry: Convex Relaxation under Deformed Algebra

Tensor Factorization Meets Deformed Information Geometry: Convex Relaxation under Deformed
Algebra @ Technical University of Denmark Kazu Ghalamkari Current and Future Computational Approaches to Quantum Many-Body Systems 2026 (CompQMB2026), Okinawa, 5 Mar. 2026 @KazuGhalamkari Accepted in AISTATS 2026

4 Tensor decomp. Pattern extraction Interaction, Energy Mean-field approximation Tensor
Modeling Optimization Information geometry At the intersection of Informatics, Physics, and Geometry. Pattern extraction and information reduction by tensor factorization Modeling with physics, e.g., interaction, energy, and mean-field Optimization via information geometry — the geometry of distributions Flatness

Applications for data analysis, data compression, data mining, pattern recognition,
and denoising.. Non-negative tensor factorization 5 = minimize + ⋯ + Discrete samples How about other objectives, such as the KL-divergence? Tensor decomposition is often ill-posed or NP-hard. ・ The rank-1 decomp. for minimizing L2 norm is NP-hard. The objective function is typically non-convex. ・Initial values dependency No guarantee to be optimal

Applications for data analysis, data compression, data mining, pattern recognition,
and denoising.. Non-negative tensor factorization 6 = minimize + ⋯ + Discrete samples Low-rank model Empirical dist. How about other objectives, such as the KL-divergence? Approximates the distribution behind the data (MLE) (non-negative) (non-negative) Relative entropy

Convex rank-1 decomposition minimizing the KL-div. Rank-1 tensors Flat manifold
▪ Rank-1 decomposition minimizing the KL div. is a convex optimization problem Limited capability Convex optimization Large capability Non-convex optimization increasing rank Rank-R tensors Non-flat manifold + ⋯ + Destroy flattens of the model space 7

Many-body approximation ensures the flattens of the model space Rank-1
tensors Flat manifold ▪ Rank-1 decomposition minimizing the KL div. is a convex optimization problem Limited capability Convex optimization Large capability Non-convex optimization increasing rank increasing high-order interactions among tensor modes Large capability Convex optimization Low-body tensors Flat manifold Rank-R tensors Non-flat manifold Energy-based modeling flattens the model space Convex optimization ensures global optimality Many-body approximation for tensors Ghalamkari, K., et. al (2023) 8

The KL-divergence is not perfect measure 9 The KL-divergence has
a large penalty if and , which induces overfitting to noise. noise Good fitting ignores noise Overfit to noise due to the nature of the KL-divergence. Large penalty for KL noise Alternative divergences, such as q-divergence, reduce this weakness. KL-divergence Data Model q-divergence Hyper-parameter (q>0) Fitting with q=1.0 Fitting with q=0.5 Fitting with q=0.1

Convex rank-1 decomposition minimizing the KL-div. Convex optimization Non-convex optimization
Replacing the objective Modify coordinate system Convex optimization increasing high-order interactions Convex optimization Large capability deformed product Deformed rank-1 approx. Deformed many-body approximation 10

We can adjust model properties by changing function. Deformed many-body
approximation for non-negative tensors Natural parameter of deformed exponential family. Energy function Free energy Examples: Temperature parameter ❖ Tsallis deformation ❖ Kaniadakis deformation ⇒ ⇒ ⇒ ❖ Standard exponential function χ-exponential and logarithm function For any increasing function 14 (appeared in statistical mechanics) (appeared in theory of relativity) Without deformation

Deformed many-body approximation for non-negative tensors 15 Control relation between
mode-k and mode-l. Control relation among mode-j, -k and -l.

Deformed many-body approximation for non-negative tensors One-body approx. Deformed product
Examples: ⇒ ❖ Tsallis product ⇒ Deformed rank-1 approximation （deformed mean-field approximation） 16 ⇒ Intermediate between standard sum and standard product

Deformed many-body approximation for non-negative tensors 17 One-body approx. Two-body
approx. The global optimal solution minimizing from can be obtained by a convex optimization. Control relation between mode-k and mode-l. Deformed rank-1 approximation （deformed mean-field approximation） Larger Capability

Deformed many-body approximation for non-negative tensors 18 One-body approx. Two-body
approx. Three-body approx. Larger Capability The global optimal solution minimizing from can be obtained by a convex optimization. Intuitive modeling focusing on interactions between modes Control relation between mode-k and mode-l. Control relation among mode-j, -k and -l. Two-body Interaction Three-body Interaction Deformed rank-1 approximation （deformed mean-field approximation）

Theoretical idea behind the proposal Dually-flat manifold generated by convex
functions Non-negative normalized tensors as deformed exponential family For any increasing function χ-exponential and logarithm function -deformed-exponential family χ-free energy χ-entropy Legendre transform The linear constraint to θ ⇔ eχ -flat manifold The linear constraint to η ⇔ mχ -flat manifold The projection onto eχ -flat manifold globally minimizes the Bregman divergence generated by . The projection onto mχ -flat manifold globally minimizes the Bregman divergence generated by . Natural parameter

The projection onto eχ -flat manifold globally minimizes the Bregman
divergence generated by . Theoretical idea behind the proposal Legendre transform where the escort is defined as Bregman divergence generated by = χ-entropy 20

The projection onto eχ -flat manifold globally minimizes the Bregman
divergence generated by . Theoretical idea behind the proposal Legendre transform where the escort is defined as Bregman divergence generated by = χ-entropy Examples of χ-divergence KL-divergence. q-divergence Model space of deformed MBA is constrained linear to θ 21

Optimization via deformed natural gradient method Natural gradient method update
update Deformed Fisher information matrix Always finds the globally optimal solution No initial value dependency Legendre transform Repeat Until convergence Riemannian metric in θ-space First-order derivative of Cumulative sum of escort distribution We estimate by the bisection method. Input Output Deformed Fisher information matrix zeta-function :tensor indices But… how to choose interactions to be reduced? 22

Example tensor reconstruction by proposal, χ(t)=t Larger capability Reconstruction for
40×40×3×10 tensor. (width, height, colors, # images) Color depends on image index Shape of each image Color is uniform within each image. Intuitive model designable that captures the relationship between modes Color depends on pixel Three-body Approx. 23

Color image is decomposed into shape × color Approximation 40×40×3×10
24

Color image is decomposed into shape × color × ≃
= Shape of each image Color of each image 40×40×3×10 40×40×10 3×10 But… how to choose the deformed function ? 25

Tsallis deformed three-body approx. in noisy settings; χ(t)=tq More noisy
Less noisy Temperature q controls the sensitivity against the noise. Vanilla MBA (q=1.0) Tsallis MBA (q=0.8) Tsallis MBA (q=0.6) Tsallis MBA (q=0.4) Tsallis MBA (q=0.2) Noisy Image True Image Better Worse 26

Kaniadakis deformed three-body approx. in noisy settings More noisy Less
noisy Parameter κ enhances the sensitivity against the noise. Vanilla MBA (κ=0.0) Noisy Image True Image Kani’s MBA (κ=0.8) Kani’s MBA (κ=0.6) Kani’s MBA (κ=0.4) Kani’s MBA (κ=0.2) Better Worse 27

Deformed low-rank approximation increasing high-order interactions increasing rank Limited capability
Convex optimization Large capability Convex optimization Large capability Non-convex optimization Deformed rank-1 approx. Deformed many-body approximation Deformed rank-R approx. No longer convex, Is there a benefit? Yes! Deformation induces implicit regulation! Restricting effective number of parameters: Smaller q prevent overfitting 28

… Deformed CP decompositions by em-method 30 Non-convex 3-dim tensor
space Non-flat manifold Deformed low-rank tensors Deformed low-rank decomposition for 3rd order tensor Low-body tensors eq -flat (3+1)-dim tensor space Linear condition of θq -parameter. Convex mq -projection Linear condition of η1 -parameter. A series of 3rd order tensor is a 4th order tensor Convex e1 -projection m1 -flat

Deformed CP decompositions by em-method 31 Low-body tensors eq -flat
(3+1)-dim tensor space Convex mq -projection Convex e1 -projection m1 -flat em-based non-negative low-rank approx. e-step: e1 -projection onto m1 -flat manifold m-step: mq -projection onto eq -flat manifold q-deformed many-body approximation The optimal update is given as where which is show by the inequality that the e-step tightens the bound as and the fact Convergence guarantees

Noisy data reconst. by deformed low-rank approx. ① Traditional model
Training error True Noisy data Low-rank model Test error ② q-deformed model Noisy image reconstruction ① Traditional CP model ② Deformed CP model Training error Test error Smaller q leads to robustness against the noise. (Known effect of the q-divergence) Overfitting Smaller q leads to regularization KL-div. (q=1.0) KL-div. (q=1.0) Small q Small q KL-div. (q=1.0) Small q 32 Deformed rank

Implicit regularization induced by Tsallis deformation Training error Test error
② Deformed model Training error Test error Overfitting KL-div. (q=1.0) KL-div. (q=1.0) Small q Small q KL-div. (q=1.0) Small q Reconstructions with q=0.5 Reconstructions with q=0.5 No noise even with larger ranks Overfit to noise if the rank is large. The model’s capability increases as the rank increases. For small 𝑞, the model capacity remains limited despite a large deformed rank. Theorem. tensor order ① Traditional model 33

Regularization in discrete density estimation Training samples Discrete density estimation
① Traditional model ② q-deformed model Baseline ① w/o. deformation ② Proposed method w. deformation better worse better worse More over-fitting Less over-fitting Tsallis-deformation induces implicit regularization and prevents overfitting 34

Summary □ Deformed low-rank approximation □ Deformed many-body approximation for
non-negative tensors Global optimization of a wide family of divergences, χ-divergence One-body Approx. Three-body Approx. Two-body Approx. Noise sensitive Noise robust Non-flat manifold Non-convex 37 Visit high-dimensional space to seek flatness. The deformation flexibly adjusts the model’s behavior. Smaller q leads to implicit regularization Visit high-dimensional space to seek flatness. m-step e-step flat manifold flat manifold e-step

Tensor Factorization Meets Deformed Information...

Tensor Factorization Meets Deformed Information Geometry: Convex Relaxation under Deformed Algebra

Kazu Ghalamkari

More Decks by Kazu Ghalamkari

Other Decks in Science

Featured

Transcript

Tensor Factorization Meets Deformed Information Geometry: Convex Relaxation under Deformed

4 Tensor decomp. Pattern extraction Interaction, Energy Mean-field approximation Tensor

Applications for data analysis, data compression, data mining, pattern recognition,

Applications for data analysis, data compression, data mining, pattern recognition,

Convex rank-1 decomposition minimizing the KL-div. Rank-1 tensors Flat manifold

Many-body approximation ensures the flattens of the model space Rank-1

The KL-divergence is not perfect measure 9 The KL-divergence has

Convex rank-1 decomposition minimizing the KL-div. Convex optimization Non-convex optimization

We can adjust model properties by changing function. Deformed many-body

Deformed many-body approximation for non-negative tensors 15 Control relation between

Deformed many-body approximation for non-negative tensors One-body approx. Deformed product

Deformed many-body approximation for non-negative tensors 17 One-body approx. Two-body

Deformed many-body approximation for non-negative tensors 18 One-body approx. Two-body

Theoretical idea behind the proposal Dually-flat manifold generated by convex

The projection onto eχ -flat manifold globally minimizes the Bregman

The projection onto eχ -flat manifold globally minimizes the Bregman

Optimization via deformed natural gradient method Natural gradient method update

Example tensor reconstruction by proposal, χ(t)=t Larger capability Reconstruction for

Color image is decomposed into shape × color Approximation 40×40×3×10

Color image is decomposed into shape × color × ≃

Tsallis deformed three-body approx. in noisy settings; χ(t)=tq More noisy

Kaniadakis deformed three-body approx. in noisy settings More noisy Less

Deformed low-rank approximation increasing high-order interactions increasing rank Limited capability

… Deformed CP decompositions by em-method 30 Non-convex 3-dim tensor

Deformed CP decompositions by em-method 31 Low-body tensors eq -flat

Noisy data reconst. by deformed low-rank approx. ① Traditional model

Implicit regularization induced by Tsallis deformation Training error Test error

Regularization in discrete density estimation Training samples Discrete density estimation

Summary □ Deformed low-rank approximation □ Deformed many-body approximation for