Non-negative low-rank approximations for multi-dimensional arrays on statistical manifold

Slide 1

Slide 1 text

Non-negative low-rank approximations for multi-dimensional arrays on statistical manifold Kazu Ghalamkari1,2, Mahito Sugiyama1,2 International Conference on Information Geometry for Data Science (IG4DS 2022) 1 : The Graduate University for Advanced Studies, SOKENDAI 2 : National Institute of Informatics

Slide 2

Slide 2 text

Motivation □ Non-negative low-rank approximation of data with various structures 2 Approximates with a linear combination of fewer bases (principal components) for feature extraction, memory reduction, and pattern discovery.😀 ≃ ≃ ≃ ≃

Slide 3

Slide 3 text

Motivation □ Non-negative low-rank approximation of data with various structures 3 Approximates with a linear combination of fewer bases (principal components) for feature extraction, memory reduction, and pattern discovery.😀 ≃ ≃ ≃ ≃ Non-negative constraint improves interpretability

Slide 4

Slide 4 text

Motivation □ Non-negative low-rank approximation of data with various structures 4 Approximates with a linear combination of fewer bases (principal components) for feature extraction, memory reduction, and pattern discovery.😀 ≃ ≃ ≃ ≃ Low-rank approximation with non-negative constraints are based on gradient methods. → Appropriate settings for stopping criteria, learning rate, and initial values are necessary 😢 Non-negative constraint improves interpretability

Slide 5

Slide 5 text

Strategy □ Modeling with probability mass function on Directed Acyclic Graph(DAG). 5 □ Modeling with probability mass function on Directed Acyclic Graph(DAG). □ Utilize projection theory of information geometry.

Slide 6

Slide 6 text

Strategy □ Modeling with probability mass function on Directed Acyclic Graph(DAG). 6 □ Modeling with probability mass function on Directed Acyclic Graph(DAG). □ Utilize projection theory of information geometry.

Slide 7

Slide 7 text

Strategy □ Modeling with probability mass function on Directed Acyclic Graph(DAG). □ Utilize projection theory of information geometry. 7

Slide 8

Slide 8 text

Strategy □ Modeling with probability mass function on Directed Acyclic Graph(DAG). □ Utilize projection theory of information geometry. 8

Slide 9

Slide 9 text

Contribution □ LTR: Faster Tucker-rank Reduction 9 No worries about initial values, stopping criterion and learning rate 😄 Information Geometric Analysis using Distributions on DAGs that Correspond to Data Structures

Slide 10

Slide 10 text

Contribution □ LTR: Faster Tucker-rank Reduction 10 No worries about initial values, stopping criterion and learning rate 😄 Information Geometric Analysis using Distributions on DAGs that Correspond to Data Structures Rank-1 = rank 1,1,1

Slide 11

Slide 11 text

Contribution □ LTR: Faster Tucker-rank Reduction 11 □ A1GM: Faster rank-1 missing NMF No worries about initial values, stopping criterion and learning rate 😄 Solve the task as a coupled NMF. Find the most dominant factor rapidly. Missing value Rank-1 = rank 1,1,1 Information Geometric Analysis using Distributions on DAGs that Correspond to Data Structures

Slide 12

Slide 12 text

Contents 12 □ Introduction of log-linear model on DAG □ The best rank-1 approximation formula □ Legendre Tucker-Rank Reduction(LTR) □ The best rank-1 NMMF □A1GM: faster rank-1 missing NMF □ Motivation, Strategy, and Contributions github.com/gkazunii/A1GM github.com/gkazunii/Legendre-tucker-rank-reduction □ Theoretical Remarks □ Conclusion 3:00

Slide 13

Slide 13 text

Modeling tensor and matrix 13 □ Flexible modeling is required to capture the structure of various data Formulate low-rank approximations with probabilistic models on DAGs

Slide 14

Slide 14 text

□DAG(poset) is a DAG ⇔ for all 𝑠1 , 𝑠2 , 𝑠3 ∈ the following three properties are satisfied. (1) Reflexivity ∶ 𝑠1 ≤ 𝑠1 (2) Antisymmetry: 𝑠1 ≤ 𝑠2 , 𝑠2 ≤ 𝑠1 ⇒ 𝑠1 = 𝑠2 (3)Transitivity:𝑠1 ≤ 𝑠2 , 𝑠2 ≤ 𝑠3 ⇒ 𝑠1 ≤ 𝑠3 Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifold“(2017) ICML. 14 Log-linear model on Directed Acyclic Graph (DAG)

Slide 15

Slide 15 text

□DAG(poset) is a DAG ⇔ for all 𝑠1 , 𝑠2 , 𝑠3 ∈ the following three properties are satisfied. (1) Reflexivity ∶ 𝑠1 ≤ 𝑠1 (2) Antisymmetry: 𝑠1 ≤ 𝑠2 , 𝑠2 ≤ 𝑠1 ⇒ 𝑠1 = 𝑠2 (3)Transitivity:𝑠1 ≤ 𝑠2 , 𝑠2 ≤ 𝑠3 ⇒ 𝑠1 ≤ 𝑠3 □ log-linear model on DAG We define the log-linear model on a DAG as a mapping 𝑝: → 0,1 ．Natural parameters 𝜽 describe the model. 𝜃-space Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifold“(2017) ICML. 15 Log-linear model on Directed Acyclic Graph (DAG)

Slide 16

Slide 16 text

is a DAG ⇔ for all 𝑠1 , 𝑠2 , 𝑠3 ∈ the following three properties are satisfied. □DAG(poset) (1) Reflexivity ∶ 𝑠1 ≤ 𝑠1 (2) Antisymmetry: 𝑠1 ≤ 𝑠2 , 𝑠2 ≤ 𝑠1 ⇒ 𝑠1 = 𝑠2 (3)Transitivity:𝑠1 ≤ 𝑠2 , 𝑠2 ≤ 𝑠3 ⇒ 𝑠1 ≤ 𝑠3 □ log-linear model on DAG We define the log-linear model on a DAG as a mapping 𝑝: → 0,1 ．Natural parameters 𝜽 describe the model. 𝜃-space 𝜂-space We can also describe the model by expectation parameters 𝜼 with Möbius function. Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifold“(2017) ICML. 16 Log-linear model on Directed Acyclic Graph (DAG)

Slide 17

Slide 17 text

Contents 17 □ Introduction of log-linear model on DAG □ The best rank-1 approximation formula □ Legendre Tucker-Rank Reduction(LTR) □ The best rank-1 NMMF □A1GM: faster rank-1 missing NMF □ Motivation, Strategy, and Contributions github.com/gkazunii/A1GM github.com/gkazunii/Legendre-tucker-rank-reduction □ Theoretical Remarks □ Conclusion 4:30

Slide 18

Slide 18 text

Introducing DAGs for Tensor 18

Slide 19

Slide 19 text

Introducing DAGs for Tensor 19

Slide 20

Slide 20 text

Introducing DAGs for Tensor 20

Slide 21

Slide 21 text

Describe a tensor with (θ,η) 21

Slide 22

Slide 22 text

Describe a tensor with (θ,η) 22

Slide 23

Slide 23 text

Describe a tensor with (θ,η) 23 Möbius inversion formula

Slide 24

Slide 24 text

Describe a tensor with (θ,η) 24 Möbius inversion formula

Slide 25

Slide 25 text

Describe a tensor with (θ,η) 25 Random variables Sample space Probability values Relation between distribution and tensor Möbius inversion formula ： 𝑖, 𝑗, 𝑘 , indices of the tensor ： index set ： tensor values 𝒫𝑖𝑗𝑘

Slide 26

Slide 26 text

One-body and many-body parameters 26 One-body parameter Many-body parameter

Slide 27

Slide 27 text

𝜽-representation of rank-1 tensor 27 One-body parameter Many-body parameter Rank-1 condition (𝜽-representation) Its all many-body 𝜃-parameters are 0. Rank-1 subspace

Slide 28

Slide 28 text

𝜽-representation of rank-1 tensor 28 One-body parameter Many-body parameter Rank-1 subspace Rank-1 condition (𝜽-representation) Its all many-body 𝜃-parameters are 0. is e-flat. The projection is unique.

Slide 29

Slide 29 text

𝜽-representation of rank-1 tensor 29 One-body parameter Many-body parameter We can find the projection destination by a gradient-method. But gradient-methods require Appropriate settings for stopping criteria, learning rate, and initial values 😢 Rank-1 subspace Rank-1 condition (𝜽-representation) Its all many-body 𝜃-parameters are 0. is e-flat. The projection is unique.

Slide 30

Slide 30 text

𝜽-representation of rank-1 tensor 30 Let us describe the rank-1 condition with the 𝜂-parameter. is e-flat. The projection is unique. One-body parameter Many-body parameter Rank-1 subspace Its all many-body 𝜃-parameters are 0. Rank-1 condition (𝜽-representation) We can find the projection destination by a gradient-method. But gradient-methods require Appropriate settings for stopping criteria, learning rate, and initial values 😢

Slide 31

Slide 31 text

𝜼-representation of rank-1 tensor 31 One-body parameter Many-body parameter Rank-1 subspace 𝜂𝑖𝑗𝑘 = 𝜂𝑖11 𝜂1𝑗1 𝜂11𝑘 Rank-1 condition (𝜼- representation) Rank-1 condition (𝜽-representation) Its all many-body 𝜃-parameters are 0. Rank-1 subspace

Slide 32

Slide 32 text

𝜼-representation of rank-1 tensor 32 The m-projection does not change one-body η-parameter = = = Shun-ichi Amari, Information Geometry and Its Applications, 2008, Theorem 11.6 One-body parameter Many-body parameter 𝜂𝑖𝑗𝑘 = 𝜂𝑖11 𝜂1𝑗1 𝜂11𝑘 Rank-1 condition (𝜼- representation) Rank-1 condition (𝜽-representation) Rank-1 subspace Its all many-body 𝜃-parameters are 0.

Slide 33

Slide 33 text

ҧ 𝜂𝑖𝑗𝑘 = ҧ 𝜂𝑖11 ҧ 𝜂1𝑗1 ҧ 𝜂11𝑘 Find the best rank-1 approximation 33 One-body parameter Many-body parameter Rank-1 condition (𝜼- representation) Rank-1 condition (𝜽-representation) Rank-1 subspace Its all many-body 𝜃-parameters are 0.

Slide 34

Slide 34 text

ҧ 𝜂𝑖𝑗𝑘 = ҧ 𝜂𝑖11 ҧ 𝜂1𝑗1 ҧ 𝜂11𝑘 Find the best rank-1 approximation 34 One-body parameter Many-body parameter Möbius inversion formula = 𝜂𝑖11 𝜂1𝑗1 𝜂11𝑘 Rank-1 condition (𝜼- representation) Rank-1 condition (𝜽-representation) Rank-1 subspace All 𝜼-parameters after the projection are identified. Using inversion formula, we found the projection destination. Its all many-body 𝜃-parameters are 0.

Slide 35

Slide 35 text

The best rank-1 approximation of 𝒫 ∈ ℝ>0 𝐼×𝐽×𝐾 is given as which minimizes KL divergence from 𝒫. Best rank-1 tensor formula for minimizing KL divergence (𝑑 = 3 ) 35 Mean-field approximation and rank-1 approximation We reproduce the result in K.Huang, et al. "Kullback-Leibler principal component for tensors is not NP-hard." ACSSC 2017 9:00

Slide 36

Slide 36 text

The best rank-1 approximation of 𝒫 ∈ ℝ>0 𝐼×𝐽×𝐾 is given as which minimizes KL divergence from 𝒫. Best rank-1 tensor formula for minimizing KL divergence (𝑑 = 3 ) 36 By the way, Frobenius error minimization is NP-hard Mean-field approximation and rank-1 approximation We reproduce the result in K.Huang, et al. "Kullback-Leibler principal component for tensors is not NP-hard." ACSSC 2017

Slide 37

Slide 37 text

The best rank-1 approximation of 𝒫 ∈ ℝ>0 𝐼×𝐽×𝐾 is given as which minimizes KL divergence from 𝒫. A tensor with 𝑑 indices is a joint distribution with 𝑑 random variables. A vector with only 1 index is an independent distribution with only one random variable. Best rank-1 tensor formula for minimizing KL divergence (𝑑 = 3 ) 37 By the way, Frobenius error minimization is NP-hard Mean-field approximation and rank-1 approximation We reproduce the result in K.Huang, et al. "Kullback-Leibler principal component for tensors is not NP-hard." ACSSC 2017 Normalized vector depending on only 𝑖 Normalized vector depending on only 𝑗 Normalized vector depending on only 𝑘

Slide 38

Slide 38 text

The best rank-1 approximation of 𝒫 ∈ ℝ>0 𝐼×𝐽×𝐾 is given as which minimizes KL divergence from 𝒫. A tensor with 𝑑 indices is a joint distribution with 𝑑 random variables. A vector with only 1 index is an independent distribution with only one random variable. Rank-1 approximation approximates a joint distribution by a product of independent distributions. Best rank-1 tensor formula for minimizing KL divergence (𝑑 = 3 ) 38 By the way, Frobenius error minimization is NP-hard Mean-field approximation and rank-1 approximation We reproduce the result in K.Huang, et al. "Kullback-Leibler principal component for tensors is not NP-hard." ACSSC 2017 Mean-field approximation : a methodology in physics for reducing a many-body problem to a one-body problem. Normalized vector depending on only 𝑖 Normalized vector depending on only 𝑗 Normalized vector depending on only 𝑘

Slide 39

Slide 39 text

MFA of Boltzmann-machine 𝑝 𝒙 = 1 𝑍(𝜽) exp ෍ 𝑖 𝜃𝑖 𝑥𝑖 + ෍ 𝑖<𝑗 𝜃𝑖𝑗 𝑥𝑖 𝑥𝑗 𝐷𝐾𝐿 𝑝, Ƹ 𝑝 𝜂𝑖 = ෍ 𝑥1=0 1 ⋯ ෍ 𝑥𝑛=0 1 𝑥𝑖 𝑝 𝒙 39 Interaction Bias Mean-field approximation and rank-1 approximation

Slide 40

Slide 40 text

MFA of Boltzmann-machine 𝑝 𝒙 = 1 𝑍(𝜽) exp ෍ 𝑖 𝜃𝑖 𝑥𝑖 + ෍ 𝑖<𝑗 𝜃𝑖𝑗 𝑥𝑖 𝑥𝑗 𝐷𝐾𝐿 𝑝, Ƹ 𝑝 𝜂𝑖 = ෍ 𝑥1=0 1 ⋯ ෍ 𝑥𝑛=0 1 𝑥𝑖 𝑝 𝒙 40 Interaction Bias Mean-field approximation and rank-1 approximation = 1 𝑍(𝜽) exp ෍ 𝑖 𝜃𝑖 𝑥𝑖 = 𝑝 𝑥1 … 𝑝(𝑥𝑛 )

Slide 41

Slide 41 text

𝑂 2𝑛 𝐷𝐾𝐿 𝑝, Ƹ 𝑝 𝐷𝐾𝐿 Ƹ 𝑝𝑒 , 𝑝 ҧ 𝜂𝑖 = sigmoid 𝜃𝑖 + ෍ 𝑘 𝜃𝑘𝑗 ҧ 𝜂𝑘 41 Mean-field approximation and rank-1 approximation MF equations MFA of Boltzmann-machine 𝑝 𝒙 = 1 𝑍(𝜽) exp ෍ 𝑖 𝜃𝑖 𝑥𝑖 + ෍ 𝑖<𝑗 𝜃𝑖𝑗 𝑥𝑖 𝑥𝑗 𝜂𝑖 = ෍ 𝑥1=0 1 ⋯ ෍ 𝑥𝑛=0 1 𝑥𝑖 𝑝 𝒙 𝑂 2𝑛 Interaction Bias

Slide 42

Slide 42 text

Rank-1 approximation 𝑝𝜃 (𝑖, 𝑗, 𝑘) = exp ෍ 𝑖′=1 𝑖 ෍ 𝑗′=1 𝑗 ෍ 𝑘′=1 𝑘 𝜃𝑖′𝑗′𝑘′ 𝑂 2𝑛 𝐷𝐾𝐿 𝑝, Ƹ 𝑝 𝐷𝐾𝐿 𝑝, Ƹ 𝑝 MF equations Set of products of independent distributions 𝜂𝑖11 = ෍ 𝑗′=1 𝐽 ෍ 𝑘′=1 𝐾 𝒫𝑖𝑗′𝑘′ ҧ 𝜂𝑖 = sigmoid 𝜃𝑖 + ෍ 𝑘 𝜃𝑘𝑗 ҧ 𝜂𝑘 42 Mean-field approximation and rank-1 approximation 𝐷𝐾𝐿 Ƹ 𝑝𝑒 , 𝑝 MFA of Boltzmann-machine 𝑝 𝒙 = 1 𝑍(𝜽) exp ෍ 𝑖 𝜃𝑖 𝑥𝑖 + ෍ 𝑖<𝑗 𝜃𝑖𝑗 𝑥𝑖 𝑥𝑗 𝜂𝑖 = ෍ 𝑥1=0 1 ⋯ ෍ 𝑥𝑛=0 1 𝑥𝑖 𝑝 𝒙 𝑂 2𝑛 Interaction Bias

Slide 43

Slide 43 text

Rank-1 approximation 𝑝𝜃 (𝑖, 𝑗, 𝑘) = exp ෍ 𝑖′=1 𝑖 ෍ 𝑗′=1 𝑗 ෍ 𝑘′=1 𝑘 𝜃𝑖′𝑗′𝑘′ 𝑂 2𝑛 𝑂 𝐼𝐽𝐾 𝐷𝐾𝐿 𝑝, Ƹ 𝑝 𝐷𝐾𝐿 𝑝, Ƹ 𝑝 MF equations Set of products of independent distributions 𝜂𝑖11 = ෍ 𝑗′=1 𝐽 ෍ 𝑘′=1 𝐾 𝒫𝑖𝑗′𝑘′ ҧ 𝜂𝑖 = sigmoid 𝜃𝑖 + ෍ 𝑘 𝜃𝑘𝑗 ҧ 𝜂𝑘 𝐶𝑜𝑚𝑝𝑢𝑡𝑎𝑏𝑙𝑒 43 Mean-field approximation and rank-1 approximation 𝐷𝐾𝐿 Ƹ 𝑝𝑒 , 𝑝 MFA of Boltzmann-machine 𝑝 𝒙 = 1 𝑍(𝜽) exp ෍ 𝑖 𝜃𝑖 𝑥𝑖 + ෍ 𝑖<𝑗 𝜃𝑖𝑗 𝑥𝑖 𝑥𝑗 𝜂𝑖 = ෍ 𝑥1=0 1 ⋯ ෍ 𝑥𝑛=0 1 𝑥𝑖 𝑝 𝒙 𝑂 2𝑛 Interaction Bias

Slide 44

Slide 44 text

44 Mean-field approximation and rank-1 approximation Minimizing KL divergence Minimizing inverse-KL divergence Rank-1 approximation Mean-field Approximation of BM impossible Closed-formula 𝜂𝑖 = σ 𝜃𝑖 + ෍ 𝑘 𝜃𝑘𝑗 𝜂𝑘 𝑂 2𝑛 m-projection e-projection Projection onto e-flat space Projection onto e-flat space 44 unique unique not unique

Slide 45

Slide 45 text

Contents 45 □ Introduction of log-linear model on DAG □ The best rank-1 approximation formula □ Legendre Tucker-Rank Reduction(LTR) □ The best rank-1 NMMF □A1GM: faster rank-1 missing NMF □ Motivation, Strategy, and Contributions github.com/gkazunii/A1GM github.com/gkazunii/ Legendre-tucker-rank-reduction □ Theoretical Remarks □ Conclusion 12:00

Slide 46

Slide 46 text

Slide 47

Slide 47 text

𝜃(3) = 𝜃111 𝜃211 𝜃311 𝜃121 0 0 𝜃131 0 0 𝜃112 0 0 0 0 0 0 0 0 𝜃113 0 0 0 0 0 0 0 0 𝜃(1) = 𝜃111 𝜃121 𝜃131 𝜃112 0 0 𝜃113 0 0 𝜃211 0 0 0 0 0 0 0 0 𝜃311 0 0 0 0 0 0 0 0 𝜃(2) = 𝜃111 𝜃211 𝜃311 𝜃112 0 0 𝜃311 0 0 𝜃121 0 0 0 0 0 0 0 0 𝜃131 0 0 0 0 0 0 0 0 Formulate Tucker rank reduction by relaxing the rank-1 condition 𝜃𝑖𝑗𝑘 = 0 𝜃112 𝜃131 𝜃121 𝜃113 𝜃211 𝜃311 Expand the tensor by focusing on the 𝑚-th axis into a rectangular matrix 𝜃(𝑚) (mode-𝑚 expansion) Rank 1,1,1 rank 𝒫 = 1 ⟺ its all many−body 𝜃 parameters are 0 Rank-1 condition (𝜽-representation) 47

Slide 48

Slide 48 text

Formulate Tucker rank reduction by relaxing the rank-1 condition 𝜃𝑖𝑗𝑘 = 0 𝜃112 𝜃131 𝜃121 𝜃113 𝜃211 𝜃311 Expand the tensor by focusing on the 𝑚-th axis into a rectangular matrix 𝜃(𝑚) (mode-𝑚 expansion) 𝜃(1) = 𝜃111 𝜃121 𝜃131 𝜃112 0 0 𝜃113 0 0 𝜃211 0 0 0 0 0 0 0 0 𝜃311 0 0 0 0 0 0 0 0 𝜃(2) = 𝜃111 𝜃211 𝜃311 𝜃112 0 0 𝜃311 0 0 𝜃121 0 0 0 0 0 0 0 0 𝜃131 0 0 0 0 0 0 0 0 𝜃(3) = 𝜃111 𝜃211 𝜃311 𝜃121 0 0 𝜃131 0 0 𝜃112 0 0 0 0 0 0 0 0 𝜃113 0 0 0 0 0 0 0 0 Rank 1,1,1 Two bingos rank 𝒫 = 1 ⟺ its all many−body 𝜃 parameters are 0 Rank-1 condition (𝜽-representation) Two bingos Two bingos 48

Slide 49

Slide 49 text

Slide 50

Slide 50 text

Slide 51

Slide 51 text

The relationship between bingo and rank 𝜃(1) = 𝜃111 𝜃121 𝜃131 𝜃112 0 0 𝜃113 0 0 𝜃211 0 0 0 0 0 0 0 0 𝜃311 𝜃321 𝜃331 𝜃312 𝜃322 𝜃332 𝜃313 𝜃323 𝜃333 𝜃(2) = 𝜃111 𝜃211 𝜃311 𝜃112 0 𝜃312 𝜃311 0 𝜃313 𝜃121 0 𝜃321 0 0 𝜃322 0 0 𝜃323 𝜃131 0 𝜃331 0 0 𝜃332 0 0 𝜃333 𝜃(3) = 𝜃111 𝜃211 𝜃311 𝜃121 0 𝜃321 𝜃131 0 𝜃331 𝜃112 0 𝜃312 0 0 𝜃322 0 0 𝜃332 𝜃113 0 𝜃313 0 0 𝜃323 0 0 𝜃333 One bingo 𝜃123 ● ● 𝒫 ത 𝒫 𝐷𝐾𝐿 𝒫, ത 𝒫 m-projection Subspace with one bingo in the mode-1 direction ℬ 1 51 No bingo No bingo Input tensor Rank 2,3,3

Slide 52

Slide 52 text

Slide 53

Slide 53 text

Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less 53 𝜃 is zero 𝜃 can be any STEP1 : Choose a bingo location.

Slide 54

Slide 54 text

Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less Bingo Bingo Bingo 54 𝜃 is zero 𝜃 can be any STEP1 : Choose a bingo location.

Slide 55

Slide 55 text

𝜃 is zero 𝜃 can be any STEP1 : Choose a bingo location. The shaded areas do not change their values in the projection. 55 Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor.

Slide 56

Slide 56 text

Replace the partial tensor in the red box using the best rank-1 approximation formula 56 Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor. STEP1 : Choose a bingo location. 𝜃 is zero 𝜃 can be any

Slide 57

Slide 57 text

Replace the partial tensor in the red box using the best rank-1 approximation formula 57 Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor. STEP1 : Choose a bingo location. 𝜃 is zero 𝜃 can be any

Slide 58

Slide 58 text

Replace the partial tensor in the red box using the best rank-1 approximation formula The best tensor is obtained in the specified bingo space. 😄 There is no guarantee that it is the best rank (5,8,3) approximation. 😢 58 Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor. STEP1 : Choose a bingo location. 𝜃 is zero 𝜃 can be any

Slide 59

Slide 59 text

59 Example: Reduce the rank of (8,8,3) tensor to (5,7,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor. STEP1 : Choose a bingo location. 𝜃 is zero 𝜃 can be any The shaded areas do not change their values in the projection.

Slide 60

Slide 60 text

60 Experimental results (synthetic data) LTR is faster with the competitive approximation performance.

Slide 61

Slide 61 text

61 Experimental results (real data) LTR is faster with the competitive approximation performance. (92, 112, 400) (9, 9, 512, 512, 3)

Slide 62

Slide 62 text

Contents 62 □ Introduction of log-linear model on DAG □ The best rank-1 approximation formula □ Legendre Tucker-Rank Reduction(LTR) □ The best rank-1 NMMF □A1GM: faster rank-1 missing NMF □ Motivation, Strategy, and Contributions github.com/gkazunii/A1GM github.com/gkazunii/ Legendre-tucker-rank-reduction □ Theoretical Remarks □ Conclusion 16:40

Slide 63

Slide 63 text

Strategy for rank-1 NMF with missing values 63 If 𝐗𝑖𝑗 is missing otherwise Element-wise product 𝚽𝑖𝑗 = ቊ 0 1 □ Collect missing values in a corner of matrix to solve as coupled NMF Missing value

Slide 64

Slide 64 text

Strategy for rank-1 NMF with missing values 64 NMMF (Takeuchi et al., 2013) 𝚽𝑖𝑗 = ቊ 0 1 If 𝐗𝑖𝑗 is missing otherwise Element-wise product Missing value □ Collect missing values in a corner of matrix to solve as coupled NMF Equivalent

Slide 65

Slide 65 text

NMMF, Nonnegative multiple matrix factorization (Takeuchi et al., 2013) 65 user artist tag user user tag artist user user artist

Slide 66

Slide 66 text

The best rank-1 approximation of NMMF The best rank-1 approximation of NMMF 66 user artist tag user user tag artist user user artist

Slide 67

Slide 67 text

Modeling of NMMF 67 One To One

Slide 68

Slide 68 text

One-body and many-body parameters 68 𝑿, 𝒀, 𝒁 is simultaneously rank-1 decomposable. ⇔ It can be written as 𝒘 ⊗ 𝒉, 𝒂 ⊗ 𝒉, 𝒘 ⊗ 𝒃 . One-body parameter Two-body parameter

Slide 69

Slide 69 text

Information geometry of rank-1 NMMF 69 𝑿, 𝒀, 𝒁 is simultaneously rank-1 decomposable. ⇔ It can be written as 𝒘 ⊗ 𝒉, 𝒂 ⊗ 𝒉, 𝒘 ⊗ 𝒃 . Its all two-body 𝜃-parameters are 0. Simultaneous Rank-1 𝜽-condition One-body parameter Two-body parameter

Slide 70

Slide 70 text

Information geometry of rank-1 NMMF 70 𝜂𝑖𝑗 = 𝜂𝑖1 𝜂1𝑗 Simultaneous Rank-1 𝜼-condition Its all two-body 𝜃-parameters are 0. Simultaneous Rank-1 𝜽-condition 𝑿, 𝒀, 𝒁 is simultaneously rank-1 decomposable. ⇔ It can be written as 𝒘 ⊗ 𝒉, 𝒂 ⊗ 𝒉, 𝒘 ⊗ 𝒃 . One-body parameter Two-body parameter is e-flat. The projection is unique.

Slide 71

Slide 71 text

Find the global optimal solution of rank-1 NMMF 71 𝜂𝑖𝑗 = 𝜂𝑖1 𝜂1𝑗 Simultaneous Rank-1 𝜼-condition Its all two-body 𝜃-parameters are 0. Simultaneous Rank-1 𝜽-condition 𝑿, 𝒀, 𝒁 is simultaneously rank-1 decomposable. ⇔ It can be written as 𝒘 ⊗ 𝒉, 𝒂 ⊗ 𝒉, 𝒘 ⊗ 𝒃 . One-body parameter Two-body parameter The m-projection does not change one-body η-parameter Shun-ichi Amari, Information Geometry and Its Applications, 2008, Theorem 11.6

Slide 72

Slide 72 text

Find the global optimal solution of rank-1 NMMF 72 𝜂𝑖𝑗 = 𝜂𝑖1 𝜂1𝑗 Simultaneous Rank-1 𝜼-condition Its all two-body 𝜃-parameters are 0. Simultaneous Rank-1 𝜽-condition 𝑿, 𝒀, 𝒁 is simultaneously rank-1 decomposable. ⇔ It can be written as 𝒘 ⊗ 𝒉, 𝒂 ⊗ 𝒉, 𝒘 ⊗ 𝒃 . One-body parameter Two-body parameter The m-projection does not change one-body η-parameter Shun-ichi Amari, Information Geometry and Its Applications, 2008, Theorem 11.6 All 𝜼-parameters after the projection are identified. 19:20

Slide 73

Slide 73 text

Rank-1 NMF with missing values □ NMMF can be viewed as a special case of NMF with missing values. Equivalent 73

Slide 74

Slide 74 text

Rank-1 NMF with missing values □ NMMF can be viewed as a special case of NMF with missing values. Equivalent □ NMF is homogeneous for row and column permutations 74

Slide 75

Slide 75 text

A1GM: Algorithm Step 1 : Gather missing values in the bottom right. Step 2 : Use the formula of the best rank-1 NMMF. 75 Step 3 : Repermutate Find exact solution 🤔❓

Slide 76

Slide 76 text

Examples that permutations cannot collect missing values into corners 76

Slide 77

Slide 77 text

Add missing values to solve the problem as NMMF 77

Slide 78

Slide 78 text

Add missing values to solve the problem as NMMF 78 Reconstruction error worsens 😢

Slide 79

Slide 79 text

Add missing values to solve the problem as NMMF 79 Gain in efficiency 😀 Reconstruction error worsens 😢

Slide 80

Slide 80 text

🙆Data that A1GM is good at and not good at🙅 80 🙅 Missing values are evenly distributed in each row and column.

Slide 81

Slide 81 text

🙆Data that A1GM is good at and not good at🙅 81 Missing values tend to be in certain columns in some real datasets. ex) disconnected sensing device, optional answer field in questionnaire form 🙅 Missing values are evenly distributed in each row and column. 🙆 Missing are heavily distributed in certain rows and columns.

Slide 82

Slide 82 text

A1GM: Algorithm Step 1 : Increase the number of missing values. Step 2 : Gather missing values in the bottom right. Step 3 : Use the formula of rank-1 NMMF and repermutate. 82

Slide 83

Slide 83 text

Experiments on real data □ A1GM is compared with gradient-based KL-WNMF - Relative runtime < 1 means A1GM is faster than KL-WNMF. - Relative error > 1 means worse reconstruction error of A1GM than KL-WNMF. - Increase rate is the ratio of # missing values after addition of missing values at step1. ×5 – 10 times faster! 83 Find the best solution Add missing values. Accuracy decreases.

Slide 84

Slide 84 text

Contents 84 □ Introduction of log-linear model on DAG □ The best rank-1 approximation formula □ Legendre Tucker-Rank Reduction(LTR) □ The best rank-1 NMMF □A1GM: faster rank-1 missing NMF □ Motivation, Strategy, and Contributions github.com/gkazunii/A1GM github.com/gkazunii/Legendre-tucker-rank-reduction □ Theoretical Remarks □ Conclusion 22:30

Slide 85

Slide 85 text

Theoretical Remarks 1 : Extended NMMF. 𝚽𝑖𝑗 = ቊ 0 1 If 𝐗𝑖𝑗 is missing otherwise □ The rank of weight matrix is 2 after adding missing values. 𝚽 𝚽 𝐗 𝐗 rank 𝚽 = 2 rank 𝚽 = 2 85

Slide 86

Slide 86 text

Theoretical Remarks 1 : Extended NMMF. 𝚽𝑖𝑗 = ቊ 0 1 If 𝐗𝑖𝑗 is missing otherwise □ The rank of weight matrix is 2 after adding missing values. □ Can we exactly solve rank-1 NMF if the rank(Φ) = 2? 𝚽 𝚽 𝐗 𝐗 rank 𝚽 = 2 rank 𝚽 = 2 rank 𝚽 = 2 86

Slide 87

Slide 87 text

Theoretical Remarks 1 : Extended NMMF. The best rank-1 approximation of extended NMMF 87

Slide 88

Slide 88 text

Theoretical Remarks 1 : Extended NMMF. 88 The best rank-1 approximation of extended NMMF Equivalent 𝚽𝑖𝑗 = ቊ 0 1 If 𝐗𝑖𝑗 is missing otherwise

Slide 89

Slide 89 text

Theoretical Remarks 1 : Extended NMMF. 89 The best rank-1 approximation of extended NMMF Equivalent If rank(𝚽) ≦2, the matrix can be transformed into the form 𝚽𝑖𝑗 = ቊ 0 1 If 𝐗𝑖𝑗 is missing otherwise Permutation We can exactly solve rank-1 NMF with missing values by permutation if rank(𝚽) ≦2.

Slide 90

Slide 90 text

Theoretical Remarks 2 : Connection to balancing. 90 Transform Balanced matrix (Doubly stochastic matrix) □ Matrix Balancing Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifold“(2017) ICML.

Slide 91

Slide 91 text

Theoretical Remarks 2 : Connection to balancing. 91 Transform Balanced matrix (Doubly stochastic matrix) Balancing condition □ Matrix Balancing Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifold“(2017) ICML.

Slide 92

Slide 92 text

Theoretical Remarks 2 : Connection to balancing. 92 Transform Balanced matrix (Doubly stochastic matrix) Balancing condition □ Matrix Balancing Rank-1 condition Its all many-body 𝜃-parameters are 0. Balanced rank-1 matrix is unique.

Slide 93

Slide 93 text

□ Describe low-rank condition using (𝜃,𝜂) Rank-1 condition (𝜼-representation) ҧ 𝜂𝑖𝑗𝑘 = ҧ 𝜂𝑖11 ҧ 𝜂1𝑗1 ҧ 𝜂11𝑘 Rank-1 condition (𝜽-representation) All many body ҧ 𝜃𝑖𝑗𝑘 are 0 93 □ Closed Formula of the Best Rank-1 NMMF □ A1GM: Faster Rank-1 NMF with missing values Conclusion The best rank-1 approximation for NMMF Data structure DAG Infor-Geo 93