Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Non-negative low-rank approximations for multi-dimensional arrays on statistical manifold

Kazu Ghalamkari
September 23, 2022

Non-negative low-rank approximations for multi-dimensional arrays on statistical manifold

The International Conference on Information Geometry for Data Science 2022 Virtual 2022.9.19 – 23

Kazu Ghalamkari

September 23, 2022
Tweet

More Decks by Kazu Ghalamkari

Other Decks in Research

Transcript

  1. Non-negative low-rank approximations for multi-dimensional arrays on statistical manifold Kazu

    Ghalamkari1,2, Mahito Sugiyama1,2 International Conference on Information Geometry for Data Science (IG4DS 2022) 1 : The Graduate University for Advanced Studies, SOKENDAI 2 : National Institute of Informatics
  2. Motivation □ Non-negative low-rank approximation of data with various structures

    2 Approximates with a linear combination of fewer bases (principal components) for feature extraction, memory reduction, and pattern discovery.😀 ≃ ≃ ≃ ≃
  3. Motivation □ Non-negative low-rank approximation of data with various structures

    3 Approximates with a linear combination of fewer bases (principal components) for feature extraction, memory reduction, and pattern discovery.😀 ≃ ≃ ≃ ≃ Non-negative constraint improves interpretability
  4. Motivation □ Non-negative low-rank approximation of data with various structures

    4 Approximates with a linear combination of fewer bases (principal components) for feature extraction, memory reduction, and pattern discovery.😀 ≃ ≃ ≃ ≃ Low-rank approximation with non-negative constraints are based on gradient methods. → Appropriate settings for stopping criteria, learning rate, and initial values are necessary 😢 Non-negative constraint improves interpretability
  5. Strategy □ Modeling with probability mass function on Directed Acyclic

    Graph(DAG). 5 □ Modeling with probability mass function on Directed Acyclic Graph(DAG). □ Utilize projection theory of information geometry.
  6. Strategy □ Modeling with probability mass function on Directed Acyclic

    Graph(DAG). 6 □ Modeling with probability mass function on Directed Acyclic Graph(DAG). □ Utilize projection theory of information geometry.
  7. Strategy □ Modeling with probability mass function on Directed Acyclic

    Graph(DAG). □ Utilize projection theory of information geometry. 7
  8. Strategy □ Modeling with probability mass function on Directed Acyclic

    Graph(DAG). □ Utilize projection theory of information geometry. 8
  9. Contribution □ LTR: Faster Tucker-rank Reduction 9 No worries about

    initial values, stopping criterion and learning rate 😄 Information Geometric Analysis using Distributions on DAGs that Correspond to Data Structures
  10. Contribution □ LTR: Faster Tucker-rank Reduction 10 No worries about

    initial values, stopping criterion and learning rate 😄 Information Geometric Analysis using Distributions on DAGs that Correspond to Data Structures Rank-1 = rank 1,1,1
  11. Contribution □ LTR: Faster Tucker-rank Reduction 11 □ A1GM: Faster

    rank-1 missing NMF No worries about initial values, stopping criterion and learning rate 😄 Solve the task as a coupled NMF. Find the most dominant factor rapidly. Missing value Rank-1 = rank 1,1,1 Information Geometric Analysis using Distributions on DAGs that Correspond to Data Structures
  12. Contents 12 □ Introduction of log-linear model on DAG □

    The best rank-1 approximation formula □ Legendre Tucker-Rank Reduction(LTR) □ The best rank-1 NMMF □A1GM: faster rank-1 missing NMF □ Motivation, Strategy, and Contributions github.com/gkazunii/A1GM github.com/gkazunii/Legendre-tucker-rank-reduction □ Theoretical Remarks □ Conclusion 3:00
  13. Modeling tensor and matrix 13 □ Flexible modeling is required

    to capture the structure of various data Formulate low-rank approximations with probabilistic models on DAGs
  14. □DAG(poset) is a DAG ⇔ for all 𝑠1 , 𝑠2

    , 𝑠3 ∈ the following three properties are satisfied. (1) Reflexivity ∶ 𝑠1 ≤ 𝑠1 (2) Antisymmetry: 𝑠1 ≤ 𝑠2 , 𝑠2 ≤ 𝑠1 ⇒ 𝑠1 = 𝑠2 (3)Transitivity:𝑠1 ≤ 𝑠2 , 𝑠2 ≤ 𝑠3 ⇒ 𝑠1 ≤ 𝑠3 Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifold“(2017) ICML. 14 Log-linear model on Directed Acyclic Graph (DAG)
  15. □DAG(poset) is a DAG ⇔ for all 𝑠1 , 𝑠2

    , 𝑠3 ∈ the following three properties are satisfied. (1) Reflexivity ∶ 𝑠1 ≤ 𝑠1 (2) Antisymmetry: 𝑠1 ≤ 𝑠2 , 𝑠2 ≤ 𝑠1 ⇒ 𝑠1 = 𝑠2 (3)Transitivity:𝑠1 ≤ 𝑠2 , 𝑠2 ≤ 𝑠3 ⇒ 𝑠1 ≤ 𝑠3 □ log-linear model on DAG We define the log-linear model on a DAG as a mapping 𝑝: → 0,1 .Natural parameters 𝜽 describe the model. 𝜃-space Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifold“(2017) ICML. 15 Log-linear model on Directed Acyclic Graph (DAG)
  16. is a DAG ⇔ for all 𝑠1 , 𝑠2 ,

    𝑠3 ∈ the following three properties are satisfied. □DAG(poset) (1) Reflexivity ∶ 𝑠1 ≤ 𝑠1 (2) Antisymmetry: 𝑠1 ≤ 𝑠2 , 𝑠2 ≤ 𝑠1 ⇒ 𝑠1 = 𝑠2 (3)Transitivity:𝑠1 ≤ 𝑠2 , 𝑠2 ≤ 𝑠3 ⇒ 𝑠1 ≤ 𝑠3 □ log-linear model on DAG We define the log-linear model on a DAG as a mapping 𝑝: → 0,1 .Natural parameters 𝜽 describe the model. 𝜃-space 𝜂-space We can also describe the model by expectation parameters 𝜼 with Möbius function. Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifold“(2017) ICML. 16 Log-linear model on Directed Acyclic Graph (DAG)
  17. Contents 17 □ Introduction of log-linear model on DAG □

    The best rank-1 approximation formula □ Legendre Tucker-Rank Reduction(LTR) □ The best rank-1 NMMF □A1GM: faster rank-1 missing NMF □ Motivation, Strategy, and Contributions github.com/gkazunii/A1GM github.com/gkazunii/Legendre-tucker-rank-reduction □ Theoretical Remarks □ Conclusion 4:30
  18. Introducing DAGs for Tensor 18

  19. Introducing DAGs for Tensor 19

  20. Introducing DAGs for Tensor 20

  21. Describe a tensor with (θ,η) 21

  22. Describe a tensor with (θ,η) 22

  23. Describe a tensor with (θ,η) 23 Möbius inversion formula

  24. Describe a tensor with (θ,η) 24 Möbius inversion formula

  25. Describe a tensor with (θ,η) 25 Random variables Sample space

    Probability values Relation between distribution and tensor Möbius inversion formula : 𝑖, 𝑗, 𝑘 , indices of the tensor : index set : tensor values 𝒫𝑖𝑗𝑘
  26. One-body and many-body parameters 26 One-body parameter Many-body parameter

  27. 𝜽-representation of rank-1 tensor 27 One-body parameter Many-body parameter Rank-1

    condition (𝜽-representation) Its all many-body 𝜃-parameters are 0. Rank-1 subspace
  28. 𝜽-representation of rank-1 tensor 28 One-body parameter Many-body parameter Rank-1

    subspace Rank-1 condition (𝜽-representation) Its all many-body 𝜃-parameters are 0. is e-flat. The projection is unique.
  29. 𝜽-representation of rank-1 tensor 29 One-body parameter Many-body parameter We

    can find the projection destination by a gradient-method. But gradient-methods require Appropriate settings for stopping criteria, learning rate, and initial values 😢 Rank-1 subspace Rank-1 condition (𝜽-representation) Its all many-body 𝜃-parameters are 0. is e-flat. The projection is unique.
  30. 𝜽-representation of rank-1 tensor 30 Let us describe the rank-1

    condition with the 𝜂-parameter. is e-flat. The projection is unique. One-body parameter Many-body parameter Rank-1 subspace Its all many-body 𝜃-parameters are 0. Rank-1 condition (𝜽-representation) We can find the projection destination by a gradient-method. But gradient-methods require Appropriate settings for stopping criteria, learning rate, and initial values 😢
  31. 𝜼-representation of rank-1 tensor 31 One-body parameter Many-body parameter Rank-1

    subspace 𝜂𝑖𝑗𝑘 = 𝜂𝑖11 𝜂1𝑗1 𝜂11𝑘 Rank-1 condition (𝜼- representation) Rank-1 condition (𝜽-representation) Its all many-body 𝜃-parameters are 0. Rank-1 subspace
  32. 𝜼-representation of rank-1 tensor 32 The m-projection does not change

    one-body η-parameter = = = Shun-ichi Amari, Information Geometry and Its Applications, 2008, Theorem 11.6 One-body parameter Many-body parameter 𝜂𝑖𝑗𝑘 = 𝜂𝑖11 𝜂1𝑗1 𝜂11𝑘 Rank-1 condition (𝜼- representation) Rank-1 condition (𝜽-representation) Rank-1 subspace Its all many-body 𝜃-parameters are 0.
  33. ҧ 𝜂𝑖𝑗𝑘 = ҧ 𝜂𝑖11 ҧ 𝜂1𝑗1 ҧ 𝜂11𝑘 Find

    the best rank-1 approximation 33 One-body parameter Many-body parameter Rank-1 condition (𝜼- representation) Rank-1 condition (𝜽-representation) Rank-1 subspace Its all many-body 𝜃-parameters are 0.
  34. ҧ 𝜂𝑖𝑗𝑘 = ҧ 𝜂𝑖11 ҧ 𝜂1𝑗1 ҧ 𝜂11𝑘 Find

    the best rank-1 approximation 34 One-body parameter Many-body parameter Möbius inversion formula = 𝜂𝑖11 𝜂1𝑗1 𝜂11𝑘 Rank-1 condition (𝜼- representation) Rank-1 condition (𝜽-representation) Rank-1 subspace All 𝜼-parameters after the projection are identified. Using inversion formula, we found the projection destination. Its all many-body 𝜃-parameters are 0.
  35. The best rank-1 approximation of 𝒫 ∈ ℝ>0 𝐼×𝐽×𝐾 is

    given as which minimizes KL divergence from 𝒫. Best rank-1 tensor formula for minimizing KL divergence (𝑑 = 3 ) 35 Mean-field approximation and rank-1 approximation We reproduce the result in K.Huang, et al. "Kullback-Leibler principal component for tensors is not NP-hard." ACSSC 2017 9:00
  36. The best rank-1 approximation of 𝒫 ∈ ℝ>0 𝐼×𝐽×𝐾 is

    given as which minimizes KL divergence from 𝒫. Best rank-1 tensor formula for minimizing KL divergence (𝑑 = 3 ) 36 By the way, Frobenius error minimization is NP-hard Mean-field approximation and rank-1 approximation We reproduce the result in K.Huang, et al. "Kullback-Leibler principal component for tensors is not NP-hard." ACSSC 2017
  37. The best rank-1 approximation of 𝒫 ∈ ℝ>0 𝐼×𝐽×𝐾 is

    given as which minimizes KL divergence from 𝒫. A tensor with 𝑑 indices is a joint distribution with 𝑑 random variables. A vector with only 1 index is an independent distribution with only one random variable. Best rank-1 tensor formula for minimizing KL divergence (𝑑 = 3 ) 37 By the way, Frobenius error minimization is NP-hard Mean-field approximation and rank-1 approximation We reproduce the result in K.Huang, et al. "Kullback-Leibler principal component for tensors is not NP-hard." ACSSC 2017 Normalized vector depending on only 𝑖 Normalized vector depending on only 𝑗 Normalized vector depending on only 𝑘
  38. The best rank-1 approximation of 𝒫 ∈ ℝ>0 𝐼×𝐽×𝐾 is

    given as which minimizes KL divergence from 𝒫. A tensor with 𝑑 indices is a joint distribution with 𝑑 random variables. A vector with only 1 index is an independent distribution with only one random variable. Rank-1 approximation approximates a joint distribution by a product of independent distributions. Best rank-1 tensor formula for minimizing KL divergence (𝑑 = 3 ) 38 By the way, Frobenius error minimization is NP-hard Mean-field approximation and rank-1 approximation We reproduce the result in K.Huang, et al. "Kullback-Leibler principal component for tensors is not NP-hard." ACSSC 2017 Mean-field approximation : a methodology in physics for reducing a many-body problem to a one-body problem. Normalized vector depending on only 𝑖 Normalized vector depending on only 𝑗 Normalized vector depending on only 𝑘
  39. MFA of Boltzmann-machine 𝑝 𝒙 = 1 𝑍(𝜽) exp ෍

    𝑖 𝜃𝑖 𝑥𝑖 + ෍ 𝑖<𝑗 𝜃𝑖𝑗 𝑥𝑖 𝑥𝑗 𝐷𝐾𝐿 𝑝, Ƹ 𝑝 𝜂𝑖 = ෍ 𝑥1=0 1 ⋯ ෍ 𝑥𝑛=0 1 𝑥𝑖 𝑝 𝒙 39 Interaction Bias Mean-field approximation and rank-1 approximation
  40. MFA of Boltzmann-machine 𝑝 𝒙 = 1 𝑍(𝜽) exp ෍

    𝑖 𝜃𝑖 𝑥𝑖 + ෍ 𝑖<𝑗 𝜃𝑖𝑗 𝑥𝑖 𝑥𝑗 𝐷𝐾𝐿 𝑝, Ƹ 𝑝 𝜂𝑖 = ෍ 𝑥1=0 1 ⋯ ෍ 𝑥𝑛=0 1 𝑥𝑖 𝑝 𝒙 40 Interaction Bias Mean-field approximation and rank-1 approximation = 1 𝑍(𝜽) exp ෍ 𝑖 𝜃𝑖 𝑥𝑖 = 𝑝 𝑥1 … 𝑝(𝑥𝑛 )
  41. 𝑂 2𝑛 𝐷𝐾𝐿 𝑝, Ƹ 𝑝 𝐷𝐾𝐿 Ƹ 𝑝𝑒 ,

    𝑝 ҧ 𝜂𝑖 = sigmoid 𝜃𝑖 + ෍ 𝑘 𝜃𝑘𝑗 ҧ 𝜂𝑘 41 Mean-field approximation and rank-1 approximation MF equations MFA of Boltzmann-machine 𝑝 𝒙 = 1 𝑍(𝜽) exp ෍ 𝑖 𝜃𝑖 𝑥𝑖 + ෍ 𝑖<𝑗 𝜃𝑖𝑗 𝑥𝑖 𝑥𝑗 𝜂𝑖 = ෍ 𝑥1=0 1 ⋯ ෍ 𝑥𝑛=0 1 𝑥𝑖 𝑝 𝒙 𝑂 2𝑛 Interaction Bias
  42. Rank-1 approximation 𝑝𝜃 (𝑖, 𝑗, 𝑘) = exp ෍ 𝑖′=1

    𝑖 ෍ 𝑗′=1 𝑗 ෍ 𝑘′=1 𝑘 𝜃𝑖′𝑗′𝑘′ 𝑂 2𝑛 𝐷𝐾𝐿 𝑝, Ƹ 𝑝 𝐷𝐾𝐿 𝑝, Ƹ 𝑝 MF equations Set of products of independent distributions 𝜂𝑖11 = ෍ 𝑗′=1 𝐽 ෍ 𝑘′=1 𝐾 𝒫𝑖𝑗′𝑘′ ҧ 𝜂𝑖 = sigmoid 𝜃𝑖 + ෍ 𝑘 𝜃𝑘𝑗 ҧ 𝜂𝑘 42 Mean-field approximation and rank-1 approximation 𝐷𝐾𝐿 Ƹ 𝑝𝑒 , 𝑝 MFA of Boltzmann-machine 𝑝 𝒙 = 1 𝑍(𝜽) exp ෍ 𝑖 𝜃𝑖 𝑥𝑖 + ෍ 𝑖<𝑗 𝜃𝑖𝑗 𝑥𝑖 𝑥𝑗 𝜂𝑖 = ෍ 𝑥1=0 1 ⋯ ෍ 𝑥𝑛=0 1 𝑥𝑖 𝑝 𝒙 𝑂 2𝑛 Interaction Bias
  43. Rank-1 approximation 𝑝𝜃 (𝑖, 𝑗, 𝑘) = exp ෍ 𝑖′=1

    𝑖 ෍ 𝑗′=1 𝑗 ෍ 𝑘′=1 𝑘 𝜃𝑖′𝑗′𝑘′ 𝑂 2𝑛 𝑂 𝐼𝐽𝐾 𝐷𝐾𝐿 𝑝, Ƹ 𝑝 𝐷𝐾𝐿 𝑝, Ƹ 𝑝 MF equations Set of products of independent distributions 𝜂𝑖11 = ෍ 𝑗′=1 𝐽 ෍ 𝑘′=1 𝐾 𝒫𝑖𝑗′𝑘′ ҧ 𝜂𝑖 = sigmoid 𝜃𝑖 + ෍ 𝑘 𝜃𝑘𝑗 ҧ 𝜂𝑘 𝐶𝑜𝑚𝑝𝑢𝑡𝑎𝑏𝑙𝑒 43 Mean-field approximation and rank-1 approximation 𝐷𝐾𝐿 Ƹ 𝑝𝑒 , 𝑝 MFA of Boltzmann-machine 𝑝 𝒙 = 1 𝑍(𝜽) exp ෍ 𝑖 𝜃𝑖 𝑥𝑖 + ෍ 𝑖<𝑗 𝜃𝑖𝑗 𝑥𝑖 𝑥𝑗 𝜂𝑖 = ෍ 𝑥1=0 1 ⋯ ෍ 𝑥𝑛=0 1 𝑥𝑖 𝑝 𝒙 𝑂 2𝑛 Interaction Bias
  44. 44 Mean-field approximation and rank-1 approximation Minimizing KL divergence Minimizing

    inverse-KL divergence Rank-1 approximation Mean-field Approximation of BM impossible Closed-formula 𝜂𝑖 = σ 𝜃𝑖 + ෍ 𝑘 𝜃𝑘𝑗 𝜂𝑘 𝑂 2𝑛 m-projection e-projection Projection onto e-flat space Projection onto e-flat space 44 unique unique not unique
  45. Contents 45 □ Introduction of log-linear model on DAG □

    The best rank-1 approximation formula □ Legendre Tucker-Rank Reduction(LTR) □ The best rank-1 NMMF □A1GM: faster rank-1 missing NMF □ Motivation, Strategy, and Contributions github.com/gkazunii/A1GM github.com/gkazunii/ Legendre-tucker-rank-reduction □ Theoretical Remarks □ Conclusion 12:00
  46. Formulate Tucker rank reduction by relaxing the rank-1 condition 𝜃𝑖𝑗𝑘

    = 0 𝜃112 𝜃131 𝜃121 𝜃113 𝜃211 𝜃311 Expand the tensor by focusing on the 𝑚-th axis into a rectangular matrix 𝜃(𝑚) (mode-𝑚 expansion) rank 𝒫 = 1 ⟺ its all many−body 𝜃 parameters are 0 Rank-1 condition (𝜽-representation) 46
  47. 𝜃(3) = 𝜃111 𝜃211 𝜃311 𝜃121 0 0 𝜃131 0

    0 𝜃112 0 0 0 0 0 0 0 0 𝜃113 0 0 0 0 0 0 0 0 𝜃(1) = 𝜃111 𝜃121 𝜃131 𝜃112 0 0 𝜃113 0 0 𝜃211 0 0 0 0 0 0 0 0 𝜃311 0 0 0 0 0 0 0 0 𝜃(2) = 𝜃111 𝜃211 𝜃311 𝜃112 0 0 𝜃311 0 0 𝜃121 0 0 0 0 0 0 0 0 𝜃131 0 0 0 0 0 0 0 0 Formulate Tucker rank reduction by relaxing the rank-1 condition 𝜃𝑖𝑗𝑘 = 0 𝜃112 𝜃131 𝜃121 𝜃113 𝜃211 𝜃311 Expand the tensor by focusing on the 𝑚-th axis into a rectangular matrix 𝜃(𝑚) (mode-𝑚 expansion) Rank 1,1,1 rank 𝒫 = 1 ⟺ its all many−body 𝜃 parameters are 0 Rank-1 condition (𝜽-representation) 47
  48. Formulate Tucker rank reduction by relaxing the rank-1 condition 𝜃𝑖𝑗𝑘

    = 0 𝜃112 𝜃131 𝜃121 𝜃113 𝜃211 𝜃311 Expand the tensor by focusing on the 𝑚-th axis into a rectangular matrix 𝜃(𝑚) (mode-𝑚 expansion) 𝜃(1) = 𝜃111 𝜃121 𝜃131 𝜃112 0 0 𝜃113 0 0 𝜃211 0 0 0 0 0 0 0 0 𝜃311 0 0 0 0 0 0 0 0 𝜃(2) = 𝜃111 𝜃211 𝜃311 𝜃112 0 0 𝜃311 0 0 𝜃121 0 0 0 0 0 0 0 0 𝜃131 0 0 0 0 0 0 0 0 𝜃(3) = 𝜃111 𝜃211 𝜃311 𝜃121 0 0 𝜃131 0 0 𝜃112 0 0 0 0 0 0 0 0 𝜃113 0 0 0 0 0 0 0 0 Rank 1,1,1 Two bingos rank 𝒫 = 1 ⟺ its all many−body 𝜃 parameters are 0 Rank-1 condition (𝜽-representation) Two bingos Two bingos 48
  49. Formulate Tucker rank reduction by relaxing the rank-1 condition 𝜃𝑖𝑗𝑘

    = 0 𝜃112 𝜃131 𝜃121 𝜃113 𝜃211 𝜃311 Expand the tensor by focusing on the 𝑚-th axis into a rectangular matrix 𝜃(𝑚) (mode-𝑚 expansion) 𝜃(1) = 𝜃111 𝜃121 𝜃131 𝜃112 0 0 𝜃113 0 0 𝜃211 0 0 0 0 0 0 0 0 𝜃311 0 0 0 0 0 0 0 0 𝜃(2) = 𝜃111 𝜃211 𝜃311 𝜃112 0 0 𝜃311 0 0 𝜃121 0 0 0 0 0 0 0 0 𝜃131 0 0 0 0 0 0 0 0 𝜃(3) = 𝜃111 𝜃211 𝜃311 𝜃121 0 0 𝜃131 0 0 𝜃112 0 0 0 0 0 0 0 0 𝜃113 0 0 0 0 0 0 0 0 Rank 1,1,1 Two bingos rank 𝒫 = 1 ⟺ its all many−body 𝜃 parameters are 0 Rank-1 condition (𝜽-representation) Two bingos Two bingos The first row and first column are the scaling factors 49
  50. The relationship between bingo and rank 𝜃(1) = 𝜃111 𝜃121

    𝜃131 𝜃112 0 0 𝜃113 0 0 𝜃211 0 0 0 0 0 0 0 0 𝜃311 𝜃321 𝜃331 𝜃312 𝜃322 𝜃332 𝜃313 𝜃323 𝜃333 𝜃(2) = 𝜃111 𝜃211 𝜃311 𝜃112 0 𝜃312 𝜃311 0 𝜃313 𝜃121 0 𝜃321 0 0 𝜃322 0 0 𝜃323 𝜃131 0 𝜃331 0 0 𝜃332 0 0 𝜃333 𝜃(3) = 𝜃111 𝜃211 𝜃311 𝜃121 0 𝜃321 𝜃131 0 𝜃331 𝜃112 0 𝜃312 0 0 𝜃322 0 0 𝜃332 𝜃113 0 𝜃313 0 0 𝜃323 0 0 𝜃333 One bingo 50 No bingo No bingo Rank 2,3,3
  51. The relationship between bingo and rank 𝜃(1) = 𝜃111 𝜃121

    𝜃131 𝜃112 0 0 𝜃113 0 0 𝜃211 0 0 0 0 0 0 0 0 𝜃311 𝜃321 𝜃331 𝜃312 𝜃322 𝜃332 𝜃313 𝜃323 𝜃333 𝜃(2) = 𝜃111 𝜃211 𝜃311 𝜃112 0 𝜃312 𝜃311 0 𝜃313 𝜃121 0 𝜃321 0 0 𝜃322 0 0 𝜃323 𝜃131 0 𝜃331 0 0 𝜃332 0 0 𝜃333 𝜃(3) = 𝜃111 𝜃211 𝜃311 𝜃121 0 𝜃321 𝜃131 0 𝜃331 𝜃112 0 𝜃312 0 0 𝜃322 0 0 𝜃332 𝜃113 0 𝜃313 0 0 𝜃323 0 0 𝜃333 One bingo 𝜃123 • • 𝒫 ത 𝒫 𝐷𝐾𝐿 𝒫, ത 𝒫 m-projection Subspace with one bingo in the mode-1 direction ℬ 1 51 No bingo No bingo Input tensor Rank 2,3,3
  52. The relationship between bingo and rank 𝜃(1) = 𝜃111 𝜃121

    𝜃131 𝜃112 0 0 𝜃113 0 0 𝜃211 0 0 0 0 0 0 0 0 𝜃311 𝜃321 𝜃331 𝜃312 𝜃322 𝜃332 𝜃313 𝜃323 𝜃333 𝜃(2) = 𝜃111 𝜃211 𝜃311 𝜃112 0 𝜃312 𝜃311 0 𝜃313 𝜃121 0 𝜃321 0 0 𝜃322 0 0 𝜃323 𝜃131 0 𝜃331 0 0 𝜃332 0 0 𝜃333 𝜃(3) = 𝜃111 𝜃211 𝜃311 𝜃121 0 𝜃321 𝜃131 0 𝜃331 𝜃112 0 𝜃312 0 0 𝜃322 0 0 𝜃332 𝜃113 0 𝜃313 0 0 𝜃323 0 0 𝜃333 One bingo 𝜃123 • • 𝒫 ത 𝒫 𝐷𝐾𝐿 𝒫, ത 𝒫 m-projection Subspace with one bingo in the mode-1 direction ℬ 1 52 No bingo No bingo Input tensor The mode-𝑘 expansion 𝜃(𝑘) of the natural parameter of a tensor 𝒫 ∈ ℝ >0 𝐼1×𝐼2×𝐼3 has 𝑏𝑘 bingos ⇒ rank 𝒫 ≤ 𝐼1 − 𝑏1 , 𝐼2 − 𝑏2 , 𝐼3 − 𝑏3 Bingo rule (𝑑 = 3 ) Rank 2,3,3
  53. Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or

    less 53 𝜃 is zero 𝜃 can be any STEP1 : Choose a bingo location.
  54. Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or

    less Bingo Bingo Bingo 54 𝜃 is zero 𝜃 can be any STEP1 : Choose a bingo location.
  55. 𝜃 is zero 𝜃 can be any STEP1 : Choose

    a bingo location. The shaded areas do not change their values in the projection. 55 Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor.
  56. Replace the partial tensor in the red box using the

    best rank-1 approximation formula 56 Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor. STEP1 : Choose a bingo location. 𝜃 is zero 𝜃 can be any
  57. Replace the partial tensor in the red box using the

    best rank-1 approximation formula 57 Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor. STEP1 : Choose a bingo location. 𝜃 is zero 𝜃 can be any
  58. Replace the partial tensor in the red box using the

    best rank-1 approximation formula The best tensor is obtained in the specified bingo space. 😄 There is no guarantee that it is the best rank (5,8,3) approximation. 😢 58 Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor. STEP1 : Choose a bingo location. 𝜃 is zero 𝜃 can be any
  59. 59 Example: Reduce the rank of (8,8,3) tensor to (5,7,3)

    or less STEP2 : Replace the bingo part with the best rank-1 tensor. STEP1 : Choose a bingo location. 𝜃 is zero 𝜃 can be any The shaded areas do not change their values in the projection.
  60. 60 Experimental results (synthetic data) LTR is faster with the

    competitive approximation performance.
  61. 61 Experimental results (real data) LTR is faster with the

    competitive approximation performance. (92, 112, 400) (9, 9, 512, 512, 3)
  62. Contents 62 □ Introduction of log-linear model on DAG □

    The best rank-1 approximation formula □ Legendre Tucker-Rank Reduction(LTR) □ The best rank-1 NMMF □A1GM: faster rank-1 missing NMF □ Motivation, Strategy, and Contributions github.com/gkazunii/A1GM github.com/gkazunii/ Legendre-tucker-rank-reduction □ Theoretical Remarks □ Conclusion 16:40
  63. Strategy for rank-1 NMF with missing values 63 If 𝐗𝑖𝑗

    is missing otherwise Element-wise product 𝚽𝑖𝑗 = ቊ 0 1 □ Collect missing values in a corner of matrix to solve as coupled NMF Missing value
  64. Strategy for rank-1 NMF with missing values 64 NMMF (Takeuchi

    et al., 2013) 𝚽𝑖𝑗 = ቊ 0 1 If 𝐗𝑖𝑗 is missing otherwise Element-wise product Missing value □ Collect missing values in a corner of matrix to solve as coupled NMF Equivalent
  65. NMMF, Nonnegative multiple matrix factorization (Takeuchi et al., 2013) 65

    user artist tag user user tag artist user user artist
  66. The best rank-1 approximation of NMMF The best rank-1 approximation

    of NMMF 66 user artist tag user user tag artist user user artist
  67. Modeling of NMMF 67 One To One

  68. One-body and many-body parameters 68 𝑿, 𝒀, 𝒁 is simultaneously

    rank-1 decomposable. ⇔ It can be written as 𝒘 ⊗ 𝒉, 𝒂 ⊗ 𝒉, 𝒘 ⊗ 𝒃 . One-body parameter Two-body parameter
  69. Information geometry of rank-1 NMMF 69 𝑿, 𝒀, 𝒁 is

    simultaneously rank-1 decomposable. ⇔ It can be written as 𝒘 ⊗ 𝒉, 𝒂 ⊗ 𝒉, 𝒘 ⊗ 𝒃 . Its all two-body 𝜃-parameters are 0. Simultaneous Rank-1 𝜽-condition One-body parameter Two-body parameter
  70. Information geometry of rank-1 NMMF 70 𝜂𝑖𝑗 = 𝜂𝑖1 𝜂1𝑗

    Simultaneous Rank-1 𝜼-condition Its all two-body 𝜃-parameters are 0. Simultaneous Rank-1 𝜽-condition 𝑿, 𝒀, 𝒁 is simultaneously rank-1 decomposable. ⇔ It can be written as 𝒘 ⊗ 𝒉, 𝒂 ⊗ 𝒉, 𝒘 ⊗ 𝒃 . One-body parameter Two-body parameter is e-flat. The projection is unique.
  71. Find the global optimal solution of rank-1 NMMF 71 𝜂𝑖𝑗

    = 𝜂𝑖1 𝜂1𝑗 Simultaneous Rank-1 𝜼-condition Its all two-body 𝜃-parameters are 0. Simultaneous Rank-1 𝜽-condition 𝑿, 𝒀, 𝒁 is simultaneously rank-1 decomposable. ⇔ It can be written as 𝒘 ⊗ 𝒉, 𝒂 ⊗ 𝒉, 𝒘 ⊗ 𝒃 . One-body parameter Two-body parameter The m-projection does not change one-body η-parameter Shun-ichi Amari, Information Geometry and Its Applications, 2008, Theorem 11.6
  72. Find the global optimal solution of rank-1 NMMF 72 𝜂𝑖𝑗

    = 𝜂𝑖1 𝜂1𝑗 Simultaneous Rank-1 𝜼-condition Its all two-body 𝜃-parameters are 0. Simultaneous Rank-1 𝜽-condition 𝑿, 𝒀, 𝒁 is simultaneously rank-1 decomposable. ⇔ It can be written as 𝒘 ⊗ 𝒉, 𝒂 ⊗ 𝒉, 𝒘 ⊗ 𝒃 . One-body parameter Two-body parameter The m-projection does not change one-body η-parameter Shun-ichi Amari, Information Geometry and Its Applications, 2008, Theorem 11.6 All 𝜼-parameters after the projection are identified. 19:20
  73. Rank-1 NMF with missing values □ NMMF can be viewed

    as a special case of NMF with missing values. Equivalent 73
  74. Rank-1 NMF with missing values □ NMMF can be viewed

    as a special case of NMF with missing values. Equivalent □ NMF is homogeneous for row and column permutations 74
  75. A1GM: Algorithm Step 1 : Gather missing values in the

    bottom right. Step 2 : Use the formula of the best rank-1 NMMF. 75 Step 3 : Repermutate Find exact solution 🤔❓
  76. Examples that permutations cannot collect missing values into corners 76

  77. Add missing values to solve the problem as NMMF 77

  78. Add missing values to solve the problem as NMMF 78

    Reconstruction error worsens 😢
  79. Add missing values to solve the problem as NMMF 79

    Gain in efficiency 😀 Reconstruction error worsens 😢
  80. 🙆Data that A1GM is good at and not good at🙅

    80 🙅 Missing values are evenly distributed in each row and column.
  81. 🙆Data that A1GM is good at and not good at🙅

    81 Missing values tend to be in certain columns in some real datasets. ex) disconnected sensing device, optional answer field in questionnaire form 🙅 Missing values are evenly distributed in each row and column. 🙆 Missing are heavily distributed in certain rows and columns.
  82. A1GM: Algorithm Step 1 : Increase the number of missing

    values. Step 2 : Gather missing values in the bottom right. Step 3 : Use the formula of rank-1 NMMF and repermutate. 82
  83. Experiments on real data □ A1GM is compared with gradient-based

    KL-WNMF - Relative runtime < 1 means A1GM is faster than KL-WNMF. - Relative error > 1 means worse reconstruction error of A1GM than KL-WNMF. - Increase rate is the ratio of # missing values after addition of missing values at step1. ×5 – 10 times faster! 83 Find the best solution Add missing values. Accuracy decreases.
  84. Contents 84 □ Introduction of log-linear model on DAG □

    The best rank-1 approximation formula □ Legendre Tucker-Rank Reduction(LTR) □ The best rank-1 NMMF □A1GM: faster rank-1 missing NMF □ Motivation, Strategy, and Contributions github.com/gkazunii/A1GM github.com/gkazunii/Legendre-tucker-rank-reduction □ Theoretical Remarks □ Conclusion 22:30
  85. Theoretical Remarks 1 : Extended NMMF. 𝚽𝑖𝑗 = ቊ 0

    1 If 𝐗𝑖𝑗 is missing otherwise □ The rank of weight matrix is 2 after adding missing values. 𝚽 𝚽 𝐗 𝐗 rank 𝚽 = 2 rank 𝚽 = 2 85
  86. Theoretical Remarks 1 : Extended NMMF. 𝚽𝑖𝑗 = ቊ 0

    1 If 𝐗𝑖𝑗 is missing otherwise □ The rank of weight matrix is 2 after adding missing values. □ Can we exactly solve rank-1 NMF if the rank(Φ) = 2? 𝚽 𝚽 𝐗 𝐗 rank 𝚽 = 2 rank 𝚽 = 2 rank 𝚽 = 2 86
  87. Theoretical Remarks 1 : Extended NMMF. The best rank-1 approximation

    of extended NMMF 87
  88. Theoretical Remarks 1 : Extended NMMF. 88 The best rank-1

    approximation of extended NMMF Equivalent 𝚽𝑖𝑗 = ቊ 0 1 If 𝐗𝑖𝑗 is missing otherwise
  89. Theoretical Remarks 1 : Extended NMMF. 89 The best rank-1

    approximation of extended NMMF Equivalent If rank(𝚽) ≦2, the matrix can be transformed into the form 𝚽𝑖𝑗 = ቊ 0 1 If 𝐗𝑖𝑗 is missing otherwise Permutation We can exactly solve rank-1 NMF with missing values by permutation if rank(𝚽) ≦2.
  90. Theoretical Remarks 2 : Connection to balancing. 90 Transform Balanced

    matrix (Doubly stochastic matrix) □ Matrix Balancing Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifold“(2017) ICML.
  91. Theoretical Remarks 2 : Connection to balancing. 91 Transform Balanced

    matrix (Doubly stochastic matrix) Balancing condition □ Matrix Balancing Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifold“(2017) ICML.
  92. Theoretical Remarks 2 : Connection to balancing. 92 Transform Balanced

    matrix (Doubly stochastic matrix) Balancing condition □ Matrix Balancing Rank-1 condition Its all many-body 𝜃-parameters are 0. Balanced rank-1 matrix is unique.
  93. □ Describe low-rank condition using (𝜃,𝜂) Rank-1 condition (𝜼-representation) ҧ

    𝜂𝑖𝑗𝑘 = ҧ 𝜂𝑖11 ҧ 𝜂1𝑗1 ҧ 𝜂11𝑘 Rank-1 condition (𝜽-representation) All many body ҧ 𝜃𝑖𝑗𝑘 are 0 93 □ Closed Formula of the Best Rank-1 NMMF □ A1GM: Faster Rank-1 NMF with missing values Conclusion The best rank-1 approximation for NMMF Data structure DAG Infor-Geo 93