Slide 1

Slide 1 text

Non-negative low-rank approximations for multi-dimensional arrays on statistical manifold Kazu Ghalamkari1,2, Mahito Sugiyama1,2 International Conference on Information Geometry for Data Science (IG4DS 2022) 1 : The Graduate University for Advanced Studies, SOKENDAI 2 : National Institute of Informatics

Slide 2

Slide 2 text

Motivation โ–ก Non-negative low-rank approximation of data with various structures 2 Approximates with a linear combination of fewer bases (principal components) for feature extraction, memory reduction, and pattern discovery.๐Ÿ˜€ โ‰ƒ โ‰ƒ โ‰ƒ โ‰ƒ

Slide 3

Slide 3 text

Motivation โ–ก Non-negative low-rank approximation of data with various structures 3 Approximates with a linear combination of fewer bases (principal components) for feature extraction, memory reduction, and pattern discovery.๐Ÿ˜€ โ‰ƒ โ‰ƒ โ‰ƒ โ‰ƒ Non-negative constraint improves interpretability

Slide 4

Slide 4 text

Motivation โ–ก Non-negative low-rank approximation of data with various structures 4 Approximates with a linear combination of fewer bases (principal components) for feature extraction, memory reduction, and pattern discovery.๐Ÿ˜€ โ‰ƒ โ‰ƒ โ‰ƒ โ‰ƒ Low-rank approximation with non-negative constraints are based on gradient methods. โ†’ Appropriate settings for stopping criteria, learning rate, and initial values are necessary ๐Ÿ˜ข Non-negative constraint improves interpretability

Slide 5

Slide 5 text

Strategy โ–ก Modeling with probability mass function on Directed Acyclic Graph(DAG). 5 โ–ก Modeling with probability mass function on Directed Acyclic Graph(DAG). โ–ก Utilize projection theory of information geometry.

Slide 6

Slide 6 text

Strategy โ–ก Modeling with probability mass function on Directed Acyclic Graph(DAG). 6 โ–ก Modeling with probability mass function on Directed Acyclic Graph(DAG). โ–ก Utilize projection theory of information geometry.

Slide 7

Slide 7 text

Strategy โ–ก Modeling with probability mass function on Directed Acyclic Graph(DAG). โ–ก Utilize projection theory of information geometry. 7

Slide 8

Slide 8 text

Strategy โ–ก Modeling with probability mass function on Directed Acyclic Graph(DAG). โ–ก Utilize projection theory of information geometry. 8

Slide 9

Slide 9 text

Contribution โ–ก LTR: Faster Tucker-rank Reduction 9 No worries about initial values, stopping criterion and learning rate ๐Ÿ˜„ Information Geometric Analysis using Distributions on DAGs that Correspond to Data Structures

Slide 10

Slide 10 text

Contribution โ–ก LTR: Faster Tucker-rank Reduction 10 No worries about initial values, stopping criterion and learning rate ๐Ÿ˜„ Information Geometric Analysis using Distributions on DAGs that Correspond to Data Structures Rank-1 = rank 1,1,1

Slide 11

Slide 11 text

Contribution โ–ก LTR: Faster Tucker-rank Reduction 11 โ–ก A1GM: Faster rank-1 missing NMF No worries about initial values, stopping criterion and learning rate ๐Ÿ˜„ Solve the task as a coupled NMF. Find the most dominant factor rapidly. Missing value Rank-1 = rank 1,1,1 Information Geometric Analysis using Distributions on DAGs that Correspond to Data Structures

Slide 12

Slide 12 text

Contents 12 โ–ก Introduction of log-linear model on DAG โ–ก The best rank-1 approximation formula โ–ก Legendre Tucker-Rank Reduction(LTR) โ–ก The best rank-1 NMMF โ–กA1GM: faster rank-1 missing NMF โ–ก Motivation, Strategy, and Contributions github.com/gkazunii/A1GM github.com/gkazunii/Legendre-tucker-rank-reduction โ–ก Theoretical Remarks โ–ก Conclusion 3:00

Slide 13

Slide 13 text

Modeling tensor and matrix 13 โ–ก Flexible modeling is required to capture the structure of various data Formulate low-rank approximations with probabilistic models on DAGs

Slide 14

Slide 14 text

โ–กDAG(poset) is a DAG โ‡” for all ๐‘ 1 , ๐‘ 2 , ๐‘ 3 โˆˆ the following three properties are satisfied. (1) Reflexivity โˆถ ๐‘ 1 โ‰ค ๐‘ 1 (2) Antisymmetry: ๐‘ 1 โ‰ค ๐‘ 2 , ๐‘ 2 โ‰ค ๐‘ 1 โ‡’ ๐‘ 1 = ๐‘ 2 (3)Transitivity:๐‘ 1 โ‰ค ๐‘ 2 , ๐‘ 2 โ‰ค ๐‘ 3 โ‡’ ๐‘ 1 โ‰ค ๐‘ 3 Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifoldโ€œ(2017) ICML. 14 Log-linear model on Directed Acyclic Graph (DAG)

Slide 15

Slide 15 text

โ–กDAG(poset) is a DAG โ‡” for all ๐‘ 1 , ๐‘ 2 , ๐‘ 3 โˆˆ the following three properties are satisfied. (1) Reflexivity โˆถ ๐‘ 1 โ‰ค ๐‘ 1 (2) Antisymmetry: ๐‘ 1 โ‰ค ๐‘ 2 , ๐‘ 2 โ‰ค ๐‘ 1 โ‡’ ๐‘ 1 = ๐‘ 2 (3)Transitivity:๐‘ 1 โ‰ค ๐‘ 2 , ๐‘ 2 โ‰ค ๐‘ 3 โ‡’ ๐‘ 1 โ‰ค ๐‘ 3 โ–ก log-linear model on DAG We define the log-linear model on a DAG as a mapping ๐‘: โ†’ 0,1 ๏ผŽNatural parameters ๐œฝ describe the model. ๐œƒ-space Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifoldโ€œ(2017) ICML. 15 Log-linear model on Directed Acyclic Graph (DAG)

Slide 16

Slide 16 text

is a DAG โ‡” for all ๐‘ 1 , ๐‘ 2 , ๐‘ 3 โˆˆ the following three properties are satisfied. โ–กDAG(poset) (1) Reflexivity โˆถ ๐‘ 1 โ‰ค ๐‘ 1 (2) Antisymmetry: ๐‘ 1 โ‰ค ๐‘ 2 , ๐‘ 2 โ‰ค ๐‘ 1 โ‡’ ๐‘ 1 = ๐‘ 2 (3)Transitivity:๐‘ 1 โ‰ค ๐‘ 2 , ๐‘ 2 โ‰ค ๐‘ 3 โ‡’ ๐‘ 1 โ‰ค ๐‘ 3 โ–ก log-linear model on DAG We define the log-linear model on a DAG as a mapping ๐‘: โ†’ 0,1 ๏ผŽNatural parameters ๐œฝ describe the model. ๐œƒ-space ๐œ‚-space We can also describe the model by expectation parameters ๐œผ with Mรถbius function. Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifoldโ€œ(2017) ICML. 16 Log-linear model on Directed Acyclic Graph (DAG)

Slide 17

Slide 17 text

Contents 17 โ–ก Introduction of log-linear model on DAG โ–ก The best rank-1 approximation formula โ–ก Legendre Tucker-Rank Reduction(LTR) โ–ก The best rank-1 NMMF โ–กA1GM: faster rank-1 missing NMF โ–ก Motivation, Strategy, and Contributions github.com/gkazunii/A1GM github.com/gkazunii/Legendre-tucker-rank-reduction โ–ก Theoretical Remarks โ–ก Conclusion 4:30

Slide 18

Slide 18 text

Introducing DAGs for Tensor 18

Slide 19

Slide 19 text

Introducing DAGs for Tensor 19

Slide 20

Slide 20 text

Introducing DAGs for Tensor 20

Slide 21

Slide 21 text

Describe a tensor with (ฮธ,ฮท) 21

Slide 22

Slide 22 text

Describe a tensor with (ฮธ,ฮท) 22

Slide 23

Slide 23 text

Describe a tensor with (ฮธ,ฮท) 23 Mรถbius inversion formula

Slide 24

Slide 24 text

Describe a tensor with (ฮธ,ฮท) 24 Mรถbius inversion formula

Slide 25

Slide 25 text

Describe a tensor with (ฮธ,ฮท) 25 Random variables Sample space Probability values Relation between distribution and tensor Mรถbius inversion formula ๏ผš ๐‘–, ๐‘—, ๐‘˜ , indices of the tensor ๏ผš index set ๏ผš tensor values ๐’ซ๐‘–๐‘—๐‘˜

Slide 26

Slide 26 text

One-body and many-body parameters 26 One-body parameter Many-body parameter

Slide 27

Slide 27 text

๐œฝ-representation of rank-1 tensor 27 One-body parameter Many-body parameter Rank-1 condition (๐œฝ-representation) Its all many-body ๐œƒ-parameters are 0. Rank-1 subspace

Slide 28

Slide 28 text

๐œฝ-representation of rank-1 tensor 28 One-body parameter Many-body parameter Rank-1 subspace Rank-1 condition (๐œฝ-representation) Its all many-body ๐œƒ-parameters are 0. is e-flat. The projection is unique.

Slide 29

Slide 29 text

๐œฝ-representation of rank-1 tensor 29 One-body parameter Many-body parameter We can find the projection destination by a gradient-method. But gradient-methods require Appropriate settings for stopping criteria, learning rate, and initial values ๐Ÿ˜ข Rank-1 subspace Rank-1 condition (๐œฝ-representation) Its all many-body ๐œƒ-parameters are 0. is e-flat. The projection is unique.

Slide 30

Slide 30 text

๐œฝ-representation of rank-1 tensor 30 Let us describe the rank-1 condition with the ๐œ‚-parameter. is e-flat. The projection is unique. One-body parameter Many-body parameter Rank-1 subspace Its all many-body ๐œƒ-parameters are 0. Rank-1 condition (๐œฝ-representation) We can find the projection destination by a gradient-method. But gradient-methods require Appropriate settings for stopping criteria, learning rate, and initial values ๐Ÿ˜ข

Slide 31

Slide 31 text

๐œผ-representation of rank-1 tensor 31 One-body parameter Many-body parameter Rank-1 subspace ๐œ‚๐‘–๐‘—๐‘˜ = ๐œ‚๐‘–11 ๐œ‚1๐‘—1 ๐œ‚11๐‘˜ Rank-1 condition (๐œผ- representation) Rank-1 condition (๐œฝ-representation) Its all many-body ๐œƒ-parameters are 0. Rank-1 subspace

Slide 32

Slide 32 text

๐œผ-representation of rank-1 tensor 32 The m-projection does not change one-body ฮท-parameter = = = Shun-ichi Amari, Information Geometry and Its Applications, 2008, Theorem 11.6 One-body parameter Many-body parameter ๐œ‚๐‘–๐‘—๐‘˜ = ๐œ‚๐‘–11 ๐œ‚1๐‘—1 ๐œ‚11๐‘˜ Rank-1 condition (๐œผ- representation) Rank-1 condition (๐œฝ-representation) Rank-1 subspace Its all many-body ๐œƒ-parameters are 0.

Slide 33

Slide 33 text

าง ๐œ‚๐‘–๐‘—๐‘˜ = าง ๐œ‚๐‘–11 าง ๐œ‚1๐‘—1 าง ๐œ‚11๐‘˜ Find the best rank-1 approximation 33 One-body parameter Many-body parameter Rank-1 condition (๐œผ- representation) Rank-1 condition (๐œฝ-representation) Rank-1 subspace Its all many-body ๐œƒ-parameters are 0.

Slide 34

Slide 34 text

าง ๐œ‚๐‘–๐‘—๐‘˜ = าง ๐œ‚๐‘–11 าง ๐œ‚1๐‘—1 าง ๐œ‚11๐‘˜ Find the best rank-1 approximation 34 One-body parameter Many-body parameter Mรถbius inversion formula = ๐œ‚๐‘–11 ๐œ‚1๐‘—1 ๐œ‚11๐‘˜ Rank-1 condition (๐œผ- representation) Rank-1 condition (๐œฝ-representation) Rank-1 subspace All ๐œผ-parameters after the projection are identified. Using inversion formula, we found the projection destination. Its all many-body ๐œƒ-parameters are 0.

Slide 35

Slide 35 text

The best rank-1 approximation of ๐’ซ โˆˆ โ„>0 ๐ผร—๐ฝร—๐พ is given as which minimizes KL divergence from ๐’ซ. Best rank-1 tensor formula for minimizing KL divergence (๐‘‘ = 3 ) 35 Mean-field approximation and rank-1 approximation We reproduce the result in K.Huang, et al. "Kullback-Leibler principal component for tensors is not NP-hard." ACSSC 2017 9:00

Slide 36

Slide 36 text

The best rank-1 approximation of ๐’ซ โˆˆ โ„>0 ๐ผร—๐ฝร—๐พ is given as which minimizes KL divergence from ๐’ซ. Best rank-1 tensor formula for minimizing KL divergence (๐‘‘ = 3 ) 36 By the way, Frobenius error minimization is NP-hard Mean-field approximation and rank-1 approximation We reproduce the result in K.Huang, et al. "Kullback-Leibler principal component for tensors is not NP-hard." ACSSC 2017

Slide 37

Slide 37 text

The best rank-1 approximation of ๐’ซ โˆˆ โ„>0 ๐ผร—๐ฝร—๐พ is given as which minimizes KL divergence from ๐’ซ. A tensor with ๐‘‘ indices is a joint distribution with ๐‘‘ random variables. A vector with only 1 index is an independent distribution with only one random variable. Best rank-1 tensor formula for minimizing KL divergence (๐‘‘ = 3 ) 37 By the way, Frobenius error minimization is NP-hard Mean-field approximation and rank-1 approximation We reproduce the result in K.Huang, et al. "Kullback-Leibler principal component for tensors is not NP-hard." ACSSC 2017 Normalized vector depending on only ๐‘– Normalized vector depending on only ๐‘— Normalized vector depending on only ๐‘˜

Slide 38

Slide 38 text

The best rank-1 approximation of ๐’ซ โˆˆ โ„>0 ๐ผร—๐ฝร—๐พ is given as which minimizes KL divergence from ๐’ซ. A tensor with ๐‘‘ indices is a joint distribution with ๐‘‘ random variables. A vector with only 1 index is an independent distribution with only one random variable. Rank-1 approximation approximates a joint distribution by a product of independent distributions. Best rank-1 tensor formula for minimizing KL divergence (๐‘‘ = 3 ) 38 By the way, Frobenius error minimization is NP-hard Mean-field approximation and rank-1 approximation We reproduce the result in K.Huang, et al. "Kullback-Leibler principal component for tensors is not NP-hard." ACSSC 2017 Mean-field approximation : a methodology in physics for reducing a many-body problem to a one-body problem. Normalized vector depending on only ๐‘– Normalized vector depending on only ๐‘— Normalized vector depending on only ๐‘˜

Slide 39

Slide 39 text

MFA of Boltzmann-machine ๐‘ ๐’™ = 1 ๐‘(๐œฝ) exp เท ๐‘– ๐œƒ๐‘– ๐‘ฅ๐‘– + เท ๐‘–<๐‘— ๐œƒ๐‘–๐‘— ๐‘ฅ๐‘– ๐‘ฅ๐‘— ๐ท๐พ๐ฟ ๐‘, ฦธ ๐‘ ๐œ‚๐‘– = เท ๐‘ฅ1=0 1 โ‹ฏ เท ๐‘ฅ๐‘›=0 1 ๐‘ฅ๐‘– ๐‘ ๐’™ 39 Interaction Bias Mean-field approximation and rank-1 approximation

Slide 40

Slide 40 text

MFA of Boltzmann-machine ๐‘ ๐’™ = 1 ๐‘(๐œฝ) exp เท ๐‘– ๐œƒ๐‘– ๐‘ฅ๐‘– + เท ๐‘–<๐‘— ๐œƒ๐‘–๐‘— ๐‘ฅ๐‘– ๐‘ฅ๐‘— ๐ท๐พ๐ฟ ๐‘, ฦธ ๐‘ ๐œ‚๐‘– = เท ๐‘ฅ1=0 1 โ‹ฏ เท ๐‘ฅ๐‘›=0 1 ๐‘ฅ๐‘– ๐‘ ๐’™ 40 Interaction Bias Mean-field approximation and rank-1 approximation = 1 ๐‘(๐œฝ) exp เท ๐‘– ๐œƒ๐‘– ๐‘ฅ๐‘– = ๐‘ ๐‘ฅ1 โ€ฆ ๐‘(๐‘ฅ๐‘› )

Slide 41

Slide 41 text

๐‘‚ 2๐‘› ๐ท๐พ๐ฟ ๐‘, ฦธ ๐‘ ๐ท๐พ๐ฟ ฦธ ๐‘๐‘’ , ๐‘ าง ๐œ‚๐‘– = sigmoid ๐œƒ๐‘– + เท ๐‘˜ ๐œƒ๐‘˜๐‘— าง ๐œ‚๐‘˜ 41 Mean-field approximation and rank-1 approximation MF equations MFA of Boltzmann-machine ๐‘ ๐’™ = 1 ๐‘(๐œฝ) exp เท ๐‘– ๐œƒ๐‘– ๐‘ฅ๐‘– + เท ๐‘–<๐‘— ๐œƒ๐‘–๐‘— ๐‘ฅ๐‘– ๐‘ฅ๐‘— ๐œ‚๐‘– = เท ๐‘ฅ1=0 1 โ‹ฏ เท ๐‘ฅ๐‘›=0 1 ๐‘ฅ๐‘– ๐‘ ๐’™ ๐‘‚ 2๐‘› Interaction Bias

Slide 42

Slide 42 text

Rank-1 approximation ๐‘๐œƒ (๐‘–, ๐‘—, ๐‘˜) = exp เท ๐‘–โ€ฒ=1 ๐‘– เท ๐‘—โ€ฒ=1 ๐‘— เท ๐‘˜โ€ฒ=1 ๐‘˜ ๐œƒ๐‘–โ€ฒ๐‘—โ€ฒ๐‘˜โ€ฒ ๐‘‚ 2๐‘› ๐ท๐พ๐ฟ ๐‘, ฦธ ๐‘ ๐ท๐พ๐ฟ ๐‘, ฦธ ๐‘ MF equations Set of products of independent distributions ๐œ‚๐‘–11 = เท ๐‘—โ€ฒ=1 ๐ฝ เท ๐‘˜โ€ฒ=1 ๐พ ๐’ซ๐‘–๐‘—โ€ฒ๐‘˜โ€ฒ าง ๐œ‚๐‘– = sigmoid ๐œƒ๐‘– + เท ๐‘˜ ๐œƒ๐‘˜๐‘— าง ๐œ‚๐‘˜ 42 Mean-field approximation and rank-1 approximation ๐ท๐พ๐ฟ ฦธ ๐‘๐‘’ , ๐‘ MFA of Boltzmann-machine ๐‘ ๐’™ = 1 ๐‘(๐œฝ) exp เท ๐‘– ๐œƒ๐‘– ๐‘ฅ๐‘– + เท ๐‘–<๐‘— ๐œƒ๐‘–๐‘— ๐‘ฅ๐‘– ๐‘ฅ๐‘— ๐œ‚๐‘– = เท ๐‘ฅ1=0 1 โ‹ฏ เท ๐‘ฅ๐‘›=0 1 ๐‘ฅ๐‘– ๐‘ ๐’™ ๐‘‚ 2๐‘› Interaction Bias

Slide 43

Slide 43 text

Rank-1 approximation ๐‘๐œƒ (๐‘–, ๐‘—, ๐‘˜) = exp เท ๐‘–โ€ฒ=1 ๐‘– เท ๐‘—โ€ฒ=1 ๐‘— เท ๐‘˜โ€ฒ=1 ๐‘˜ ๐œƒ๐‘–โ€ฒ๐‘—โ€ฒ๐‘˜โ€ฒ ๐‘‚ 2๐‘› ๐‘‚ ๐ผ๐ฝ๐พ ๐ท๐พ๐ฟ ๐‘, ฦธ ๐‘ ๐ท๐พ๐ฟ ๐‘, ฦธ ๐‘ MF equations Set of products of independent distributions ๐œ‚๐‘–11 = เท ๐‘—โ€ฒ=1 ๐ฝ เท ๐‘˜โ€ฒ=1 ๐พ ๐’ซ๐‘–๐‘—โ€ฒ๐‘˜โ€ฒ าง ๐œ‚๐‘– = sigmoid ๐œƒ๐‘– + เท ๐‘˜ ๐œƒ๐‘˜๐‘— าง ๐œ‚๐‘˜ ๐ถ๐‘œ๐‘š๐‘๐‘ข๐‘ก๐‘Ž๐‘๐‘™๐‘’ 43 Mean-field approximation and rank-1 approximation ๐ท๐พ๐ฟ ฦธ ๐‘๐‘’ , ๐‘ MFA of Boltzmann-machine ๐‘ ๐’™ = 1 ๐‘(๐œฝ) exp เท ๐‘– ๐œƒ๐‘– ๐‘ฅ๐‘– + เท ๐‘–<๐‘— ๐œƒ๐‘–๐‘— ๐‘ฅ๐‘– ๐‘ฅ๐‘— ๐œ‚๐‘– = เท ๐‘ฅ1=0 1 โ‹ฏ เท ๐‘ฅ๐‘›=0 1 ๐‘ฅ๐‘– ๐‘ ๐’™ ๐‘‚ 2๐‘› Interaction Bias

Slide 44

Slide 44 text

44 Mean-field approximation and rank-1 approximation Minimizing KL divergence Minimizing inverse-KL divergence Rank-1 approximation Mean-field Approximation of BM impossible Closed-formula ๐œ‚๐‘– = ฯƒ ๐œƒ๐‘– + เท ๐‘˜ ๐œƒ๐‘˜๐‘— ๐œ‚๐‘˜ ๐‘‚ 2๐‘› m-projection e-projection Projection onto e-flat space Projection onto e-flat space 44 unique unique not unique

Slide 45

Slide 45 text

Contents 45 โ–ก Introduction of log-linear model on DAG โ–ก The best rank-1 approximation formula โ–ก Legendre Tucker-Rank Reduction(LTR) โ–ก The best rank-1 NMMF โ–กA1GM: faster rank-1 missing NMF โ–ก Motivation, Strategy, and Contributions github.com/gkazunii/A1GM github.com/gkazunii/ Legendre-tucker-rank-reduction โ–ก Theoretical Remarks โ–ก Conclusion 12:00

Slide 46

Slide 46 text

Formulate Tucker rank reduction by relaxing the rank-1 condition ๐œƒ๐‘–๐‘—๐‘˜ = 0 ๐œƒ112 ๐œƒ131 ๐œƒ121 ๐œƒ113 ๐œƒ211 ๐œƒ311 Expand the tensor by focusing on the ๐‘š-th axis into a rectangular matrix ๐œƒ(๐‘š) (mode-๐‘š expansion) rank ๐’ซ = 1 โŸบ its all manyโˆ’body ๐œƒ parameters are 0 Rank-1 condition (๐œฝ-representation) 46

Slide 47

Slide 47 text

๐œƒ(3) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ121 0 0 ๐œƒ131 0 0 ๐œƒ112 0 0 0 0 0 0 0 0 ๐œƒ113 0 0 0 0 0 0 0 0 ๐œƒ(1) = ๐œƒ111 ๐œƒ121 ๐œƒ131 ๐œƒ112 0 0 ๐œƒ113 0 0 ๐œƒ211 0 0 0 0 0 0 0 0 ๐œƒ311 0 0 0 0 0 0 0 0 ๐œƒ(2) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ112 0 0 ๐œƒ311 0 0 ๐œƒ121 0 0 0 0 0 0 0 0 ๐œƒ131 0 0 0 0 0 0 0 0 Formulate Tucker rank reduction by relaxing the rank-1 condition ๐œƒ๐‘–๐‘—๐‘˜ = 0 ๐œƒ112 ๐œƒ131 ๐œƒ121 ๐œƒ113 ๐œƒ211 ๐œƒ311 Expand the tensor by focusing on the ๐‘š-th axis into a rectangular matrix ๐œƒ(๐‘š) (mode-๐‘š expansion) Rank 1,1,1 rank ๐’ซ = 1 โŸบ its all manyโˆ’body ๐œƒ parameters are 0 Rank-1 condition (๐œฝ-representation) 47

Slide 48

Slide 48 text

Formulate Tucker rank reduction by relaxing the rank-1 condition ๐œƒ๐‘–๐‘—๐‘˜ = 0 ๐œƒ112 ๐œƒ131 ๐œƒ121 ๐œƒ113 ๐œƒ211 ๐œƒ311 Expand the tensor by focusing on the ๐‘š-th axis into a rectangular matrix ๐œƒ(๐‘š) (mode-๐‘š expansion) ๐œƒ(1) = ๐œƒ111 ๐œƒ121 ๐œƒ131 ๐œƒ112 0 0 ๐œƒ113 0 0 ๐œƒ211 0 0 0 0 0 0 0 0 ๐œƒ311 0 0 0 0 0 0 0 0 ๐œƒ(2) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ112 0 0 ๐œƒ311 0 0 ๐œƒ121 0 0 0 0 0 0 0 0 ๐œƒ131 0 0 0 0 0 0 0 0 ๐œƒ(3) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ121 0 0 ๐œƒ131 0 0 ๐œƒ112 0 0 0 0 0 0 0 0 ๐œƒ113 0 0 0 0 0 0 0 0 Rank 1,1,1 Two bingos rank ๐’ซ = 1 โŸบ its all manyโˆ’body ๐œƒ parameters are 0 Rank-1 condition (๐œฝ-representation) Two bingos Two bingos 48

Slide 49

Slide 49 text

Formulate Tucker rank reduction by relaxing the rank-1 condition ๐œƒ๐‘–๐‘—๐‘˜ = 0 ๐œƒ112 ๐œƒ131 ๐œƒ121 ๐œƒ113 ๐œƒ211 ๐œƒ311 Expand the tensor by focusing on the ๐‘š-th axis into a rectangular matrix ๐œƒ(๐‘š) (mode-๐‘š expansion) ๐œƒ(1) = ๐œƒ111 ๐œƒ121 ๐œƒ131 ๐œƒ112 0 0 ๐œƒ113 0 0 ๐œƒ211 0 0 0 0 0 0 0 0 ๐œƒ311 0 0 0 0 0 0 0 0 ๐œƒ(2) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ112 0 0 ๐œƒ311 0 0 ๐œƒ121 0 0 0 0 0 0 0 0 ๐œƒ131 0 0 0 0 0 0 0 0 ๐œƒ(3) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ121 0 0 ๐œƒ131 0 0 ๐œƒ112 0 0 0 0 0 0 0 0 ๐œƒ113 0 0 0 0 0 0 0 0 Rank 1,1,1 Two bingos rank ๐’ซ = 1 โŸบ its all manyโˆ’body ๐œƒ parameters are 0 Rank-1 condition (๐œฝ-representation) Two bingos Two bingos The first row and first column are the scaling factors 49

Slide 50

Slide 50 text

The relationship between bingo and rank ๐œƒ(1) = ๐œƒ111 ๐œƒ121 ๐œƒ131 ๐œƒ112 0 0 ๐œƒ113 0 0 ๐œƒ211 0 0 0 0 0 0 0 0 ๐œƒ311 ๐œƒ321 ๐œƒ331 ๐œƒ312 ๐œƒ322 ๐œƒ332 ๐œƒ313 ๐œƒ323 ๐œƒ333 ๐œƒ(2) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ112 0 ๐œƒ312 ๐œƒ311 0 ๐œƒ313 ๐œƒ121 0 ๐œƒ321 0 0 ๐œƒ322 0 0 ๐œƒ323 ๐œƒ131 0 ๐œƒ331 0 0 ๐œƒ332 0 0 ๐œƒ333 ๐œƒ(3) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ121 0 ๐œƒ321 ๐œƒ131 0 ๐œƒ331 ๐œƒ112 0 ๐œƒ312 0 0 ๐œƒ322 0 0 ๐œƒ332 ๐œƒ113 0 ๐œƒ313 0 0 ๐œƒ323 0 0 ๐œƒ333 One bingo 50 No bingo No bingo Rank 2,3,3

Slide 51

Slide 51 text

The relationship between bingo and rank ๐œƒ(1) = ๐œƒ111 ๐œƒ121 ๐œƒ131 ๐œƒ112 0 0 ๐œƒ113 0 0 ๐œƒ211 0 0 0 0 0 0 0 0 ๐œƒ311 ๐œƒ321 ๐œƒ331 ๐œƒ312 ๐œƒ322 ๐œƒ332 ๐œƒ313 ๐œƒ323 ๐œƒ333 ๐œƒ(2) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ112 0 ๐œƒ312 ๐œƒ311 0 ๐œƒ313 ๐œƒ121 0 ๐œƒ321 0 0 ๐œƒ322 0 0 ๐œƒ323 ๐œƒ131 0 ๐œƒ331 0 0 ๐œƒ332 0 0 ๐œƒ333 ๐œƒ(3) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ121 0 ๐œƒ321 ๐œƒ131 0 ๐œƒ331 ๐œƒ112 0 ๐œƒ312 0 0 ๐œƒ322 0 0 ๐œƒ332 ๐œƒ113 0 ๐œƒ313 0 0 ๐œƒ323 0 0 ๐œƒ333 One bingo ๐œƒ123 โ— โ— ๐’ซ เดค ๐’ซ ๐ท๐พ๐ฟ ๐’ซ, เดค ๐’ซ m-projection Subspace with one bingo in the mode-1 direction โ„ฌ 1 51 No bingo No bingo Input tensor Rank 2,3,3

Slide 52

Slide 52 text

The relationship between bingo and rank ๐œƒ(1) = ๐œƒ111 ๐œƒ121 ๐œƒ131 ๐œƒ112 0 0 ๐œƒ113 0 0 ๐œƒ211 0 0 0 0 0 0 0 0 ๐œƒ311 ๐œƒ321 ๐œƒ331 ๐œƒ312 ๐œƒ322 ๐œƒ332 ๐œƒ313 ๐œƒ323 ๐œƒ333 ๐œƒ(2) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ112 0 ๐œƒ312 ๐œƒ311 0 ๐œƒ313 ๐œƒ121 0 ๐œƒ321 0 0 ๐œƒ322 0 0 ๐œƒ323 ๐œƒ131 0 ๐œƒ331 0 0 ๐œƒ332 0 0 ๐œƒ333 ๐œƒ(3) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ121 0 ๐œƒ321 ๐œƒ131 0 ๐œƒ331 ๐œƒ112 0 ๐œƒ312 0 0 ๐œƒ322 0 0 ๐œƒ332 ๐œƒ113 0 ๐œƒ313 0 0 ๐œƒ323 0 0 ๐œƒ333 One bingo ๐œƒ123 โ— โ— ๐’ซ เดค ๐’ซ ๐ท๐พ๐ฟ ๐’ซ, เดค ๐’ซ m-projection Subspace with one bingo in the mode-1 direction โ„ฌ 1 52 No bingo No bingo Input tensor The mode-๐‘˜ expansion ๐œƒ(๐‘˜) of the natural parameter of a tensor ๐’ซ โˆˆ โ„ >0 ๐ผ1ร—๐ผ2ร—๐ผ3 has ๐‘๐‘˜ bingos โ‡’ rank ๐’ซ โ‰ค ๐ผ1 โˆ’ ๐‘1 , ๐ผ2 โˆ’ ๐‘2 , ๐ผ3 โˆ’ ๐‘3 Bingo rule (๐‘‘ = 3 ) Rank 2,3,3

Slide 53

Slide 53 text

Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less 53 ๐œƒ is zero ๐œƒ can be any STEP1 : Choose a bingo location.

Slide 54

Slide 54 text

Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less Bingo Bingo Bingo 54 ๐œƒ is zero ๐œƒ can be any STEP1 : Choose a bingo location.

Slide 55

Slide 55 text

๐œƒ is zero ๐œƒ can be any STEP1 : Choose a bingo location. The shaded areas do not change their values in the projection. 55 Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor.

Slide 56

Slide 56 text

Replace the partial tensor in the red box using the best rank-1 approximation formula 56 Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor. STEP1 : Choose a bingo location. ๐œƒ is zero ๐œƒ can be any

Slide 57

Slide 57 text

Replace the partial tensor in the red box using the best rank-1 approximation formula 57 Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor. STEP1 : Choose a bingo location. ๐œƒ is zero ๐œƒ can be any

Slide 58

Slide 58 text

Replace the partial tensor in the red box using the best rank-1 approximation formula The best tensor is obtained in the specified bingo space. ๐Ÿ˜„ There is no guarantee that it is the best rank (5,8,3) approximation. ๐Ÿ˜ข 58 Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor. STEP1 : Choose a bingo location. ๐œƒ is zero ๐œƒ can be any

Slide 59

Slide 59 text

59 Example: Reduce the rank of (8,8,3) tensor to (5,7,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor. STEP1 : Choose a bingo location. ๐œƒ is zero ๐œƒ can be any The shaded areas do not change their values in the projection.

Slide 60

Slide 60 text

60 Experimental results (synthetic data) LTR is faster with the competitive approximation performance.

Slide 61

Slide 61 text

61 Experimental results (real data) LTR is faster with the competitive approximation performance. (92, 112, 400) (9, 9, 512, 512, 3)

Slide 62

Slide 62 text

Contents 62 โ–ก Introduction of log-linear model on DAG โ–ก The best rank-1 approximation formula โ–ก Legendre Tucker-Rank Reduction(LTR) โ–ก The best rank-1 NMMF โ–กA1GM: faster rank-1 missing NMF โ–ก Motivation, Strategy, and Contributions github.com/gkazunii/A1GM github.com/gkazunii/ Legendre-tucker-rank-reduction โ–ก Theoretical Remarks โ–ก Conclusion 16:40

Slide 63

Slide 63 text

Strategy for rank-1 NMF with missing values 63 If ๐—๐‘–๐‘— is missing otherwise Element-wise product ๐šฝ๐‘–๐‘— = แ‰Š 0 1 โ–ก Collect missing values in a corner of matrix to solve as coupled NMF Missing value

Slide 64

Slide 64 text

Strategy for rank-1 NMF with missing values 64 NMMF (Takeuchi et al., 2013) ๐šฝ๐‘–๐‘— = แ‰Š 0 1 If ๐—๐‘–๐‘— is missing otherwise Element-wise product Missing value โ–ก Collect missing values in a corner of matrix to solve as coupled NMF Equivalent

Slide 65

Slide 65 text

NMMF, Nonnegative multiple matrix factorization (Takeuchi et al., 2013) 65 user artist tag user user tag artist user user artist

Slide 66

Slide 66 text

The best rank-1 approximation of NMMF The best rank-1 approximation of NMMF 66 user artist tag user user tag artist user user artist

Slide 67

Slide 67 text

Modeling of NMMF 67 One To One

Slide 68

Slide 68 text

One-body and many-body parameters 68 ๐‘ฟ, ๐’€, ๐’ is simultaneously rank-1 decomposable. โ‡” It can be written as ๐’˜ โŠ— ๐’‰, ๐’‚ โŠ— ๐’‰, ๐’˜ โŠ— ๐’ƒ . One-body parameter Two-body parameter

Slide 69

Slide 69 text

Information geometry of rank-1 NMMF 69 ๐‘ฟ, ๐’€, ๐’ is simultaneously rank-1 decomposable. โ‡” It can be written as ๐’˜ โŠ— ๐’‰, ๐’‚ โŠ— ๐’‰, ๐’˜ โŠ— ๐’ƒ . Its all two-body ๐œƒ-parameters are 0. Simultaneous Rank-1 ๐œฝ-condition One-body parameter Two-body parameter

Slide 70

Slide 70 text

Information geometry of rank-1 NMMF 70 ๐œ‚๐‘–๐‘— = ๐œ‚๐‘–1 ๐œ‚1๐‘— Simultaneous Rank-1 ๐œผ-condition Its all two-body ๐œƒ-parameters are 0. Simultaneous Rank-1 ๐œฝ-condition ๐‘ฟ, ๐’€, ๐’ is simultaneously rank-1 decomposable. โ‡” It can be written as ๐’˜ โŠ— ๐’‰, ๐’‚ โŠ— ๐’‰, ๐’˜ โŠ— ๐’ƒ . One-body parameter Two-body parameter is e-flat. The projection is unique.

Slide 71

Slide 71 text

Find the global optimal solution of rank-1 NMMF 71 ๐œ‚๐‘–๐‘— = ๐œ‚๐‘–1 ๐œ‚1๐‘— Simultaneous Rank-1 ๐œผ-condition Its all two-body ๐œƒ-parameters are 0. Simultaneous Rank-1 ๐œฝ-condition ๐‘ฟ, ๐’€, ๐’ is simultaneously rank-1 decomposable. โ‡” It can be written as ๐’˜ โŠ— ๐’‰, ๐’‚ โŠ— ๐’‰, ๐’˜ โŠ— ๐’ƒ . One-body parameter Two-body parameter The m-projection does not change one-body ฮท-parameter Shun-ichi Amari, Information Geometry and Its Applications, 2008, Theorem 11.6

Slide 72

Slide 72 text

Find the global optimal solution of rank-1 NMMF 72 ๐œ‚๐‘–๐‘— = ๐œ‚๐‘–1 ๐œ‚1๐‘— Simultaneous Rank-1 ๐œผ-condition Its all two-body ๐œƒ-parameters are 0. Simultaneous Rank-1 ๐œฝ-condition ๐‘ฟ, ๐’€, ๐’ is simultaneously rank-1 decomposable. โ‡” It can be written as ๐’˜ โŠ— ๐’‰, ๐’‚ โŠ— ๐’‰, ๐’˜ โŠ— ๐’ƒ . One-body parameter Two-body parameter The m-projection does not change one-body ฮท-parameter Shun-ichi Amari, Information Geometry and Its Applications, 2008, Theorem 11.6 All ๐œผ-parameters after the projection are identified. 19:20

Slide 73

Slide 73 text

Rank-1 NMF with missing values โ–ก NMMF can be viewed as a special case of NMF with missing values. Equivalent 73

Slide 74

Slide 74 text

Rank-1 NMF with missing values โ–ก NMMF can be viewed as a special case of NMF with missing values. Equivalent โ–ก NMF is homogeneous for row and column permutations 74

Slide 75

Slide 75 text

A1GM: Algorithm Step 1 : Gather missing values in the bottom right. Step 2 : Use the formula of the best rank-1 NMMF. 75 Step 3 : Repermutate Find exact solution ๐Ÿค”โ“

Slide 76

Slide 76 text

Examples that permutations cannot collect missing values into corners 76

Slide 77

Slide 77 text

Add missing values to solve the problem as NMMF 77

Slide 78

Slide 78 text

Add missing values to solve the problem as NMMF 78 Reconstruction error worsens ๐Ÿ˜ข

Slide 79

Slide 79 text

Add missing values to solve the problem as NMMF 79 Gain in efficiency ๐Ÿ˜€ Reconstruction error worsens ๐Ÿ˜ข

Slide 80

Slide 80 text

๐Ÿ™†Data that A1GM is good at and not good at๐Ÿ™… 80 ๐Ÿ™… Missing values are evenly distributed in each row and column.

Slide 81

Slide 81 text

๐Ÿ™†Data that A1GM is good at and not good at๐Ÿ™… 81 Missing values tend to be in certain columns in some real datasets. ex) disconnected sensing device, optional answer field in questionnaire form ๐Ÿ™… Missing values are evenly distributed in each row and column. ๐Ÿ™† Missing are heavily distributed in certain rows and columns.

Slide 82

Slide 82 text

A1GM: Algorithm Step 1 : Increase the number of missing values. Step 2 : Gather missing values in the bottom right. Step 3 : Use the formula of rank-1 NMMF and repermutate. 82

Slide 83

Slide 83 text

Experiments on real data โ–ก A1GM is compared with gradient-based KL-WNMF - Relative runtime < 1 means A1GM is faster than KL-WNMF. - Relative error > 1 means worse reconstruction error of A1GM than KL-WNMF. - Increase rate is the ratio of # missing values after addition of missing values at step1. ร—5 โ€“ 10 times faster! 83 Find the best solution Add missing values. Accuracy decreases.

Slide 84

Slide 84 text

Contents 84 โ–ก Introduction of log-linear model on DAG โ–ก The best rank-1 approximation formula โ–ก Legendre Tucker-Rank Reduction(LTR) โ–ก The best rank-1 NMMF โ–กA1GM: faster rank-1 missing NMF โ–ก Motivation, Strategy, and Contributions github.com/gkazunii/A1GM github.com/gkazunii/Legendre-tucker-rank-reduction โ–ก Theoretical Remarks โ–ก Conclusion 22:30

Slide 85

Slide 85 text

Theoretical Remarks 1 : Extended NMMF. ๐šฝ๐‘–๐‘— = แ‰Š 0 1 If ๐—๐‘–๐‘— is missing otherwise โ–ก The rank of weight matrix is 2 after adding missing values. ๐šฝ ๐šฝ ๐— ๐— rank ๐šฝ = 2 rank ๐šฝ = 2 85

Slide 86

Slide 86 text

Theoretical Remarks 1 : Extended NMMF. ๐šฝ๐‘–๐‘— = แ‰Š 0 1 If ๐—๐‘–๐‘— is missing otherwise โ–ก The rank of weight matrix is 2 after adding missing values. โ–ก Can we exactly solve rank-1 NMF if the rank(ฮฆ) = 2? ๐šฝ ๐šฝ ๐— ๐— rank ๐šฝ = 2 rank ๐šฝ = 2 rank ๐šฝ = 2 86

Slide 87

Slide 87 text

Theoretical Remarks 1 : Extended NMMF. The best rank-1 approximation of extended NMMF 87

Slide 88

Slide 88 text

Theoretical Remarks 1 : Extended NMMF. 88 The best rank-1 approximation of extended NMMF Equivalent ๐šฝ๐‘–๐‘— = แ‰Š 0 1 If ๐—๐‘–๐‘— is missing otherwise

Slide 89

Slide 89 text

Theoretical Remarks 1 : Extended NMMF. 89 The best rank-1 approximation of extended NMMF Equivalent If rank(๐šฝ) โ‰ฆ2, the matrix can be transformed into the form ๐šฝ๐‘–๐‘— = แ‰Š 0 1 If ๐—๐‘–๐‘— is missing otherwise Permutation We can exactly solve rank-1 NMF with missing values by permutation if rank(๐šฝ) โ‰ฆ2.

Slide 90

Slide 90 text

Theoretical Remarks 2 : Connection to balancing. 90 Transform Balanced matrix (Doubly stochastic matrix) โ–ก Matrix Balancing Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifoldโ€œ(2017) ICML.

Slide 91

Slide 91 text

Theoretical Remarks 2 : Connection to balancing. 91 Transform Balanced matrix (Doubly stochastic matrix) Balancing condition โ–ก Matrix Balancing Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifoldโ€œ(2017) ICML.

Slide 92

Slide 92 text

Theoretical Remarks 2 : Connection to balancing. 92 Transform Balanced matrix (Doubly stochastic matrix) Balancing condition โ–ก Matrix Balancing Rank-1 condition Its all many-body ๐œƒ-parameters are 0. Balanced rank-1 matrix is unique.

Slide 93

Slide 93 text

โ–ก Describe low-rank condition using (๐œƒ,๐œ‚) Rank-1 condition (๐œผ-representation) าง ๐œ‚๐‘–๐‘—๐‘˜ = าง ๐œ‚๐‘–11 าง ๐œ‚1๐‘—1 าง ๐œ‚11๐‘˜ Rank-1 condition (๐œฝ-representation) All many body าง ๐œƒ๐‘–๐‘—๐‘˜ are 0 93 โ–ก Closed Formula of the Best Rank-1 NMMF โ–ก A1GM: Faster Rank-1 NMF with missing values Conclusion The best rank-1 approximation for NMMF Data structure DAG Infor-Geo 93