Tensor Representations in Signal Processing and Machine Learning (Tutorial at APSIPA-ASC 2020)

Slide 1

Slide 1 text

Tensor Representations in Signal Processing and Machine Learning Tatsuya Yokota Tutorial in APSIPA-ASC 2020

Slide 2

Slide 2 text

2 Organization of This Tutorial Basics Applications Advances This tutorial consists of 3 parts. Each part may have 45 mins for presentation + 5-10 mins for Q&A + break

Slide 3

Slide 3 text

3 3 Introduction What & Why tensor? Basics

Slide 4

Slide 4 text

4 Variables having indices Generalization of scalars, vectors, and matrices Ex: Number of indices : “Order” Scalar : 0th order tensor Vector : 1st order tensor Matrices : 2nd order tensor … Tensors

Slide 5

Slide 5 text

5 Multi-way array 0th order tensors : 1st order tensors : 2nd order tensors : 3rd order tensors : Tensor data Vector → array of scalars Matrix → array of vectors 3rd order Tensor → array of matrices

Slide 6

Slide 6 text

6 Let us imagine 3rd order tensor by a “cube” 4th order tensor A block 3rd order tensor of which elements are vectors A block matrix of which elements are matrices A block vector of which elements are 3rd order tensors 5th order tensor A block 4th order tensor of which elements are vectors A block 3rd order tensor of which elements are matrices A block matrix of which elements are 3rd order tensors A block vector of which elements are 4th order tensors Nth order tensor A block vector of which elements are (N-1)th order tensors Recursive representations of Nth order tensor ・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・ Block matrix Block matrix Block vector Block vector

Slide 7

Slide 7 text

7 Tensor representation has benefits when tensors have “low-rank structures” or “smooth structures” Many applications Dimensionality Reduction (Compression), Pattern Recognition, Blind Source Separation, Signal Restoration, Deep Neural Networks, etc. Why tensors

Slide 8

Slide 8 text

8 Psychometrics (1960s-) Linguistics (1970s-) Chemometrics (1980s-) Food industry (1990s-) Computer Science / Data Engineering (1990s-) Multi-media signal processing (audio, speech, image, video) Bio-medical signal processing (EEG, MEG, functional MRI, diffusion MRI) Bio-informatics (gene, protein) Remote sensing (Hyperspectral image) Web/data mining (SNS analysis, recommender system) Non-linear time series analysis Wireless communications Deep learning etc Fields of applications ID 001 ID 002 ID 003 ID 004

Slide 9

Slide 9 text

9 “Tensor Component Analysis” by Yipeng Liu in APSIPA ASC 2019 “Cause-and-Effect in a Tensor Framework” by L. De Lathauwer, M. Alex, O. Vasilescu, and J. Kossaifi in CVPR 2019 “Tensor Factorizations, Data Fusion & Applications” by E. Acar in LVA/ICA 2018 “Blind Audio Source Separation on Tensor Representation” by H. Sawada, N. Ono, H. Kameoka, D. Kitamura in ICASSP 2018 “Tensor Decomposition for Signal Processing and Machine Learning” by N.D. Sidiropoulos, L. De Lathauwer, and X. Fu, E.E. Papalexakis in ICASSP 2017 “A New Tensor Algebra” by L. Horesh and M. Kilmer in CVPR 2017 Related Tutorials / Workshops

Slide 10

Slide 10 text

10 “Tensor Decompositions for Signal Processing Applications: From Two-Way to Multiway Component Analysis” has been awarded in ICASSP 2019. SPM Best Paper Award

Slide 11

Slide 11 text

11 Intermediate Summary Tensor is a multi dimensional array Generalization of vectors, matrices … Many data can be represented by using tensors Tensor processing have many applications. Tensor processing have attracted attentions Workshops/Tutorials Awards So, lets learn about tensors!!

Slide 12

Slide 12 text

12 12 Basic methods for handling tensors Part I : Basics

Slide 13

Slide 13 text

13 Scalar variables lowercase / uppercase letter : Vector variables Lowercase bold letter : Matrix variables Uppercase bold letter : High order tensor variables (3rd, 4th, …, Nth ) Calligraphic bold letter : Mathematical notations

Slide 14

Slide 14 text

14 Here, we call axes of a tensor as “modes” Modes of tensor 1st mode 2nd mode 3rd mode

Slide 15

Slide 15 text

15 A value of (i, j, k) element of a tensor is denoted by Elements of a tensor

Slide 16

Slide 16 text

16 Fibers of a tensor are denoted by using “ : ” Fibers of a tensor Fiber along the 1st mode Fiber along the 2nd mode Fiber along the 3rd mode

Slide 17

Slide 17 text

17 Slices of a tensor are also denoted by using “ : ” Slices of a tensor Slice of the 1st mode Slice of the 2nd mode Slice of the 3rd mode

Slide 18

Slide 18 text

18 Matrix transposition unique Switch the 1st and 2nd modes Transposition / Mode permutation Tensor transposition several choices permute selected modes A case of 3rd order tensor

Slide 19

Slide 19 text

19 Vectorization is a operation that extracts all fibers from a tensor and constructs a very long vector by concatenating all fibers. Vectorization

Slide 20

Slide 20 text

20 Note the order of elements of a vector transformed from a tensor Vectorization (cont’d)

Slide 21

Slide 21 text

21 A motivation of vectorization It is easier to understand some functions which does not care tensor or not l2 norm of a tensor (Frobenius norm) Inner product between two tensors

Slide 22

Slide 22 text

22 n-th mode unfolding is an operation that extracts all fivers along the n-th mode and constructs a horizontally long matrix by concatenation. Unfolding / Matricization

Slide 23

Slide 23 text

23 Meaning of “n-th mode unfolding” We should carefully understand the meaning of Ex) 3rd mode unfolding of 5th order tensor n-th mode unfolding unfold all modes except the n-th mode keep the 3rd mode unfold the 1st, 2nd, 4th, 5th modes

Slide 24

Slide 24 text

24 A motivation of n-th mode unfolding n-th mode unfolding is useful to see the linear characteristics of tensor from each mode For example, Tucker rank or multilinear rank of tensor can be defined by using matrix rank of n-th mode unfolding

Slide 25

Slide 25 text

25 Folding / Tensorization Vectorization and matricization are generalized as “unfolding” It sometimes help us to understand the linear characteristics of tensors On the other hand, “folding” can be considered, too. unfolding folding

Slide 26

Slide 26 text

26 2 An example of folding : vector to matrix Folding can be considered as a “grouping” elements Ex: triplet is extracted for each column vector, not unique How to consider which folding is better? ➔ low rankness can be one criterion 9 6 3 6 4 2 3 2 1 3 2 1 6 4 2 9 6 3 3 2 1 6 3 4 6 9 Folding 1 Folding 2 3 2 1 1 2 3 × 0 0 1 1 4 9 × 2 3 6 1 1 0 × + rank is 1 rank is 2

Slide 27

Slide 27 text

27 An interesting tensor representation of images Image data can be naturally represented by matrix, but folding have some benefits for compression (256,256) Low rank approximation #param = 2048 #param = 2048 16th order tensor (2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2) Low rank approximation Folding

Slide 28

Slide 28 text

28 Higher order tensor representations of images More beneficial tensor representation? (256,256) #param = 2048 #param = 2048 (2,2,2,2,2,2,2,2) (2,2,2,2,2,2,2,2) 16th order tensor 8th order tensor Folding 1 Folding 2 (4,4,4,4,4,4,4,4) Low rank approximation Low rank approximation

Slide 29

Slide 29 text

29 Vectorization and matricization are special cases of unfolding/folding Folding and unfolding are linear operations in general Folding / unfolding folding folding unfolding unfolding folding unfolding …

Slide 30

Slide 30 text

30 30 Calculation of Tensors Part I : Basics

Slide 31

Slide 31 text

31 Addition of two tensors ➔ element-wise addition Scaling A tensor multiplied by a scalar Addition & Scaling + = + = = = * * Image of addition Image of scaling

Slide 32

Slide 32 text

32 Elementwise multiplication (Adamard product) and division is denoted by circled asterisk and circled slack , respectively. Elementwise multiplication / division = = Image of Adamard product * = = Image of elementwise division / ○ * ／ ○

Slide 33

Slide 33 text

33 Inner product of two vectors is a summation of all elements of their elementwise (Adamard) product. It can be naturally generalized that “inner product of two vectors obtained by vectorising two tensors”. Inner product * * * * … * sum * * * * … * sum vec vec

Slide 34

Slide 34 text

34 Outer product of two vectors outputs a matrix Outer product of 3 vectors outputs a 3rd order tensor Outer product = =

Slide 35

Slide 35 text

35 Kronecker product outputs a large matrix from two small matrices. Kronecker product ○ × = * * * * * * =

Slide 36

Slide 36 text

36 Kronecker product and outer product are essentially same. Kronecker product (cont’d) ○ × = * * * * * * Vectorize each * * * * * * Vectorize reshaping =

Slide 37

Slide 37 text

37 Khatri-Rao product is defined between two matrices of which number of columns is same. Khatri-Rao product = * * * * * * =

Slide 38

Slide 38 text

38 Matrix product is defined between two matrices of which number of columns/rows is same. Matrix product is an combination of inner product and outer product Matrix product = + + = inner product w.r.t r outer product w.r.t i and j = + +

Slide 39

Slide 39 text

39 Matrix product is defined between two matrices of which number of columns/rows is same. Matrix product is an combination of inner product and outer product Khatri-Rao product is a sub-part of matrix product. Matrix product (cont’d) = + + = = + + = * * * * * * = sum fold

Slide 40

Slide 40 text

40 Tensor-Matrix product Tensor-Matrix product is defined between a tensor and a matrix of which the size of some mode is same. inner product w.r.t r outer product w.r.t i and (j,k) = = fold / unfold fold / unfold

Slide 41

Slide 41 text

41 Tensor-Matrix product (cont’d) Tensor-Matrix product is defined between a tensor and a matrix of which the size of some mode is same. It is also called as nth-mode product. Easier to understand through nth-mode unfolding. inner prod nth mode prod

Slide 42

Slide 42 text

42 Tensor-Matrix product (cont’d) all modes An Nth order tensor multiplied by N matrices from all modes inner prod

Slide 43

Slide 43 text

43 Tensor-Matrix product (cont’d) Kronecker product helps to understand it as linear transformation of tensor = vec =

Slide 44

Slide 44 text

44 We denote a single tensor as following diagram (it is very helpful) Using a node and edges, the number of edges means order of a tensor Inner product of tensors can be denoted by the link of all edges Tensor Diagram vector matrix 3rd order tensor 4th order tensor ・・・・・・・・・・・・

Slide 45

Slide 45 text

45 Matrix multiplication General tensor multiplication The tensor multiplication is essentially connecting the edges (modes) of tensors. There is some freedom in how to connect them. Tensor Diagram (cont’d) (I×J) (I×R)(R×J) R I J I J = I J K I J K I J K I J K J K I J K I J K I I J K

Slide 46

Slide 46 text

46 Summary of Part I Important concepts in tensor processing are explained Notations Fiber / Slice Unfolding (vectorization, matricization etc) Inner product Outer product Special products (Kronecker, Khatri-Rao etc) General tensor multiplication Tensor diagram I think that it is nice to geometrically imagine how the tensors are unfolded, folded, transformed, and etc. So, this time, I tried to explain them using figure. Q&A

Slide 47

Slide 47 text

47 47 Part II : Applications

Slide 48

Slide 48 text

48 Outline of Part II Tensor decompositions CP decomposition Tucker decomposition TT decomposition Others Applications Dimensionality Reduction Pattern Recognition Blind Source Separation Signal Restoration Deep Neural Networks

Slide 49

Slide 49 text

49 Tensor decomposition is a higher order generalization of matrix decomposition General definition may be “a tensor represented by multiple tensors” A specific decomposition is given by a parametric model f and its parameter tensors. f( , , ) Tensor decomposition = Matrix decomposition = core tensor factor matrix

Slide 50

Slide 50 text

50 + Canonical polyadic (CP) decomposition Canonical Polyadic (CP) decomposition/model is a natural extension of low- rank matrix decomposition in terms of Khatri-Rao product. = = ○ ○ + ○ = + + ○ ○ ○ unfold extend fold CP decomposition Low-rank matrix decomposition extend

Slide 51

Slide 51 text

51 Optimization problem is as follow Alternating Least Squares (ALS) algorithm is typical and widely used. Alternately update each factor matrix with a least squares solution while freezing the other factors An algorithm for CP decomposition Initialize:

Slide 52

Slide 52 text

52 Least square rule for update in CP Unfolding the cost function Its derivative Seeking note

Slide 53

Slide 53 text

53 Tucker decomposition Tucker decomposition/model is a natural extension of low-rank matrix decomposition in terms of Kronecker product. unfold extend fold = Tucker decomposition Low-rank matrix decomposition extend =

Slide 54

Slide 54 text

54 Optimization problem is as follow First, we unfold cost function Partial differential with g Parameter g can be eliminated An algorithm for Tucker decomposition note

Slide 55

Slide 55 text

55 Optimization problem is as follow An algorithm for Tucker decomposition (cont’d)

Slide 56

Slide 56 text

56 Optimization problem is as follow Alternating Least Squares (ALS) algorithm is typical and widely used. Alternately update each factor matrix with a least squares solution while freezing the other factors An algorithm for Tucker decomposition (cont’d) Initialize:

Slide 57

Slide 57 text

57 Least square rule for update in Tucker Unfolding the cost function Since the constraint of orthogonality, we consider Lagrange function as follow Seeking note Lagrange multiplier

Slide 58

Slide 58 text

58 Tensor-Train (TT) decomposition Tensor-Train (TT) decomposition [Oseledets, 2011] is a relatively emergent model It have a benefit of extremely light parameterization for high order huge tensors Elements of tensor can be calculated by multiplication of slices/fibers of core tensors = TT decomposition

Slide 59

Slide 59 text

59 An sequential algorithm for TT-decomposition TT can be obtained by truncated SVD and unfolding TT-SVD

Slide 60

Slide 60 text

60 ALS update for TT-decomposition ⓪ TT-SVD diagonal matrix left orthogonal right orthogonal ① SVD ③ SVD ④ SVD ⑤ SVD ⑥ SVD ⑦ SVD ② SVD ⑧ SVD ⑨ SVD ⑩ SVD

Slide 61

Slide 61 text

61 Tensor-Ring (TR) decomposition and others Tensor-Ring (TR) decomposition is a generalization of TT-decomposition Also many other networks can be considered. Now a day, tensor decompositions (including CP, Tucker etc) are called as Tensor Netoworks in general. = CP Tucker Hierarchical Tucker (HT) TT Projected Entangled-Pair States (PEPS)

Slide 62

Slide 62 text

62 62 Part II : Applications Applications of tensor decomposition

Slide 63

Slide 63 text

63 Applications Dimensionality Reduction Pattern Recognition Blind Source Separation Signal Restoration Deep Neural Networks

Slide 64

Slide 64 text

64 In general, the number of elements/parameters to describe the tensor data grows exponentially with the tensor order, N. Tensor decomposition works to reduce the number of parameters. For example, CP decomposition with I=10, R = 10 Dimensionality reduction Number of param. Original Tensor CP (R=10) N=3 1,000 3,00 N=4 10,000 4,00 N=5 100,000 5,00 N=6 1,000,000 6,00

Slide 65

Slide 65 text

65 A simple experiment: image reconstruction with smaller #parameters Dimensionality reduction (cont’d) 128 128 128 Unfold & SVD = 128 128*128 R CP decomposition + … + + ○ ○ ○ R Tucker decomposition (R,R,R)

Slide 66

Slide 66 text

66 Reconstructed MR images Dimensionality reduction (cont’d) CP Tucker SVD R=1 R=10 R=100 R=200 R=400 R=800 R=1600 R=(1,1,1) R=1 R=5 R=10 R=20 R=40 R=70 R=100 R=(10,10,10) R=(30,30,30) R=(40,40,40) R=(50,50,50) R=(80,80,80) R=(120,120,120)

Slide 67

Slide 67 text

67 SNR vs #parameters Dimensionality reduction (cont’d) SVD, R=10 Tucker, R=[50, 50, 50] CP, R=400 SVD, R=10 CP, R=400 Tucker R=[50, 50, 50]

Slide 68

Slide 68 text

68 Advances in dimensionality reduction Sparse coding of tensor [Caiafa+, 2013 (in NECO)] [Caiafa+, 2013 (in WIRES)] Tensor networks (TNs) for dimensionality reduction [Zhao+, 2016] [Cichocki+, 2017] Genetic algorithm for TN model selection [Li+, 2020 (in ICML)]

Slide 69

Slide 69 text

69 Pattern recognition Linear Discriminant Analysis (LDA) is a typical method to obtain the best linear projection which can separate different categorized patterns. … Linear projection maximize minimize

Slide 70

Slide 70 text

70 Pattern recognition (cont’d) LDA can be naturally extended by using multi-linear projection instead of linear projection. Multiway LDA … Linear projection maximize minimize … Multi-linear projection maximize minimize (N+1)-th order tensor

Slide 71

Slide 71 text

71 Pattern recognition (cont’d) LDA has a closed form solution using Eigen Value Decomposition (EVD) Multi-way LDA is obtained by alternating optimization Eigenvectors of Update by EVD for n = 1, …, N repeat endfor until converge

Slide 72

Slide 72 text

72 Human face identification [Ye+, 2005 (in NeurIPS)] Pattern recognition (cont’d) 112 92 ORL faces 10^4 … 10^4 10^4  In the case of LDA, EVD of a large matrix is necessary ☺ In the case of (Multi-way) 2DLDA, the matrix is much smaller L 112 92 L Obtain EVD 112 112 92 92 Update EVD Update EVD

Slide 73

Slide 73 text

73 Human face identification [Ye+, 2005 (in NeurIPS)] Two stage feature extraction by using 2DLDA + LDA is proposed Running time of 2DLDA was much shorter than LDA Accuracy of 2DLDA+LDA was slightly better than LDA Pattern recognition (cont’d) 112 92 10 10 vec 100 40 Category Nearest Neighbor

Slide 74

Slide 74 text

74 Gate recognition [Tao+, 2007 (in TPAMI)] Gate image is represented as a 4th order tensor by convolving Gabor functions The Gabor Tensor is discriminated by (multi-way) Generalized LDA Pattern recognition (cont’d) 5×8 Gabor functions Category vec GLDA LDA Some simple classifier

Slide 75

Slide 75 text

75 Blind source separation (BSS) is a method to estimate original sources from multiple mixture signals Blind source separation

Slide 76

Slide 76 text

76 BSS on Tensor Representation [Sawada+, Tutorial in ICASSP2018] Independent low-rank matrix analysis (ILRMA) [Kitamura+, 2016] Blind source separation (cont’d) channel time channel time frequency Tensor representation by STFT for each channel source = Demixing matrices for each frequency frequency ≈ .^2 ② Nonnegative CP decomp. of power tensor ○ ① Reconstructed tensor is parameterized by using demixing matrices

Slide 77

Slide 77 text

77 Tensorizing BSS problem using “Segmentation” [Bousse+, 2017 (in IEEE-TSP)] Segmentation is one of the way of tensorization (folding) This strategy is nice in case of the sources having low-rank structure in matrix/tensor representation Blind source separation (cont’d) tensor representation

Slide 78

Slide 78 text

78 It is a technique to recover the corruption of signals Signal restoration Noise reduction Deconvolution Inpainting Super-resolution

Slide 79

Slide 79 text

79 Signal restoration (cont’d) Tensor decomposition can be used for recovering signals which have low- rank structure = Consistent Tensor representation f( , , ) CP decomposition Tucker decomposition TT decomposition etc

Slide 80

Slide 80 text

80 Signal restoration (cont’d) Low-rank tensor completion Corrupted image is directly represented as a 3rd order tensor (Height * Width * RGB) Smooth CP decomposition [Yokota+, 2016 (in IEEE-TSP)] = ・・・ height width rgb height width rgb Consistent + + ○ ○ ○ ・・・ + + + + = smooth components

Slide 81

Slide 81 text

81 Signal restoration (cont’d) Non-local filtering for image denoising BM3D [Dabov+, 2007 (in IEEE-TIP)] GSR [Zhang+, 2014 (in IEEE-TIP)] Truncated HOSVD [Yokota+, 2017 (in IEEE-TSP)] http://www.cs.tut.fi/~foi/GCF-BM3D/ ① Extracting similar patches and represent it as a tensor ② The tensor is approximated by low-rank tensor to reduce noise component

Slide 82

Slide 82 text

82 Signal restoration (cont’d) There are many studies of low-rank based tensor data recovery Low-n-rank tensor recovery [Gandy+, 2011 (in Iverse Problems)] Low rank tensor completion [Liu+, 2012 (in IEEE-TPAMI)] Bayesian CP decomposition [Zhao+, 2015 (in IEEE-TPAMI)] t-SVD [Zhang+, 2017 (in IEEE-TSP)] Low-rank and smooth tensor recovery [Yokota+, 2017 (in CVPR)] [Yokota+, 2019 (in IEEE-Access)] Tensor ring completion [Wang+, 2017 (in ICCV)]

Slide 83

Slide 83 text

83 Currently, DNNs are playing central roles in various signal and information processing studies. Especially, convolutional neural networks (CNNs) is widely employed for its efficiency. Deep neural networks (DNNs) Recognition Segmentation Enhancement Restoration Synthesis for Audio Speech Language Image Bio-signal Input Convolution Activation Convolution Activation … Output Convolution Convolution Activation

Slide 84

Slide 84 text

84 , 2D convolutional layer in CNNs is a mapping from the 3rd order tensor space to another 3rd order tensor space Its parameter is a 4th order tensor DNNs (cont’d) height width channels height width channels , , conv layer

Slide 85

Slide 85 text

85 Tensor decomposition in DNNs Convolutional layer [Lebedev+, 2015 (in ICLR)] Fully connected layer [Novikov+, 2015 (in NeurIPS)] DNNs (cont’d) , , , Fold / Tensorize = = TT decomposition CP decomposition ○ ○ ○

Slide 86

Slide 86 text

86 Tensor decomposition in DNNs Convolutional layer [Lebedev+, 2015 (in ICLR)] Full convolution (4th order tensor) is approximated by CP model It becomes sequential convolutions with fibers Experiments: evaluated in image recognition (largely speedup with few accuracy drop) DNNs (cont’d) , , , CP decomposition ○ ○ ○ (9,9,48,128) rank

Slide 87

Slide 87 text

87 Tensor decomposition in DNNs Fully connected (FC) layer [Novikov+, 2015 (in NeurIPS)] Experiments: 1000 class ImageNet using VGG16 and VGG19 There are three FC layers DNNs (cont’d) = = TT decomposition FC1 (25088, 4096) FC2 (4096, 4096) FC3 (4096, 1000) 25088 = 2*7*8*8*7*4 4096 = 4*4*4*4*4*4 No compression ➔ Using TT TT-rank ➔ was 1, 2, or 4 Matrix decomp matrix rank ➔ was 1, 5, or 50 Efficient compression with few degradation!!

Slide 88

Slide 88 text

88 DNNs (cont’d) Other references in machine learning Compression of CNN with Tucker [Kim+, 2016 (in ICLR)] Supervised Learning with Tensor Networks [Stoudenmire+, 2016 (in NeurIPS)] Exponential Machines [Nivikov+, 2017 (in ICLR workshop)] Compact RNNs with Block-term Tensor Decomposition [Ye+, 2018 (in CVPR)] Compressing RNNs with Tensor Ring [Pan+, 2019 (in AAAI)] Tensor network decomposition of CNN kernels [Hayashi+, 2019 (in NeurIPS)] For understanding DNNs [Li+, 2019 (in ICASSP)] [Li+, 2020 (in AISTATS)]

Slide 89

Slide 89 text

89 Summary of Part II Tensor decomposition has various applications A common concept is “representing signals by high-order tensors which has low-rank structure” and do the tensor decomposition Direct representation Multi-scale Gabor representation Time-frequency representation Similar patch grouping High-order tensorization Other perspectives in tensor studies: decomposition models, optimization algorithm, model selection, rank estimation, etc signals tensors decomposition Dimensionality Reduction Pattern Recognition Blind Source Separation Signal Restoration Deep Neural Networks

Slide 90

Slide 90 text

90 Some Discussions Main flow was Tensorization + Decomposition However, in more strict sense we need to consider how to select the model for decomposition Also it is also important how to efficiently obtain the decomposition in optimization Model selection Usually, model of tensor decomposition is manually selected by researchers / users, however, the there are various candidates of the tensor decomposition nowadays. It is very difficult task to determine the model from given data. Usually it requires some professional knowledge of both application and tensor fields. Evolutionary Topology Search for Tensor Networks [Li+, 2020 (in ICML)] Optimization It is quite important to study how to efficiently solve the complicated optimization problems to perform tensor (network) decompositions Many methods for optimization: Gradient method, Newton method, alternating least square, hierarchical alternating least squares, stochastic gradient … Randomized SVD [Minster+, 2020] [Ahmadi-Asl+, 2020] TensorSketch [Malik+, 2018 (in NeurIPS)] GPU acceleration [Choi+, 2018]

Slide 91

Slide 91 text

91 91 Part III : Advances

Slide 92

Slide 92 text

92 Introduction Till now, I showed tensor operations, decompositions, and its applications A common concept was “tensorize” and “decompose” A question here is “how to tensorize the signals” Generally, some specific knowledge on the signals is necessary to design the tensors Here, I introduce the multi-way Hankelization / delay-embedding for tensors as an advanced topic on the tensor processing. signals tensors decomposition here

Slide 93

Slide 93 text

93 Outline Hankelization / delay-embedding for time-series signals Multi-way Hankelization / delay-embedding for tensors Applications Open problems and perspectives

Slide 94

Slide 94 text

94 Limitation of direct representation Revisiting signal restoration using tensor decomposition When whole slices are missing in a tensor, low-rank tensor decomposition does not work Random pixel missing can be recovered, but slice can not be recovered. Since slice missing deflates the matrix/tensor rank, its deflated component can not be recovered. Slices & random missing Recovered by low-rank decomposition

Slide 95

Slide 95 text

95 Approach to use Hankel representation Here, we introduce to use low-rank tensor decomposition not in direct tensor space, but in embedded (Hankel) tensor space. height width rgb Low-rank tensor completion Delay embedding / Hankelization High order incomplete tensor Low-rank tensor completion Inverse operation output High order completed tensor  ☺

Slide 96

Slide 96 text

96 Here, I introduce a technique from dynamical system analysis. For a time series: for We consider a state space : for It is called as “Delay-embedding (DE)” What is delay-embedding (DE)

Slide 97

Slide 97 text

97 DE is a transformation from a vector to a Hankel matrix. (i.e. Hankelization) Hankel matrix : a matrix of which anti-diagonal elements are the same Delay-embedding (DE) is Hankelization

Slide 98

Slide 98 text

98 DE can be regarded as a linear operation. DE = duplication & unfolding

Slide 99

Slide 99 text

99 Inverse of delay-embedding (DE) Can we define the inverse of DE? There is no inversion of DE since it is essentially a duplication We only consider the pseudo-inverse of DE folding unfolding duplicate … average … average duplicate …  ☺ ?

Slide 100

Slide 100 text

100 SVD of Hankel matrix Seeing singular values of Hankel matrix with various value of Hankel matrix has a low-rank structure 3d-plot of DE(x,τ=3) and its principal patterns Order of singular values

Slide 101

Slide 101 text

101 SVD of Hankel matrix (cont’d) Seeing left- and right- singular vectors of Hankel matrix Left singular vectors are very similar to discrete cosine transform (DCT) basis

Slide 102

Slide 102 text

102 Time-series recovery by using truncated SVD missing&noisy D.E. missing&noisy Hankel matrix low rank approximation Inverse D.E. recovered Completion

Slide 103

Slide 103 text

103 How to extend it to tensors? vector DE matrix DE ? tensor

Slide 104

Slide 104 text

104 We have three choices of delay-embedding for matrix data (a) Delay-embedding only column direction (b) Delay-embedding only row direction (c) Delay-embedding both directions Delay-embedding for matrix … … (a) … … … (b) (c) … … … … … … … … 3rd order tensor 4th order tensor

Slide 105

Slide 105 text

105 Considering “both” case for matrix delay-embedding Each column is Hankelized by DE It obtains time-series of Hankel matrices Further DE obtains “block Hankel matrix” Block Hankel matrix is defined as a block matrix which has Hankel structure and its all elements are also Hankel matrices Block Hankel structure … … … … … … … … … … DE DE a b c d z A B C D A B C B C D C D E Z Z

Slide 106

Slide 106 text

106 DE = linear duplication + folding Multiway delay-embedding = multilinear duplication + folding We call it as “Multiway delay-embedding transform (MDT)” MDT: A generalization of DE for tensors = I τ*(I-τ+1) I-τ+1 τ = 3rd order tensor (I1, I2, I3) 6th order tensor

Slide 107

Slide 107 text

107 DE vs Inverse DE MDT vs Inverse MDT Inverse MDT 6th order tensor † † † = 3rd order tensor (I1, I2, I3) 6th order tensor (I1, I2, I3) 3rd order tensor

Slide 108

Slide 108 text

108 MDT of Image and its high order SVD Seeing the components of Hankel tensor [Yokota+, 2018 (in APSIPA)] Very similar to DCT Bases

Slide 109

Slide 109 text

109 Application to Tensor Completion Incomplete tensor Incomplete block Hankel tensor Complete block Hankel tensor Complete tensor Low-rank Tucker decomposition in embedded space [Yokota+, 2018 (in CVPR)] ① MDT of given incomplete tensor of original size (Nth order ➔ 2Nth order) ② Tucker decomposition of incomplete block Hankel tensor ③ Inverse MDT of comlete block Hankel tensor (2Nth order ➔ Nth order) MDT Inv. MDT Tucker Decomp.

Slide 110

Slide 110 text

110 Tucker decomposition of complete tensor was explained before Here, I explain the method for Tucker decomposition of incomplete tensor Optimization problem can be given by It can be solved by auxiliary function approach and ALS algorithm Update Tucker decomposition of an incomplete tensor : Incomplete tensor : binary tensor of same size to → 0 represents missing entry → 1 represents observed entry ALS to minimize alternative optimization To convergence ↑ it is same as Tucker for complete tensor

Slide 111

Slide 111 text

111 (256,256,3) (32,225,32,225,3) (R1,R2,R3,R4,3) (256,256,3) Experimental results (32,32) → (R1,R3) (225,225) → (R2,R4) 16 32 64 4 8 Slice missing was recovered with appropriate rank decomposition ➔ But the best rank is unknown

Slide 112

Slide 112 text

112 Adaptive rank increment algorithm To be free from the rank estimation problem in tensor completion, we propose to use rank increment algorithm

Slide 113

Slide 113 text

113 ・・・・・・ Initialization is very important !! (32,225,32,225,3) Experimental results

Slide 114

Slide 114 text

114 95% 99%

Slide 115

Slide 115 text

115 Recovery of functional MRI data

Slide 116

Slide 116 text

116 116 Video reconstruction

Slide 117

Slide 117 text

117 Other works Here, I showed an original work of MDT Tucker decomposition with MDT [Yokota+, 2018 (in CVPR)] We are extending it to others TT decomposition with MDT [Sedighin+, 2020 (in IEEE-SPL)] Time series analysis with MDT [Shi+, 2020 (in AAAI)] Manifold modeling with MDT [Yokota+, 2020 (in IEEE-TNNLS)] MDT Inv. MDT + ○ ○

Slide 118

Slide 118 text

118 Relation between Hankelization and Convolution Actually, Hankelization is a part of convolution Hankelization can be applied to incomplete data ☺ Convolution (STFT, Wavelet) can not be applied to incomplete data  convolution Hankel matrix

Slide 119

Slide 119 text

119 Open problems How to decide the value of Some criteria would be helpful Computationally expensive The number of elements of the Hankel tensor dramatically increases w.r.t. Ex: we consider MDT of (128,128,128)-tensor small large DE 128 128 128 τ Size of Hankel tensor Memory requirement 8 (8, 121, 8, 121, 8, 121) 約7GB 16 (16, 113, 16, 113, 16, 113) 約47GB 32 (32, 97, 32, 97, 32, 97) 約239GB 64 (64, 65, 64, 65, 64, 65) 約575GB

Slide 120

Slide 120 text

120 Other perspectives More applications would be interesting Network data analysis, Medical imaging, Hyperspectral images, etc Hankel representation in analysis of CNNs Convolutional sparse coding [Papyan+, 2017] Neural tangent denoiser [Tachella+, 2020] Dual convex formulation of CNNs [Anonymous, 2020]

Slide 121

Slide 121 text

121 Closing tutorial I’d like to show my appreciation to organizers of APSIPA ASC 2020 and all participants!! This tutorial is an intro. of tensor methods in signal processing and machine learning. I tried to avoid cutting-edge difficult content as much as possible and make the tutorial friendly to students and beginners. Community of tensor studies is growing now, and all the people are welcome to this fields having any interest on theory, algorithm, software development, and applications!! Basics Applications Advances * * * * … * = + + ○ ○ ○ = … = + +

Slide 122

Slide 122 text

122 Acknowledgements Andrzej Cichocki (Skoltech) Qibin Zhao (RIKEN AIP) Chao Li (RIKEN AIP) Cesar F. Caiafa (CONICET) Hidekata Hontani (Nitech) Burak Erem (TrueMotion) Simon K. Warfield (Harvard)