JC: Tree Tensor Networks for Supervised Learning

Slide 1

Slide 1 text

TREE TENSOR NETWORKS FOR SUPERVISED LEARNING ROHIT GOSWAMI Created: 2021-07-13 Tue 12:27 1

Slide 2

Slide 2 text

BRIEF INTRODUCTION 3

Slide 3

Slide 3 text

HELLO! Find me here: Who? Rohit Goswami MInstP Doctoral Researcher, University of Iceland, Faculty of Physical Sciences https://rgoswami.me 4

Slide 4

Slide 4 text

THE PAPER 6

Slide 5

Slide 5 text

TENSORS 8

Slide 6

Slide 6 text

From TENSORS IN TENSORFLOW import tensorflow as tf rank_3_tensor = tf.constant([ [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]], [[10, 11, 12, 13, 14], [15, 16, 17, 18, 19]], [[20, 21, 22, 23, 24], [25, 26, 27, 28, 29]],]) import tensorflow as tf rank_4_tensor = tf.zeros([3, 2, 4, 5]) the tensorflow tutorial 9

Slide 7

Slide 7 text

TENSORS AS DATA STRUCTURES X ∈ RI × J× K with sub- tensors Any multi-dimensional dataset can be viewed as a tensor This is the ML tensor or Multi-array form TensorFlow and others Do not have contractions defined [cichockiLowRankTensorNetworks2016] 10

Slide 8

Slide 8 text

GRAPHICAL NOTATION [cichockiLowRankTensorNetworks2016] 11

Slide 9

Slide 9 text

TENSORS IN MECHANICS T ( n ) = n ⋅ σ or T ( n) j = σij ni . σ = σ11 σ12 σ13 σ21 σ22 σ23 σ31 σ32 σ33 ≡ σx τxy τxz τyx σy τyz τzx τzy σz [ ] [ ] [foxIntroductionFluidMechanics2004] Image The Cauchy stress tensor from wikipedia 12

Slide 10

Slide 10 text

IN ESSENCE Coordinate Ox in Rf [irgensTensorAnalysis2019] 13

Slide 11

Slide 11 text

A MATHEMATICAL DEFINITION A set of quantities Tr s associated with a point P are said to be the components of a second order tensor if, under a change of coordinates, from a set of coordinates xs to xs′ they transform according to: T ′ r s = 𝛿 x ′ r 𝛿 xⁱ 𝛿 xj 𝛿 x ′ s Where the partial derivatives are evaluated at P. Perhaps not the most natural of definitions for ML [neuenschwanderTensorCalculusPhysics2015] 14

Slide 12

Slide 12 text

Treatment of notation , A ROSE BY ANY OTHER NAME.. [penroseApplicationsNegativeDimensional1971] [bridgemanHandwavingInterpretiveDance2017] from here 15

Slide 13

Slide 13 text

From the MATRIX PRODUCT STATES / TENSOR TRAINS Bond Dimension MPS description at tensornetwork.org 16

Slide 14

Slide 14 text

BACK TO THE PAPER The remaining slides unless mentioned otherwise are derived from [stoudenmireLearningRelevantFeatures2018] 18

Slide 15

Slide 15 text

KEY QUESTION Are tensors useful abstractions for learning? 19

Slide 16

Slide 16 text

TENSORS FOR LEARNING PROBLEMS 21

Slide 17

Slide 17 text

KERNELS AND TENSORS Kernel Learning f(x) = W ⋅ 𝛷 (x) 𝛷 is a feature map W are weights Representer Theorem W = ∑NT j= 1𝛼 ⱼ 𝛷 † (x ⱼ ) x ⱼ are training inputs With a tensor basis 22

Slide 18

Slide 18 text

PROBLEM FORMULATION SVD 𝛷 ⱼ s ⱼ = ∑nn′ U ₙˢSn′ ⁿV ⱼ †n′ S is the matrix of singular values Obtains a basis set spanning the original feature map Feature Space Covariance Matrix 𝜌 ₛs′ = 1 NT ∑NT j= 1𝛷 ⱼ s′ 𝛷 ₛj† 23

Slide 19

Slide 19 text

COARSE GRAINING 25

Slide 20

Slide 20 text

ISOMETRIES [stoudenmireLearningRelevantFeatures2018] Maps two vector spaces to a single vector Such that the contraction over indices yields the identity matrix 26

Slide 21

Slide 21 text

FIDELITY F = Tr[𝜌 ] = 1 NT ∑ ⱼ 𝛷 ⱼ † 𝛷 ⱼ Is maximized by computing isometry U₁ Average inner product 27

Slide 22

Slide 22 text

TRUNCATION Reduced covariance Where the truncation after diagonalization is: E = ∑i =D ʳ p ᵢ Tr[ 𝜌 ₁ ₂ ] < 𝜖 Rank r, eigenvalues p ᵢ 28

Slide 23

Slide 23 text

SUPERVISED LEARNING Replace the top tensor with one which can be optimized Layers are fixed 29

Slide 24

Slide 24 text

MNIST RESULTS MNIST grayscale, 60, 000 train, 10, 000 test 𝛷 s ₙ = 1 = 1, 𝛷 s ₙ = 2(x ₙ) = x ₙ 𝜖 T1,T2 indices Train Acc. Test Acc. 10⁻³ (107, 151) 98.75 97.44 6 × 10⁻⁴ (328, 444) 99.68 98.08 30

Slide 25

Slide 25 text

MIXING AND COMPRESSION MPS Mapping f(x) = V ⋅ x = ∑n= 1 ᴺ V ₙxⁿ Ws₁s₂ …sN = ∑ 𝛼 = As₁ 𝛼 ₁As₂ 𝛼 ₁ 𝛼 ₂…AsN 𝛼 N −1 Asⱼ = 1 𝛼 ⱼ 𝛼 j −1 = 1 0 0 1 Asⱼ = 2 𝛼 ⱼ 𝛼 j −1 = 0 0 Vj 1 Maps a linear classifier into an MPS 31

Slide 26

Slide 26 text

MNIST RESULTS MNIST grayscale, 60, 000 train, 10, 000 test 𝛷 s ₙ = 1 = 1, 𝛷 s ₙ = 2(x ₙ) = x ₙ Table 1: 𝜇 = 0.5 for the underlined row 𝜖 T1,T2 indices Train Acc. Test Acc. 10⁻³ (107, 151) 98.75 97.44 6 × 10⁻⁴ (328, 444) 99.68 98.08 4×10⁻⁴ (279, 393) 99.798 98.11 32

Slide 27

Slide 27 text

TREE CURTAIN MODEL FOR FASHION MNIST MNIST grayscale, 60, 000 train, 10, 000 test Linear classifier –> 83 percent test accuracy Convert to classifier vectors to MPS form Use mixing with 𝜇 = 0.9 to optimize 4 tree tensor layers 𝜖 = 2 × 10⁻⁹ Fix top MPS at bond dimension 300 Optimize with DMRG (density matrix renormalization group) 34

Slide 28

Slide 28 text

FASHION MNIST RESULTS Model Test Acc. XGBoost 89.8 AlexNet 89.9 Keras 2-layer 87.6 GoogLeNet 93.7 Tree-curtain 88.97 35

Slide 29

Slide 29 text

ONE SITE DMRG 36

Slide 30

Slide 30 text

FACTORIZE ITERATIVELY Optimize over the bond tensor, using say Davidson or Lanczos From the tensornetwork tutorial 37

Slide 31

Slide 31 text

RESTORE Factorize to an MPS with truncated SVD From the tensornetwork tutorial 38

Slide 32

Slide 32 text

CONCLUSIONS 40

Slide 33

Slide 33 text

SALIENT POINTS No explicit regularization More structure than Neural Networks Linearly scales in number of training inputs Similar to kernel PCA but in feature space 41

Slide 34

Slide 34 text

Pros Cons THOUGHTS Very valuable approach towards ML Can be used to derive more detailed results Many more tensor networks to explore [cichockiLowRankTensorNetworks2016] Significant learning curve Good resources available Complexity and time resources are not discussed Total number of parameters missing Contractions are not unique [orusPracticalIntroductionTensor2014a] 42

Slide 35

Slide 35 text

BIBLIOGRAPHY Bridgeman & Chubb, Hand-Waving and Interpretive Dance: An Introductory Course on Tensor Networks, Journal of Physics A: Mathematical and Theoretical, 50(22), 223001 . . . Cichocki, Lee, Oseledets, Phan, Zhao & Mandic, Low-Rank Tensor Networks for Dimensionality Reduction and Large-Scale Optimization Problems: Perspectives and Challenges PART 1, Foundations and Trends® in Machine Learning, 9(4-5), 249-429 . . . Fox, McDonald & Pritchard, Introduction to Fluid Mechanics, Wiley . Irgens, Tensor Analysis, Springer International Publishing . Neuenschwander, Tensor Calculus for Physics: A Concise Guide, Johns Hopkins University Press . Orus, A Practical Introduction to Tensor Networks: Matrix Product States and Projected Entangled Pair States, Annals of Physics, 349, 117-158 . . . Penrose, Applications of Negative Dimensional Tensors, Combinatorial mathematics and its applications, 1, 221-244 . . Stoudenmire, Learning Relevant Features of Data with Multi-Scale Tensor Networks, Quantum Science and Technology, 3(3), 034003 . . . [bridgemanHandwavingInterpretiveDance2017] link doi [cichockiLowRankTensorNetworks2016] link doi [foxIntroductionFluidMechanics2004] [irgensTensorAnalysis2019] [neuenschwanderTensorCalculusPhysics2015] [orusPracticalIntroductionTensor2014a] link doi [penroseApplicationsNegativeDimensional1971] link [stoudenmireLearningRelevantFeatures2018] link doi 43

Slide 36

Slide 36 text

THANKS! 44