14

# JC: Tree Tensor Networks for Supervised Learning

Presented at the 2021 TOL208M group

July 20, 2021

## Transcript

Tue 12:27 1

3. ### HELLO! Find me here: Who? Rohit Goswami MInstP Doctoral Researcher,

University of Iceland, Faculty of Physical Sciences https://rgoswami.me 4

6. ### From TENSORS IN TENSORFLOW import tensorflow as tf rank_3_tensor =

tf.constant([ [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]], [[10, 11, 12, 13, 14], [15, 16, 17, 18, 19]], [[20, 21, 22, 23, 24], [25, 26, 27, 28, 29]],]) import tensorflow as tf rank_4_tensor = tf.zeros([3, 2, 4, 5]) the tensorflow tutorial 9
7. ### TENSORS AS DATA STRUCTURES X ∈ RI × J× K

with sub- tensors Any multi-dimensional dataset can be viewed as a tensor This is the ML tensor or Multi-array form TensorFlow and others Do not have contractions defined [cichockiLowRankTensorNetworks2016] 10

9. ### TENSORS IN MECHANICS T ( n ) = n ⋅

σ or T ( n) j = σij ni . σ = σ11 σ12 σ13 σ21 σ22 σ23 σ31 σ32 σ33 ≡ σx τxy τxz τyx σy τyz τzx τzy σz [ ] [ ] [foxIntroductionFluidMechanics2004] Image The Cauchy stress tensor from wikipedia 12

11. ### A MATHEMATICAL DEFINITION A set of quantities Tr s associated

with a point P are said to be the components of a second order tensor if, under a change of coordinates, from a set of coordinates xs to xs′ they transform according to: T ′ r s = 𝛿 x ′ r 𝛿 xⁱ 𝛿 xj 𝛿 x ′ s Where the partial derivatives are evaluated at P. Perhaps not the most natural of definitions for ML [neuenschwanderTensorCalculusPhysics2015] 14
12. ### Treatment of notation , A ROSE BY ANY OTHER NAME..

[penroseApplicationsNegativeDimensional1971] [bridgemanHandwavingInterpretiveDance2017] from here 15
13. ### From the MATRIX PRODUCT STATES / TENSOR TRAINS Bond Dimension

MPS description at tensornetwork.org 16
14. ### BACK TO THE PAPER The remaining slides unless mentioned otherwise

are derived from [stoudenmireLearningRelevantFeatures2018] 18

17. ### KERNELS AND TENSORS Kernel Learning f(x) = W ⋅ 𝛷

(x) 𝛷 is a feature map W are weights Representer Theorem W = ∑NT j= 1𝛼 ⱼ 𝛷 † (x ⱼ ) x ⱼ are training inputs With a tensor basis 22
18. ### PROBLEM FORMULATION SVD 𝛷 ⱼ s ⱼ = ∑nn′ U

ₙˢSn′ ⁿV ⱼ †n′ S is the matrix of singular values Obtains a basis set spanning the original feature map Feature Space Covariance Matrix 𝜌 ₛs′ = 1 NT ∑NT j= 1𝛷 ⱼ s′ 𝛷 ₛj† 23

20. ### ISOMETRIES [stoudenmireLearningRelevantFeatures2018] Maps two vector spaces to a single vector

Such that the contraction over indices yields the identity matrix 26
21. ### FIDELITY F = Tr[𝜌 ] = 1 NT ∑ ⱼ

𝛷 ⱼ † 𝛷 ⱼ Is maximized by computing isometry U₁ Average inner product 27
22. ### TRUNCATION Reduced covariance Where the truncation after diagonalization is: E

= ∑i =D ʳ p ᵢ Tr[ 𝜌 ₁ ₂ ] < 𝜖 Rank r, eigenvalues p ᵢ 28
23. ### SUPERVISED LEARNING Replace the top tensor with one which can

be optimized Layers are fixed 29
24. ### MNIST RESULTS MNIST grayscale, 60, 000 train, 10, 000 test

𝛷 s ₙ = 1 = 1, 𝛷 s ₙ = 2(x ₙ) = x ₙ 𝜖 T1,T2 indices Train Acc. Test Acc. 10⁻³ (107, 151) 98.75 97.44 6 × 10⁻⁴ (328, 444) 99.68 98.08 30
25. ### MIXING AND COMPRESSION MPS Mapping f(x) = V ⋅ x

= ∑n= 1 ᴺ V ₙxⁿ Ws₁s₂ …sN = ∑ 𝛼 = As₁ 𝛼 ₁As₂ 𝛼 ₁ 𝛼 ₂…AsN 𝛼 N −1 Asⱼ = 1 𝛼 ⱼ 𝛼 j −1 = 1 0 0 1 Asⱼ = 2 𝛼 ⱼ 𝛼 j −1 = 0 0 Vj 1 Maps a linear classifier into an MPS 31
26. ### MNIST RESULTS MNIST grayscale, 60, 000 train, 10, 000 test

𝛷 s ₙ = 1 = 1, 𝛷 s ₙ = 2(x ₙ) = x ₙ Table 1: 𝜇 = 0.5 for the underlined row 𝜖 T1,T2 indices Train Acc. Test Acc. 10⁻³ (107, 151) 98.75 97.44 6 × 10⁻⁴ (328, 444) 99.68 98.08 4×10⁻⁴ (279, 393) 99.798 98.11 32
27. ### TREE CURTAIN MODEL FOR FASHION MNIST MNIST grayscale, 60, 000

train, 10, 000 test Linear classifier –> 83 percent test accuracy Convert to classifier vectors to MPS form Use mixing with 𝜇 = 0.9 to optimize 4 tree tensor layers 𝜖 = 2 × 10⁻⁹ Fix top MPS at bond dimension 300 Optimize with DMRG (density matrix renormalization group) 34
28. ### FASHION MNIST RESULTS Model Test Acc. XGBoost 89.8 AlexNet 89.9

Keras 2-layer 87.6 GoogLeNet 93.7 Tree-curtain 88.97 35

30. ### FACTORIZE ITERATIVELY Optimize over the bond tensor, using say Davidson

or Lanczos From the tensornetwork tutorial 37
31. ### RESTORE Factorize to an MPS with truncated SVD From the

tensornetwork tutorial 38

33. ### SALIENT POINTS No explicit regularization More structure than Neural Networks

Linearly scales in number of training inputs Similar to kernel PCA but in feature space 41
34. ### Pros Cons THOUGHTS Very valuable approach towards ML Can be

used to derive more detailed results Many more tensor networks to explore [cichockiLowRankTensorNetworks2016] Significant learning curve Good resources available Complexity and time resources are not discussed Total number of parameters missing Contractions are not unique [orusPracticalIntroductionTensor2014a] 42
35. ### BIBLIOGRAPHY Bridgeman & Chubb, Hand-Waving and Interpretive Dance: An Introductory

Course on Tensor Networks, Journal of Physics A: Mathematical and Theoretical, 50(22), 223001 . . . Cichocki, Lee, Oseledets, Phan, Zhao & Mandic, Low-Rank Tensor Networks for Dimensionality Reduction and Large-Scale Optimization Problems: Perspectives and Challenges PART 1, Foundations and Trends® in Machine Learning, 9(4-5), 249-429 . . . Fox, McDonald & Pritchard, Introduction to Fluid Mechanics, Wiley . Irgens, Tensor Analysis, Springer International Publishing . Neuenschwander, Tensor Calculus for Physics: A Concise Guide, Johns Hopkins University Press . Orus, A Practical Introduction to Tensor Networks: Matrix Product States and Projected Entangled Pair States, Annals of Physics, 349, 117-158 . . . Penrose, Applications of Negative Dimensional Tensors, Combinatorial mathematics and its applications, 1, 221-244 . . Stoudenmire, Learning Relevant Features of Data with Multi-Scale Tensor Networks, Quantum Science and Technology, 3(3), 034003 . . . [bridgemanHandwavingInterpretiveDance2017] link doi [cichockiLowRankTensorNetworks2016] link doi [foxIntroductionFluidMechanics2004] [irgensTensorAnalysis2019] [neuenschwanderTensorCalculusPhysics2015] [orusPracticalIntroductionTensor2014a] link doi [penroseApplicationsNegativeDimensional1971] link [stoudenmireLearningRelevantFeatures2018] link doi 43