JC: Tree Tensor Networks for Supervised Learning

TREE TENSOR NETWORKS FOR SUPERVISED LEARNING ROHIT GOSWAMI Created: 2021-07-13
Tue 12:27 1

BRIEF INTRODUCTION 3

HELLO! Find me here: Who? Rohit Goswami MInstP Doctoral Researcher,
University of Iceland, Faculty of Physical Sciences https://rgoswami.me 4

THE PAPER 6

TENSORS 8

From TENSORS IN TENSORFLOW import tensorflow as tf rank_3_tensor =
tf.constant([ [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]], [[10, 11, 12, 13, 14], [15, 16, 17, 18, 19]], [[20, 21, 22, 23, 24], [25, 26, 27, 28, 29]],]) import tensorflow as tf rank_4_tensor = tf.zeros([3, 2, 4, 5]) the tensorflow tutorial 9

TENSORS AS DATA STRUCTURES X ∈ RI × J× K
with sub- tensors Any multi-dimensional dataset can be viewed as a tensor This is the ML tensor or Multi-array form TensorFlow and others Do not have contractions defined [cichockiLowRankTensorNetworks2016] 10

GRAPHICAL NOTATION [cichockiLowRankTensorNetworks2016] 11

TENSORS IN MECHANICS T ( n ) = n ⋅
σ or T ( n) j = σij ni . σ = σ11 σ12 σ13 σ21 σ22 σ23 σ31 σ32 σ33 ≡ σx τxy τxz τyx σy τyz τzx τzy σz [ ] [ ] [foxIntroductionFluidMechanics2004] Image The Cauchy stress tensor from wikipedia 12

IN ESSENCE Coordinate Ox in Rf [irgensTensorAnalysis2019] 13

A MATHEMATICAL DEFINITION A set of quantities Tr s associated
with a point P are said to be the components of a second order tensor if, under a change of coordinates, from a set of coordinates xs to xs′ they transform according to: T ′ r s = 𝛿 x ′ r 𝛿 xⁱ 𝛿 xj 𝛿 x ′ s Where the partial derivatives are evaluated at P. Perhaps not the most natural of definitions for ML [neuenschwanderTensorCalculusPhysics2015] 14

Treatment of notation , A ROSE BY ANY OTHER NAME..
[penroseApplicationsNegativeDimensional1971] [bridgemanHandwavingInterpretiveDance2017] from here 15

From the MATRIX PRODUCT STATES / TENSOR TRAINS Bond Dimension
MPS description at tensornetwork.org 16

BACK TO THE PAPER The remaining slides unless mentioned otherwise
are derived from [stoudenmireLearningRelevantFeatures2018] 18

KEY QUESTION Are tensors useful abstractions for learning? 19

TENSORS FOR LEARNING PROBLEMS 21

KERNELS AND TENSORS Kernel Learning f(x) = W ⋅ 𝛷
(x) 𝛷 is a feature map W are weights Representer Theorem W = ∑NT j= 1𝛼 ⱼ 𝛷 † (x ⱼ ) x ⱼ are training inputs With a tensor basis 22

PROBLEM FORMULATION SVD 𝛷 ⱼ s ⱼ = ∑nn′ U
ₙˢSn′ ⁿV ⱼ †n′ S is the matrix of singular values Obtains a basis set spanning the original feature map Feature Space Covariance Matrix 𝜌 ₛs′ = 1 NT ∑NT j= 1𝛷 ⱼ s′ 𝛷 ₛj† 23

COARSE GRAINING 25

ISOMETRIES [stoudenmireLearningRelevantFeatures2018] Maps two vector spaces to a single vector
Such that the contraction over indices yields the identity matrix 26

FIDELITY F = Tr[𝜌 ] = 1 NT ∑ ⱼ
𝛷 ⱼ † 𝛷 ⱼ Is maximized by computing isometry U₁ Average inner product 27

TRUNCATION Reduced covariance Where the truncation after diagonalization is: E
= ∑i =D ʳ p ᵢ Tr[ 𝜌 ₁ ₂ ] < 𝜖 Rank r, eigenvalues p ᵢ 28

SUPERVISED LEARNING Replace the top tensor with one which can
be optimized Layers are fixed 29

MNIST RESULTS MNIST grayscale, 60, 000 train, 10, 000 test
𝛷 s ₙ = 1 = 1, 𝛷 s ₙ = 2(x ₙ) = x ₙ 𝜖 T1,T2 indices Train Acc. Test Acc. 10⁻³ (107, 151) 98.75 97.44 6 × 10⁻⁴ (328, 444) 99.68 98.08 30

MIXING AND COMPRESSION MPS Mapping f(x) = V ⋅ x
= ∑n= 1 ᴺ V ₙxⁿ Ws₁s₂ …sN = ∑ 𝛼 = As₁ 𝛼 ₁As₂ 𝛼 ₁ 𝛼 ₂…AsN 𝛼 N −1 Asⱼ = 1 𝛼 ⱼ 𝛼 j −1 = 1 0 0 1 Asⱼ = 2 𝛼 ⱼ 𝛼 j −1 = 0 0 Vj 1 Maps a linear classifier into an MPS 31

MNIST RESULTS MNIST grayscale, 60, 000 train, 10, 000 test
𝛷 s ₙ = 1 = 1, 𝛷 s ₙ = 2(x ₙ) = x ₙ Table 1: 𝜇 = 0.5 for the underlined row 𝜖 T1,T2 indices Train Acc. Test Acc. 10⁻³ (107, 151) 98.75 97.44 6 × 10⁻⁴ (328, 444) 99.68 98.08 4×10⁻⁴ (279, 393) 99.798 98.11 32

TREE CURTAIN MODEL FOR FASHION MNIST MNIST grayscale, 60, 000
train, 10, 000 test Linear classifier –> 83 percent test accuracy Convert to classifier vectors to MPS form Use mixing with 𝜇 = 0.9 to optimize 4 tree tensor layers 𝜖 = 2 × 10⁻⁹ Fix top MPS at bond dimension 300 Optimize with DMRG (density matrix renormalization group) 34

FASHION MNIST RESULTS Model Test Acc. XGBoost 89.8 AlexNet 89.9
Keras 2-layer 87.6 GoogLeNet 93.7 Tree-curtain 88.97 35

ONE SITE DMRG 36

FACTORIZE ITERATIVELY Optimize over the bond tensor, using say Davidson
or Lanczos From the tensornetwork tutorial 37

RESTORE Factorize to an MPS with truncated SVD From the
tensornetwork tutorial 38

CONCLUSIONS 40

SALIENT POINTS No explicit regularization More structure than Neural Networks
Linearly scales in number of training inputs Similar to kernel PCA but in feature space 41

Pros Cons THOUGHTS Very valuable approach towards ML Can be
used to derive more detailed results Many more tensor networks to explore [cichockiLowRankTensorNetworks2016] Significant learning curve Good resources available Complexity and time resources are not discussed Total number of parameters missing Contractions are not unique [orusPracticalIntroductionTensor2014a] 42

BIBLIOGRAPHY Bridgeman & Chubb, Hand-Waving and Interpretive Dance: An Introductory
Course on Tensor Networks, Journal of Physics A: Mathematical and Theoretical, 50(22), 223001 . . . Cichocki, Lee, Oseledets, Phan, Zhao & Mandic, Low-Rank Tensor Networks for Dimensionality Reduction and Large-Scale Optimization Problems: Perspectives and Challenges PART 1, Foundations and Trends® in Machine Learning, 9(4-5), 249-429 . . . Fox, McDonald & Pritchard, Introduction to Fluid Mechanics, Wiley . Irgens, Tensor Analysis, Springer International Publishing . Neuenschwander, Tensor Calculus for Physics: A Concise Guide, Johns Hopkins University Press . Orus, A Practical Introduction to Tensor Networks: Matrix Product States and Projected Entangled Pair States, Annals of Physics, 349, 117-158 . . . Penrose, Applications of Negative Dimensional Tensors, Combinatorial mathematics and its applications, 1, 221-244 . . Stoudenmire, Learning Relevant Features of Data with Multi-Scale Tensor Networks, Quantum Science and Technology, 3(3), 034003 . . . [bridgemanHandwavingInterpretiveDance2017] link doi [cichockiLowRankTensorNetworks2016] link doi [foxIntroductionFluidMechanics2004] [irgensTensorAnalysis2019] [neuenschwanderTensorCalculusPhysics2015] [orusPracticalIntroductionTensor2014a] link doi [penroseApplicationsNegativeDimensional1971] link [stoudenmireLearningRelevantFeatures2018] link doi 43

THANKS! 44

JC: Tree Tensor Networks for Supervised Learning

JC: Tree Tensor Networks for Supervised Learning

More Decks by Rohit Goswami

Featured

Transcript