Save 37% off PRO during our Black Friday Sale! »

JC: Tree Tensor Networks for Supervised Learning

F6baf93a0833a98bdc8184c214f4c468?s=47 Rohit Goswami
July 20, 2021
14

JC: Tree Tensor Networks for Supervised Learning

Presented at the 2021 TOL208M group

F6baf93a0833a98bdc8184c214f4c468?s=128

Rohit Goswami

July 20, 2021
Tweet

Transcript

  1. TREE TENSOR NETWORKS FOR SUPERVISED LEARNING ROHIT GOSWAMI Created: 2021-07-13

    Tue 12:27 1
  2. BRIEF INTRODUCTION 3

  3. HELLO! Find me here: Who? Rohit Goswami MInstP Doctoral Researcher,

    University of Iceland, Faculty of Physical Sciences https://rgoswami.me 4
  4. THE PAPER 6

  5. TENSORS 8

  6. From TENSORS IN TENSORFLOW import tensorflow as tf rank_3_tensor =

    tf.constant([ [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]], [[10, 11, 12, 13, 14], [15, 16, 17, 18, 19]], [[20, 21, 22, 23, 24], [25, 26, 27, 28, 29]],]) import tensorflow as tf rank_4_tensor = tf.zeros([3, 2, 4, 5]) the tensorflow tutorial 9
  7. TENSORS AS DATA STRUCTURES X ∈ RI × J× K

    with sub- tensors Any multi-dimensional dataset can be viewed as a tensor This is the ML tensor or Multi-array form TensorFlow and others Do not have contractions defined [cichockiLowRankTensorNetworks2016] 10
  8. GRAPHICAL NOTATION [cichockiLowRankTensorNetworks2016] 11

  9. TENSORS IN MECHANICS T ( n ) = n ⋅

    σ or T ( n) j = σij ni . σ = σ11 σ12 σ13 σ21 σ22 σ23 σ31 σ32 σ33 ≡ σx τxy τxz τyx σy τyz τzx τzy σz [ ] [ ] [foxIntroductionFluidMechanics2004] Image The Cauchy stress tensor from wikipedia 12
  10. IN ESSENCE Coordinate Ox in Rf [irgensTensorAnalysis2019] 13

  11. A MATHEMATICAL DEFINITION A set of quantities Tr s associated

    with a point P are said to be the components of a second order tensor if, under a change of coordinates, from a set of coordinates xs to xs′ they transform according to: T ′ r s = 𝛿 x ′ r 𝛿 xⁱ 𝛿 xj 𝛿 x ′ s Where the partial derivatives are evaluated at P. Perhaps not the most natural of definitions for ML [neuenschwanderTensorCalculusPhysics2015] 14
  12. Treatment of notation , A ROSE BY ANY OTHER NAME..

    [penroseApplicationsNegativeDimensional1971] [bridgemanHandwavingInterpretiveDance2017] from here 15
  13. From the MATRIX PRODUCT STATES / TENSOR TRAINS Bond Dimension

    MPS description at tensornetwork.org 16
  14. BACK TO THE PAPER The remaining slides unless mentioned otherwise

    are derived from [stoudenmireLearningRelevantFeatures2018] 18
  15. KEY QUESTION Are tensors useful abstractions for learning? 19

  16. TENSORS FOR LEARNING PROBLEMS 21

  17. KERNELS AND TENSORS Kernel Learning f(x) = W ⋅ 𝛷

    (x) 𝛷 is a feature map W are weights Representer Theorem W = ∑NT j= 1𝛼 ⱼ 𝛷 † (x ⱼ ) x ⱼ are training inputs With a tensor basis 22
  18. PROBLEM FORMULATION SVD 𝛷 ⱼ s ⱼ = ∑nn′ U

    ₙˢSn′ ⁿV ⱼ †n′ S is the matrix of singular values Obtains a basis set spanning the original feature map Feature Space Covariance Matrix 𝜌 ₛs′ = 1 NT ∑NT j= 1𝛷 ⱼ s′ 𝛷 ₛj† 23
  19. COARSE GRAINING 25

  20. ISOMETRIES [stoudenmireLearningRelevantFeatures2018] Maps two vector spaces to a single vector

    Such that the contraction over indices yields the identity matrix 26
  21. FIDELITY F = Tr[𝜌 ] = 1 NT ∑ ⱼ

    𝛷 ⱼ † 𝛷 ⱼ Is maximized by computing isometry U₁ Average inner product 27
  22. TRUNCATION Reduced covariance Where the truncation after diagonalization is: E

    = ∑i =D ʳ p ᵢ Tr[ 𝜌 ₁ ₂ ] < 𝜖 Rank r, eigenvalues p ᵢ 28
  23. SUPERVISED LEARNING Replace the top tensor with one which can

    be optimized Layers are fixed 29
  24. MNIST RESULTS MNIST grayscale, 60, 000 train, 10, 000 test

    𝛷 s ₙ = 1 = 1, 𝛷 s ₙ = 2(x ₙ) = x ₙ 𝜖 T1,T2 indices Train Acc. Test Acc. 10⁻³ (107, 151) 98.75 97.44 6 × 10⁻⁴ (328, 444) 99.68 98.08 30
  25. MIXING AND COMPRESSION MPS Mapping f(x) = V ⋅ x

    = ∑n= 1 ᴺ V ₙxⁿ Ws₁s₂ …sN = ∑ 𝛼 = As₁ 𝛼 ₁As₂ 𝛼 ₁ 𝛼 ₂…AsN 𝛼 N −1 Asⱼ = 1 𝛼 ⱼ 𝛼 j −1 = 1 0 0 1 Asⱼ = 2 𝛼 ⱼ 𝛼 j −1 = 0 0 Vj 1 Maps a linear classifier into an MPS 31
  26. MNIST RESULTS MNIST grayscale, 60, 000 train, 10, 000 test

    𝛷 s ₙ = 1 = 1, 𝛷 s ₙ = 2(x ₙ) = x ₙ Table 1: 𝜇 = 0.5 for the underlined row 𝜖 T1,T2 indices Train Acc. Test Acc. 10⁻³ (107, 151) 98.75 97.44 6 × 10⁻⁴ (328, 444) 99.68 98.08 4×10⁻⁴ (279, 393) 99.798 98.11 32
  27. TREE CURTAIN MODEL FOR FASHION MNIST MNIST grayscale, 60, 000

    train, 10, 000 test Linear classifier –> 83 percent test accuracy Convert to classifier vectors to MPS form Use mixing with 𝜇 = 0.9 to optimize 4 tree tensor layers 𝜖 = 2 × 10⁻⁹ Fix top MPS at bond dimension 300 Optimize with DMRG (density matrix renormalization group) 34
  28. FASHION MNIST RESULTS Model Test Acc. XGBoost 89.8 AlexNet 89.9

    Keras 2-layer 87.6 GoogLeNet 93.7 Tree-curtain 88.97 35
  29. ONE SITE DMRG 36

  30. FACTORIZE ITERATIVELY Optimize over the bond tensor, using say Davidson

    or Lanczos From the tensornetwork tutorial 37
  31. RESTORE Factorize to an MPS with truncated SVD From the

    tensornetwork tutorial 38
  32. CONCLUSIONS 40

  33. SALIENT POINTS No explicit regularization More structure than Neural Networks

    Linearly scales in number of training inputs Similar to kernel PCA but in feature space 41
  34. Pros Cons THOUGHTS Very valuable approach towards ML Can be

    used to derive more detailed results Many more tensor networks to explore [cichockiLowRankTensorNetworks2016] Significant learning curve Good resources available Complexity and time resources are not discussed Total number of parameters missing Contractions are not unique [orusPracticalIntroductionTensor2014a] 42
  35. BIBLIOGRAPHY Bridgeman & Chubb, Hand-Waving and Interpretive Dance: An Introductory

    Course on Tensor Networks, Journal of Physics A: Mathematical and Theoretical, 50(22), 223001 . . . Cichocki, Lee, Oseledets, Phan, Zhao & Mandic, Low-Rank Tensor Networks for Dimensionality Reduction and Large-Scale Optimization Problems: Perspectives and Challenges PART 1, Foundations and Trends® in Machine Learning, 9(4-5), 249-429 . . . Fox, McDonald & Pritchard, Introduction to Fluid Mechanics, Wiley . Irgens, Tensor Analysis, Springer International Publishing . Neuenschwander, Tensor Calculus for Physics: A Concise Guide, Johns Hopkins University Press . Orus, A Practical Introduction to Tensor Networks: Matrix Product States and Projected Entangled Pair States, Annals of Physics, 349, 117-158 . . . Penrose, Applications of Negative Dimensional Tensors, Combinatorial mathematics and its applications, 1, 221-244 . . Stoudenmire, Learning Relevant Features of Data with Multi-Scale Tensor Networks, Quantum Science and Technology, 3(3), 034003 . . . [bridgemanHandwavingInterpretiveDance2017] link doi [cichockiLowRankTensorNetworks2016] link doi [foxIntroductionFluidMechanics2004] [irgensTensorAnalysis2019] [neuenschwanderTensorCalculusPhysics2015] [orusPracticalIntroductionTensor2014a] link doi [penroseApplicationsNegativeDimensional1971] link [stoudenmireLearningRelevantFeatures2018] link doi 43
  36. THANKS! 44