Upgrade to Pro — share decks privately, control downloads, hide ads and more …

JC: Tree Tensor Networks for Supervised Learning

Rohit Goswami
July 20, 2021
78

JC: Tree Tensor Networks for Supervised Learning

Presented at the 2021 TOL208M group

Rohit Goswami

July 20, 2021
Tweet

Transcript

  1. HELLO! Find me here: Who? Rohit Goswami MInstP Doctoral Researcher,

    University of Iceland, Faculty of Physical Sciences https://rgoswami.me 4
  2. From TENSORS IN TENSORFLOW import tensorflow as tf rank_3_tensor =

    tf.constant([ [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]], [[10, 11, 12, 13, 14], [15, 16, 17, 18, 19]], [[20, 21, 22, 23, 24], [25, 26, 27, 28, 29]],]) import tensorflow as tf rank_4_tensor = tf.zeros([3, 2, 4, 5]) the tensorflow tutorial 9
  3. TENSORS AS DATA STRUCTURES X ∈ RI × J× K

    with sub- tensors Any multi-dimensional dataset can be viewed as a tensor This is the ML tensor or Multi-array form TensorFlow and others Do not have contractions defined [cichockiLowRankTensorNetworks2016] 10
  4. TENSORS IN MECHANICS T ( n ) = n ⋅

    σ or T ( n) j = σij ni . σ = σ11 σ12 σ13 σ21 σ22 σ23 σ31 σ32 σ33 ≡ σx τxy τxz τyx σy τyz τzx τzy σz [ ] [ ] [foxIntroductionFluidMechanics2004] Image The Cauchy stress tensor from wikipedia 12
  5. A MATHEMATICAL DEFINITION A set of quantities Tr s associated

    with a point P are said to be the components of a second order tensor if, under a change of coordinates, from a set of coordinates xs to xs′ they transform according to: T ′ r s = 𝛿 x ′ r 𝛿 xⁱ 𝛿 xj 𝛿 x ′ s Where the partial derivatives are evaluated at P. Perhaps not the most natural of definitions for ML [neuenschwanderTensorCalculusPhysics2015] 14
  6. Treatment of notation , A ROSE BY ANY OTHER NAME..

    [penroseApplicationsNegativeDimensional1971] [bridgemanHandwavingInterpretiveDance2017] from here 15
  7. From the MATRIX PRODUCT STATES / TENSOR TRAINS Bond Dimension

    MPS description at tensornetwork.org 16
  8. BACK TO THE PAPER The remaining slides unless mentioned otherwise

    are derived from [stoudenmireLearningRelevantFeatures2018] 18
  9. KERNELS AND TENSORS Kernel Learning f(x) = W ⋅ 𝛷

    (x) 𝛷 is a feature map W are weights Representer Theorem W = ∑NT j= 1𝛼 ⱼ 𝛷 † (x ⱼ ) x ⱼ are training inputs With a tensor basis 22
  10. PROBLEM FORMULATION SVD 𝛷 ⱼ s ⱼ = ∑nn′ U

    ₙˢSn′ ⁿV ⱼ †n′ S is the matrix of singular values Obtains a basis set spanning the original feature map Feature Space Covariance Matrix 𝜌 ₛs′ = 1 NT ∑NT j= 1𝛷 ⱼ s′ 𝛷 ₛj† 23
  11. ISOMETRIES [stoudenmireLearningRelevantFeatures2018] Maps two vector spaces to a single vector

    Such that the contraction over indices yields the identity matrix 26
  12. FIDELITY F = Tr[𝜌 ] = 1 NT ∑ ⱼ

    𝛷 ⱼ † 𝛷 ⱼ Is maximized by computing isometry U₁ Average inner product 27
  13. TRUNCATION Reduced covariance Where the truncation after diagonalization is: E

    = ∑i =D ʳ p ᵢ Tr[ 𝜌 ₁ ₂ ] < 𝜖 Rank r, eigenvalues p ᵢ 28
  14. MNIST RESULTS MNIST grayscale, 60, 000 train, 10, 000 test

    𝛷 s ₙ = 1 = 1, 𝛷 s ₙ = 2(x ₙ) = x ₙ 𝜖 T1,T2 indices Train Acc. Test Acc. 10⁻³ (107, 151) 98.75 97.44 6 × 10⁻⁴ (328, 444) 99.68 98.08 30
  15. MIXING AND COMPRESSION MPS Mapping f(x) = V ⋅ x

    = ∑n= 1 ᴺ V ₙxⁿ Ws₁s₂ …sN = ∑ 𝛼 = As₁ 𝛼 ₁As₂ 𝛼 ₁ 𝛼 ₂…AsN 𝛼 N −1 Asⱼ = 1 𝛼 ⱼ 𝛼 j −1 = 1 0 0 1 Asⱼ = 2 𝛼 ⱼ 𝛼 j −1 = 0 0 Vj 1 Maps a linear classifier into an MPS 31
  16. MNIST RESULTS MNIST grayscale, 60, 000 train, 10, 000 test

    𝛷 s ₙ = 1 = 1, 𝛷 s ₙ = 2(x ₙ) = x ₙ Table 1: 𝜇 = 0.5 for the underlined row 𝜖 T1,T2 indices Train Acc. Test Acc. 10⁻³ (107, 151) 98.75 97.44 6 × 10⁻⁴ (328, 444) 99.68 98.08 4×10⁻⁴ (279, 393) 99.798 98.11 32
  17. TREE CURTAIN MODEL FOR FASHION MNIST MNIST grayscale, 60, 000

    train, 10, 000 test Linear classifier –> 83 percent test accuracy Convert to classifier vectors to MPS form Use mixing with 𝜇 = 0.9 to optimize 4 tree tensor layers 𝜖 = 2 × 10⁻⁹ Fix top MPS at bond dimension 300 Optimize with DMRG (density matrix renormalization group) 34
  18. FASHION MNIST RESULTS Model Test Acc. XGBoost 89.8 AlexNet 89.9

    Keras 2-layer 87.6 GoogLeNet 93.7 Tree-curtain 88.97 35
  19. FACTORIZE ITERATIVELY Optimize over the bond tensor, using say Davidson

    or Lanczos From the tensornetwork tutorial 37
  20. SALIENT POINTS No explicit regularization More structure than Neural Networks

    Linearly scales in number of training inputs Similar to kernel PCA but in feature space 41
  21. Pros Cons THOUGHTS Very valuable approach towards ML Can be

    used to derive more detailed results Many more tensor networks to explore [cichockiLowRankTensorNetworks2016] Significant learning curve Good resources available Complexity and time resources are not discussed Total number of parameters missing Contractions are not unique [orusPracticalIntroductionTensor2014a] 42
  22. BIBLIOGRAPHY Bridgeman & Chubb, Hand-Waving and Interpretive Dance: An Introductory

    Course on Tensor Networks, Journal of Physics A: Mathematical and Theoretical, 50(22), 223001 . . . Cichocki, Lee, Oseledets, Phan, Zhao & Mandic, Low-Rank Tensor Networks for Dimensionality Reduction and Large-Scale Optimization Problems: Perspectives and Challenges PART 1, Foundations and Trends® in Machine Learning, 9(4-5), 249-429 . . . Fox, McDonald & Pritchard, Introduction to Fluid Mechanics, Wiley . Irgens, Tensor Analysis, Springer International Publishing . Neuenschwander, Tensor Calculus for Physics: A Concise Guide, Johns Hopkins University Press . Orus, A Practical Introduction to Tensor Networks: Matrix Product States and Projected Entangled Pair States, Annals of Physics, 349, 117-158 . . . Penrose, Applications of Negative Dimensional Tensors, Combinatorial mathematics and its applications, 1, 221-244 . . Stoudenmire, Learning Relevant Features of Data with Multi-Scale Tensor Networks, Quantum Science and Technology, 3(3), 034003 . . . [bridgemanHandwavingInterpretiveDance2017] link doi [cichockiLowRankTensorNetworks2016] link doi [foxIntroductionFluidMechanics2004] [irgensTensorAnalysis2019] [neuenschwanderTensorCalculusPhysics2015] [orusPracticalIntroductionTensor2014a] link doi [penroseApplicationsNegativeDimensional1971] link [stoudenmireLearningRelevantFeatures2018] link doi 43