Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tensor Networks and their applications in Physics and Machine Learning

Siva Swaminathan
December 12, 2019
41

Tensor Networks and their applications in Physics and Machine Learning

Tensor networks originated as a very useful tool to model states of quantum systems with many degrees of freedom (effectively equivalent to high-dimensional probability distributions). By exploiting the naturally sparse entanglement structure, well-designed networks provide variational ansatzes conducive to efficiently modelling such states. Of particular importance are 'MERA' networks, in which information is organized hierarchically, in a manner comparable to feed-forward neural networks. In this talk, I briefly explain the motivation behind and usage of tensor networks, and summarize some applications of tensor networks, both in the physics context, and recent usage in the context of machine learning.

Siva Swaminathan

December 12, 2019
Tweet

Transcript

  1. Tensor Networks and their applications
    in Physics and Machine Learning
    Sivaramakrishnan Swaminathan
    Vicarious AI
    http://sivark.me
    12 December 2019
    Indian Institute of Technology Bombay

    View Slide

  2. Before we begin. . .
    Please feel free to interrupt and ask questions!
    Comments are my own, and do not represent Vicarious AI

    View Slide

  3. Tensor networks and quantum states

    View Slide

  4. History of tensor networks
    Graphical notation (Roger Penrose in the 1970s)
    Representing and formally manipulating computations
    Index gymnastics on multilinear operators i.e. “tensors”
    Einstein summation convention

    View Slide

  5. Primer on tensors

    View Slide

  6. Quantum states ∼ probability distributions
    States are vectors in a Hilbert space, with ψ|ψ = 1
    Alternately, density matrices ρ ≡ |ψ ψ| with Tr ρ = 1
    Compute expectation values of operators
    O ≡ Tr [ρO] = ψ|O|ψ
    Entanglement ∼ Mutual Information
    Bell’s inequality

    View Slide

  7. Many-body states are high-dimensional
    Joint distribution on N random variables =⇒ dim. ∼ exp (N)
    Would be nice to handle infinite systems
    This is why QM is hard, even though it’s linear

    View Slide

  8. Aside: Curse or blessing of dimensionality?
    Exponentially many dimensions
    Every direction corresponds to a (soft) partitioning!
    (high “shattering” capacity)

    View Slide

  9. Statistical physics

    View Slide

  10. Basic problem setup
    Local DOFs on a lattice (eg: Ising model)
    Hamiltonian describing how the states are coupled
    Implicitly defines a distribution, and a “ground state”
    Compute observables to explain behavior
    correlation functions ~ statistical moments
    Why bother?
    Condensed matter goodies!

    View Slide

  11. Strategy
    Seek representation amenable to
    Efficient storage
    Efficient computations
    Lossy representations that allow controlled approximations
    Start with the simplest cases (most symmetry) and slowly generalize

    View Slide

  12. Exploit physical principles?!
    Model typical states
    eg: Lowest energy state; use “power method”
    Most states in Hilbert space are crazy unphysical!
    Typical states form a vanishing fraction of Hilbert space
    Locality =⇒ “area scaling” of entanglement
    Additional symmetries (translation, scale invariance)

    View Slide

  13. Tensor networks1
    Approximate joint distributions (states) by some variational ansatz;
    allows efficient representation and computation
    Can condition/marginalize over variables efficiently.
    (Number of variational parameters scale favorably)
    Often massively over-parametrized
    1See Orus 2019 for a recent review

    View Slide

  14. Matrix Product States2
    (markov models, tensor train, etc.)
    Modern perspective on DMRG
    Exponentially decaying correlations
    Could passably fake power-laws through interesting dynamics, or suitable sum of
    exponentials! (rich statistics literature)
    x−r =
    1
    Γ(r)

    0
    tr−1e−xtdt
    2See Schollwock 2011 for a review

    View Slide

  15. Multiscale Entanglement Renormalization Ansatz3
    Modelling scale-invariant (critical) systems
    3Vidal 2008, Evenbly+Vidal 2009, Pfiefer+Evenbly+Vidal 2009

    View Slide

  16. MERA: Constraints
    =
    u†
    u
    i j
    k l
    i j
    k l
    =
    w†
    w
    i
    j
    i
    j

    View Slide

  17. MERA: Efficient computations
    Causal structure of influence simplifies computations

    View Slide

  18. Computing the variational parameters in MERA
    Non-trivial optimization problem, given unitarity constraints
    For each layer
    Reduce problem to optimizing Tr t A t† B
    Approximate by optimizing Tr [t C] =⇒ use SVD!
    Alternating minimization to optimize tensors (t ∈ {u, v})
    (More recent developments demonstrate better learning techniques)

    View Slide

  19. Aside: MERA and Wavelets
    Multi-resolution “shape” of MERA reminiscent of wavelets
    Connection established4 more rigorously
    Used to design new wavelets from quantum circuits!
    4Evenbly+White 2016, 2018

    View Slide

  20. Quantum gravity

    View Slide

  21. A hard problem
    Holy grail of fundamental physics for the past half-century
    If we naively combine gravity and quantum mechanics
    “Infinities” from marginalizing over infinitely many DOFs
    (dependence on prior; loss of predictivity)
    Physicists care about answers being finite and unique,
    so they may be compared with experiment.

    View Slide

  22. Holographic quantum gravity
    Quantum Mechanics on "boundary" = Quantum Gravity in "bulk"
    (justifications from string theory)
    {Figure from https://commons.wikimedia.org/wiki/File:AdS3_(new).png}

    View Slide

  23. MERA ???
    ←→ Holography
    How does “space” emerge from correlated DOFs?
    (deep question in AI/cognition)
    MERA models entanglement structure in quantum states
    Holographic spacetime maps entanglement structure
    Bulk geodesic length ∼
    = boundary entanglement (Ryu-Takayanagi formula)
    Emergent direction encodes scale-dependence of entanglement
    (renormalization group flow)

    View Slide

  24. Searching for a more direct relationship
    Lots of discussion over the last several years. . .
    I’ll summarize recent understanding, without detailing justifications
    MERA discretizes the integral transform of bulk geometry5
    5Czech+Lamprou+McCandlish+Sully 2015, 2016

    View Slide

  25. Simplest example: hyperbolic space H2
    Full conformal symmetry
    Start with H2 and obtain dS1+1
    MERA discretizes dS1+1
    Causal structure and scaling of correlations
    (I’m happy to sketch the calculation if desired)

    View Slide

  26. Minimal Updates Proposal (MUP)
    Modeling scale invariant systems with a local defect
    Originally6 motivated by computational convenience
    6Evenbly+Vidal 2015

    View Slide

  27. Our generalization: defect geometries
    Reduced symmetry: more nuanced duality; harder computations
    Proposed7 a novel generalization of the MUP: Rayed MERA
    principled justification based on symmetry arguments
    (Boundary OPE)
    7Czech+Nguyen+Swaminathan 2017

    View Slide

  28. Summary: Quantum mechanics ↔ Spacetime geometry
    TNs organize many-body systems by structure of correlations
    Sparsity in entanglement ↔ spatial structure

    View Slide

  29. View Slide

  30. Machine learning8
    8Hopelessly incomplete selection of things to touch on

    View Slide

  31. TNs for discriminative models
    (Reminiscent of quantum circuit interpretation of tensor network)
    Linear classifier on a suitable encoding of the input
    y = W · Φ(x)
    Represent classifier (W) by a tensor network
    Tensor bond dimensions regularize model capacity;
    can be chosen adaptively

    View Slide

  32. MPS for MNIST9
    Generalize one-hot encoding at each pixel;
    tensor product over locations
    Reshape image to 1d (ugh!),
    and represent linear classifier functional as MPS
    Regularization from approximation
    L2 cost function; network structure gives efficient gradients
    Choose internal bond dimension adaptively while optimizing (SVD step)
    9Stoudenmire+Schwab 2016

    View Slide

  33. (Figures from Stoudenmire+Schwab 2016)

    View Slide

  34. TNs for generative models11
    (Reminiscent of wavefunction interpretation of tensor network)
    Efficient contraction schemes provide inference
    supporting variety of “queries” a la graphical models.
    Direct sampling schemes10 MCMC
    10Ferris+Vidal 2012
    11Han+Wang+Fan+Wang+Zhang 2018

    View Slide

  35. TN ↔ more familiar ML models
    MPS and RBMs
    Tree tensor networks and Conv. Arithmetic Circuits
    Coarse graining structure of language models
    etc, etc, etc.
    This slide is just meant to be indicative.
    See Orus 2019 for a more comprehensive listing and references

    View Slide

  36. TensorNetwork12 API on top of TensorFlow (2019)
    Previously had to write efficient bespoke code
    Recently released by Google X, one of the highlights at NeurIPS 2019
    Convenient Python interface
    GPU backend =⇒ massive speedup!
    12Roberts et. al. 2019

    View Slide

  37. Themes to explore
    Engineering
    Develop better ansatzes (esp. for higher dimensional space)
    Make sense of these classes
    Better techniques (differential programming13)
    Exploit them for ML!
    ML on quantum computers!?
    Physics
    Quantum many-body systems (condensed matter physics)
    Why do these variational models work so well!?
    MERA and renormalization group flow
    Quantum gravity (holography)
    13Liao+Liu+Wang+Xiang 2019

    View Slide

  38. Thank you!

    View Slide