Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Neural Abstract Machines

Neural Abstract Machines

A summarization talk of the neural abstract machines and program induction topic, given in an offline meeting.

Haruki Kirigaya

January 04, 2017
Tweet

More Decks by Haruki Kirigaya

Other Decks in Research

Transcript

  1. Approaches of Program Induction • Inductive Logic Programming (ancient) •

    Probabilistic Programming Languages (modern) • Neural Abstract Machines (contemporary)
  2. Approaches of Program Induction • Inductive Logic Programming (ancient) •

    Program Synthesis (Manna et al. 1980; Solar-Lezama et al. 2006) • Inductive Logic Programming (Muggleton et al. 1994) • Probabilistic Programming Languages (modern) • Neural Abstract Machines (contemporary)
  3. Approaches of Program Induction • Inductive Logic Programming (ancient) •

    Probabilistic Programming Languages (modern) • Bayesian Logic (compiler Swift [Wu et al. 2016]) • TerpreT (Gaunt et al. 2016) • and a LONG list of other PPLs • Neural Abstract Machines (contemporary)
  4. Approaches of Program Induction • Inductive Logic Programming (ancient) •

    Probabilistic Programming Languages (modern) • Neural Abstract Machines (contemporary) • Neural Programmer (Neelakantan et al. ICLR-2016) • Hybrid Computing (Graves et al. 2016, nature) • Coupling Distributed and Symbolic Execution (Mou et al., 2016, arXiv) • Neural Symbolic Machines (Liang et al. 2016, NIPS2016-NAMPI)
  5. Neural Programmer: Inducing Latent Programs with Gradient Descent • Motivation

    • NN are unable to learn even simple arithmetic and logic operations (Joulin and Mikolov, 2015) • Tasks like QA need complex reasoning • Contributions • Neural Programmer, an NN augmented with arithmetic and logic operators • Learn from weak supervision • Fully differentiable • KeyPoint: select operators and save results iteratively for T timesteps • Synthetic Dataset (tables)
  6. Neural Programmer: Inducing Latent Programs with Gradient Descent • Question

    RNN • Select operators and data source (table columns) • Operators • History RNN to save result
  7. Neural Programmer: Inducing Latent Programs with Gradient Descent • Operators

    • Maintained Output Variables: • + ∈ ℝ, scalar answer of t step, init to 0 • + ∈ 0,1 6×8, selection prob. of entry (i, j), init to 0 • + ∈ [0,1]6, selection prob. of row i, init to 1 • Operator List
  8. Neural Programmer: Inducing Latent Programs with Gradient Descent • Operators

    • variables are then updated by the weighted sum of every operator results
  9. Neural Programmer: Inducing Latent Programs with Gradient Descent Question Templates:

    Table Generation: • cells: [-100, 100] for training, [-200, 200] for testing • rows: [30, 100] for training, 120 for testing
  10. Related Work: Neural Programmer- Interpreters (Reed and de Freitas, ICLR2016)

    • Use complete program execution sequence as supervision • Core LSTM + domain-specific encoders, for various environments • Augmented with programs saved in key-value memory environments arguments domain-specific encoder prob(end-of-prog) program key next argument next env next program
  11. Hybrid computing using a neural network with dynamic external memory

    (Graves et al. 2016) • Differential Neural Computer (DNC) benefit from both sides • manipulate data structures using memory like a traditional computer and learn from data like a NN • Core (deep LSTM) + 3 forms of attention to interact with memory • Content-based Lookup • Temporal Link Matrix, to record consecutively modified locations to read • Dynamic Memory Allocation, to allocate unused memory place to write
  12. Hybrid computing using a neural network with dynamic external memory

    (Graves et al. 2016) • input: x and R vectors read out from memory last time • Core LSTM: • Final Output: • Interface parameter:
  13. Hybrid computing using a neural network with dynamic external memory

    (Graves et al. 2016) • Read from memory • the weights domain: non-negative orthant with a unit simplex as boundary • suppose we have R read weights • the read vectors are: (i = 1, 2, ..., R) • Write to memory • writing weight, , erase vector , write vector • then erase and write to the memory:
  14. Hybrid computing using a neural network with dynamic external memory

    (Graves et al. 2016) • Reinterpret the interface parameter: • Normalized with: • As
  15. Hybrid computing using a neural network with dynamic external memory

    (Graves et al. 2016) • Attention 1: Content-based Addressing • with cosine similarity: • input: M, memory; k, comparison key; beta, comparison strength
  16. • Attention 2: Dynamic Memory Allocation (analogous to the list

    of free memory in a traditional computer) • Synthesize the free gates for every previous read vector • Update the usage vector: • Sort the usage vector as phi (phi[1] the index of the least usage) • Allocation: • Combined with Content-based attention Hybrid computing using a neural network with dynamic external memory (Graves et al. 2016)
  17. • • Synthesize the free gates for every previous read

    vector • Update the usage vector: • Sort the usage vector as phi (phi[1] the index of the least usage) • Allocation: • Combined with Content-based attention Hybrid computing using a neural network with dynamic external memory (Graves et al. 2016)
  18. Hybrid computing using a neural network with dynamic external memory

    (Graves et al. 2016) • Attention 3: Temporal Memory Linkage, the cell L[i, j] is the degree to which location i is updated after j • Record the degree to which the locations are modified last time • Update Link Matrix L: • Forward and backward weighting: • Combined with Content-based Addressing
  19. Hybrid computing using a neural network with dynamic external memory

    (Graves et al. 2016) • • Record the degree to which the locations are modified last time • Update Link Matrix L: • Forward and backward weighting: • Combined with Content-based Addressing
  20. Hybrid computing using a neural network with dynamic external memory

    (Graves et al. 2016) • Comparison with Neural Turing Machine (NTM) • similar architecture: neural network + memory • differed in the attention mechanism with memory. NTM use location-based instead of content-based addressing, thus memory is more likely to be allocated sequentially. • Drawbacks of NTM: • NTM do not ensure the allocated blocks do not overlap and interfere. DNC do not need the assurance because the allocated blocks are not contiguous • NTM do not free memory but DNC has free gates. • NTM preserve the sequential information only when continuously iterating memory. DNC has Temporal Link Matrix.
  21. Hybrid computing using a neural network with dynamic external memory

    (Graves et al. 2016) • Trained with Random Graph • input triples (123, 456, 789) • N points sampled from a unit square • a random K-nn as output nodes • Traversal: • query: (src, e, _), (_, e1, _), (_, e2, _), ... • answer(no input): (src, e, p1), (p1, e1, p2), ... • Shortest path: • query: (src, _, dest), blank1, ..., blank10 • answer (with last input): triple sequence • Structure Prediction problem (imitation learning) • Inference: • query: (s, relation, _) • answer: (s, relation, dest) • with curriculum learning
  22. Hybrid computing using a neural network with dynamic external memory

    (Graves et al. 2016) • Other experiments dismissed: • Functional: • a puzzle game inspired by Winograd’s SHRDLU(Winograd, 1971): mini-SHRDLU • + Reinforcement Learning • Theoretic: • How garbage memory is re-allocated • the allocation is independent with memory size and content • effect of the sparse temporal link matrix • hyper-parameters and cross validation • curriculum learning
  23. • Applications of using natural language to query database: •

    Generative QA [Yin et al. 2016a] • Human-Computer conversation [Wen et al. 2016] • Table Query [This work] • Derived from Semantic Parsing(language to logic forms): • [Long et al. 2016; Pasupat and Liang, 2016 (further over 2015)] • Seq2seq: groundtruth LF as supervision • [Dong and Lapata, 2016] seq2tree, and [Xiao et al. 2016] DSP • Neural Semantic Parsing • [Yin et al. 2016b] Neural enquirer (basis of this work), lack of explicit interpretation • [Neelakantan et al. 2016] neural programmer, ICLR-2016 • symbolic operations only for numeric tables, not for string matching • exponential number of combinatorial states • [Liang et al. 2016] neural symbolic machines, REINFORCE is sensitive to initial policy Coupling Distributed and Symbolic Execution for Natural Language Queries (Mou et al. 2016)
  24. Coupling Distributed and Symbolic Execution for Natural Language Queries (Mou

    et al. 2016) • Distributed Enquirer (similar to Neural Enquirer [Yin et al. 2016]) • step by step to the final (softmax) output • Select columns and rows, output the column distribution p and row weights r
  25. Coupling Distributed and Symbolic Execution for Natural Language Queries (Mou

    et al. 2016) • Symbolic Executor • operators • computing: trained with REINFORCE trial-and-error fashion
  26. Coupling Distributed and Symbolic Execution for Natural Language Queries (Mou

    et al. 2016) • Distributed can be trained end-to-end • low execution efficiency (matrix multi.), and lack of explanation • Symbolic has high efficiency and explicit explanation • can’t be trained end-to-end and suffers REINFORCE cold start and local optima problem • Coupling:
  27. Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak

    Supervision • To use NN to language understanding and symbolic reasoning • complex operations • external memory • Existing neural program induction methods are • differentiable (Memory Network) • low-level (Neural Turing Machine) • limited to small synthetic dataset • The proposed MPC framework • non-differentiable • support abstract, scalable and precise operations
  28. Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak

    Supervision • Figurative Narrative • Manager: input data (and reward as weak supervision) • Programmer: seq-to-seq induction model • Computer: high-level language program execution
  29. Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak

    Supervision • Computer executes the programs consisting of LISP-style expressions • Code-Assistant: Computer will help Programmer with the restriction on the next token to be the set of valid tokens • ( -> Hop -> v, then the next token must be a relation reachable from v • similar to typing constraint in functional programming • another way is to induce derivations rather than tokens (Xiao et al. 2016)
  30. Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak

    Supervision • Seq-to-seq as Programmer: • GRU as both encoder and decoder • attention similar to [Dong and Lapata, 2016]
  31. Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak

    Supervision • Key-Value memory • (key=embedding, value=entity token or list) • At Encoding: (entity-resolver similar to [Yih et al. 2015]) • key=avg. of GRU outputs of tokens within the entity span • value=the entity token “m.12345” • At Decoding: • evaluate the expression as soon as the “)” is emitted • key=current GRU output, value=executed result • each time a new variable is added, the token is added to vocabulary • freebase relation embedding: ParentOf: /people/person/parents -> [avg(E(people), E(person)); E(parents)]
  32. Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak

    Supervision • Abstract: • Use Lisp-style expression rather than low-level operations • Scalable: • Executable on the whole freebase • Precise: • Use exact token rather than embeddings
  33. Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak

    Supervision • Training with REINFORCE algorithm (Williams, 1992) • action: given tokens by the computer • state: question, and the action sequence up to now • reward: non-zero only at the last step • policy π(s, a, theta) = • environment is deterministic: • Loss:
  34. Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak

    Supervision • REINFORCE algorithm suffers from • local optimum problem • search space is too large • Find Approximate Gold Programs using maximum likelihood loss: • where is the program that achieved highest reward with shortest length on a question so far (question is ignored if no such program is found). • Drawback: spurious program (correct result with incorrect program), no negative examples make it hard to compute • Curriculum Learning (complexity: functions used, program length) • first 10 iterations for ML training, using only Hop function and length 2 • iterations again, using both Hop and Equal with length 3
  35. Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak

    Supervision • Augmented REINFORCE (similar to [Game of Go] and [Google MT]) • add to the final beam with prob. alpha=0.1 • others prob. = (1 - alpha) = 0.9
  36. Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak

    Supervision • Evaluations on WebQuestionSP (Yih et al. 2016)
  37. Conclusion • Neural Abstract Machine is a reusable framework capable

    of real reasoning to some degree. • Combining neural networks with symbolic methods is promising • powerful • explicitly explainable • High-level operators can be designed to target the very task • better at scaling up, e.g. to the magnitude of freebase • Low-level operations can be reused for various tasks • Memory is essential for high-level operators to achieve expression composition