Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Learning Structural Edits via Incremental Tree Transformations

Breandan Considine
September 28, 2021
21

Learning Structural Edits via Incremental Tree Transformations

Breandan Considine

September 28, 2021
Tweet

Transcript

  1. - Programs are graphs, but even matching in graph grammars

    is NP-hard and termination of graph rewriting is Turing-complete The computational challenges of learning programs
  2. - Programs are graphs, but even matching in graph grammars

    is NP-hard and termination of graph rewriting is Turing-complete - Source code is syntactically tree-like (i.e., a context-free grammar) The computational challenges of learning programs
  3. The computational challenges of learning programs - Programs are graphs,

    but even matching in graph grammars is NP-hard and termination of graph rewriting is Turing-complete - Source code is syntactically tree-like (i.e., a context-free grammar) - But whole-tree synthesis is still extremely hard, i.e., the number of possible ASTs grows super-exponentially with depth, ~O(Ack(n, m))
  4. - Programs are graphs, but even matching in graph grammars

    is NP-hard and termination of graph rewriting is Turing-complete - Source code is syntactically tree-like (i.e., a context-free grammar) - But whole-tree synthesis is still extremely hard, i.e., the number of possible ASTs grows super-exponentially with depth, ~O(Ack(n, m)) - Even counting shallow trees requires galactic computation The computational challenges of learning programs
  5. Counting the number of binary trees up to height n

    All trees S n+1 =(S n +2)2−1 Distinct trees T n =S n −S n−1 n S n T n 1 3 3 2 24 21 3 675 651 4 458329 457653 5 210066388899 210065930571
  6. - Programs are graphs, but even matching in graph grammars

    is NP-hard and termination of graph rewriting is Turing-complete - Source code is syntactically tree-like (i.e., a context-free grammar) - But whole-tree synthesis is still extremely hard, i.e., the number of possible ASTs grows super-exponentially with depth, ~O(Ack(n, m)) - Even counting shallow trees requires galactic computation - Much more computationally feasible to learn incremental transformations, but how do we represent incremental edits? The computational challenges of learning programs
  7. Previous work in incremental program learning - Tarlow et al.,

    (2019), Dinella et al., (2020), Brody et al., (2020) explore incremental editing on code sequences and tree-structured data - Unlike theirs and prior work, e.g., Yin et al. (2019), Panthaplackel et al. (2020b), and Hoang et al. (2020), Yao et al. (2021): - Study intent, i.e. conditioned a learned or provided-specification - Use a language-agnostic type-safe DSL based on Wang, (1997) - Perform all transformations entirely in the tree domain - Allow edit trajectories to modify any part of the tree at time - Support new flexible operators such as subtree copying - Propose a IL-based training algorithm with a dynamic oracle
  8. A Dynamic Oracle! See: Goldberg and Nivre (2012) How do

    we generate the gold edit sequences?
  9. • Wang et al. The Zephyr Abstract Syntax Description Language,

    1997. • Yin et al. Learning to represent edits, 2018. • Panthaplackel et al., Learning to update natural language comments based on code changes, 2020. • Tarlow, et al. Learning to fix build errors with graph2diff neural networks, 2019. • Dinella, et al. Hoppity: Learning graph transformations to detect and fix bugs in programs, 2020. • Thong Hoang, et al. CC2Vec: Distributed representations of code changes, 2020. References and Related Work