Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Learning Structural Edits via Incremental Tree ...

Breandan Considine
September 28, 2021
28

Learning Structural Edits via Incremental Tree Transformations

Breandan Considine

September 28, 2021
Tweet

More Decks by Breandan Considine

Transcript

  1. - Programs are graphs, but even matching in graph grammars

    is NP-hard and termination of graph rewriting is Turing-complete The computational challenges of learning programs
  2. - Programs are graphs, but even matching in graph grammars

    is NP-hard and termination of graph rewriting is Turing-complete - Source code is syntactically tree-like (i.e., a context-free grammar) The computational challenges of learning programs
  3. The computational challenges of learning programs - Programs are graphs,

    but even matching in graph grammars is NP-hard and termination of graph rewriting is Turing-complete - Source code is syntactically tree-like (i.e., a context-free grammar) - But whole-tree synthesis is still extremely hard, i.e., the number of possible ASTs grows super-exponentially with depth, ~O(Ack(n, m))
  4. - Programs are graphs, but even matching in graph grammars

    is NP-hard and termination of graph rewriting is Turing-complete - Source code is syntactically tree-like (i.e., a context-free grammar) - But whole-tree synthesis is still extremely hard, i.e., the number of possible ASTs grows super-exponentially with depth, ~O(Ack(n, m)) - Even counting shallow trees requires galactic computation The computational challenges of learning programs
  5. Counting the number of binary trees up to height n

    All trees S n+1 =(S n +2)2−1 Distinct trees T n =S n −S n−1 n S n T n 1 3 3 2 24 21 3 675 651 4 458329 457653 5 210066388899 210065930571
  6. - Programs are graphs, but even matching in graph grammars

    is NP-hard and termination of graph rewriting is Turing-complete - Source code is syntactically tree-like (i.e., a context-free grammar) - But whole-tree synthesis is still extremely hard, i.e., the number of possible ASTs grows super-exponentially with depth, ~O(Ack(n, m)) - Even counting shallow trees requires galactic computation - Much more computationally feasible to learn incremental transformations, but how do we represent incremental edits? The computational challenges of learning programs
  7. Previous work in incremental program learning - Tarlow et al.,

    (2019), Dinella et al., (2020), Brody et al., (2020) explore incremental editing on code sequences and tree-structured data - Unlike theirs and prior work, e.g., Yin et al. (2019), Panthaplackel et al. (2020b), and Hoang et al. (2020), Yao et al. (2021): - Study intent, i.e. conditioned a learned or provided-specification - Use a language-agnostic type-safe DSL based on Wang, (1997) - Perform all transformations entirely in the tree domain - Allow edit trajectories to modify any part of the tree at time - Support new flexible operators such as subtree copying - Propose a IL-based training algorithm with a dynamic oracle
  8. A Dynamic Oracle! See: Goldberg and Nivre (2012) How do

    we generate the gold edit sequences?
  9. • Wang et al. The Zephyr Abstract Syntax Description Language,

    1997. • Yin et al. Learning to represent edits, 2018. • Panthaplackel et al., Learning to update natural language comments based on code changes, 2020. • Tarlow, et al. Learning to fix build errors with graph2diff neural networks, 2019. • Dinella, et al. Hoppity: Learning graph transformations to detect and fix bugs in programs, 2020. • Thong Hoang, et al. CC2Vec: Distributed representations of code changes, 2020. References and Related Work