Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Learning Structural Edits via Incremental Tree Transformations

Breandan Considine
September 28, 2021
21

Learning Structural Edits via Incremental Tree Transformations

Breandan Considine

September 28, 2021
Tweet

More Decks by Breandan Considine

Transcript

  1. Learning Structural Edits via
    Incremental Tree Transformations
    Ziyu Yao, Frank Xu, Pengcheng Yin,
    Huan Sun, Graham Neubig

    View full-size slide

  2. - Programs are graphs, but even matching in graph grammars is
    NP-hard and termination of graph rewriting is Turing-complete
    The computational challenges of learning programs

    View full-size slide

  3. - Programs are graphs, but even matching in graph grammars is
    NP-hard and termination of graph rewriting is Turing-complete
    - Source code is syntactically tree-like (i.e., a context-free grammar)
    The computational challenges of learning programs

    View full-size slide

  4. The computational challenges of learning programs
    - Programs are graphs, but even matching in graph grammars is
    NP-hard and termination of graph rewriting is Turing-complete
    - Source code is syntactically tree-like (i.e., a context-free grammar)
    - But whole-tree synthesis is still extremely hard, i.e., the number of
    possible ASTs grows super-exponentially with depth, ~O(Ack(n, m))

    View full-size slide

  5. - Programs are graphs, but even matching in graph grammars is
    NP-hard and termination of graph rewriting is Turing-complete
    - Source code is syntactically tree-like (i.e., a context-free grammar)
    - But whole-tree synthesis is still extremely hard, i.e., the number of
    possible ASTs grows super-exponentially with depth, ~O(Ack(n, m))
    - Even counting shallow trees requires galactic computation
    The computational challenges of learning programs

    View full-size slide

  6. Counting the number of binary trees up to height n
    All trees
    S
    n+1
    =(S
    n
    +2)2−1
    Distinct trees
    T
    n
    =S
    n
    −S
    n−1
    n S
    n
    T
    n
    1 3 3
    2 24 21
    3 675 651
    4 458329 457653
    5 210066388899 210065930571

    View full-size slide

  7. - Programs are graphs, but even matching in graph grammars is
    NP-hard and termination of graph rewriting is Turing-complete
    - Source code is syntactically tree-like (i.e., a context-free grammar)
    - But whole-tree synthesis is still extremely hard, i.e., the number of
    possible ASTs grows super-exponentially with depth, ~O(Ack(n, m))
    - Even counting shallow trees requires galactic computation
    - Much more computationally feasible to learn incremental
    transformations, but how do we represent incremental edits?
    The computational challenges of learning programs

    View full-size slide

  8. Previous work in incremental program learning
    - Tarlow et al., (2019), Dinella et al., (2020), Brody et al., (2020) explore
    incremental editing on code sequences and tree-structured data
    - Unlike theirs and prior work, e.g., Yin et al. (2019), Panthaplackel et al.
    (2020b), and Hoang et al. (2020), Yao et al. (2021):
    - Study intent, i.e. conditioned a learned or provided-specification
    - Use a language-agnostic type-safe DSL based on Wang, (1997)
    - Perform all transformations entirely in the tree domain
    - Allow edit trajectories to modify any part of the tree at time
    - Support new flexible operators such as subtree copying
    - Propose a IL-based training algorithm with a dynamic oracle

    View full-size slide

  9. Code transformations as autoregressive tree editing

    View full-size slide

  10. Code transformations as autoregressive tree editing
    x=list.ElementAt(i+1) x = list[i+1]

    View full-size slide

  11. Code transformations as autoregressive tree editing
    (Λ, Λ) = φ( )
    x=list.ElementAt(i+1) x = list[i+1]

    View full-size slide

  12. Code transformations as autoregressive tree editing
    (Λ, Λ) = φ( )
    x=list.ElementAt(i+1) x = list[i+1]

    View full-size slide

  13. Code transformations as autoregressive tree editing
    x=list.ElementAt(i+1) x = list[i+1]
    (Λ, Λ) = φ( )

    View full-size slide

  14. x=list.ElementAt(i+1) x = list[i+1]
    Code transformations as autoregressive tree editing
    (Λ, Λ) = φ( )

    View full-size slide

  15. x=list.ElementAt(i+1) x = list[i+1]
    Code transformations as autoregressive tree editing
    (Λ, Λ) = φ( )

    View full-size slide

  16. x=list.ElementAt(i+1) x = list[i+1]
    Code transformations as autoregressive tree editing
    (Λ, Λ) = φ( )

    View full-size slide

  17. Iterative tree edit decoding process

    View full-size slide

  18. Iterative tree edit decoding process

    View full-size slide

  19. Iterative tree edit decoding process

    View full-size slide

  20. Iterative tree edit encoding process
    Given: Learn:

    View full-size slide

  21. Iterative tree edit encoding process
    Given: Learn:

    View full-size slide

  22. A Dynamic Oracle!
    See:
    Goldberg and Nivre (2012)
    How do we generate the gold edit sequences?

    View full-size slide

  23. Trajectory-sampling algorithms

    View full-size slide

  24. ● Wang et al. The Zephyr Abstract Syntax Description Language, 1997.
    ● Yin et al. Learning to represent edits, 2018.
    ● Panthaplackel et al., Learning to update natural language comments
    based on code changes, 2020.
    ● Tarlow, et al. Learning to fix build errors with graph2diff neural
    networks, 2019.
    ● Dinella, et al. Hoppity: Learning graph transformations to detect and
    fix bugs in programs, 2020.
    ● Thong Hoang, et al. CC2Vec: Distributed representations of code
    changes, 2020.
    References and Related Work

    View full-size slide