Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Natural Language to Code Generation: A Brief Survey

Natural Language to Code Generation: A Brief Survey

Natural Language to Code Generation: A Brief Survey (RU)

Pavel Braslavski
Associate Professor, Higher School of Economics
Researcher, JetBrains Research

Online Dev Meetup
23 April 2020

Video (RU): https://youtu.be/ZZJgtSyMfmU

Pavel will give a talk on generating program code based on natural language text input. Pavel will briefly survey various task settings, methods, and evaluation, as well as available data.
The talk is a must for those interested in the modern methods and applications of natural language processing.

To learn more about Exactpro, visit our website https://exactpro.com/

Follow us on
LinkedIn https://www.linkedin.com/company/exactpro-systems-llc
Twitter https://twitter.com/exactpro
Facebook https://www.facebook.com/exactpro/
Instagram https://www.instagram.com/exactpro/

Subscribe to Exactpro YouTube channel https://www.youtube.com/c/exactprosystems

5206c19df417b8876825b5561344c1a0?s=128

Exactpro
PRO

April 23, 2020
Tweet

Transcript

  1. Natural Language to Code: Brief Overview Pavel Braslavski 23.04.2020

  2. About myself • Research/academia: JetBrains Research/ Higher School of Economics

    SPb/Ural Federal University • Past industrial experience: Yandex/SKB Kontur • Recent research interests: question answering, fiction analysis, computational humor Homepage: http://kansas.ru/pb/ 2
  3. Why NL2Code? • Applications • Code generation/search • Question answering

    • Instructing a robot • Interesting NLP task • Possibly complete executable meaning representation 3 at the chair, move forward three steps past the sofa [from Yoav Artzi’s slides]
  4. SHRDLU by Terry Winograd (1968) 4 https://www.youtube.com/watch?v=bo4RvYJYOzI

  5. Approaches • Bottom-up processing: Words → Syntax → Meaning •

    End-to-end • Hand-written rules/grammars • Annotated data → machine-learned models 5
  6. GeoQuery (Zelle and Mooney 1996) 7 [Dragomir Radev]

  7. Compositional Semantics S -> NP VP {VP.Sem(NP.Sem)} t VP ->

    V NP {V.Sem(NP.Sem)} <e,t> NP -> N {N.Sem} e V -> likes {λ x,y likes(x,y) <e,<e,t>> N -> Javier {Javier} e N -> pizza {pizza} e [Radev] 8
  8. Semantic Parsing • Associate a semantic expression with each node

    Javier likes pizza V: λ x,y likes(x,y) N: pizza VP: λx likes(x,pizza) N: Javier S: likes(Javier, pizza) [Radev] 9
  9. Zettlemoyer and Collins (2005) 10 [Dragomir Radev]

  10. Zettlemoyer and Collins (2005) 11 [Dragomir Radev]

  11. seq2tree Li Dong and Mirella Lapata Language to Logical Form

    with Neural Attention 2016 (following slides by Dong & Lapata) 12
  12. 13

  13. 15

  14. 16

  15. 17

  16. 18

  17. 19

  18. Seq2SQL Seq2sql: Generating Structured Queries from Natural Language Using Reinforcement

    Learning Victor Zhong, Caiming Xiong, Richard Socher 2017 20
  19. WikiSQL dataset • 80,654 examples: question + SQL query •

    24,241 tables extracted from Wikipedia • Table → SQL query → crude question based on a template → human paraphrase via crowdsourcing • https://github.com/salesforce/WikiSQL 21
  20. Architecture 22 [Zhong, Xiong, Socher, 2017] mixed objective function: L

    = Lagg + Lsel + Lwhe
  21. Results ex – execution accuracy (the same result) lf –

    logical form accuracy (the same query) 23
  22. Coarse to Fine Li Dong and Mirella Lapata Coarse-to-Fine Decoding

    for Neural Semantic Parsing 2018 (following slides by Dong & Lapata) 24
  23. Task ATIS Request: Show me flights from Seattle to Boston

    next Monday SQL query: (SELECT DISTINCT flight.flight_id FROM flight WHERE (flight.from_airport IN (SELECT airport_service.airport_code FROM airport_service WHERE airport_service.city_code IN (SELECT city.city_code FROM city WHERE city.city_name = 'SEATTLE'))) AND (flight.to_airport IN (SELECT airport_service.airport_code FROM airport_service WHERE airport_service.city_code IN (SELECT city.city_code FROM city WHERE city.city_name = 'BOSTON'))) AND (flight.flight_days IN (SELECT days.days_code FROM days WHERE days.day_name IN (SELECT date_day.day_name FROM date_day WHERE date_day.year = 1993 AND date_day.month_number = 2 AND date_day.day_number = 8)))); Database Result: 31 flights available [Hemphill et al. 1990; Dahl et al. 1994] 25 Alane Suhr, ACL2018 tutorial ~5,000 examples; database with 27 tables and ~160,000 entries
  24. DJANGO (2015) • Source code → pseudo-code • Py2En: 18,805

    Py2Jp: 722 • Evaluation: BLEU + manual (acceptability/understanding) • https://github.com/odashi/ase15-django-dataset 26
  25. 27

  26. 28

  27. 29

  28. 30

  29. 31

  30. 32

  31. 33

  32. 34 From NL2bash paper

  33. Spider (2018) • 200 DBs with multiple tables, • 10,181

    questions, • 5,693 complex SQL queries. • Tables → human questions + SQL queries →checking + paraphrasing • Several evaluation measures accounting for SQL structure • https://yale-lily.github.io/spider 35 +SParC, CoSQL (2019)
  34. 36

  35. NL2Bash (2018) • Bash one-liners from the Web + NL

    descriptions • https://github.com/TellinaTool/nl2bash 37
  36. CoNaLa (2018) Intent + python code snippets from StackOverflow Manually

    annotated: 2,379 training/500 test Automatically mined: 598,237 intent/snippet pairs Evaluation: BLUE https://conala-corpus.github.io/ 38
  37. StaQC (2018) • Stack Overflow Question-Code pairs https://github.com/LittleYUYU/StackOverflow-Question-Code-Dataset 39

  38. 40

  39. 41

  40. 42 https://app.wandb.ai/github/codesearchnet/benchmark

  41. Questions? 43

  42. BLEU • Bilingual Evaluation Understudy • Most widely used evaluation

    metrics in machine translation • n-gram precision of the MT output + brevity penalty 44 = ⋅ exp(෍ =1 log ) = min( 1, _ℎ _ℎ ) 4 = ෑ =1 4