Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Development Emails Content Analyzer: Intention Mining in Developer Discussions

Development Emails Content Analyzer: Intention Mining in Developer Discussions

Written development communication (e.g. mailing lists, issue trackers) constitutes a precious source of information to build recommenders for software engineers, for example aimed at suggesting experts, or at re-documenting existing source code. In this paper we propose a novel, semi-supervised approach
named DECA (Development Emails Content Analyzer) that uses Natural Language Parsing to classify the content of development emails according to their purpose (e.g. feature request, opinion asking, problem discovery, solution proposal, information giving etc), identifying email elements that can be used for specific tasks.
A study based on data from Qt and Ubuntu, highlights a high precision (90%) and recall (70%) of DECA in classifying email content, outperforming traditional machine learning strategies. Moreover, we successfully used DECA for re-documenting source code of Eclipse and Lucene, improving the recall, while keeping high precision, of a previous approach based on ad-hoc heuristics.

Sebastiano Panichella

July 12, 2016
Tweet

More Decks by Sebastiano Panichella

Other Decks in Research

Transcript

  1. Development  Emails  Content  Analyzer:   Intention  Mining  in  Developer  Discussions

     Andrea Di  Sorbo  Sebastiano Panichella  Corrado Visaggio  Massimiliano Di  Penta  Gerardo Canfora  Harald Gall
  2. Outline   Context:   Wri5en   Development  Discussions Case  Study:

      Development  Mailing  List of  2  Open  Source  Projects Results: Automatic  Classification  of  Relevant Contents  in  Developers’  Communication 2
  3. Development     Communication  Means Recommender  systems: -­‐‑  Bug  Triaging

     [1] -­‐‑  Suggest  Mentors  [2] -­‐‑  Code  re-­‐‑documentation  [3] -­‐‑  Etc. [1]  Anvik  et  al.  “Who  should  fix  this  bug?”. [2]  Canfora  et  al.  “Who  is  going  to  mentor  newcomers  in  open  source  projects?”   [3]  Panichella  et  al.  “Mining  source  code  descriptions  from  developer  communications” 7
  4. Development     Communication  Means [1]  Bacchelli  et  al.  “Content

     classification  of  development  emails”. [2]  Cerulo  et  al.  “A  Hidden  Markov  Model  to  detect  coded  information  islands  in  free  text.”   9
  5. A  Considerable  Effort  for   Developers Many  messages   Developers

     get  lost  in  unnecessary  details   missing  potential  useful  information… 11
  6. Previous  Work   12 Hana  et  al. “…Lazy”  RTC  occurs

     when   a  core  developer  post  a   change  to  a  mailing  lists   and  nobody  responds,  it  assumed  that  other   developers  reviewed  the   code…”
  7. Previous  Work   Approaches  for:   -­‐‑  Generating  summaries  

           of  emails.              à  Lam  et  al.  ,              à  Rambow  et  al. -­‐‑  Generating  summaries            of  bug  reports.            à    Rastkar  et  al. 13
  8. DECA   (Development  Email  Content  Analyzer) An  approach  to  Classify

     Paragraphs   According  to  Intentions hSp://www.ifi.uzh.ch/seal/people/panichella/tools/DECA.html 15
  9. Example i.  We  could  use  a  leaky  bucket  algorithm  to

     limit   the  bandwidth ii.  The  leaky  bucket  algorithm  fails  in  limiting  the   bandwidth   17
  10. i.  We  could  use  a  leaky  bucket  algorithm  to  limit

      the  bandwidth ii.  The  leaky  bucket  algorithm  fails  in  limiting  the   bandwidth        An  high  percentage  of  words  in  common Example 18
  11. i.  We  could  use  a  leaky  bucket  algorithm  to  limit

      the  bandwidth ii.  The  leaky  bucket  algorithm  fails  in  limiting  the   bandwidth   Discuss  about  the  same  topics Example 19
  12. i.  We  could  use  a  leaky  bucket  algorithm  to  limit

      the  bandwidth ii.  The  leaky  bucket  algorithm  fails  in  limiting  the   bandwidth   Have  different  intentions Example 20
  13. i.  We  could  use  a  leaky  bucket  algorithm  to  limit

      the  bandwidth ii.  The  leaky  bucket  algorithm  fails  in  limiting  the   bandwidth   Have  different  intentions Example “Techniques  based  on  lexicon  analysis,  such  as  VSM  [1],  LSI  [2],  or  LDA  [3]  would   not  be  sufficient  to  classify  paragraphs  according  to  intentions”. . [1]  Baeza-­‐‑Yates  et  al.  “Modern  Information  Retrieval”. [2]  de  Marneffe  et  al.,  “The  Stanford  typed  dependencies  representation”. [3]  Blei  et  al.,  “Latent  dirichlet  allocation”. 21
  14. Goal:  Understanding  to  what  extent  NL  parsing  could  be  

    used    in  recognizing  informative  text  fragments  in  emails   from  a  software  maintenance  and  evolution  perspective Quality   focus:   Detection   of   text   paragraphs   in   development   discussions     containing   helpful   information   for  developers.   Perspective:   Guide   developers   in   maintaining   and   evolving  their  products.   Case  Study   23
  15. Research  Questions   RQ1:   Can   an   NLP

      approach   (i.e.   DECA)   be   effective   in   classifying   writers’   intentions   in   development  emails? RQ2:  Is  DECA  more  effective  than  existing   Machine   Learning   techniques   in   classifying  development  emails  content? 24
  16. Sampling   We  selected  100 Of  the      

                         Project       28
  17. Why  NL  parsing?   Well  defined  predicate-­‐‑argument  structures use we

    could algorithm a leaky bucket limit to bandwidth the            nsubj                    aux                    dobj                        xcomp            det                  amod                    nn                      aux                        dobj             det fails algorithm the leaky bucket in limiting bandwidth the                                    nsubj                                                          prep                det                amod          nn                    pcomp                  dobj                  det   36
  18. NL  parsing Natural  Language  Templates use [someone] could [something]  

                           nsubj                    aux                    dobj fails [somehing] nsubj 37
  19. Natural  Language  Templates use [someone] could [something]      

                       nsubj                    aux                    dobj fails [somehing] nsubj NL  parsing 38
  20. Natural  Language  Templates use [someone] could [something]      

                       nsubj                    aux                    dobj fails [somehing] nsubj NL  parsing 39
  21. 42

  22. 43

  23. 49

  24. 50

  25. 51

  26. 52

  27. 53

  28. 54

  29. RQ2:   Is  the  proposed  approach  more   effective  than

     existing  ML  in  classifying   development  emails  content? 55
  30. ML  for  Email  Classification An  Approach  Based  on  ML  for

     Email  Content  Classification            à  Antoniol  et.  al.,  CASCON  2008                à  Zhou  et  al.  ,  ICSME  2014 56
  31. ML  for  Email  Classification An  Approach  Based  on  ML  for

     Email  Content  Classification 1)Text  Features 57
  32. ML  for  Email  Classification An  Approach  Based  on  ML  for

     Email  Content  Classification 1)Text  Features 2)  Split  training   and  test  sets 58
  33. ML  for  Email  Classification An  Approach  Based  on  ML  for

     Email  Content  Classification 1)Text  Features 2)  Split  training   and  test  sets 3)  Oracle   building 59
  34. ML  for  Email  Classification An  Approach  Based  on  ML  for

     Email  Content  Classification 1)Text  Features 2)  Split  training   and  test  sets 3)  Oracle   building 4)  Classification training prediction            à  Antoniol  et.  al.,  CASCON  2008                à  Zhou  et  al.  ,  ICSME  2014 60
  35. 61

  36. 62

  37. 63

  38. 64

  39. 65

  40. 66

  41. 67

  42. 68

  43. 69

  44. Summary •  RQ2:   DECA outperforms traditional ML techniques in

    terms of recall, precision and F-Measure when classifying e-mail content. •  RQ1:   the automatic classification performed by DECA achieves very good results in terms of both precision, recall and F-measure (over all the experiments). 70
  45. Summary •  RQ2:   DECA outperforms traditional ML techniques in

    terms of recall, precision and F-Measure when classifying e-mail content. ”…it took the MSR community more than 10 years to figure out that machine learning is not the best method for analyzing human-written text. Thank you for helping move the field forward…”  [One of the ASE Reviewers] •  RQ1:   the automatic classification performed by DECA achieves very good results in terms of both precision, recall and F-measure (over all the experiments). 71
  46. 72

  47. Code  e-­‐‑documentation àPanichella  et.  al.  –  ICPC  2012   Extract

     methods’  descriptions  from   developers  discussions à  Vector  Space  Models à  ad  hoc  heuristics “…  several  are  the  discourse   paIerns  that  characterize  false   negative  method  descriptions…  “ 73
  48. Code  re-­‐‑documentation “…  several  are  the  discourse   paIerns  that

     characterize  false   negative  method  descriptions…  “ 74
  49. Code  re-­‐‑documentation “…  several  are  the  discourse   paIerns  that

     characterize  false   negative  method  descriptions…  “ 75
  50. Code  re-­‐‑documentation “…  several  are  the  discourse   paIerns  that

     characterize  false   negative  method  descriptions…  “ 76
  51. Code  re-­‐‑documentation “…  several  are  the  discourse   paIerns  that

     characterize  false   negative  method  descriptions…  “ 77
  52. Code  re-­‐‑documentation “…  several  are  the  discourse   paIerns  that

     characterize  false   negative  method  descriptions…  “ 78
  53. Code  re-­‐‑documentation “…  several  are  the  discourse   paIerns  that

     characterize  false   negative  method  descriptions…  “ 79
  54. Future  work 1)DECA  as  preprocessing   support  to  discard  irrelevant

      sentences  in  summarization   approaches 2)DECA  in  combination  with   topic  models  for  mining   contents  with  the  same  intentions   and  the  same  topics   88