Development Emails Content Analyzer: Intention Mining in Developer Discussions

Development Emails Content Analyzer: Intention Mining in Developer Discussions
Andrea Di Sorbo Sebastiano Panichella Corrado Visaggio Massimiliano Di Penta Gerardo Canfora Harald Gall

Outline Context: Wri5en Development Discussions Case Study:
Development Mailing List of 2 Open Source Projects Results: Automatic Classiﬁcation of Relevant Contents in Developers’ Communication 2

Open Source (OS) and Industrial Projects 3

Development Communication Means Recommender systems: -‐‑  Bug Triaging
[1] -‐‑  Suggest Mentors [2] -‐‑  Code re-‐‑documentation [3] -‐‑  Etc. [1] Anvik et al. “Who should ﬁx this bug?”. [2] Canfora et al. “Who is going to mentor newcomers in open source projects?” [3] Panichella et al. “Mining source code descriptions from developer communications” 7

Development Communication Means 8

Development Communication Means [1] Bacchelli et al. “Content
classiﬁcation of development emails”. [2] Cerulo et al. “A Hidden Markov Model to detect coded information islands in free text.” 9

Diﬀerent Kinds of Data Structured Semi-‐‑Structured Unstructured 10

A Considerable Eﬀort for Developers Many messages Developers
get lost in unnecessary details missing potential useful information… 11

Previous Work 12 Hana et al. “…Lazy” RTC occurs
when a core developer post a change to a mailing lists and nobody responds, it assumed that other developers reviewed the code…”

Previous Work Approaches for: -‐‑  Generating summaries
of emails. à Lam et al. , à Rambow et al. -‐‑  Generating summaries of bug reports. à Rastkar et al. 13

Diﬀerent Purposes Feature requests Bug disclosures Project Management 14

DECA (Development Email Content Analyzer) An approach to Classify
Paragraphs According to Intentions hSp://www.iﬁ.uzh.ch/seal/people/panichella/tools/DECA.html 15

Why use NLP for Classifying Paragraphs According to
Intentions? 16

Example i.  We could use a leaky bucket algorithm to
limit the bandwidth ii.  The leaky bucket algorithm fails in limiting the bandwidth 17

i.  We could use a leaky bucket algorithm to limit
the bandwidth ii.  The leaky bucket algorithm fails in limiting the bandwidth An high percentage of words in common Example 18

the bandwidth ii.  The leaky bucket algorithm fails in limiting the bandwidth Discuss about the same topics Example 19

the bandwidth ii.  The leaky bucket algorithm fails in limiting the bandwidth Have diﬀerent intentions Example 20

the bandwidth ii.  The leaky bucket algorithm fails in limiting the bandwidth Have different intentions Example “Techniques based on lexicon analysis, such as VSM [1], LSI [2], or LDA [3] would not be sufficient to classify paragraphs according to intentions”. . [1] Baeza-‐‑Yates et al. “Modern Information Retrieval”. [2] de Marneffe et al., “The Stanford typed dependencies representation”. [3] Blei et al., “Latent dirichlet allocation”. 21

Perspective 22

Goal: Understanding to what extent NL parsing could be
used in recognizing informative text fragments in emails from a software maintenance and evolution perspective Quality focus: Detection of text paragraphs in development discussions containing helpful information for developers. Perspective: Guide developers in maintaining and evolving their products. Case Study 23

Research Questions RQ1: Can an NLP
approach (i.e. DECA) be eﬀective in classifying writers’ intentions in development emails? RQ2: Is DECA more eﬀective than existing Machine Learning techniques in classifying development emails content? 24

Qt Ubuntu Context 25

STEPS: 1) Taxonomy Deﬁnition 2) Classiﬁcation Based
on DECA (NLP Analyzer) 26

Taxonomy Definition 27

Sampling We selected 100 Of the
Project 28

Clustering Clusters Implementation Technical Infrastructure Project Status Social Interations Usage
Discarded Guzzi et. al – MSR2013 29

Clustering Guzzi et. al – ICSE2012 30

The ﬁnal taxonomy 31

Diﬀerences with Guzzi et. al. 32

Examples 33

Natural Language Parsing DECA (Development Email Content Analyzer) 34

Recurrent Linguistic PaSerns 35

Why NL parsing? Well deﬁned predicate-‐‑argument structures use we
could algorithm a leaky bucket limit to bandwidth the nsubj aux dobj xcomp det amod nn aux dobj det fails algorithm the leaky bucket in limiting bandwidth the nsubj prep det amod nn pcomp dobj det 36

NL parsing Natural Language Templates use [someone] could [something]
nsubj aux dobj fails [somehing] nsubj 37

Natural Language Templates use [someone] could [something]
nsubj aux dobj fails [somehing] nsubj NL parsing 38

Natural Language Templates use [someone] could [something]
nsubj aux dobj fails [somehing] nsubj NL parsing 39

NLP Heuristics 40

NLP Parser raw text NLP parser NLP heuristics 41

RQ1: Is DECA eﬀective in classifying writers’ intentions
in development emails? 44

Experiment I training test 102 87 100 45

Experiment I training test 102 87 100 Experiment II False
Negative 46

Experiment II training 100 169 test 100 Experiment III False
Negative 47

Experiment III training 100 231 test 100 48

RQ2: Is the proposed approach more eﬀective than
existing ML in classifying development emails content? 55

ML for Email Classiﬁcation An Approach Based on ML for
Email Content Classiﬁcation à Antoniol et. al., CASCON 2008 à Zhou et al. , ICSME 2014 56

Email Content Classiﬁcation 1)Text Features 57

Email Content Classiﬁcation 1)Text Features 2) Split training and test sets 58

Email Content Classiﬁcation 1)Text Features 2) Split training and test sets 3) Oracle building 59

Email Content Classiﬁcation 1)Text Features 2) Split training and test sets 3) Oracle building 4) Classiﬁcation training prediction à Antoniol et. al., CASCON 2008 à Zhou et al. , ICSME 2014 60

Summary •  RQ2: DECA outperforms traditional ML techniques in
terms of recall, precision and F-Measure when classifying e-mail content. •  RQ1: the automatic classification performed by DECA achieves very good results in terms of both precision, recall and F-measure (over all the experiments). 70

Summary •  RQ2: DECA outperforms traditional ML techniques in
terms of recall, precision and F-Measure when classifying e-mail content. ”…it took the MSR community more than 10 years to figure out that machine learning is not the best method for analyzing human-written text. Thank you for helping move the field forward…” [One of the ASE Reviewers] •  RQ1: the automatic classification performed by DECA achieves very good results in terms of both precision, recall and F-measure (over all the experiments). 71

Code e-‐‑documentation àPanichella et. al. – ICPC 2012 Extract
methods’ descriptions from developers discussions à Vector Space Models à ad hoc heuristics “… several are the discourse paIerns that characterize false negative method descriptions… “ 73

Code re-‐‑documentation “… several are the discourse paIerns that
characterize false negative method descriptions… “ 74

Code re-‐‑documentation delete 80

Conclusion 81

Conclusion 82

Conclusion 83

Conclusion 84

Conclusion 85

Conclusion 86

Future work 1)DECA as preprocessing support to discard irrelevant
sentences in summarization approaches 87

Future work 1)DECA as preprocessing support to discard irrelevant
sentences in summarization approaches 2)DECA in combination with topic models for mining contents with the same intentions and the same topics 88

Development Emails Content Analyzer: Intention ...

Development Emails Content Analyzer: Intention Mining in Developer Discussions

More Decks by Sebastiano Panichella

Other Decks in Research

Featured

Transcript