Modeling Changeset Topics

Modeling Changeset Topics

Christopher S. Corley, Kelly L. Kashuda
The University of Alabama

Daniel S. May
Swarthmore College

Nicholas A. Kraft
ABB Corporate Research

Topic modeling has been applied to several areas of software engineering, such as bug localization, feature location, triaging change requests, and traceability link recovery. Many of these approaches combine mining unstructured data, such as bug reports, with topic modeling a snapshot (or release) of source code. However, source code evolves, which causes models to become obsolete. In this paper, we explore the approach of topic modeling changesets over the traditional release approach. We conduct an exploratory study of four open source systems. We investigate the differences in corpora in each project, and evaluate the topic distinctness of the models.

*Note*: these slides were animation-heavy, YouTube recording available here: https://www.youtube.com/watch?v=S12B_CTeUtA

02498ca4cb73f57dc33c2642cd70fef2?s=128

Christopher Corley

September 30, 2014
Tweet

Transcript

  1. 1.

    Modeling Changeset Topics C.S. Corley, K.L. Kashuda, D.S. May, N.A.

    Kraft @excsc cscorley@ua.edu cscorley/mud2014-modeling-changeset-topics
  2. 3.

    3

  3. 5.

    4 1

  4. 6.

    4 1

  5. 7.

    5 2

  6. 8.

    5 2

  7. 9.

    5 2

  8. 10.

    5 2

  9. 11.
  10. 12.
  11. 13.
  12. 14.

    7

  13. 22.

    9 ƭ ƭ ƭ ƭ Release A … ƭ ƭ

    ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B
  14. 23.

    10 ƭ ƭ ƭ ƭ Release A … ƭ ƭ

    ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B
  15. 24.

    10 ƭ ƭ ƭ ƭ Release A … ƭ ƭ

    ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B ƭ ƭ ƭ ƭ
  16. 25.

    11

  17. 27.

    11 ƃ ƃ ƃ ƃ ƃ Rome wasn’t built in

    a day ƭ ƭ ƭ ƭ ƭ ƭ ƭ … neither is software.
  18. 28.

    12 ƭ ƭ ƭ ƭ Release A … ƭ ƭ

    ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B ƭ ƭ ƭ ƭ ƭ ƭ ƭ ƭ ƭ ƭ
  19. 29.

    12 ƭ ƭ ƭ ƭ Release A … ƭ ƭ

    ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B ƭ ƭ ƭ ƭ ƭ ƭ ƭ ƭ ƭ ƭ (not a good idea)
  20. 30.

    13 ƭ ƭ ƭ ƭ Release A … ƭ ƭ

    ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B
  21. 31.

    13 ƭ ƭ ƭ ƭ Release A … ƭ ƭ

    ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B ƭ ⭈ ƭ ƭ ⭈ ƭ ƭ ƭ ⭈
  22. 32.

    13 ƭ ƭ ƭ ƭ Release A … ƭ ƭ

    ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B ƭ ƭ ƭ ƭ ƭ ⭈ ƭ ƭ ⭈ ƭ ƭ ƭ ⭈
  23. 33.

    13 ƭ ƭ ƭ ƭ Release A … ƭ ƭ

    ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B ƭ ƭ ƭ ƭ (a much better idea) ƭ ⭈ ƭ ƭ ⭈ ƭ ƭ ƭ ⭈
  24. 39.

    16 ƭ ƭ ƭ ƭ Release A … ƭ ƭ

    ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B Source code repositories!
  25. 40.

    17 ƭ ƭ ƭ ƭ Release A … ƭ ƭ

    ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B But, how?
  26. 41.

    17 ƭ ƭ ƭ ƭ Release A … ƭ ƭ

    ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B But, how?
  27. 42.

    18 ƭ ƭ ƭ ƭ Release A … ƭ ƭ

    ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B
  28. 43.

    18 ƭ ƭ ƭ ƭ Release A … ƭ ƭ

    ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B diff A..A+1 diff A+1..A+2 diff A+2..A+3
  29. 44.

    18 ƭ ƭ ƭ ƭ Release A … ƭ ƭ

    ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B
  30. 46.

    19 Release Changeset ƭ ƭ ƭ ƭ ƭ ƭ ƭ

    ƭ How does the corpus change?
  31. 47.

    19 Release Changeset ƭ ƭ ƭ ƭ ƭ ƭ ƭ

    ƭ How does the corpus change? How does the model change?
  32. 53.

    20 99.7 % 93.1 % 93.5 % 67.1 % AspectJ

    Joda-Time RQ1: cosine similarity
  33. 58.

    21 AspectJ Joda-Time 2.31 ! 3.17 3.75 ! 2.78 1.34

    ! 1.03 ! RQ2: distinctness score
  34. 59.

    21 AspectJ Joda-Time 2.31 ! 3.17 3.75 ! 2.78 1.34

    ! 1.03 2.59 ! 3.56 ! RQ2: distinctness score
  35. 60.

    22 Release Changeset ƭ ƭ ƭ ƭ ƭ ƭ ƭ

    ƭ How does the corpus change? How does the model change?
  36. 61.

    23 Modeling Changeset Topics C.S. Corley, K.L. Kashuda, D.S. May,

    N.A. Kraft @excsc cscorley@ua.edu cscorley/mud2014-modeling-changeset-topics
  37. 62.

    24 The Way that can be told of is not

    the eternal Way; The name that can be named is not the eternal name. The Nameless is the origin of Heaven and Earth; The Named is the mother of all things. Therefore let there always be non-being, so we may see their subtlety, And let there always be being, so we may see their outcome. The two are the same, But after they are produced, they have different names.
  38. 63.

    24 The Nameless is the origin of Heaven and Earth;

    The Named is the mother of all things. Therefore let there always be non-being, so we may see their subtlety, And let there always be being, so we may see their outcome. The two are the same, But after they are produced, they have different names. They both may be called deep and profound. Deeper and more profound, The door of all subtleties!
  39. 64.

    24 The Nameless is the origin of Heaven and Earth;

    The named is the mother of all things. ! Therefore let there always be non-being, so we may see their subtlety, And let there always be being, so we may see their outcome. The two are the same, But after they are produced, they have different names. They both may be called deep and profound. Deeper and more profound, The door of all subtleties!
  40. 65.

    25 ! ! ! ! ! The Way that can

    be told of is not the eternal Way; The name that can be named is not the eternal name. ! The Named is the mother of all things. The named is the mother of all things. ! ! ! ! ! ! ! ! They both may be called deep and profound. Deeper and more profound, The door of all subtleties! diff --git a/lao b/tzu index 635ef2c..5af88a8 100644 --- a/lao +++ b/tzu @@ -1,7 +1,6 @@ -The Way that can be told of is not the eternal Way; -The name that can be named is not the eternal name. The Nameless is the origin of Heaven and Earth; -The Named is the mother of all things. +The named is the mother of all things. + Therefore let there always be non-being, so we may see their subtlety, And let there always be being, @@ -9,3 +8,6 @@ And let there always be being, The two are the same, But after they are produced, they have different names. +They both may be called deep and profound. +Deeper and more profound, +The door of all subtleties!
  41. 66.

    25 ! ! ! ! ! The Way that can

    be told of is not the eternal Way; The name that can be named is not the eternal name. The Nameless is the origin of Heaven and Earth; The Named is the mother of all things. The named is the mother of all things. Therefore let there always be non-being, so we may see their subtlety, And let there always be being, ! The two are the same, But after they are produced, they have different names. They both may be called deep and profound. Deeper and more profound, The door of all subtleties! ! ! ! ! ! The Way that can be told of is not the eternal Way; The name that can be named is not the eternal name. ! The Named is the mother of all things. The named is the mother of all things. ! ! ! ! ! ! ! ! They both may be called deep and profound. Deeper and more profound, The door of all subtleties!