Slide 1

Slide 1 text

Modeling Changeset Topics C.S. Corley, K.L. Kashuda, D.S. May, N.A. Kraft @excsc [email protected] cscorley/mud2014-modeling-changeset-topics

Slide 2

Slide 2 text

2 ???? Topic Modeling

Slide 3

Slide 3 text

3

Slide 4

Slide 4 text

3 latent Dirichlet allocation (LDA)

Slide 5

Slide 5 text

4 1

Slide 6

Slide 6 text

4 1

Slide 7

Slide 7 text

5 2

Slide 8

Slide 8 text

5 2

Slide 9

Slide 9 text

5 2

Slide 10

Slide 10 text

5 2

Slide 11

Slide 11 text

6 3*

Slide 12

Slide 12 text

6 3*

Slide 13

Slide 13 text

6 3*

Slide 14

Slide 14 text

7

Slide 15

Slide 15 text

7 Feature location

Slide 16

Slide 16 text

7 Feature location Bug localization

Slide 17

Slide 17 text

7 Feature location Bug localization Ɲ Traceability links

Slide 18

Slide 18 text

7 Feature location Bug localization Ɲ Traceability links ? ? ? Developer identification

Slide 19

Slide 19 text

8 Release A ƭ ƭ ƭ ƭ

Slide 20

Slide 20 text

8 Release A ƭ ƭ ƭ ƭ A

Slide 21

Slide 21 text

9 ƭ ƭ ƭ ƭ Release A

Slide 22

Slide 22 text

9 ƭ ƭ ƭ ƭ Release A … ƭ ƭ ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B

Slide 23

Slide 23 text

10 ƭ ƭ ƭ ƭ Release A … ƭ ƭ ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B

Slide 24

Slide 24 text

10 ƭ ƭ ƭ ƭ Release A … ƭ ƭ ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B ƭ ƭ ƭ ƭ

Slide 25

Slide 25 text

11

Slide 26

Slide 26 text

11 ƃ ƃ ƃ ƃ ƃ Rome wasn’t built in a day

Slide 27

Slide 27 text

11 ƃ ƃ ƃ ƃ ƃ Rome wasn’t built in a day ƭ ƭ ƭ ƭ ƭ ƭ ƭ … neither is software.

Slide 28

Slide 28 text

12 ƭ ƭ ƭ ƭ Release A … ƭ ƭ ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B ƭ ƭ ƭ ƭ ƭ ƭ ƭ ƭ ƭ ƭ

Slide 29

Slide 29 text

12 ƭ ƭ ƭ ƭ Release A … ƭ ƭ ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B ƭ ƭ ƭ ƭ ƭ ƭ ƭ ƭ ƭ ƭ (not a good idea)

Slide 30

Slide 30 text

13 ƭ ƭ ƭ ƭ Release A … ƭ ƭ ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B

Slide 31

Slide 31 text

13 ƭ ƭ ƭ ƭ Release A … ƭ ƭ ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B ƭ ⭈ ƭ ƭ ⭈ ƭ ƭ ƭ ⭈

Slide 32

Slide 32 text

13 ƭ ƭ ƭ ƭ Release A … ƭ ƭ ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B ƭ ƭ ƭ ƭ ƭ ⭈ ƭ ƭ ⭈ ƭ ƭ ƭ ⭈

Slide 33

Slide 33 text

13 ƭ ƭ ƭ ƭ Release A … ƭ ƭ ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B ƭ ƭ ƭ ƭ (a much better idea) ƭ ⭈ ƭ ƭ ⭈ ƭ ƭ ƭ ⭈

Slide 34

Slide 34 text

14 LDA is online

Slide 35

Slide 35 text

14 LDA is online => streamed

Slide 36

Slide 36 text

14 LDA can process an unknown number of documents LDA is online => streamed

Slide 37

Slide 37 text

14 LDA can process an unknown number of documents LDA is online => streamed => ∞

Slide 38

Slide 38 text

15 Source code repositories!

Slide 39

Slide 39 text

16 ƭ ƭ ƭ ƭ Release A … ƭ ƭ ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B Source code repositories!

Slide 40

Slide 40 text

17 ƭ ƭ ƭ ƭ Release A … ƭ ƭ ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B But, how?

Slide 41

Slide 41 text

17 ƭ ƭ ƭ ƭ Release A … ƭ ƭ ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B But, how?

Slide 42

Slide 42 text

18 ƭ ƭ ƭ ƭ Release A … ƭ ƭ ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B

Slide 43

Slide 43 text

18 ƭ ƭ ƭ ƭ Release A … ƭ ƭ ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B diff A..A+1 diff A+1..A+2 diff A+2..A+3

Slide 44

Slide 44 text

18 ƭ ƭ ƭ ƭ Release A … ƭ ƭ ƭ ƭ A+1 ƭ ƭ ƭ ƭ A+2 ƭ ƭ ƭ ƭ A+3 ƭ ƭ ƭ ƭ Release B

Slide 45

Slide 45 text

19 Release Changeset ƭ ƭ ƭ ƭ ƭ ƭ ƭ ƭ

Slide 46

Slide 46 text

19 Release Changeset ƭ ƭ ƭ ƭ ƭ ƭ ƭ ƭ How does the corpus change?

Slide 47

Slide 47 text

19 Release Changeset ƭ ƭ ƭ ƭ ƭ ƭ ƭ ƭ How does the corpus change? How does the model change?

Slide 48

Slide 48 text

20 AspectJ Joda-Time

Slide 49

Slide 49 text

20 AspectJ Joda-Time RQ1: cosine similarity

Slide 50

Slide 50 text

20 99.7 % AspectJ Joda-Time RQ1: cosine similarity

Slide 51

Slide 51 text

20 99.7 % 93.1 % AspectJ Joda-Time RQ1: cosine similarity

Slide 52

Slide 52 text

20 99.7 % 93.1 % 93.5 % AspectJ Joda-Time RQ1: cosine similarity

Slide 53

Slide 53 text

20 99.7 % 93.1 % 93.5 % 67.1 % AspectJ Joda-Time RQ1: cosine similarity

Slide 54

Slide 54 text

21 AspectJ Joda-Time !

Slide 55

Slide 55 text

21 AspectJ Joda-Time ! RQ2: distinctness score

Slide 56

Slide 56 text

21 AspectJ Joda-Time 2.31 ! 3.17 ! RQ2: distinctness score

Slide 57

Slide 57 text

21 AspectJ Joda-Time 2.31 ! 3.17 3.75 ! 2.78 ! RQ2: distinctness score

Slide 58

Slide 58 text

21 AspectJ Joda-Time 2.31 ! 3.17 3.75 ! 2.78 1.34 ! 1.03 ! RQ2: distinctness score

Slide 59

Slide 59 text

21 AspectJ Joda-Time 2.31 ! 3.17 3.75 ! 2.78 1.34 ! 1.03 2.59 ! 3.56 ! RQ2: distinctness score

Slide 60

Slide 60 text

22 Release Changeset ƭ ƭ ƭ ƭ ƭ ƭ ƭ ƭ How does the corpus change? How does the model change?

Slide 61

Slide 61 text

23 Modeling Changeset Topics C.S. Corley, K.L. Kashuda, D.S. May, N.A. Kraft @excsc [email protected] cscorley/mud2014-modeling-changeset-topics

Slide 62

Slide 62 text

24 The Way that can be told of is not the eternal Way; The name that can be named is not the eternal name. The Nameless is the origin of Heaven and Earth; The Named is the mother of all things. Therefore let there always be non-being, so we may see their subtlety, And let there always be being, so we may see their outcome. The two are the same, But after they are produced, they have different names.

Slide 63

Slide 63 text

24 The Nameless is the origin of Heaven and Earth; The Named is the mother of all things. Therefore let there always be non-being, so we may see their subtlety, And let there always be being, so we may see their outcome. The two are the same, But after they are produced, they have different names. They both may be called deep and profound. Deeper and more profound, The door of all subtleties!

Slide 64

Slide 64 text

24 The Nameless is the origin of Heaven and Earth; The named is the mother of all things. ! Therefore let there always be non-being, so we may see their subtlety, And let there always be being, so we may see their outcome. The two are the same, But after they are produced, they have different names. They both may be called deep and profound. Deeper and more profound, The door of all subtleties!

Slide 65

Slide 65 text

25 ! ! ! ! ! The Way that can be told of is not the eternal Way; The name that can be named is not the eternal name. ! The Named is the mother of all things. The named is the mother of all things. ! ! ! ! ! ! ! ! They both may be called deep and profound. Deeper and more profound, The door of all subtleties! diff --git a/lao b/tzu index 635ef2c..5af88a8 100644 --- a/lao +++ b/tzu @@ -1,7 +1,6 @@ -The Way that can be told of is not the eternal Way; -The name that can be named is not the eternal name. The Nameless is the origin of Heaven and Earth; -The Named is the mother of all things. +The named is the mother of all things. + Therefore let there always be non-being, so we may see their subtlety, And let there always be being, @@ -9,3 +8,6 @@ And let there always be being, The two are the same, But after they are produced, they have different names. +They both may be called deep and profound. +Deeper and more profound, +The door of all subtleties!

Slide 66

Slide 66 text

25 ! ! ! ! ! The Way that can be told of is not the eternal Way; The name that can be named is not the eternal name. The Nameless is the origin of Heaven and Earth; The Named is the mother of all things. The named is the mother of all things. Therefore let there always be non-being, so we may see their subtlety, And let there always be being, ! The two are the same, But after they are produced, they have different names. They both may be called deep and profound. Deeper and more profound, The door of all subtleties! ! ! ! ! ! The Way that can be told of is not the eternal Way; The name that can be named is not the eternal name. ! The Named is the mother of all things. The named is the mother of all things. ! ! ! ! ! ! ! ! They both may be called deep and profound. Deeper and more profound, The door of all subtleties!