Slide 1

Slide 1 text

APSEC2019 Day 3 – December 5th, 2019 Class Name Recommendation based on Graph Embedding of Program Elements Shintaro Kurimoto, Yasuhiro Hayase, Hiroshi Yonai, Hiroyoshi Ito and Hiroyuki Kitagawa

Slide 2

Slide 2 text

lIntroduction lProposed approach lExperiments lConclusion Contents

Slide 3

Slide 3 text

Introduction

Slide 4

Slide 4 text

Identifier Names Names given to uniquely identify each program element Program Elements Program components representing properties or behaviors, such as classes, methods, and fields Introduction | Program Elements & Identifier Names

Slide 5

Slide 5 text

Introduction | Appropriate Identifier Naming The quality of identifier names greatly affects program comprehension [Takang et al., 1996] 6TFS"DDPVOU*OGP $MBTT"#$ Important Appropriate names enable developers to spend less time on program comprehension [Lawrie et al., 2006]

Slide 6

Slide 6 text

Introduction | Appropriate Identifier Naming Difficulties Naming conventions lack empirical knowledge necessary for good naming [Tichy, 1997] Appropriate identifier naming requires domain knowledge of software [Deißenbock et al., 2005] follow follow ≠

Slide 7

Slide 7 text

Introduction | Situation of Class Name Recommendation DMBTT'PP\ GJFME" NFUIPE NFUIPE ^ What is a good name for this class…? Recommended 1. Libsystemd 2. SystemdPlugin ・ ・ ・

Slide 8

Slide 8 text

Introduction | Studies on Class Name Recommendation Recommended class names by association rule mining [Fukuda et al., 2015] The content of a class is beneficial to recommend class names Low accuracy Embedding is beneficial to recommend identifier names Recommendation is not available before a program element is used somewhere Rule Recommended identifier names by embedding with Skip-gram [Allamanis et al., 2015] ?

Slide 9

Slide 9 text

Introduction | Goal Propose an approach that can recommend class names 1. before a class is used 2. with high accuracy

Slide 10

Slide 10 text

Introduction | Summary 1. Appropriate Identifier Naming is important but difficult 2. Previous approaches are not enough in available situations or accuracy

Slide 11

Slide 11 text

Proposed approach

Slide 12

Slide 12 text

Recommend a method name by embedding a method-call graph Recommendation is available before a method is used Embed a graph that represents the relationships between classes, methods, and fields Introduced heterogeneous graph embedding Higher accuracy than those of homogeneous Basically extends this work DPOOFDU ? XSJUF OFXMJOF [Yonai et al., 2019] Extension inspired by this work [Dong et al., 2017] Proposed | Key Idea

Slide 13

Slide 13 text

Code Corpus Recommended D ɾ ɾ D (1) (2) (3) (4) \ ^ N N G Model (1) Extract relationships (2) Learn embedding (3) Obtain embedding of a target class (4) Recommend class names Train Recommend Proposed | Overview

Slide 14

Slide 14 text

Proposed | (1) Extract Relationships between Program Elements ɿ$MBTT ɿ.FUIPE ɿ'JFME type return type access call extend possess possess

Slide 15

Slide 15 text

Move the embedding of a target class nearer to the weighted sum of those of program elements that have relationship with the target Proposed | (2) Learning Embedding : Field : Target class : Class : Method : Weighted sum of program elements related to the target class

Slide 16

Slide 16 text

Move the embedding of a target class nearer to the weighted sum of those of program elements that have relationship with the target Proposed | (2) Learning Embedding : Field : Target class : Class : Method : Weighted sum of program elements related to the target class

Slide 17

Slide 17 text

Move the embedding of a target class nearer to the weighted sum of those of program elements that have relationship with the target Proposed | (2) Learning Embedding : Field : Target class : Class : Method : Weighted sum of program elements related to the target class

Slide 18

Slide 18 text

Move the embedding of a target class nearer to the weighted sum of those of program elements that have relationship with the target Proposed | (2) Learning Embedding : Field : Target class : Class : Method : Weighted sum of program elements related to the target class

Slide 19

Slide 19 text

Move the embedding of a target class nearer to the weighted sum of those of program elements that have relationship with the target Proposed | (2) Learning Embedding : Field : Target class : Class : Method : Weighted sum of program elements related to the target class

Slide 20

Slide 20 text

Move the embedding of a target class nearer to the weighted sum of those of program elements that have relationship with the target Proposed | (2) Learning Embedding : Field : Target class : Class : Method : Weighted sum of program elements related to the target class

Slide 21

Slide 21 text

Move the embedding of a target class nearer to the weighted sum of those of program elements that have relationship with the target Proposed | (2) Learning Embedding : Field : Target class : Class : Method : Weighted sum of program elements related to the target class

Slide 22

Slide 22 text

Move the embedding of a target class nearer to the weighted sum of those of program elements that have relationship with the target Proposed | (2) Learning Embedding : Field : Target class : Class : Method : Weighted sum of program elements related to the target class

Slide 23

Slide 23 text

[Yonai et al., 2019] Consider 1 relationship, only method-method relation by homogeneous embedding Proposed | Novelty Proposed approach Consider 7 relationships, between class, method, field relations by heterogeneous embedding

Slide 24

Slide 24 text

\ ^ N N G 1. Given a code including a target class 2. Obtain the embedding of the target class from owned methods and fields by procedure (2) $PEFJODMVEJOH BUBSHFUDMBTT : Class : Method : Field : Target class Proposed | (3) Obtaining the Embedding of a Target Class

Slide 25

Slide 25 text

1. Calculate cos similarity between the target class and all candidate classes 2. Recommend based on the similarity Recommended D ɾ ɾ D Proposed | (4) Class Name Recommendation

Slide 26

Slide 26 text

1. Our approach considers more relationships of program elements 2. The core to realize it is applying heterogeneous graph embedding Proposed | Summary

Slide 27

Slide 27 text

Experiments

Slide 28

Slide 28 text

1. Recommendation before a class is used 2. Recommendation after a class is used 3. Where does the proposed work well? Experiments

Slide 29

Slide 29 text

<&EHFT> <$BOEJEBUF $MBTT> <5BSHFU $MBTT> 20 large Java projects, such as ElasticSearch, Clojure [Allamanis et al., 2015] The quality of identifier names is assured because many developers maintain for years Experiments | Dataset

Slide 30

Slide 30 text

Goal evaluate whether the proposed recommends… l before a class is used l with high accuracy Task Class name recommendation (before a class is used) Criterion The ratio of the recommended names that match partially the actual name within the top-10 Experiments | 1. Recommendation before a class is used

Slide 31

Slide 31 text

1.2x higher than [Fukuda et al., 2015] -> before a class is used and with high accuracy The ratio of successful recommendation L L L Lʙ Lʙ Experiments | 1. Recommendation before a class is used

Slide 32

Slide 32 text

Goal evaluate how well the proposed recommends… l after a class is used Task Class name recommendation (after a class is used) Criterion The ratio of the recommended names that match partially the actual name within the top-10 Experiments | 2. Recommendation after a class is used

Slide 33

Slide 33 text

The ratio of successful recommendation L L L Lʙ Lʙ Experiments | 2. Recommendation after a class is used 1.7x higher than [Fukuda et al., 2015] -> with higher accuracy after a class is used

Slide 34

Slide 34 text

Experiments | Recommendation example JdbcConnector JdbcRecordSinkProvider TestJdbcRecordSet JdbcMetadata TestJdbcClient TestingDatabase TestJdbcMetadata TestJdbcRecordSetProvider ConnectorColumnHandle ViewResolutionTests ??? extends ConnectorRecord SetProvider getRecordSet()

Slide 35

Slide 35 text

Experiments | Recommendation example JdbcConnector JdbcRecordSinkProvider TestJdbcRecordSet JdbcMetadata TestJdbcClient TestingDatabase TestJdbcMetadata TestJdbcRecordSetProvider ConnectorColumnHandle ViewResolutionTests JdbcRecord SetProvider extends ConnectorRecord SetProvider getRecordSet()

Slide 36

Slide 36 text

Experiments | Recommendation example JdbcConnector JdbcRecordSinkProvider TestJdbcRecordSet JdbcMetadata TestJdbcClient TestingDatabase TestJdbcMetadata TestJdbcRecordSetProvider ConnectorColumnHandle ViewResolutionTests extends ConnectorRecord SetProvider getRecordSet() JdbcRecord SetProvider

Slide 37

Slide 37 text

Goal evaluate l where the proposed work well Task Class name recommendation (before & after a class is used) Criterion The ratio of the recommended names that match partially the actual name within the top-10 Experiments | 3. Where does the proposed work well?

Slide 38

Slide 38 text

Experiments | 3. Where does the proposed work well? The proposed works well where a target class has relations with all types of program elements

Slide 39

Slide 39 text

Experiments | Summary 1. Our approach can recommend before a class is used and with high accuracy 2. Our approach would work better (1) after a class is used (2) where a class has relations with all types of program elements

Slide 40

Slide 40 text

Conclusion

Slide 41

Slide 41 text

Goal Propose an approach that recommends class names (1) before a class is used (2) with high accuracy Accomplishment Proposed a class name recommendation approach based on graph embedding of program elements Future work l Step by step recommendation l Combining another approach such as rule mining Conclusion The code and notebook is available: https://github.com/kuri8ive/apsec2019class