Classification of UML Diagrams to Support Software Engineering Education

Slide 1

Slide 1 text

/23 1 RAISE'2021   9TH INTERNATIONAL WORKSHOP   ON REALIZING ARTIFICIAL INTELLIGENCE SYNERGIES IN SOFTWARE ENGINEERING Classification of UML Diagrams   to Support Software Engineering Education José Fernando Tavares, Yandre Costa, Thelma E. Colanzi State University of Maringa Brazil

Slide 2

Slide 2 text

/23 • This presentation is related to the paper:   Classification of UML Diagrams to Support Software Engineering Education   https://easychair.org/publications/preprint/cnC1 • The dataset and codes ar free available here:   https://doi.org/10.5281/zenodo.5544379   2 Related paper

Slide 3

Slide 3 text

/23 3 I. INTRODUCTION • Science, Technology, Engineering and Maths (STEM) undergraduate courses have often been cited as difﬁcult to learn for people who have some form of visual impairment or blindness • Diagrams are very common in Software Engineering education. Particularly, the use of UML (Uniﬁed Modeling Language) and diagrams is of vital importance for teaching object-oriented techniques or more advanced concepts.

Slide 4

Slide 4 text

/23 4 • UML diagrams use to be described and stored in the form of digital image (jpg/png/svg) • In addition, we know that the current educational context imposes increasing attention to the aspects of accessibility. So, it would be opportune to describe these diagrams in an alternative language (text, sound or physical devices) which facilitates access to information for blind or low vision people. I. INTRODUCTION

Slide 5

Slide 5 text

/23 5 Fonte: Microsoft design toolkit • Our project goal is to support the creation and manipulation of didactic materials and books, which contain UML diagrams, aiming to make them accessible to visually impaired students. • In this work we performed the first step of the project, which consists of an exploratory study to apply Convolutional Neural Networks (CNN) in the task of UML diagrams classiﬁcation. I. INTRODUCTION

Slide 6

Slide 6 text

/23 6 II. MACHINE LEARNING TECHNIQUES A. Convolutional Neural Networks (CNN) • CNN is a specific type of neural network, and currently it is one of the most famous deep models used by the machine learning research community to address image classification tasks. • In this work we selected three of these models, widely-used in the recent literature to address image classification tasks in a wide range of applications: VGG16, Inception V3, and ResNet50.

Slide 7

Slide 7 text

/23 • With transfer learning, instead of starting the learning process from scratch, we can start from patterns that have been learned when solving a different problem. • In computer vision, transfer learning is expressed through the use of pre-trained models. A pre-trained model is a model usually trained on a large benchmark dataset to solve a problem similar to the one we want to the one we want to solve. 7 B. Transfer Learning

Slide 8

Slide 8 text

/23 8 B. Transfer Learning Prediction Input Strategy 1 Train the entire model Prediction Input Strategy 2 Train some layers and leave the others frozen Prediction Input Strategy 3 Freeze the convolutional base Legend: Frozen Trained

Slide 9

Slide 9 text

/23 • Several works applied transfer learning in image classification tasks using CNN architectures, but they are not devoted to UML diagrams classification. • Transfer learning with VGG16 were employed for class diagram and sequence diagram classification in [12], but it is needed to test other CNN architectures in order to evaluate the CNN accuracy for the context. • Different types of UML diagrams are used in practice, leading to the need of providing support for a more comprehensive set of UML diagrams. 9 III. RELATED WORK • M.J. R. Torresand R. Barwaldt, “Approaches for diagrams accessibility for blind people: a systematic review,” in 2019 IEEE Frontiers in Education Conference (FIE), 2019, pp. 1–7. • T. Ho-Quang, M. Chaudron, I. Samuelsson, J. Hjaltason, B. Karasneh, and M. H. Osman, “Automatic classification of UML class diagrams from images,” 12 2014. [Online]. Available: 10.1109/APSEC.2014.65 • [12] N. Best, J. Ott, and E. Linstead, “Exploring the efficacy of transfer learning in mining image-based software artifacts,” Journal Of Big Data, vol. 7, 08 2020.

Slide 10

Slide 10 text

/23 IV. STUDY DESIGN AND EXECUTION 10 Research Questions: RQ1 - Is there a suitable combination of transfer learning strategy and CNN architecture for UML diagram classification? RQ2 - Can different types of UML diagrams be predicted by a single classifier?

Slide 11

Slide 11 text

/23 11 A. UML Dataset Creation • Our dataset is composed of six categories of UML diagrams, containing images for training, validation, and testing. • UML diagrams included in our dataset: – class diagram, – use case diagram, – sequence diagram, – component diagram, – activity diagram, and – deployment diagram.

Slide 12

Slide 12 text

/23 12 A. UML Dataset Creation • To train and validate the CNN models we used a dataset composed of 200 images for each category, (small dataset). • To test the CNN models we used another dataset composed of 50 images for each category (test dataset). • Data Augmentation. We performed data augmentation on the small dataset, generating four more variations for each image. We created a bigger version of the dataset (called augmented dataset).

Slide 13

Slide 13 text

/23 13 Experiment 3 Experiment 2 Data Acquisition InceptionV3 ResNet50 VGG16 Augmentation Tecniques UML Dataset Creation Experiment 1 Augmented Dataset Cross Validation 10-Fold Transfer Learning Strategy 2 Test Dataset Transfer Learning Strategy 3 Cross Validation 5-Fold InceptionV3 ResNet50 VGG16 Small Dataset Trained model from Experiment 2 using InceptionV3 B. Experimental protocol

Slide 14

Slide 14 text

/23 14 • The implementation was carried out using the Tensorﬂow and Keras framework within the Google Colaboratoty environment. This environment offered a more robust GPU and consequently the possibility to perform more tests in less time. • The datasets were uploaded to Google Drive allowing easy access through the Google Colab notebook API. • All codes used were made publicly available in   https://bityli.com/sc9o5. B. Experimental protocol

Slide 15

Slide 15 text

/23 15 V. RESULTS AND DISCUSSION • VGG16 underperformed when compared to the other two models. • Inception V3 achieved better results in both experiments. • Experiment 1 Using the small dataset and changing only the last layer in the model. • Experiment 2 Using the augmented dataset and freezing a part of the model.

Slide 16

Slide 16 text

/23 • Regarding RQ1 - Is there a suitable combination of transfer learning strategy and CNN architecture for UML diagram classification? • The experimental results showed that Inception V3 using the augmented dataset and the Strategy 2 for transfer learning achieved the best results in terms of accuracy. 16 V. RESULTS AND DISCUSSION

Slide 17

Slide 17 text

/23 • Taking into account the superior results obtained by Inception V3, we performed Experiment 3 using the model trained in Experiment 2 using Inception. • In experiment 3 we applied the model in the test dataset that contain 50 images for each type of UML Diagrams. 17 V. RESULTS AND DISCUSSION

Slide 18

Slide 18 text

/23 18 V. RESULTS AND DISCUSSION

Slide 19

Slide 19 text

/23 • The worst performance for class diagrams. • This kind of misclassification can be explained, in part, because the wrongly classified images contain elements that can resemble those present in the deployment diagrams. • In fact, the class diagram is more complex to classify 19 V. RESULTS AND DISCUSSION

Slide 20

Slide 20 text

/23 • Approximately 20% of the images correspond to complex class diagrams • Most images (80%) represent simple class diagrams • This lack of uniformity tends to drop the hit rates, but the classifier developed in this way is supposed to perform better in more realistic scenarios. 20 V. RESULTS AND DISCUSSION

Slide 21

Slide 21 text

/23 • Regarding RQ2 - Can different types of UML diagrams be predicted by a single classifier? • The accuracy of the generated classifier using the test dataset varied according to the type of UML diagram. • InceptionV3 achieved quite encouraging results for most UML diagrams, however it had low accuracy in class diagram. • The experimental results provided initial evidence that a single classifier can efficiently predict different types of UML diagram 21 V. RESULTS AND DISCUSSION

Slide 22

Slide 22 text

/23 22 VI. LESSONS LEARNED - Data augmentation is beneficial for the problem. - Transfer learning is a good practice for the problem - The UML diagram multi classification is not a so simple problem. - A huge dataset of UML diagram images is need to perform further studies.

Slide 23

Slide 23 text

/23 • The results suggest that the UML classification is a complex problem that needs classifiers carefully created considering specific types of UML diagrams. • The goal is to create a specific CNN model that takes into account the specific characteristics of each type of diagram and that improves the results achieved with transfer learning. • Next step is to work on object recognition within the diagrams, to make possible to extract the object’s information and transform it into accessible information. 23 VII. CONCLUDING REMARKS

Slide 24

Slide 24 text

/23 Contribution of this paper: • Dataset of six different UML Diagram   https://doi.org/10.5281/zenodo.5544379 • The comparison between different neural network in the UML classification 24 VII. CONCLUDING REMARKS

Slide 25

Slide 25 text

/23 25 Thank you for listening José Fernando Tavares [email protected] Yandre Costa [email protected] Thelma E. Colanzi [email protected]