Classification of UML Diagrams to Support Software Engineering Education

/23 1 RAISE'2021   9TH INTERNATIONAL WORKSHOP   ON REALIZING
ARTIFICIAL INTELLIGENCE SYNERGIES IN SOFTWARE ENGINEERING Classification of UML Diagrams   to Support Software Engineering Education José Fernando Tavares, Yandre Costa, Thelma E. Colanzi State University of Maringa Brazil

/23 • This presentation is related to the paper:  
Classification of UML Diagrams to Support Software Engineering Education   https://easychair.org/publications/preprint/cnC1 • The dataset and codes ar free available here:   https://doi.org/10.5281/zenodo.5544379   2 Related paper

/23 3 I. INTRODUCTION • Science, Technology, Engineering and Maths
(STEM) undergraduate courses have often been cited as difﬁcult to learn for people who have some form of visual impairment or blindness • Diagrams are very common in Software Engineering education. Particularly, the use of UML (Uniﬁed Modeling Language) and diagrams is of vital importance for teaching object-oriented techniques or more advanced concepts.

/23 4 • UML diagrams use to be described and
stored in the form of digital image (jpg/png/svg) • In addition, we know that the current educational context imposes increasing attention to the aspects of accessibility. So, it would be opportune to describe these diagrams in an alternative language (text, sound or physical devices) which facilitates access to information for blind or low vision people. I. INTRODUCTION

/23 5 Fonte: Microsoft design toolkit • Our project goal
is to support the creation and manipulation of didactic materials and books, which contain UML diagrams, aiming to make them accessible to visually impaired students. • In this work we performed the first step of the project, which consists of an exploratory study to apply Convolutional Neural Networks (CNN) in the task of UML diagrams classiﬁcation. I. INTRODUCTION

/23 6 II. MACHINE LEARNING TECHNIQUES A. Convolutional Neural Networks
(CNN) • CNN is a specific type of neural network, and currently it is one of the most famous deep models used by the machine learning research community to address image classification tasks. • In this work we selected three of these models, widely-used in the recent literature to address image classification tasks in a wide range of applications: VGG16, Inception V3, and ResNet50.

/23 • With transfer learning, instead of starting the learning
process from scratch, we can start from patterns that have been learned when solving a different problem. • In computer vision, transfer learning is expressed through the use of pre-trained models. A pre-trained model is a model usually trained on a large benchmark dataset to solve a problem similar to the one we want to the one we want to solve. 7 B. Transfer Learning

/23 8 B. Transfer Learning Prediction Input Strategy 1 Train
the entire model Prediction Input Strategy 2 Train some layers and leave the others frozen Prediction Input Strategy 3 Freeze the convolutional base Legend: Frozen Trained

/23 • Several works applied transfer learning in image classification
tasks using CNN architectures, but they are not devoted to UML diagrams classification. • Transfer learning with VGG16 were employed for class diagram and sequence diagram classification in [12], but it is needed to test other CNN architectures in order to evaluate the CNN accuracy for the context. • Different types of UML diagrams are used in practice, leading to the need of providing support for a more comprehensive set of UML diagrams. 9 III. RELATED WORK • M.J. R. Torresand R. Barwaldt, “Approaches for diagrams accessibility for blind people: a systematic review,” in 2019 IEEE Frontiers in Education Conference (FIE), 2019, pp. 1–7. • T. Ho-Quang, M. Chaudron, I. Samuelsson, J. Hjaltason, B. Karasneh, and M. H. Osman, “Automatic classification of UML class diagrams from images,” 12 2014. [Online]. Available: 10.1109/APSEC.2014.65 • [12] N. Best, J. Ott, and E. Linstead, “Exploring the efficacy of transfer learning in mining image-based software artifacts,” Journal Of Big Data, vol. 7, 08 2020.

/23 IV. STUDY DESIGN AND EXECUTION 10 Research Questions: RQ1
- Is there a suitable combination of transfer learning strategy and CNN architecture for UML diagram classification? RQ2 - Can different types of UML diagrams be predicted by a single classifier?

/23 11 A. UML Dataset Creation • Our dataset is
composed of six categories of UML diagrams, containing images for training, validation, and testing. • UML diagrams included in our dataset: – class diagram, – use case diagram, – sequence diagram, – component diagram, – activity diagram, and – deployment diagram.

/23 12 A. UML Dataset Creation • To train and
validate the CNN models we used a dataset composed of 200 images for each category, (small dataset). • To test the CNN models we used another dataset composed of 50 images for each category (test dataset). • Data Augmentation. We performed data augmentation on the small dataset, generating four more variations for each image. We created a bigger version of the dataset (called augmented dataset).

/23 13 Experiment 3 Experiment 2 Data Acquisition InceptionV3 ResNet50
VGG16 Augmentation Tecniques UML Dataset Creation Experiment 1 Augmented Dataset Cross Validation 10-Fold Transfer Learning Strategy 2 Test Dataset Transfer Learning Strategy 3 Cross Validation 5-Fold InceptionV3 ResNet50 VGG16 Small Dataset Trained model from Experiment 2 using InceptionV3 B. Experimental protocol

/23 14 • The implementation was carried out using the
Tensorﬂow and Keras framework within the Google Colaboratoty environment. This environment offered a more robust GPU and consequently the possibility to perform more tests in less time. • The datasets were uploaded to Google Drive allowing easy access through the Google Colab notebook API. • All codes used were made publicly available in   https://bityli.com/sc9o5. B. Experimental protocol

/23 15 V. RESULTS AND DISCUSSION • VGG16 underperformed when
compared to the other two models. • Inception V3 achieved better results in both experiments. • Experiment 1 Using the small dataset and changing only the last layer in the model. • Experiment 2 Using the augmented dataset and freezing a part of the model.

/23 • Regarding RQ1 - Is there a suitable combination
of transfer learning strategy and CNN architecture for UML diagram classification? • The experimental results showed that Inception V3 using the augmented dataset and the Strategy 2 for transfer learning achieved the best results in terms of accuracy. 16 V. RESULTS AND DISCUSSION

/23 • Taking into account the superior results obtained by
Inception V3, we performed Experiment 3 using the model trained in Experiment 2 using Inception. • In experiment 3 we applied the model in the test dataset that contain 50 images for each type of UML Diagrams. 17 V. RESULTS AND DISCUSSION

/23 18 V. RESULTS AND DISCUSSION

/23 • The worst performance for class diagrams. • This
kind of misclassification can be explained, in part, because the wrongly classified images contain elements that can resemble those present in the deployment diagrams. • In fact, the class diagram is more complex to classify 19 V. RESULTS AND DISCUSSION

/23 • Approximately 20% of the images correspond to complex
class diagrams • Most images (80%) represent simple class diagrams • This lack of uniformity tends to drop the hit rates, but the classifier developed in this way is supposed to perform better in more realistic scenarios. 20 V. RESULTS AND DISCUSSION

/23 • Regarding RQ2 - Can different types of UML
diagrams be predicted by a single classifier? • The accuracy of the generated classifier using the test dataset varied according to the type of UML diagram. • InceptionV3 achieved quite encouraging results for most UML diagrams, however it had low accuracy in class diagram. • The experimental results provided initial evidence that a single classifier can efficiently predict different types of UML diagram 21 V. RESULTS AND DISCUSSION

/23 22 VI. LESSONS LEARNED - Data augmentation is beneficial
for the problem. - Transfer learning is a good practice for the problem - The UML diagram multi classification is not a so simple problem. - A huge dataset of UML diagram images is need to perform further studies.

/23 • The results suggest that the UML classification is
a complex problem that needs classifiers carefully created considering specific types of UML diagrams. • The goal is to create a specific CNN model that takes into account the specific characteristics of each type of diagram and that improves the results achieved with transfer learning. • Next step is to work on object recognition within the diagrams, to make possible to extract the object’s information and transform it into accessible information. 23 VII. CONCLUDING REMARKS

/23 Contribution of this paper: • Dataset of six different
UML Diagram   https://doi.org/10.5281/zenodo.5544379 • The comparison between different neural network in the UML classification 24 VII. CONCLUDING REMARKS

/23 25 Thank you for listening José Fernando Tavares [email protected]
Yandre Costa [email protected] Thelma E. Colanzi [email protected]

Classification of UML Diagrams to Support Softw...

Classification of UML Diagrams to Support Software Engineering Education

Jose Fernando

More Decks by Jose Fernando

Other Decks in Technology

Featured

Transcript

/23 1 RAISE'2021   9TH INTERNATIONAL WORKSHOP   ON REALIZING

/23 • This presentation is related to the paper:

/23 3 I. INTRODUCTION • Science, Technology, Engineering and Maths

/23 4 • UML diagrams use to be described and

/23 5 Fonte: Microsoft design toolkit • Our project goal

/23 6 II. MACHINE LEARNING TECHNIQUES A. Convolutional Neural Networks

/23 • With transfer learning, instead of starting the learning

/23 8 B. Transfer Learning Prediction Input Strategy 1 Train

/23 • Several works applied transfer learning in image classification

/23 IV. STUDY DESIGN AND EXECUTION 10 Research Questions: RQ1

/23 11 A. UML Dataset Creation • Our dataset is

/23 12 A. UML Dataset Creation • To train and

/23 13 Experiment 3 Experiment 2 Data Acquisition InceptionV3 ResNet50

/23 14 • The implementation was carried out using the

/23 15 V. RESULTS AND DISCUSSION • VGG16 underperformed when

/23 • Regarding RQ1 - Is there a suitable combination

/23 • Taking into account the superior results obtained by

/23 18 V. RESULTS AND DISCUSSION

/23 • The worst performance for class diagrams. • This

/23 • Approximately 20% of the images correspond to complex

/23 • Regarding RQ2 - Can different types of UML

/23 22 VI. LESSONS LEARNED - Data augmentation is beneficial

/23 • The results suggest that the UML classification is

/23 Contribution of this paper: • Dataset of six different

/23 25 Thank you for listening José Fernando Tavares [email protected]