Remodularization Analysis Using Semantic Clustering (CSMR-WCRE 2014)

Remodularization Analysis Using Semantic Clustering (CSMR-WCRE 2014)

In this paper, we report an experience on using and adapting Semantic Clustering to evaluate software remodularizations. Semantic Clustering is an approach that relies on information retrieval and clustering techniques to extract sets of similar classes in a system, according to their vocabularies. We adapted Semantic Clustering to support remodularization analysis. We evaluate our adaptation using six real-world remodularizations of four software systems. We report that Semantic Clustering and conceptual metrics can be used to express and explain the intention of the architects when performing common modularization operators, such as module decomposition.

13beaa3b7239eca3319d54c6a9f3a85a?s=128

ASERG, DCC, UFMG

February 04, 2014
Tweet

Transcript

  1. Remodularization Analysis using Semantic Clustering APPLIED SOFTWARE ENGINEERING RESEARCH GROUP

    / Gustavo Santos Marco Tulio Valente Federal University of Minas Gerais / Nicolas Anquetil RMoD Team – INRIA Lille
  2. Semantic Clustering Groups source artifacts that use similar vocabulary in

    semantic clusters, which are likely to reveal the intention of the code Similarity considers text in common Remodularization Analysis using Semantic Clustering February 4th, 2014 2
  3. Semantic Clustering Remodularization Analysis using Semantic Clustering 3 Extracted from

    Kuhn et al., 2007 February 4th, 2014
  4. Remodularization Analysis Remodularization Analysis using Semantic Clustering February 4th, 2014

  5. Improvements to Semantic Clustering #1: Text filtering in comments Remove

    JavaDoc metadata, HTML code, etc. All public methods from Object class #2: Clustering Stop Criterion Remodularization Analysis using Semantic Clustering 5 Our Approach: Similarity Threshold February 4th, 2014
  6. Improvement #3: Remodularization Analysis Challenge: Comparison between versions Our Approach:

    Execute Semantic Clustering on the earlier version Map new classes to previously calculated cluster Remodularization Analysis using Semantic Clustering 6 Same number of clusters February 4th, 2014
  7. Evaluation Remodularization Analysis using Semantic Clustering February 4th, 2014

  8. Metrics Assessment Goal: Do conceptual aspects express architectural improvements? Methodology

    Isolate the remodularization Execute Semantic Clustering Remodularization Analysis using Semantic Clustering 8 Execute Semantic Clustering Cluster comparison algorithm Calculate the conceptual metrics February 4th, 2014
  9. Dataset Remodularization Analysis using Semantic Clustering 9 February 4th, 2014

  10. Conceptual Metrics Conceptual Cohesion [Marcus and Poshyvanyk, 2005] Average cosine

    similarity of all pairs of classes Spread [Ducasse et al., 2006] Number of packages a cluster touches Remodularization Analysis using Semantic Clustering 10 Focus [Ducasse et al., 2006] Concentration of a cluster February 4th, 2014
  11. RQ1 – Impact on Clusters What is the impact of

    remodularizations in the clusters generated by Semantic Clustering? Global remodularizations Wilcoxon test Remodularization Analysis using Semantic Clustering 11 Wilcoxon test February 4th, 2014
  12. Spread and Focus Results Eclipse-2.0.1 – 2.1 Remodularization Analysis using

    Semantic Clustering 12 JHotDraw-7.3.1 – 7.4.1 Focus Spread February 4th, 2014
  13. Wilcoxon Results Remodularization Analysis using Semantic Clustering 13 Summary: Remodularizations

    tend to consistently increase the spread of the existing semantic clusters among the new package structure. Focus is not an appropriate quality metric to support remodularization analysis. February 4th, 2014
  14. RQ2 – Operators With Highest Impact What are the modularization

    operators that have more impact in the clusters generated by Semantic Clustering? Top-3 and Bottom-3 of Spread and Focus Remodularization Analysis using Semantic Clustering 14 Using common modularization operators proposed by [Rama and Patel, 2010] Chi-square test February 4th, 2014
  15. RQ2 – Operators With Highest Impact Remodularization Analysis using Semantic

    Clustering 15 Summary: Module decomposition is commonly the operator behind the increasing in spread and focus. This fact means that the semantic clusters cover more packages, but they are also more concentrated inside these packages. February 4th, 2014
  16. RQ3 – Impact on Conceptual Cohesion What is the impact

    of module decomposition in terms of conceptual cohesion? Analyze the impact considering: Restructured packages New packages Remodularization Analysis using Semantic Clustering 16 New packages February 4th, 2014
  17. RQ3 – Impact on Conceptual Cohesion Original Package New Packages

    Restructured Package Remodularization Analysis using Semantic Clustering 17 February 4th, 2014 Summary: After module decompositions, the new packages have better conceptual cohesion than the original packages. CCP is an adequate metric to express a quality improvement.
  18. Conclusions Remodularization Analysis using Semantic Clustering February 4th, 2014

  19. Conclusions Metrics assessment approach with conceptual metrics Improvements to Semantic

    Clustering Module Decomposition is the most common operator Remodularization Analysis using Semantic Clustering 19 Top 3 increase of Spread and Focus Increase in Conceptual Cohesion February 4th, 2014
  20. Future Work More conceptual metrics Mapping big refactorings to modularization

    operators Recommend operators according to conceptual metrics Remodularization Analysis using Semantic Clustering 20 metrics February 4th, 2014
  21. Thank You! 21