Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Remodularization Analysis Using Semantic Clustering (CSMR-WCRE 2014)

Remodularization Analysis Using Semantic Clustering (CSMR-WCRE 2014)

In this paper, we report an experience on using and adapting Semantic Clustering to evaluate software remodularizations. Semantic Clustering is an approach that relies on information retrieval and clustering techniques to extract sets of similar classes in a system, according to their vocabularies. We adapted Semantic Clustering to support remodularization analysis. We evaluate our adaptation using six real-world remodularizations of four software systems. We report that Semantic Clustering and conceptual metrics can be used to express and explain the intention of the architects when performing common modularization operators, such as module decomposition.

ASERG, DCC, UFMG

February 04, 2014
Tweet

More Decks by ASERG, DCC, UFMG

Other Decks in Research

Transcript

  1. Remodularization Analysis using
    Semantic Clustering
    APPLIED SOFTWARE ENGINEERING
    RESEARCH GROUP
    /
    Gustavo Santos
    Marco Tulio Valente
    Federal University of Minas Gerais
    /
    Nicolas Anquetil
    RMoD Team – INRIA Lille

    View Slide

  2. Semantic Clustering
    Groups source artifacts that use similar vocabulary in
    semantic clusters, which are likely to reveal the
    intention of the code
    Similarity considers text in common
    Remodularization Analysis using
    Semantic Clustering
    February 4th, 2014 2

    View Slide

  3. Semantic Clustering
    Remodularization Analysis using
    Semantic Clustering
    3
    Extracted from Kuhn et al., 2007
    February 4th, 2014

    View Slide

  4. Remodularization Analysis
    Remodularization Analysis using
    Semantic Clustering
    February 4th, 2014

    View Slide

  5. Improvements to Semantic Clustering
    #1: Text filtering in comments
    Remove JavaDoc metadata, HTML code, etc.
    All public methods from Object class
    #2: Clustering Stop Criterion
    Remodularization Analysis using
    Semantic Clustering
    5
    Our Approach: Similarity Threshold
    February 4th, 2014

    View Slide

  6. Improvement #3: Remodularization Analysis
    Challenge: Comparison between versions
    Our Approach:
    Execute Semantic Clustering on the earlier version
    Map new classes to previously calculated cluster
    Remodularization Analysis using
    Semantic Clustering
    6
    Same number of clusters
    February 4th, 2014

    View Slide

  7. Evaluation
    Remodularization Analysis using
    Semantic Clustering
    February 4th, 2014

    View Slide

  8. Metrics Assessment
    Goal: Do conceptual aspects express architectural
    improvements?
    Methodology
    Isolate the remodularization
    Execute Semantic Clustering
    Remodularization Analysis using
    Semantic Clustering
    8
    Execute Semantic Clustering
    Cluster comparison algorithm
    Calculate the conceptual metrics
    February 4th, 2014

    View Slide

  9. Dataset
    Remodularization Analysis using
    Semantic Clustering
    9
    February 4th, 2014

    View Slide

  10. Conceptual Metrics
    Conceptual Cohesion [Marcus and Poshyvanyk, 2005]
    Average cosine similarity of all pairs of classes
    Spread [Ducasse et al., 2006]
    Number of packages a cluster touches
    Remodularization Analysis using
    Semantic Clustering
    10
    Focus [Ducasse et al., 2006]
    Concentration of a cluster
    February 4th, 2014

    View Slide

  11. RQ1 – Impact on Clusters
    What is the impact of remodularizations in the clusters
    generated by Semantic Clustering?
    Global remodularizations
    Wilcoxon test
    Remodularization Analysis using
    Semantic Clustering
    11
    Wilcoxon test
    February 4th, 2014

    View Slide

  12. Spread and Focus Results
    Eclipse-2.0.1 – 2.1
    Remodularization Analysis using
    Semantic Clustering
    12
    JHotDraw-7.3.1 – 7.4.1
    Focus
    Spread
    February 4th, 2014

    View Slide

  13. Wilcoxon Results
    Remodularization Analysis using
    Semantic Clustering
    13
    Summary: Remodularizations tend to consistently increase the spread of
    the existing semantic clusters among the new package structure. Focus is
    not an appropriate quality metric to support remodularization analysis.
    February 4th, 2014

    View Slide

  14. RQ2 – Operators With Highest Impact
    What are the modularization operators that have more
    impact in the clusters generated by Semantic
    Clustering?
    Top-3 and Bottom-3 of Spread and Focus
    Remodularization Analysis using
    Semantic Clustering
    14
    Using common modularization operators proposed by
    [Rama and Patel, 2010]
    Chi-square test
    February 4th, 2014

    View Slide

  15. RQ2 – Operators With Highest Impact
    Remodularization Analysis using
    Semantic Clustering
    15
    Summary: Module decomposition is commonly the operator behind the
    increasing in spread and focus. This fact means that the semantic clusters
    cover more packages, but they are also more concentrated inside these
    packages.
    February 4th, 2014

    View Slide

  16. RQ3 – Impact on Conceptual Cohesion
    What is the impact of module decomposition in terms
    of conceptual cohesion?
    Analyze the impact considering:
    Restructured packages
    New packages
    Remodularization Analysis using
    Semantic Clustering
    16
    New packages
    February 4th, 2014

    View Slide

  17. RQ3 – Impact on Conceptual Cohesion
    Original Package New Packages
    Restructured Package
    Remodularization Analysis using
    Semantic Clustering
    17
    February 4th, 2014
    Summary: After module decompositions, the new packages have better
    conceptual cohesion than the original packages. CCP is an adequate
    metric to express a quality improvement.

    View Slide

  18. Conclusions
    Remodularization Analysis using
    Semantic Clustering
    February 4th, 2014

    View Slide

  19. Conclusions
    Metrics assessment approach with conceptual metrics
    Improvements to Semantic Clustering
    Module Decomposition is the most common operator
    Remodularization Analysis using
    Semantic Clustering
    19
    Top 3 increase of Spread and Focus
    Increase in Conceptual Cohesion
    February 4th, 2014

    View Slide

  20. Future Work
    More conceptual metrics
    Mapping big refactorings to modularization operators
    Recommend operators according to conceptual
    metrics
    Remodularization Analysis using
    Semantic Clustering
    20
    metrics
    February 4th, 2014

    View Slide

  21. Thank You!
    21

    View Slide