Smelly Relations: Measuring and Understanding Database Schema Quality

Smelly Relations: Measuring and Understanding Database Schema Quality

Presented at ICSE 2018 SEIP track in Gothenburg.

16a26342d4ff4724dfe59b0dc947dead?s=128

Tushar Sharma

May 31, 2018
Tweet

Transcript

  1. Smelly Relations: Measuring and Understanding Database Schema Quality Tushar Sharma,

    Marios Fragkoulis, Stamatia Rizou, Magiel Bruntink, Diomidis Spinellis This work is funded by SENECA project under Marie-Skłodowska Curie Actions Innovative Training Networks ITN-EID. Grant agreement number 642954.
  2. None
  3. Code Smells …certain structures in the code that suggest (sometimes

    they scream for) the possibility of refactoring. - Kent Beck
  4. http://www.tusharma.in/smells/

  5. Database smells Database smells are the characteristics of database code,

    database system, or stored data that indicate violation of the recommended best practices and potentially affect the quality of the software system in a negative way. Classification • Schema smells • Query smells • Data smells <!>
  6. Overview of the study Research questions Results DbDeo Catalog of

    database schema smells <!> <!> 2568 open-source and 357 industrial repositories Developers’ survey
  7. Database schema smells Compound attribute When a column is used

    to store a non-atomic attribute <!> <!> Id Title ISBN Authors … Book 123 Refactoring.. 0128013974 Girish S., Ganesh S., Tushar S. …
  8. Database schema smells Index abuse When indexes are used poorly.

    • Missing indexes • Insufficient indexes • Unused indexes <!> <!>
  9. Database schema smells 1. Compound attribute 2. Adjacency list 3.

    Superfluous key 4. Missing constraints 5. Metadata as data 6. Polymorphic association 7. Multicolumn attribute 8. Clone table 9. Values in attribute definition 10. Index abuse 11. God table 12. Meaningless name 13. Overload attribute names <!> <!>
  10. Developers’ survey • Online survey to understand developers’ perspective about

    the significance of database smells. • Presented database smells and asked them to rate based on their importance and usefulness 0 3 6 9 12 15 18 21 24 27 0 1 2 3 4 5-10 11-20 / >10 >20 / - Software development experience in years #Database applications developed Received 52 complete responses
  11. 0 5 10 15 20 25 30 35 40 45

    50 CA AL SK MC MD PA MA CT VA IA GT MN OA Respondents Don’t know Recommended practice Neither a smell nor a recommended practice Database schema smell Both a smell and a recommended practice depending on the context Clearly marked smells More context- sensitive
  12. Mining Repositories Attributes Industry OSS Repositories (initial) 840 16,057 Repositories

    with SQL statements 357 2,568 Files 2,559,984 3,297,932 LOC 220,489,273 409,155,497 SELECT 51,652 74,096 CREATE TABLE 18,907 50,682 INSERT 74,416 66,830 UPDATE 10,454 29,002 CREATE INDEX 7,152 10,798
  13. RQ1. What are the occurrence patterns of database smells?

  14. RQ1. Occurrence patterns of database smells Smells Occurrences Avg. smell

    density I OSS I OSS CA 5,517 7,966 0.04 0.04 AL 733 297 0.15 0.02 GT 4,428 5,507 0.44 0.24 VA 85 326 0.00 0.02 MD 944 1,003 0.16 0.09 MA 1,624 3,137 0.10 0.07 CT 101 3,704 0.00 0.05 OA 1,814 7,300 0.20 0.21 IA 12,643 9,475 1.25 1.76
  15. RQ1. Occurrence patterns of database smells Smells Occurrences Avg. smell

    density I OSS I OSS CA 5,517 7,966 0.04 0.04 AL 733 297 0.15 0.02 GT 4,428 5,507 0.44 0.24 VA 85 326 0.00 0.02 MD 944 1,003 0.16 0.09 MA 1,624 3,137 0.10 0.07 CT 101 3,704 0.00 0.05 OA 1,814 7,300 0.20 0.21 IA 12,643 9,475 1.25 1.76 Most frequently occurring smell Index abuse
  16. RQ1. Occurrence patterns of database smells Smells Occurrences Avg. smell

    density I OSS I OSS CA 5,517 7,966 0.04 0.04 AL 733 297 0.15 0.02 GT 4,428 5,507 0.44 0.24 VA 85 326 0.00 0.02 MD 944 1,003 0.16 0.09 MA 1,624 3,137 0.10 0.07 CT 101 3,704 0.00 0.05 OA 1,814 7,300 0.20 0.21 IA 12,643 9,475 1.25 1.76 Adjacency list prone to occur more in industrial projects OSS projects report more Clone table
  17. RQ2. Does the size of the project or the database

    play a role in smell density?
  18. RQ2. Size and smell density Smell density • Number of

    database smells per ten SQL statements • LOC vs smell density • ⍴ = 0.2420 (p-value = 3.724 × 10-6) for Industry • ⍴ = 0.0006 (p-value = 0.9731) for OSS • Database size vs smell density • ⍴ = 0.7338 (p-value < 2.2 × 10-16) for Industry • ⍴ = 0.6174 (p-value < 2.2 × 10-16) for OSS Strong correlation between database size and smell density.
  19. RQ3. Does the nature of code (type of the application,

    or usage of ORM frameworks) affect the smell density?
  20. RQ3. Nature of code vs smell density Application types •

    Desktop • Mobile • Web 0 0.4 0.8 1.2 1.6 2 2.4 2.8 Desktop (1998, 346) Mobile (40, 2) Web (530, 9) Average smell density Open-source Industry
  21. RQ3. Nature of code vs smell density ORM (Object-Relational Mapping)

    frameworks • 19 well-known frameworks identified 0 0.5 1 1.5 2 2.5 3 Projects using ORM (681, 238) Rest of the projects (1887, 199) Average smell density Open-source Industrial The difference is not statistically significant! Thus, ORM frameworks do not bring immunity from database smells.
  22. RQ4. What is the degree of co-occurrence among database smells?

  23. 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 CT

    AL GT VA MD MA CT OA IA Average co-occurrence Open-source Industry RQ4. Co-occurrence among smells Smells show considerably higher correlations in industrial projects.
  24. Qualitative analysis Overloaded attributes • Attributes, even the primary keys,

    with identical names but with different types. • “ID” is the most popularly used name for a primary key. • > 40% in OSS, ~11% in Industrial code • Up to 13 different types used for “ID”
  25. Qualitative analysis Overloaded attributes and clone tables • Many instances

    of these smells observed in tests or in example code while manual exploration. • It highlights the quality deficit and possibly casual mindset while writing test or example code.
  26. Opportunities • Tool support • Native support within IDEs •

    Training and awareness • Database standards • Database APIs • CHECK clause
  27. DbDeo – Database Smells Detection Tool DbDeo – An open-source

    tool to • extract embedded SQL statements and • detect database schema smells https://github.com/tushartushar/DbDeo
  28. None
  29. Tushar Sharma http://www.tusharma.in