Slide 1

Slide 1 text

Smelly Relations: Measuring and Understanding Database Schema Quality Tushar Sharma, Marios Fragkoulis, Stamatia Rizou, Magiel Bruntink, Diomidis Spinellis This work is funded by SENECA project under Marie-Skłodowska Curie Actions Innovative Training Networks ITN-EID. Grant agreement number 642954.

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

Code Smells …certain structures in the code that suggest (sometimes they scream for) the possibility of refactoring. - Kent Beck

Slide 4

Slide 4 text

http://www.tusharma.in/smells/

Slide 5

Slide 5 text

Database smells Database smells are the characteristics of database code, database system, or stored data that indicate violation of the recommended best practices and potentially affect the quality of the software system in a negative way. Classification • Schema smells • Query smells • Data smells

Slide 6

Slide 6 text

Overview of the study Research questions Results DbDeo Catalog of database schema smells 2568 open-source and 357 industrial repositories Developers’ survey

Slide 7

Slide 7 text

Database schema smells Compound attribute When a column is used to store a non-atomic attribute Id Title ISBN Authors … Book 123 Refactoring.. 0128013974 Girish S., Ganesh S., Tushar S. …

Slide 8

Slide 8 text

Database schema smells Index abuse When indexes are used poorly. • Missing indexes • Insufficient indexes • Unused indexes

Slide 9

Slide 9 text

Database schema smells 1. Compound attribute 2. Adjacency list 3. Superfluous key 4. Missing constraints 5. Metadata as data 6. Polymorphic association 7. Multicolumn attribute 8. Clone table 9. Values in attribute definition 10. Index abuse 11. God table 12. Meaningless name 13. Overload attribute names

Slide 10

Slide 10 text

Developers’ survey • Online survey to understand developers’ perspective about the significance of database smells. • Presented database smells and asked them to rate based on their importance and usefulness 0 3 6 9 12 15 18 21 24 27 0 1 2 3 4 5-10 11-20 / >10 >20 / - Software development experience in years #Database applications developed Received 52 complete responses

Slide 11

Slide 11 text

0 5 10 15 20 25 30 35 40 45 50 CA AL SK MC MD PA MA CT VA IA GT MN OA Respondents Don’t know Recommended practice Neither a smell nor a recommended practice Database schema smell Both a smell and a recommended practice depending on the context Clearly marked smells More context- sensitive

Slide 12

Slide 12 text

Mining Repositories Attributes Industry OSS Repositories (initial) 840 16,057 Repositories with SQL statements 357 2,568 Files 2,559,984 3,297,932 LOC 220,489,273 409,155,497 SELECT 51,652 74,096 CREATE TABLE 18,907 50,682 INSERT 74,416 66,830 UPDATE 10,454 29,002 CREATE INDEX 7,152 10,798

Slide 13

Slide 13 text

RQ1. What are the occurrence patterns of database smells?

Slide 14

Slide 14 text

RQ1. Occurrence patterns of database smells Smells Occurrences Avg. smell density I OSS I OSS CA 5,517 7,966 0.04 0.04 AL 733 297 0.15 0.02 GT 4,428 5,507 0.44 0.24 VA 85 326 0.00 0.02 MD 944 1,003 0.16 0.09 MA 1,624 3,137 0.10 0.07 CT 101 3,704 0.00 0.05 OA 1,814 7,300 0.20 0.21 IA 12,643 9,475 1.25 1.76

Slide 15

Slide 15 text

RQ1. Occurrence patterns of database smells Smells Occurrences Avg. smell density I OSS I OSS CA 5,517 7,966 0.04 0.04 AL 733 297 0.15 0.02 GT 4,428 5,507 0.44 0.24 VA 85 326 0.00 0.02 MD 944 1,003 0.16 0.09 MA 1,624 3,137 0.10 0.07 CT 101 3,704 0.00 0.05 OA 1,814 7,300 0.20 0.21 IA 12,643 9,475 1.25 1.76 Most frequently occurring smell Index abuse

Slide 16

Slide 16 text

RQ1. Occurrence patterns of database smells Smells Occurrences Avg. smell density I OSS I OSS CA 5,517 7,966 0.04 0.04 AL 733 297 0.15 0.02 GT 4,428 5,507 0.44 0.24 VA 85 326 0.00 0.02 MD 944 1,003 0.16 0.09 MA 1,624 3,137 0.10 0.07 CT 101 3,704 0.00 0.05 OA 1,814 7,300 0.20 0.21 IA 12,643 9,475 1.25 1.76 Adjacency list prone to occur more in industrial projects OSS projects report more Clone table

Slide 17

Slide 17 text

RQ2. Does the size of the project or the database play a role in smell density?

Slide 18

Slide 18 text

RQ2. Size and smell density Smell density • Number of database smells per ten SQL statements • LOC vs smell density • ⍴ = 0.2420 (p-value = 3.724 × 10-6) for Industry • ⍴ = 0.0006 (p-value = 0.9731) for OSS • Database size vs smell density • ⍴ = 0.7338 (p-value < 2.2 × 10-16) for Industry • ⍴ = 0.6174 (p-value < 2.2 × 10-16) for OSS Strong correlation between database size and smell density.

Slide 19

Slide 19 text

RQ3. Does the nature of code (type of the application, or usage of ORM frameworks) affect the smell density?

Slide 20

Slide 20 text

RQ3. Nature of code vs smell density Application types • Desktop • Mobile • Web 0 0.4 0.8 1.2 1.6 2 2.4 2.8 Desktop (1998, 346) Mobile (40, 2) Web (530, 9) Average smell density Open-source Industry

Slide 21

Slide 21 text

RQ3. Nature of code vs smell density ORM (Object-Relational Mapping) frameworks • 19 well-known frameworks identified 0 0.5 1 1.5 2 2.5 3 Projects using ORM (681, 238) Rest of the projects (1887, 199) Average smell density Open-source Industrial The difference is not statistically significant! Thus, ORM frameworks do not bring immunity from database smells.

Slide 22

Slide 22 text

RQ4. What is the degree of co-occurrence among database smells?

Slide 23

Slide 23 text

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 CT AL GT VA MD MA CT OA IA Average co-occurrence Open-source Industry RQ4. Co-occurrence among smells Smells show considerably higher correlations in industrial projects.

Slide 24

Slide 24 text

Qualitative analysis Overloaded attributes • Attributes, even the primary keys, with identical names but with different types. • “ID” is the most popularly used name for a primary key. • > 40% in OSS, ~11% in Industrial code • Up to 13 different types used for “ID”

Slide 25

Slide 25 text

Qualitative analysis Overloaded attributes and clone tables • Many instances of these smells observed in tests or in example code while manual exploration. • It highlights the quality deficit and possibly casual mindset while writing test or example code.

Slide 26

Slide 26 text

Opportunities • Tool support • Native support within IDEs • Training and awareness • Database standards • Database APIs • CHECK clause

Slide 27

Slide 27 text

DbDeo – Database Smells Detection Tool DbDeo – An open-source tool to • extract embedded SQL statements and • detect database schema smells https://github.com/tushartushar/DbDeo

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

Tushar Sharma http://www.tusharma.in