Slide 1

Slide 1 text

Virtual Mutation Analysis of Relational Database Schemas Phil McMinn
 Gregory M. Kapfhammer
 Chris J. Wright University of Sheffield
 Allegheny College
 University of Sheffield

Slide 2

Slide 2 text

Relational Databases – 
 Why Should We (Still) Care? A vital component of many software systems Despite the wave of interest in “NoSQL” technologies, Relational Databases are still popular (and faster) For developers: schemas provide self-documentation

Slide 3

Slide 3 text

Relational Databases – 
 Why Should We (Still) Care? Relational Databases are still 
 important, popular and relevant A vital component of many software systems Despite the wave of interest in “NoSQL” technologies, Relational Databases are still popular (and faster) For developers: schemas provide self-documentation

Slide 4

Slide 4 text

A Relational Database Schema

Slide 5

Slide 5 text

A Relational Database Schema Table

Slide 6

Slide 6 text

A Relational Database Schema Table Column and
 data type

Slide 7

Slide 7 text

Integrity Constraints Prevent invalid data being entered into the database Encode domain logic

Slide 8

Slide 8 text

Integrity Constraints Prevent invalid data being entered into the database Encode domain logic

Slide 9

Slide 9 text

Testing the Schema Correctly accepted by the schema Correctly rejected by the schema Correctly rejected by the schema

Slide 10

Slide 10 text

Testing the Schema Correctly accepted by the schema Correctly rejected by the schema Correctly rejected by the schema

Slide 11

Slide 11 text

Testing the Schema Correctly accepted by the schema Correctly rejected by the schema Correctly rejected by the schema

Slide 12

Slide 12 text

Testing the Schema Correctly accepted by the schema Correctly rejected by the schema Correctly rejected by the schema

Slide 13

Slide 13 text

Why Do We Need to Do This?

Slide 14

Slide 14 text

Why Do We Need to Do This? To trap common errors when designing a schema
 For example: lack of uniqueness property on usernames, out of range values

Slide 15

Slide 15 text

Why Do We Need to Do This? To trap common errors when designing a schema
 For example: lack of uniqueness property on usernames, out of range values To test development behaviour vs deployment
 DBMSs have subtly different behaviors

Slide 16

Slide 16 text

Why Do We Need to Do This? To trap common errors when designing a schema
 For example: lack of uniqueness property on usernames, out of range values To test development behaviour vs deployment
 DBMSs have subtly different behaviors Nobody throws away a database of data
 To test the success of database migrations

Slide 17

Slide 17 text

Why Do We Need to Do This? To trap common errors when designing a schema
 For example: lack of uniqueness property on usernames, out of range values To test development behaviour vs deployment
 DBMSs have subtly different behaviors Nobody throws away a database of data
 To test the success of database migrations Industry advice
 Destroying database consistency can have huge cost implications

Slide 18

Slide 18 text

Mutation Analysis Once a test suite has been created, its fault finding capability can be estimated with mutation analysis. For relational database schema testing, mutants are created by making small changes to the schema

Slide 19

Slide 19 text

Mutation Analysis Once a test suite has been created, its fault finding capability can be estimated with mutation analysis. For relational database schema testing, mutants are created by making small changes to the schema

Slide 20

Slide 20 text

Mutation Analysis Once a test suite has been created, its fault finding capability can be estimated with mutation analysis. For relational database schema testing, mutants are created by making small changes to the schema

Slide 21

Slide 21 text

Mutation Analysis Once a test suite has been created, its fault finding capability can be estimated with mutation analysis. For relational database schema testing, mutants are created by making small changes to the schema

Slide 22

Slide 22 text

Mutation Analysis is Costly schema mutants

Slide 23

Slide 23 text

Mutation Analysis is Costly schema mutants mutant test suite database Mutant killed / alive +

Slide 24

Slide 24 text

Mutation Analysis is Costly schema mutants mutant test suite database Mutant killed / alive + SchemaAnalyst

Slide 25

Slide 25 text

Mutation Analysis is Costly

Slide 26

Slide 26 text

Mutation Analysis is Costly DO FEWER

Slide 27

Slide 27 text

Mutation Analysis is Costly DO FEWER DO SMARTER

Slide 28

Slide 28 text

Mutation Analysis is Costly DO FEWER DO SMARTER DO FASTER

Slide 29

Slide 29 text

Mutation Analysis is Costly schema mutants mutant test suite database Mutant killed / alive + SchemaAnalyst

Slide 30

Slide 30 text

Mutation Analysis is Costly schema mutants mutant test suite database Mutant killed / alive + SchemaAnalyst

Slide 31

Slide 31 text

Mutation Analysis is Costly schema mutants mutant test suite database Mutant killed / alive + SchemaAnalyst High cost of communicating with the DBMS and executing SQL queries on it

Slide 32

Slide 32 text

Reducing the Cost schema mutants mutant test suite database Mutant killed / alive + SchemaAnalyst

Slide 33

Slide 33 text

Reducing the Cost schema mutants mutant test suite database Mutant killed / alive + SchemaAnalyst Local, no communication overhead

Slide 34

Slide 34 text

Reducing the Cost schema mutants mutant test suite database Mutant killed / alive + SchemaAnalyst model of

Slide 35

Slide 35 text

Reducing the Cost schema mutants mutant test suite database Mutant killed / alive + SchemaAnalyst model of Virtual Mutation
 Analysis

Slide 36

Slide 36 text

Reducing the Cost schema mutants mutant test suite database Mutant killed / alive + SchemaAnalyst model of Virtual Mutation
 Analysis Lower execution overhead

Slide 37

Slide 37 text

The Model

Slide 38

Slide 38 text

The Model Integrity constraint predicate icp1

Slide 39

Slide 39 text

The Model icp2 icp3 Integrity constraint predicate icp1

Slide 40

Slide 40 text

The Model icp2 icp3 icp4 icp5 Integrity constraint predicate icp1

Slide 41

Slide 41 text

The Model Form an acceptance predicate for the table: icp2 icp3 icp4 icp5 ap = icp1 /\ icp2 /\ icp3 /\ icp4 /\ icp5 Integrity constraint predicate icp1

Slide 42

Slide 42 text

The Model Form an acceptance predicate for the table: icp2 icp3 icp4 icp5 ap = icp1 /\ icp2 /\ icp3 /\ icp4 /\ icp5 Integrity constraint predicate icp1 True when DBMS would accept the data False otherwise

Slide 43

Slide 43 text

Virtual DBMS Models

Slide 44

Slide 44 text

Empirical Study RQ1. What is the relative efficiency of the virtual approach? RQ2. What are the time savings? RQ3. How do mutation scores compare when the standard approach is run for as long as the virtual one?

Slide 45

Slide 45 text

Subject Schemas

Slide 46

Slide 46 text

RQ1: Efficiency Standard Virtual 1000 100000 1000 100000 1000 100000 HyperSQL PostgreSQL SQLite C offeeO rders Em ployee Inventory Iso3166 JW hoisServer M ozillaPerm issions N istW eather Person Products C offeeO rders Em ployee Inventory Iso3166 JW hoisServer M ozillaPerm issions N istW eather Person Products Database Schema Mutation Analysis Time (Log Transformed)

Slide 47

Slide 47 text

RQ1: Efficiency Standard Virtual 1000 100000 1000 100000 1000 100000 HyperSQL PostgreSQL SQLite C offeeO rders Em ployee Inventory Iso3166 JW hoisServer M ozillaPerm issions N istW eather Person Products C offeeO rders Em ployee Inventory Iso3166 JW hoisServer M ozillaPerm issions N istW eather Person Products Database Schema Mutation Analysis Time (Log Transformed) Virtual Mutation Analysis is significantly more efficient for Postgres and HyperSQL, but not SQLite

Slide 48

Slide 48 text

RQ2: Time Savings −100 −50 0 50 100 50 100 150 Number of Mutants Percentage of Mean Time Saved HyperSQL PostgreSQL SQLite

Slide 49

Slide 49 text

RQ2: Time Savings −100 −50 0 50 100 50 100 150 Number of Mutants Percentage of Mean Time Saved HyperSQL PostgreSQL SQLite Virtual Mutation Analysis yields large time savings for Postgres and HyperSQL but not always with SQLite, leading to an average time saving of 51% overall

Slide 50

Slide 50 text

HyperSQL PostgreSQL SQLite 0 50 100 150 C offeeO rders Em ployee Inventory Iso3166 JW hoisServer M ozillaPerm issions N istW eather Person Products C offeeO rders Em ployee Inventory Iso3166 JW hoisServer M ozillaPerm issions N istW eather Person Products C offeeO rders Em ployee Inventory Iso3166 JW hoisServer M ozillaPerm issions N istW eather Person Products Database Schema Total Number of Mutants Selective Virtual RQ3: Comparison

Slide 51

Slide 51 text

HyperSQL PostgreSQL SQLite 0 50 100 150 C offeeO rders Em ployee Inventory Iso3166 JW hoisServer M ozillaPerm issions N istW eather Person Products C offeeO rders Em ployee Inventory Iso3166 JW hoisServer M ozillaPerm issions N istW eather Person Products C offeeO rders Em ployee Inventory Iso3166 JW hoisServer M ozillaPerm issions N istW eather Person Products Database Schema Total Number of Mutants Selective Virtual RQ3: Comparison Virtual Mutation Analysis evaluates more mutants

Slide 52

Slide 52 text

RQ3: Comparison Selective Virtual 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 HyperSQL PostgreSQL SQLite C offeeO rders Em ployee Inventory Iso3166 JW hoisServer M ozillaPerm issions N istW eather Person Products C offeeO rders Em ployee Inventory Iso3166 JW hoisServer M ozillaPerm issions N istW eather Person Products Database Schema Mutation Score

Slide 53

Slide 53 text

RQ3: Comparison Selective Virtual 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 HyperSQL PostgreSQL SQLite C offeeO rders Em ployee Inventory Iso3166 JW hoisServer M ozillaPerm issions N istW eather Person Products C offeeO rders Em ployee Inventory Iso3166 JW hoisServer M ozillaPerm issions N istW eather Person Products Database Schema Mutation Score Virtual Mutation Analysis is the best option when highly accurate scores are needed under a time constraint

Slide 54

Slide 54 text

Conclusions Virtual Mutation Analysis Technique: Removes the need to use a real DBMS for relational database schema mutation testing More cost-effective while still being accurate: • More efficient for 22 of 27 configurations studied • Yields time savings of 13 to 99%