Slide 1

Slide 1 text

Automatically Evaluating the Efficiency of Search-Based Test Data Generation (for Relational Database Schemas) Cody Kinneer SEKE 2015 July 7, 2015

Slide 2

Slide 2 text

Random Testing

Slide 3

Slide 3 text

Random Testing Easy to implement — and yet not always very e ective!

Slide 4

Slide 4 text

Search-Based Testing 0 0.2 0.4 0.6 0.8 1 0 0.5 1 0 0.5 1

Slide 5

Slide 5 text

Search-Based Testing 0 0.2 0.4 0.6 0.8 1 0 0.5 1 0 0.5 1 Often much more e ective than random testing

Slide 6

Slide 6 text

Performance of SBST

Slide 7

Slide 7 text

Performance of SBST Fitness Function

Slide 8

Slide 8 text

Performance of SBST Fitness Function Data Generator

Slide 9

Slide 9 text

Performance of SBST Fitness Function Data Generator Restart Rule

Slide 10

Slide 10 text

Performance of SBST Fitness Function Data Generator Restart Rule Stop Rule

Slide 11

Slide 11 text

Performance of SBST Fitness Function Data Generator Restart Rule Stop Rule Search Budget

Slide 12

Slide 12 text

Performance of SBST Fitness Function Data Generator Restart Rule Stop Rule Search Budget How do parameter values in uence the e ciency of SBST?

Slide 13

Slide 13 text

Performance of SBST O( )

Slide 14

Slide 14 text

Performance of SBST O(?)

Slide 15

Slide 15 text

Performance of SBST O(?) Analytical

Slide 16

Slide 16 text

Performance of SBST O(?) Analytical

Slide 17

Slide 17 text

Performance of SBST O(?) Analytical Empirical

Slide 18

Slide 18 text

Performance of SBST O(?) Analytical Empirical

Slide 19

Slide 19 text

Doubling Experiment Input

Slide 20

Slide 20 text

Doubling Experiment Input

Slide 21

Slide 21 text

Doubling Experiment Input Time = 14.98

Slide 22

Slide 22 text

Doubling Experiment Input Time = 14.98 Input

Slide 23

Slide 23 text

Doubling Experiment Input Time = 14.98 Input

Slide 24

Slide 24 text

Doubling Experiment Input Time = 14.98 Input Time = 31.45

Slide 25

Slide 25 text

Doubling Experiment Input Time = 14.98 Input Time = 31.45

Slide 26

Slide 26 text

Doubling Experiment Input Time = 14.98 Input Time = 31.45 Ratio ≈ 2

Slide 27

Slide 27 text

Doubling Experiment Input Time = 14.98 Input Time = 31.45 Ratio ≈ 2 Linear — O(n)

Slide 28

Slide 28 text

Doubling Experiment Input Input

Slide 29

Slide 29 text

Doubling Experiment Input Time = 12.63 Input

Slide 30

Slide 30 text

Doubling Experiment Input Time = 12.63 Input Time = 51.48

Slide 31

Slide 31 text

Doubling Experiment Input Time = 12.63 Input Time = 51.48 Ratio ≈ 4

Slide 32

Slide 32 text

Doubling Experiment Input Time = 12.63 Input Time = 51.48 Ratio ≈ 4 Quadratic — O(n2)

Slide 33

Slide 33 text

Doubling Experiment Input Input

Slide 34

Slide 34 text

Doubling Experiment Input Time = 11.23 Input

Slide 35

Slide 35 text

Doubling Experiment Input Time = 11.23 Input Time = 89.72

Slide 36

Slide 36 text

Doubling Experiment Input Time = 11.23 Input Time = 89.72 Ratio ≈ 8

Slide 37

Slide 37 text

Doubling Experiment Input Time = 11.23 Input Time = 89.72 Ratio ≈ 8 Cubic — O(n3)

Slide 38

Slide 38 text

Relational Databases Deployment Locations for Databases

Slide 39

Slide 39 text

Relational Databases Deployment Locations for Databases Database Application Server

Slide 40

Slide 40 text

Relational Databases Deployment Locations for Databases Database Application Server Mobile Phone or Tablet

Slide 41

Slide 41 text

Relational Databases Deployment Locations for Databases Database Application Server Mobile Phone or Tablet O ce and Productivity Software

Slide 42

Slide 42 text

Relational Databases Deployment Locations for Databases Database Application Server Mobile Phone or Tablet O ce and Productivity Software Government

Slide 43

Slide 43 text

Relational Databases Deployment Locations for Databases Database Application Server Mobile Phone or Tablet O ce and Productivity Software Government Astrophysics

Slide 44

Slide 44 text

Database Schemas Relational Database Management System

Slide 45

Slide 45 text

Database Schemas Relational Database Management System E-commerce

Slide 46

Slide 46 text

Database Schemas Relational Database Management System E-commerce Schema

Slide 47

Slide 47 text

Database Schemas Relational Database Management System E-commerce Schema State

Slide 48

Slide 48 text

Database Schemas Relational Database Management System E-commerce Schema State Schema Integrity Constraints

Slide 49

Slide 49 text

Database Schemas Relational Database Management System E-commerce Schema State Schema Integrity Constraints PRIMARY KEY

Slide 50

Slide 50 text

Database Schemas Relational Database Management System E-commerce Schema State Schema Integrity Constraints PRIMARY KEY FOREIGN KEY

Slide 51

Slide 51 text

Database Schemas Relational Database Management System E-commerce Schema State Schema Integrity Constraints PRIMARY KEY FOREIGN KEY Arbitrary CHECK

Slide 52

Slide 52 text

Database Schemas Relational Database Management System E-commerce Schema State State Relational Components

Slide 53

Slide 53 text

Database Schemas Relational Database Management System E-commerce Schema State State Relational Components Tables

Slide 54

Slide 54 text

Database Schemas Relational Database Management System E-commerce Schema State State Relational Components Tables Rows

Slide 55

Slide 55 text

Database Schemas Relational Database Management System E-commerce Schema State State Relational Components Tables Rows Columns

Slide 56

Slide 56 text

Database Testing The Data Warehouse Institute reports that North American organizations experience a $611 billion annual loss due to poor data quality

Slide 57

Slide 57 text

Database Testing The Data Warehouse Institute reports that North American organizations experience a $611 billion annual loss due to poor data quality Scott W. Ambler argues that the “virtual absence” of database testing — the validation of the contents, schema, and functionality of the database — is the primary cause of this loss

Slide 58

Slide 58 text

Database Testing The Data Warehouse Institute reports that North American organizations experience a $611 billion annual loss due to poor data quality Scott W. Ambler argues that the “virtual absence” of database testing — the validation of the contents, schema, and functionality of the database — is the primary cause of this loss Past papers presented SchemaAnalyst, a search-based system for testing the complex integrity constraints in relational schemas

Slide 59

Slide 59 text

Method of Approach SchemaAnalyst Execution

Slide 60

Slide 60 text

Method of Approach SchemaAnalyst Execution Coverage Criterion

Slide 61

Slide 61 text

Method of Approach SchemaAnalyst Execution Coverage Criterion Data Generator

Slide 62

Slide 62 text

Method of Approach SchemaAnalyst Execution Coverage Criterion Data Generator Database Schema

Slide 63

Slide 63 text

Method of Approach SchemaAnalyst Execution Coverage Criterion Data Generator Database Schema Test Suite

Slide 64

Slide 64 text

Method of Approach SchemaAnalyst Execution Coverage Criterion Data Generator Database Schema Runtime

Slide 65

Slide 65 text

Method of Approach SchemaAnalyst Execution Coverage Criterion Data Generator Runtime Schema Doubler Provides Schema Database Schema

Slide 66

Slide 66 text

Method of Approach SchemaAnalyst Execution Coverage Criterion Data Generator Runtime Schema Doubler Provides Schema Database Schema Doubler Choice

Slide 67

Slide 67 text

Method of Approach SchemaAnalyst Execution Coverage Criterion Data Generator Runtime Schema Doubler Provides Schema Database Schema Doubler Choice Convergence Algorithm Continue?

Slide 68

Slide 68 text

Doubling Schemas Table Column 1 Column 2 . . . Column n

Slide 69

Slide 69 text

Doubling Schemas Table Column 1 Column 2 . . . Column n NOT NULL

Slide 70

Slide 70 text

Doubling Schemas Table Column 1 Column 2 . . . Column n NOT NULL PRIMARY KEY

Slide 71

Slide 71 text

Doubling Schemas Table Column 1 Column 2 . . . Column n NOT NULL PRIMARY KEY UNIQUE

Slide 72

Slide 72 text

Doubling Schemas Table Column 1 Column 2 . . . Column n NOT NULL PRIMARY KEY UNIQUE CHECK

Slide 73

Slide 73 text

Doubling Schemas Table Column 1 Column 2 . . . Column n NOT NULL PRIMARY KEY UNIQUE CHECK FORIEGN KEY

Slide 74

Slide 74 text

Doubling Schemas Table Column 1 Column 2 . . . Column n NOT NULL PRIMARY KEY UNIQUE CHECK FORIEGN KEY NOT NULL

Slide 75

Slide 75 text

Experiments Experimental Parameters

Slide 76

Slide 76 text

Experiments Experimental Parameters Coverage Criterion

Slide 77

Slide 77 text

Experiments Experimental Parameters Coverage Criterion Data Generator

Slide 78

Slide 78 text

Experiments Experimental Parameters Coverage Criterion Data Generator Doubling Technique

Slide 79

Slide 79 text

Experiments Experimental Parameters Coverage Criterion Data Generator Doubling Technique Database Schema

Slide 80

Slide 80 text

Experiments Experimental Parameters Coverage Criterion Data Generator Doubling Technique Database Schema Over 2,000 unique combinations of parameters!

Slide 81

Slide 81 text

Experiments Experimental Parameters Coverage Criterion Data Generator Doubling Technique Database Schema Over 2,000 unique combinations of parameters! Experiments ran on HPC cluster with 3,440 cores

Slide 82

Slide 82 text

Relational Schemas Schema Tables Columns Constraints BioSQL 28 129 186 Cloc 2 10 0 iTrust 42 309 134 JWhoisServer 6 49 50 NistWeather 2 9 13 NistXTS7 1 3 3 NistXTS749 1 3 3 RiskIt 13 57 36 UnixUsage 8 32 24

Slide 83

Slide 83 text

Relational Schemas Schema Tables Columns Constraints BioSQL 28 129 186 Cloc 2 10 0 iTrust 42 309 134 JWhoisServer 6 49 50 NistWeather 2 9 13 NistXTS7 1 3 3 NistXTS749 1 3 3 RiskIt 13 57 36 UnixUsage 8 32 24

Slide 84

Slide 84 text

Relational Schemas Schema Tables Columns Constraints BioSQL 28 129 186 Cloc 2 10 0 iTrust 42 309 134 JWhoisServer 6 49 50 NistWeather 2 9 13 NistXTS7 1 3 3 NistXTS749 1 3 3 RiskIt 13 57 36 UnixUsage 8 32 24

Slide 85

Slide 85 text

Relational Schemas Schema Tables Columns Constraints BioSQL 28 129 186 Cloc 2 10 0 iTrust 42 309 134 JWhoisServer 6 49 50 NistWeather 2 9 13 NistXTS7 1 3 3 NistXTS749 1 3 3 RiskIt 13 57 36 UnixUsage 8 32 24

Slide 86

Slide 86 text

Empirical Results Doubled UNIQUEs NOT NULLs CHECKs

Slide 87

Slide 87 text

Empirical Results Doubled UNIQUEs NOT NULLs CHECKs 699 Experiments

Slide 88

Slide 88 text

Empirical Results Doubled UNIQUEs NOT NULLs CHECKs 699 Experiments 8% Stopped

Slide 89

Slide 89 text

Empirical Results Doubled UNIQUEs NOT NULLs CHECKs 699 Experiments 8% Stopped 20% O(1) or O(log)

Slide 90

Slide 90 text

Empirical Results Doubled UNIQUEs NOT NULLs CHECKs 699 Experiments 8% Stopped 20% O(1) or O(log) 72% O(n) or O(n log n)

Slide 91

Slide 91 text

Empirical Results Doubled UNIQUEs NOT NULLs CHECKs 699 Experiments 8% Stopped 20% O(1) or O(log) 72% O(n) or O(n log n) SchemaAnalyst ∈ O(n) for constraints studied

Slide 92

Slide 92 text

Empirical Results Doubled Tables

Slide 93

Slide 93 text

Empirical Results Doubled Tables 467 Experiments

Slide 94

Slide 94 text

Empirical Results Doubled Tables 467 Experiments 56% Stopped

Slide 95

Slide 95 text

Empirical Results Doubled Tables 467 Experiments 56% Stopped 72 O(n2)

Slide 96

Slide 96 text

Empirical Results Doubled Tables 467 Experiments 56% Stopped 72 O(n2) 10 O(n3)

Slide 97

Slide 97 text

Empirical Results Doubled Tables 467 Experiments 56% Stopped 72 O(n2) 10 O(n3) SchemaAnalyst ∈ O(n3) or worse for tables

Slide 98

Slide 98 text

Empirical Results Doubled Columns

Slide 99

Slide 99 text

Empirical Results Doubled Columns 467 Experiments

Slide 100

Slide 100 text

Empirical Results Doubled Columns 467 Experiments 203 Stopped

Slide 101

Slide 101 text

Empirical Results Doubled Columns 467 Experiments 203 Stopped 208 O(n) or O(n log n)

Slide 102

Slide 102 text

Empirical Results Doubled Columns 467 Experiments 203 Stopped 208 O(n) or O(n log n) 28 O(n2) and 2 O(n3)

Slide 103

Slide 103 text

Empirical Results Doubled Columns 467 Experiments 203 Stopped 208 O(n) or O(n log n) 28 O(n2) and 2 O(n3) SchemaAnalyst ∈ O(n3) or worse for columns

Slide 104

Slide 104 text

Adequacy Criteria

Slide 105

Slide 105 text

Adequacy Criteria More e ective criteria require additional runtime

Slide 106

Slide 106 text

Data Generator

Slide 107

Slide 107 text

Data Generator More e ective generators can also be more e cient

Slide 108

Slide 108 text

Key Contributions Search-based test data generation is often highly e ective, but worst-case time complexity unknown

Slide 109

Slide 109 text

Key Contributions Search-based test data generation is often highly e ective, but worst-case time complexity unknown A technique for automated doubling experiments

Slide 110

Slide 110 text

Key Contributions Search-based test data generation is often highly e ective, but worst-case time complexity unknown A technique for automated doubling experiments Emprical suggestions for worst-case time complexity

Slide 111

Slide 111 text

Key Contributions Search-based test data generation is often highly e ective, but worst-case time complexity unknown A technique for automated doubling experiments Emprical suggestions for worst-case time complexity Tradeo s in search-based test data generation

Slide 112

Slide 112 text

Key Contributions Search-based test data generation is often highly e ective, but worst-case time complexity unknown A technique for automated doubling experiments Emprical suggestions for worst-case time complexity Tradeo s in search-based test data generation https://github.com/kinneerc/ExpOse