Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Automatically evaluating the efficiency of sear...

Automatically evaluating the efficiency of search-based test data generation for relational database schemas

Interested in learning more about this topic? Visit this web site to read the paper: https://www.gregorykapfhammer.com/research/papers/Kinneer2015/

Gregory Kapfhammer

July 15, 2015
Tweet

More Decks by Gregory Kapfhammer

Other Decks in Technology

Transcript

  1. Automatically Evaluating the Efficiency of Search-Based Test Data Generation (for

    Relational Database Schemas) Cody Kinneer SEKE 2015 July 7, 2015
  2. Search-Based Testing 0 0.2 0.4 0.6 0.8 1 0 0.5

    1 0 0.5 1 Often much more e ective than random testing
  3. Performance of SBST Fitness Function Data Generator Restart Rule Stop

    Rule Search Budget How do parameter values in uence the e ciency of SBST?
  4. Relational Databases Deployment Locations for Databases Database Application Server Mobile

    Phone or Tablet O ce and Productivity Software Government
  5. Relational Databases Deployment Locations for Databases Database Application Server Mobile

    Phone or Tablet O ce and Productivity Software Government Astrophysics
  6. Database Schemas Relational Database Management System E-commerce Schema State Schema

    Integrity Constraints PRIMARY KEY FOREIGN KEY Arbitrary CHECK
  7. Database Testing The Data Warehouse Institute reports that North American

    organizations experience a $611 billion annual loss due to poor data quality
  8. Database Testing The Data Warehouse Institute reports that North American

    organizations experience a $611 billion annual loss due to poor data quality Scott W. Ambler argues that the “virtual absence” of database testing — the validation of the contents, schema, and functionality of the database — is the primary cause of this loss
  9. Database Testing The Data Warehouse Institute reports that North American

    organizations experience a $611 billion annual loss due to poor data quality Scott W. Ambler argues that the “virtual absence” of database testing — the validation of the contents, schema, and functionality of the database — is the primary cause of this loss Past papers presented SchemaAnalyst, a search-based system for testing the complex integrity constraints in relational schemas
  10. Method of Approach SchemaAnalyst Execution Coverage Criterion Data Generator Runtime

    Schema Doubler Provides Schema Database Schema Doubler Choice
  11. Method of Approach SchemaAnalyst Execution Coverage Criterion Data Generator Runtime

    Schema Doubler Provides Schema Database Schema Doubler Choice Convergence Algorithm Continue?
  12. Doubling Schemas Table Column 1 Column 2 . . .

    Column n NOT NULL PRIMARY KEY
  13. Doubling Schemas Table Column 1 Column 2 . . .

    Column n NOT NULL PRIMARY KEY UNIQUE
  14. Doubling Schemas Table Column 1 Column 2 . . .

    Column n NOT NULL PRIMARY KEY UNIQUE CHECK
  15. Doubling Schemas Table Column 1 Column 2 . . .

    Column n NOT NULL PRIMARY KEY UNIQUE CHECK FORIEGN KEY
  16. Doubling Schemas Table Column 1 Column 2 . . .

    Column n NOT NULL PRIMARY KEY UNIQUE CHECK FORIEGN KEY NOT NULL
  17. Experiments Experimental Parameters Coverage Criterion Data Generator Doubling Technique Database

    Schema Over 2,000 unique combinations of parameters! Experiments ran on HPC cluster with 3,440 cores
  18. Relational Schemas Schema Tables Columns Constraints BioSQL 28 129 186

    Cloc 2 10 0 iTrust 42 309 134 JWhoisServer 6 49 50 NistWeather 2 9 13 NistXTS7 1 3 3 NistXTS749 1 3 3 RiskIt 13 57 36 UnixUsage 8 32 24
  19. Relational Schemas Schema Tables Columns Constraints BioSQL 28 129 186

    Cloc 2 10 0 iTrust 42 309 134 JWhoisServer 6 49 50 NistWeather 2 9 13 NistXTS7 1 3 3 NistXTS749 1 3 3 RiskIt 13 57 36 UnixUsage 8 32 24
  20. Relational Schemas Schema Tables Columns Constraints BioSQL 28 129 186

    Cloc 2 10 0 iTrust 42 309 134 JWhoisServer 6 49 50 NistWeather 2 9 13 NistXTS7 1 3 3 NistXTS749 1 3 3 RiskIt 13 57 36 UnixUsage 8 32 24
  21. Relational Schemas Schema Tables Columns Constraints BioSQL 28 129 186

    Cloc 2 10 0 iTrust 42 309 134 JWhoisServer 6 49 50 NistWeather 2 9 13 NistXTS7 1 3 3 NistXTS749 1 3 3 RiskIt 13 57 36 UnixUsage 8 32 24
  22. Empirical Results Doubled UNIQUEs NOT NULLs CHECKs 699 Experiments 8%

    Stopped 20% O(1) or O(log) 72% O(n) or O(n log n)
  23. Empirical Results Doubled UNIQUEs NOT NULLs CHECKs 699 Experiments 8%

    Stopped 20% O(1) or O(log) 72% O(n) or O(n log n) SchemaAnalyst ∈ O(n) for constraints studied
  24. Empirical Results Doubled Tables 467 Experiments 56% Stopped 72 O(n2)

    10 O(n3) SchemaAnalyst ∈ O(n3) or worse for tables
  25. Empirical Results Doubled Columns 467 Experiments 203 Stopped 208 O(n)

    or O(n log n) 28 O(n2) and 2 O(n3) SchemaAnalyst ∈ O(n3) or worse for columns
  26. Key Contributions Search-based test data generation is often highly e

    ective, but worst-case time complexity unknown
  27. Key Contributions Search-based test data generation is often highly e

    ective, but worst-case time complexity unknown A technique for automated doubling experiments
  28. Key Contributions Search-based test data generation is often highly e

    ective, but worst-case time complexity unknown A technique for automated doubling experiments Emprical suggestions for worst-case time complexity
  29. Key Contributions Search-based test data generation is often highly e

    ective, but worst-case time complexity unknown A technique for automated doubling experiments Emprical suggestions for worst-case time complexity Tradeo s in search-based test data generation
  30. Key Contributions Search-based test data generation is often highly e

    ective, but worst-case time complexity unknown A technique for automated doubling experiments Emprical suggestions for worst-case time complexity Tradeo s in search-based test data generation https://github.com/kinneerc/ExpOse