Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Testing database-driven applications: Challenges and solutions

Testing database-driven applications: Challenges and solutions

Interested in learning more about this topic? Read the overview of my research to learn more: https://www.gregorykapfhammer.com/research/

Gregory Kapfhammer

May 14, 2004
Tweet

More Decks by Gregory Kapfhammer

Other Decks in Research

Transcript

  1. Testing Database-Driven Applications:
    Challenges and Solutions
    Gregory M. Kapfhammer
    Department of Computer Science
    University of Pittsburgh
    Department of Computer Science
    Allegheny College
    Mary Lou Soffa
    Department of Computer Science
    University of Pittsburgh
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 1/32

    View full-size slide

  2. Outline
    Introduction and Motivation
    Testing Challenges
    Database-Driven Applications
    A Unified Representation
    Test Adequacy Criteria
    Test Suite Execution
    Test Coverage Monitoring
    Conclusions and Resources
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 2/32

    View full-size slide

  3. Motivation
    The Risks Digest, Volume 22, Issue 64, 2003
    Jeppesen reports airspace boundary problems
    About 350 airspace boundaries contained in Jeppesen
    NavData are incorrect, the FAA has warned. The error
    occurred at Jeppesen after a software upgrade when
    information was pulled from a database containing
    20,000 airspace boundaries worldwide for the March
    NavData update, which takes effect March 20.
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 3/32

    View full-size slide

  4. Testing Challenges
    Should consider the environment in which software
    applications execute
    Must test a program and its interaction with a database
    Database-driven application’s state space is
    well-structured, but infinite (Chays et al.)
    Need to show program does not violate database
    integrity, where integrity = consistency + validity (Motro)
    Must locate program and database coupling points that
    vary in granularity
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 4/32

    View full-size slide

  5. Testing Challenges
    The structured query language’s (SQL) data
    manipulation language (DML) and data definition
    language (DDL) have different interaction
    characteristics
    Database state changes cause modifications to the
    program representation
    Different kinds of test suites require different
    techniques for managing database state during testing
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 5/32

    View full-size slide

  6. Testing Challenges
    The many testing challenges include, but are not
    limited to, the following:
    Unified program representation
    Family of test adequacy criteria
    Efficient test coverage monitoring techinques
    Intelligent approaches to test suite execution
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 6/32

    View full-size slide

  7. Database-Driven Applications
    P
    m i
    m
    j
    Dl
    Dk
    R
    R2
    1
    E F G H
    A B C D
    I
    R3
    J K L
    Program P interacts with two relational
    databases
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 7/32

    View full-size slide

  8. Database Interaction Levels
    Database Level
    D1
    P
    Dn
    A program can interact with a
    database at different levels of
    granularity
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 8/32

    View full-size slide

  9. Database Interaction Levels
    UserInfo
    user_name
    4
    acct_lock
    1 Brian Zorman
    2 Robert Roos
    3
    card_number pin_number
    Geoffrey Arnold
    0
    0
    0
    0
    32142
    41601
    45322
    56471
    Marcus Bittman
    Relation Level
    P
    D1
    Dn
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 8/32

    View full-size slide

  10. Database Interaction Levels
    UserInfo
    user_name
    4
    acct_lock
    1 Brian Zorman
    2 Robert Roos
    3
    card_number pin_number
    Geoffrey Arnold
    0
    0
    0
    0
    32142
    41601
    45322
    56471
    Marcus Bittman
    Record Level
    P
    n
    D1
    D
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 8/32

    View full-size slide

  11. Database Interaction Levels
    UserInfo
    4
    acct_lock
    1 Brian Zorman
    2 Robert Roos
    3
    card_number pin_number
    0
    0
    0
    0
    32142
    41601
    45322
    56471
    user_name
    Attribute Level
    Marcus Bittman
    Geoffrey Arnold
    P
    D1
    Dn
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 8/32

    View full-size slide

  12. Database Interaction Levels
    UserInfo
    4
    acct_lock
    1 Brian Zorman
    2 Robert Roos
    3
    card_number pin_number
    Geoffrey Arnold
    0
    0
    0
    0
    32142
    41601
    45322
    56471
    user_name
    Attribute Value Level
    Marcus Bittman
    P
    D1
    n
    D
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 8/32

    View full-size slide

  13. Database Interaction Points
    Database interaction point Ir ∈ I corresponds to the
    execution of a SQL DML statement
    Consider a simplified version of SQL and ignore SQL
    DDL statements (for the moment)
    Interaction points are normally encoded within Java
    programs as dynamically constructed Strings
    select uses Dk
    , delete defines Dk
    , insert defines Dk
    ,
    update defines and/or uses Dk
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 9/32

    View full-size slide

  14. Database Interaction Points (DML)
    select A1, A2, . . . , Aq
    from r1, r2, . . . , rm
    where Q
    delete from r
    where Q
    insert into r(A1, A2, . . . , Aq
    )
    values(v1, v2, . . . , vq
    )
    update r
    set Al
    = F(Al
    )
    where Q
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 10/32

    View full-size slide

  15. Refined Database-Driven Application
    P
    m i
    m
    j
    R
    R2
    1
    E F G H
    A B C D
    l
    D
    k
    D
    where
    set J = 500
    update
    L < 1000
    R3
    select 1
    * from R
    R2
    from )
    select
    where A < ( avg(G)
    I
    R3
    J K L
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 11/32

    View full-size slide

  16. Test Adequacy Criteria
    P violates a database Dk
    ’s validity when it:
    (1-v) inserts entities into Dk
    that do not reflect real
    world
    P violates a database Dk
    ’s completeness when it:
    (1-c) deletes entities from Dk
    that still reflect real
    world
    In order to verify (1-v) and (1-c), T must cause P to
    define and then use entities within D1
    , . . . , Dn
    !
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 12/32

    View full-size slide

  17. Data Flow Information
    Interaction point: ‘‘UPDATE UserInfo SET
    acct lock=1 WHERE card number=’’ +
    card number + ‘‘;’’;
    Database Level: define(BankDB)
    Attribute Level: define(acct_lock) and
    use(card_number)
    Data flow information varies with respect to
    the granularity of the database interaction
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 13/32

    View full-size slide

  18. Database Entities
    UserInfo user_name
    4
    acct_lock
    1 Brian Zorman
    2 Robert Roos
    3
    card_number pin_number
    Marcus Bittman
    Geoffrey Arnold
    41601
    45322
    56471
    32142
    0
    0
    0
    0
    v r
    A (I ) = { 32142 }
    1 Geoffrey Arnold 0
    , , . . . , ,
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 14/32

    View full-size slide

  19. The DICFG: A Unified Representation
    entry lockAccount
    temp1 = parameter0:c_n
    temp2 = LocalDatabaseEntity0:Bank
    temp3 = LocalDatabaseEntity1:acct_lock
    temp4 = LocalDatabaseEntity2:card_number
    “Database-enhanced”
    CFG for lockAccount
    Define temporaries to
    represent the
    program’s interaction
    at the levels of
    database and attribute
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 15/32

    View full-size slide

  20. The DICFG: A Unified Representation
    exit
    G
    G G
    G
    r
    r2
    r 2
    r1
    1
    entry entry
    exit
    lockAccount
    update_lock = m_connect.createStatement()
    if( result_lock == 1)
    completed = true
    exit
    D
    qu_lck = "UPDATE UserInfo ..." + temp1 + ";"
    use(temp4)
    result_lock = update_lock.executeUpdate(qu_lck)
    define(temp2)
    A
    Ir
    define(temp3)
    Database interaction
    graphs (DIGs) are
    placed before interaction
    point Ir
    Multiple DIGs can be
    integrated into a single
    CFG
    String at Ir is
    determined in a
    control-flow sensitive
    fashion
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 15/32

    View full-size slide

  21. Test Adequacy Criteria
    all−attribute−value−DUs
    all−record−DUs all−attribute−DUs
    all−relation−DUs all−database−DUs
    Database interaction
    association (DIA) involves the
    def and use of a database
    entity
    DIAs can be located in the
    DICFG with data flow analysis
    all-database-DUs requires
    tests to exercise all DIAs for all
    of the accessed databases
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 16/32

    View full-size slide

  22. Generating Test Requirements
    Database Seeder
    Database
    (P)
    Test Adequacy Criterion
    (C)
    System Under Test
    Test Case Specification
    Relational Schema
    Requirements
    Test
    Measured time and space overhead
    when computing family of test adequacy
    criteria
    Modified ATM and mp3cd to contain
    appropriate database interaction points
    Soot 1.2.5 to calculate intraprocedural
    associations
    GNU/Linux workstation with kernel
    2.4.18-smp and dual 1 GHz Pentium III
    Xeon processors
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 17/32

    View full-size slide

  23. Counting Associations and Definitions
    D Rc Rl A Av
    Database Granularity
    0
    250
    500
    750
    1000
    1250
    1500
    1750
    Assoc & Def Count
    D Rc Rl A Av
    mp3cd HD
    ATM HD
    mp3cd DB
    ATM DB
    DIAs at attribute value level represent 16.8% of mp3cd’s and
    9.6% of ATM’s total number of intraprocedural associations
    – p. 18/32

    View full-size slide

  24. Measuring Time Overhead
    None D Rc Rl A Av
    Database Granularity
    22.5
    25
    27.5
    30
    32.5
    35
    37.5
    System Time sec
    None D Rc Rl A Av
    Time Overhead
    mp3cd
    ATM
    Computing DIAs at the attribute value level incurs no more
    than a 5 second time overhead
    – p. 19/32

    View full-size slide

  25. Measuring Average Space Overhead
    None D Rc Rl A Av
    Database Granularity
    16
    18
    20
    22
    24
    26
    28
    30
    Node & Edge Count
    None D Rc Rl A Av
    Emp3
    Eatm
    Nmp3
    Natm
    mp3cd shows a more marked increase in the average
    number of nodes and edges than ATM – p. 20/32

    View full-size slide

  26. Measuring Maximum Space Overhead
    None D Rc Rl A Av
    Database Granularity
    200
    400
    600
    800
    1000
    1200
    1400
    Node & Edge Count
    None D Rc Rl A Av
    Emp3
    Eatm
    Nmp3
    Natm
    mp3cd shows a significantly greater maximum space
    overhead than ATM
    – p. 21/32

    View full-size slide

  27. Automatic Representation Construction
    Manual construction of DICFGs is not practical
    Use extension of BRICS Java String Analyzer (JSA) to
    determine content of String at Ir
    Per-class analysis is inter-procedural and control flow
    sensitive
    Conservative analysis might determine that all
    database entities are accessed
    Include coverage monitoring instrumentation to track
    DIGs that are covered during test suite execution
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 22/32

    View full-size slide

  28. Tracking Covered DIGs and DIAs
    DB
    P DIGr
    m i
    m
    j
    DIGs
    DIG #
    1
    DEF USE
    { ... } { ... }
    q { ... } { ... }
    1
    1
    2
    2
    TEST
    { ... }
    { ... }
    COV?
    DIG Coverage Table
    DIA coverage can be tracked by recording which DIGs
    within a DICFG were executed during testing
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 23/32

    View full-size slide

  29. Types of Test Suites
    T
    T

    1
    e
    m
    1

    1
    e

    0
    m
    e
    ∆e−1
    Independent

    m
    T1 1
    0
    ∆1
    ε−1

    Tε ε
    m
    ε

    e

    e
    m
    Te
    ∆e
    Partially Independent
    T
    T

    1
    e
    m
    1

    1
    e

    0
    m
    e
    ∆e−1
    Non-restricted
    – p. 24/32

    View full-size slide

  30. Test Suite Execution
    Independent test suites can be executed by using
    provided setup code to ensure that all ∆γ
    = ∆0
    Non-restricted test suites simply allow state to accrue
    Partially independent test suites must return to ∆ε
    after

    is executed by :
    1. Re-executing all SQL statements that resulted in
    the creation of ∆ε
    2. Creating a compensating transaction to undo the
    SQL statements executed by each test after Tε
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 25/32

    View full-size slide

  31. Representation Extension
    The execution of a SQL insert during testing requires
    the re-creation of DICFG(s)
    The SQL delete does not require re-creation because
    we must still determine if deleted entity is ever used
    DICFG re-creation only needed when database
    interactions are viewed at the record or attribute-value
    level
    Representation extension ripples to other methods
    DICFGs can be re-constructed after test suite has
    executed, thus incurring smaller time overhead
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 26/32

    View full-size slide

  32. Test Coverage Monitoring
    For each tested method mi
    that interacts with a
    database and each interaction point Ir
    that involves an
    insert we must:
    1. Update the DICFG
    2. Re-compute the test requirements
    We can compute the set of covered DIAs by consulting
    the DIG coverage table
    Test adequacy is : # covered DIAs / # total DIAs
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 27/32

    View full-size slide

  33. Calculating Adequacy
    m
    i
    mj
    m
    i
    DIA


    COV?


    Test Requirements
    DIA COV?






    Test Requirements
    m
    j
    Tf
    cov(mi
    ) = 2
    4
    cov(mj
    ) = 4
    6
    cov(Tf
    ) = 6
    10
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 28/32

    View full-size slide

  34. Related Work
    Jin and Offutt and Whittaker and Voas have suggested that
    the environment of a software system is important
    Chan and Cheung transform SQL statements into C code
    segments
    Chays et al. and Chays and Deng have created the
    category-partition inspired AGENDA tool suite
    Neufeld et al. and Zhang et al. have proposed techniques
    for database state generation
    Dauo et al. focused on the regression testing of
    database-driven applications
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 29/32

    View full-size slide

  35. Conclusions
    Must test the program’s interaction with the database
    Many challenges associated with (1) unified program
    representation, (2) test adequacy criteria, (3) test coverage
    monitoring, (4) test suite execution
    The DICFG shows database interactions at varying levels of
    granularity
    Unique family of test adequacy criteria to detect type (1)
    violations of database validity and completeness
    Intraprocedural database interactions can be computed from
    a DICFG with minimal time and space overhead
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 30/32

    View full-size slide

  36. Conclusions
    Test coverage monitoring instrumentation supports the
    tracking of DIAs executed during testing
    Three types of test suites require different techniques to
    manage the state of the database
    SQL insert statement causes the re-creation of the
    representation and re-computation of test requirements
    Data flow-based test adequacy criteria can serve as the
    foundation for automatically generating test cases and
    supporting regression testing
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 31/32

    View full-size slide

  37. Resources
    Gregory M. Kapfhammer and Mary Lou Soffa. A Family of
    Test Adequacy Criteria for Database-Driven Applications.
    In FSE 2003.
    Gregory M. Kapfhammer. Software Testing. CRC Press
    Computer Science Handbook. June, 2004.
    http://cs.allegheny.edu/˜gkapfham/research/diatoms/
    Database drIven Application T esting tOol ModuleS, IBM T.J. Watson Research Center, May 14, 2004 – p. 32/32

    View full-size slide