Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Highly Reliable Tests Data Management

August 25, 2018

Highly Reliable Tests Data Management

Automated tests are a great tool for regression testing, however they are as good as the data they use. There are different approaches to test data management and they all juggle between quality/quantity and its availability. One may use the current data present, other may use an obfuscated subset of production data, another will preload the full db with synthetic data. There is even a costly commercial tools that will do magic ‘data virtualization’ for you.

However, if you want a highly reliable automated tests the best approach is for each test to create all the data that it needs. Using this strategy your tests will be reliable, independent, could be run in parallel, could be run on any environment, on empty or on dirty database, could detect problems that they are not specifically programmed to do. The tests will also be very stable — we achieved 0.13% flakiness, as well as fast — we lowered the execution time from 3 hours to less than 3 minutes.

Those great advantages come at a cost however — you need to completely overhaul your testing framework. This presentation will help you do just that. From deciding which interfaces to use for data insertion to how to abstract this low level functionality at the correct level in your framework.

Test generation at the test case level is only one part of the solution. This presentation will also touch on topics such as random test data generation, strategies for cleaning test data and how to deal with test data if you’re using service virtualization when testing against 3rd party service outside of your control.


August 25, 2018

More Decks by emanuil

Other Decks in Programming


  1. Falcon’s flaky test rate: 0.13% Google’s flaky test rate: 1.5%*

    *Flaky Tests at Google and How We Mitigate Them @EmanuilSlavov
  2. Each test creates all the data that it needs. The

    way we achieved this @EmanuilSlavov
  3. The time needed to create data for one test And

    then the test starts Call 12 API endpoints Modify data in 11 tables Takes about 1.2 seconds @EmanuilSlavov
  4. Eum odit omnis impedit officia adipisci id non. random tweet

    '' Random Sentence Constant String Special Character random tweet Provident ipsa dolor excepturi quo asperiores animi. @someMention & random tweet Dignissimos eos accusamus aut ratione [email protected] random tweet Ut optio illum libero. Natus accusantium aliquam dolore atque voluptatum et a. http://ryanpacocha.biz/nikita random tweet @EmanuilSlavov
  5. Existing Tools (March 2016) Transparent Fake SSL certs Dynamic Responses

    Persist State Return Binary Data Regex URL match Stubby4J WireMock Wilma soapUI MockServer mounteback Hoverfly Mirage
  6. Independent (run in isolation) Run in random order (do all

    the state setting) Run in parallel (to bring speed) Run on any database (only schema is needed) Easy to investigate (independent data per test) Catch more bugs (using realistic generators) @EmanuilSlavov Advantages
  7. Use an official interface to insert the test data Careful

    when testing in production - write operations Don’t expose test-only endpoints to the outside world @EmanuilSlavov
  8. Use a dedicated test data set Use (sanitized) production data

    Seed a DB with test data before all tests start @EmanuilSlavov