Automated tests are a great tool for regression testing, however they are as good as the data they use. There are different approaches to test data management and they all juggle between quality/quantity and its availability. One may use the current data present, other may use an obfuscated subset of production data, another will preload the full db with synthetic data. There is even a costly commercial tools that will do magic ‘data virtualization’ for you.
However, if you want a highly reliable automated tests the best approach is for each test to create all the data that it needs. Using this strategy your tests will be reliable, independent, could be run in parallel, could be run on any environment, on empty or on dirty database, could detect problems that they are not specifically programmed to do. The tests will also be very stable — we achieved 0.13% flakiness, as well as fast — we lowered the execution time from 3 hours to less than 3 minutes.
Those great advantages come at a cost however — you need to completely overhaul your testing framework. This presentation will help you do just that. From deciding which interfaces to use for data insertion to how to abstract this low level functionality at the correct level in your framework.
Test generation at the test case level is only one part of the solution. This presentation will also touch on topics such as random test data generation, strategies for cleaning test data and how to deal with test data if you’re using service virtualization when testing against 3rd party service outside of your control.
HIGHLY RELIABLE TESTS
High Level Automated Tests Problems
*Need for Speed: Accelerate Tests from 3 Hours to 3 Minutes
Falcon’s ﬂaky test rate: 0.13%
Google’s ﬂaky test rate: 1.5%*
*Flaky Tests at Google and How We Mitigate Them
Each test creates all the data that it needs.
The way we achieved this
The time needed to create data for one test
And then the test starts
Call 12 API endpoints
Modify data in 11 tables
Takes about 1.2 seconds
Static vs Dynamic Data
Eum odit omnis impedit oﬃcia adipisci id non. random tweet ''
Random Sentence Constant String Special Character
random tweet Provident ipsa dolor excepturi quo asperiores animi. @someMention
& random tweet Dignissimos eos accusamus aut ratione
[email protected] random tweet Ut optio illum libero.
Natus accusantium aliquam dolore atque voluptatum et a. http://ryanpacocha.biz/nikita random tweet
Existing Tools (March 2016)
Fake SSL certs
Return Binary Data
Regex URL match
Independent (run in isolation)
Run in random order (do all the state setting)
Run in parallel (to bring speed)
Run on any database (only schema is needed)
Easy to investigate (independent data per test)
Catch more bugs (using realistic generators)
Tips & Tricks
Use an oﬃcial interface to insert the test data
Careful when testing in production - write operations
Don’t expose test-only endpoints to the outside world
Test Data Cleaning
Each Test Deletes its Data
Tag Test Data
In case of a Dedicated Test Environment
Other Test Data Strategies
Use a dedicated test data set
Use (sanitized) production data
Seed a DB with test data before all tests start
Soﬁa · Copenhagen · Budapest