Hunting for Bad Data
A Practitioner’s Guide to Self Healing Systems
[email protected]
@EmanuilSlavov
Slide 2
Slide 2 text
A Tale of Two Defects
@EmanuilSlavov
Slide 3
Slide 3 text
@EmanuilSlavov
Slide 4
Slide 4 text
@EmanuilSlavov
Slide 5
Slide 5 text
Defects
Manifestations
Resources
Logs
Data
@EmanuilSlavov
Slide 6
Slide 6 text
Test your data the way
you test your code.
@EmanuilSlavov
Slide 7
Slide 7 text
@EmanuilSlavov
Slide 8
Slide 8 text
Gartner, 2013
IBM, 2016
Ovum, 2014
@EmanuilSlavov
Slide 9
Slide 9 text
5%of our data was bad. Any operation
on it would cause an exception/defect
@EmanuilSlavov
Slide 10
Slide 10 text
19%of our backend exceptions
are caused by bad data
@EmanuilSlavov
Slide 11
Slide 11 text
What is Bad Data?*
Missing Bad Format
Unrealistic Unsynchronized
Conflicting
Duplicated
* The Quartz guide to bad data
github.com/Quartz/bad-data-guide
@EmanuilSlavov
Slide 12
Slide 12 text
Two Types
of Checks
Slide 13
Slide 13 text
Data Sanity Checks
@EmanuilSlavov
The data is clearly not valid, wrong format,
missing etc.
Slide 14
Slide 14 text
Business Data Checks
@EmanuilSlavov
The data looks valid, but does not conform
to the business at hand.
Slide 15
Slide 15 text
Self Check
@EmanuilSlavov
Slide 16
Slide 16 text
Investigate Defect
or Exception
Fix Root
Cause
Write Automated
Test
@EmanuilSlavov
Slide 17
Slide 17 text
Investigate Defect
or Exception
Caused by
Bad Data?
Add Automatic
Bad Data Check
Fix Root
Cause
Write Automated
Test
Yes
No
System Level
[database]
Unit Level
[codebase]
@EmanuilSlavov
Slide 18
Slide 18 text
Check production periodically
Run after automated tests pass*
*on dedicated test environment
@EmanuilSlavov
Slide 19
Slide 19 text
The Problems
with DB Checks
May take too much time
Data entering the system
@EmanuilSlavov
Slide 20
Slide 20 text
Data Input
Data Output
Some
Manipulation
DB
Usually covered
by input validation
Most likely checks
are missing
Read/Write Checks
@EmanuilSlavov
Slide 21
Slide 21 text
Checks Before
DB Write
@EmanuilSlavov
Slide 22
Slide 22 text
Schema
vs
NoSchema
@EmanuilSlavov
Slide 23
Slide 23 text
@EmanuilSlavov
DB Schema Advantages
Data Type
Default Value
Permitted Values
Nullable
Slide 24
Slide 24 text
@EmanuilSlavov
github.com/emanuil/kobold
Slide 25
Slide 25 text
@EmanuilSlavov
Slide 26
Slide 26 text
@EmanuilSlavov
Slide 27
Slide 27 text
As testers, our job does not
end when we release a feature.
@EmanuilSlavov
Slide 28
Slide 28 text
Data Repair
@EmanuilSlavov
Slide 29
Slide 29 text
Automatic detection is good,
but automatic repair is better.
@EmanuilSlavov
Slide 30
Slide 30 text
Its faster to fix data than code
The offending code is might not be there
Might be hard to find what caused the bad data
Future defect prevention
@EmanuilSlavov
Slide 31
Slide 31 text
Standard
Fixes
@EmanuilSlavov
Slide 32
Slide 32 text
Remove an entry
Set a default value
Extract missing value from metadata
Approximate from neighboring data
Request missing data again
Archive/delete old data
@EmanuilSlavov
Slide 33
Slide 33 text
Those standard fixes
are easy to script.
@EmanuilSlavov
Slide 34
Slide 34 text
Run the script automatically
on a given period to self heal
your system.
@EmanuilSlavov
Slide 35
Slide 35 text
How to
Start?
@EmanuilSlavov
Slide 36
Slide 36 text
Define what is bad data for your context
Examine Bugs and Exceptions
Put checks in a script and run it periodically
Study the common fixes and script them
Make sure your backups are working
@EmanuilSlavov