Hunting for Bad Data: A Practitioner’s Guide to Self Healing Systems

Hunting for Bad Data A Practitioner’s Guide to Self Healing
Systems [email protected] @EmanuilSlavov

A Tale of Two Defects @EmanuilSlavov

@EmanuilSlavov

Defects Manifestations Resources Logs Data @EmanuilSlavov

Test your data the way you test your code. @EmanuilSlavov

@EmanuilSlavov

Gartner, 2013 IBM, 2016 Ovum, 2014 @EmanuilSlavov

5%of our data was bad. Any operation on it would
cause an exception/defect @EmanuilSlavov

19%of our backend exceptions are caused by bad data @EmanuilSlavov

What is Bad Data?* Missing Bad Format Unrealistic Unsynchronized Conﬂicting
Duplicated * The Quartz guide to bad data  github.com/Quartz/bad-data-guide @EmanuilSlavov

Two Types of Checks

Data Sanity Checks @EmanuilSlavov The data is clearly not valid,
wrong format, missing etc.

Business Data Checks @EmanuilSlavov The data looks valid, but does
not conform to the business at hand.

Self Check @EmanuilSlavov

Investigate Defect or Exception Fix Root Cause Write Automated Test
@EmanuilSlavov

Investigate Defect or Exception Caused by Bad Data? Add Automatic
Bad Data Check Fix Root Cause Write Automated Test Yes No System Level [database] Unit Level [codebase] @EmanuilSlavov

Check production periodically Run after automated tests pass* *on dedicated
test environment @EmanuilSlavov

The Problems with DB Checks May take too much time
Data entering the system @EmanuilSlavov

Data Input Data Output Some Manipulation DB Usually covered by
input validation Most likely checks are missing Read/Write Checks @EmanuilSlavov

Checks Before DB Write @EmanuilSlavov

Schema vs NoSchema @EmanuilSlavov

@EmanuilSlavov DB Schema Advantages Data Type Default Value Permitted Values
Nullable

@EmanuilSlavov github.com/emanuil/kobold

@EmanuilSlavov

As testers, our job does not end when we release
a feature. @EmanuilSlavov

Data Repair @EmanuilSlavov

Automatic detection is good, but automatic repair is better. @EmanuilSlavov

Its faster to fix data than code The offending code
is might not be there Might be hard to find what caused the bad data Future defect prevention @EmanuilSlavov

Standard Fixes @EmanuilSlavov

Remove an entry Set a default value Extract missing value
from metadata Approximate from neighboring data Request missing data again Archive/delete old data @EmanuilSlavov

Those standard ﬁxes are easy to script. @EmanuilSlavov

Run the script automatically on a given period to self
heal your system. @EmanuilSlavov

How to Start? @EmanuilSlavov

Deﬁne what is bad data for your context Examine Bugs
and Exceptions Put checks in a script and run it periodically Study the common ﬁxes and script them Make sure your backups are working @EmanuilSlavov

Jidoka ⾃自働化 @EmanuilSlavov

@EmanuilSlavov

WE’RE HIRING.

Hunting for Bad Data: A Practitioner’s Guide to...

Hunting for Bad Data: A Practitioner’s Guide to Self Healing Systems

More Decks by emanuil

Other Decks in Programming

Featured

Transcript