Testing at Stream Scale

Testing at Stream Scale Matt Farmer | All Things Open
2017

About Me

About You

The MailChimp Data Pipeline

The MailChimp Data Pipeline Σ MailChimp MC Data Pipeline Kafka
Data Science

The MailChimp Data Pipeline - We care about the order
of data. - We cannot avoid binary data. (MySQL Binlogs, Thrift) - Our ideal performance is to deliver updates within a minute

The time to build a better pipeline

1. Articulate and communicate  a testing strategy.

Our Testing Guidelines Unit tests should test complex, non obvious
behavior and exercise various failure conditions. Should be true unit tests.

Our Testing Guidelines Integration tests need to be easier to
build, even easier to keep up to date, and stick to testing the "happy path." Test cases should be easy to generate.

Our Testing Guidelines Continuous end-to-end testing of accuracy and performance.
Reporting and alerting on that testing over time.

1. Articulate and communicate  a testing strategy.

2. Remove barriers to debugging and testing

Prologue Tools

2. Remove barriers to debugging and testing

3. Pursue Greatness in Staging

Attributes of a "Great" Streaming Staging Environment - Sourced from
the same data as production. - Owned by the team maintaining the streaming application. - Sees a significant % of production data. - Stable and alerts when things break.

3. Pursue Greatness in Staging

4. Continuous End-to-End Testing

Kafka Detective

Kafka Detective Kafka Detective is a highly configurable utility for
reporting on differences between two Kafka topics that should be semantically identical. This enables true, continuous end-to- end testing of a Kafka-based application.

Staging Pipeline Production Pipeline Kafka Detective Σ Kafka Detective

Kafka Detective Catches Real Issues! A configuration issue that caused
our user data to become ordered incorrectly. (e.g. Inserts after Updates for the same DB record.)

Kafka Detective Catches Real Issues! An issue constructing our message
keys that broke MailChimp Pro.

Kafka Detective Features - Pluggable semantics around matching messages -
Pluggable reporters (Graphite and Kafka reporter included) - Capable of transforming (e.g. deserializing) mismatches before reporting - Capable of back pressuring when things appear out-of-sync - Capable of handling high volume with low memory

Kafka Detective Roadmap - Better support things being unhealthy when
Detective starts - Better support for use-cases that repartition data - Better support for use-cases where keys are not globally unique - Look into re-thinking what "success" means

4. Continuous End-to-End Testing

TL;DL 1. Articulate and communicate a testing strategy. 2. Remove
barriers to debugging and testing. 3. Pursue greatness in staging. 4. Do continuous end-to-end testing.

We've open sourced Kafka Detective!

detective.frmr.me

Thank you! Twitter/GitHub: farmdawgnation [email protected]

Testing at Stream Scale

Testing at Stream Scale

Matt Farmer

More Decks by Matt Farmer

Other Decks in Programming

Featured

Transcript