Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Testing at Stream Scale

Testing at Stream Scale

Testing code is hard. Testing code that's processing 40 billion messages per day is harder. In this talk, I speak a bit about how we do that on MailChimp's Data Systems Team.

Matt Farmer

October 24, 2017
Tweet

More Decks by Matt Farmer

Other Decks in Programming

Transcript

  1. The MailChimp Data Pipeline - We care about the order

    of data. - We cannot avoid binary data. (MySQL Binlogs, Thrift) - Our ideal performance is to deliver updates within a minute
  2. Our Testing Guidelines Unit tests should test complex, non obvious

    behavior and exercise various failure conditions. Should be true unit tests.
  3. Our Testing Guidelines Integration tests need to be easier to

    build, even easier to keep up to date, and stick to testing the "happy path." Test cases should be easy to generate.
  4. Attributes of a "Great" Streaming Staging Environment - Sourced from

    the same data as production. - Owned by the team maintaining the streaming application. - Sees a significant % of production data. - Stable and alerts when things break.
  5. Kafka Detective Kafka Detective is a highly configurable utility for

    reporting on differences between two Kafka topics that should be semantically identical. This enables true, continuous end-to- end testing of a Kafka-based application.
  6. Kafka Detective Kafka Detective is a highly configurable utility for

    reporting on differences between two Kafka topics that should be semantically identical. This enables true, continuous end-to- end testing of a Kafka-based application.
  7. Kafka Detective Catches Real Issues! A configuration issue that caused

    our user data to become ordered incorrectly. (e.g. Inserts after Updates for the same DB record.)
  8. Kafka Detective Features - Pluggable semantics around matching messages -

    Pluggable reporters (Graphite and Kafka reporter included) - Capable of transforming (e.g. deserializing) mismatches before reporting - Capable of back pressuring when things appear out-of-sync - Capable of handling high volume with low memory
  9. Kafka Detective Roadmap - Better support things being unhealthy when

    Detective starts - Better support for use-cases that repartition data - Better support for use-cases where keys are not globally unique - Look into re-thinking what "success" means
  10. TL;DL 1. Articulate and communicate a testing strategy. 2. Remove

    barriers to debugging and testing. 3. Pursue greatness in staging. 4. Do continuous end-to-end testing.