Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Testing at Stream Scale

Testing at Stream Scale

Testing code is hard. Testing code that's processing 40 billion messages per day is harder. In this talk, I speak a bit about how we do that on MailChimp's Data Systems Team.

Matt Farmer

October 24, 2017
Tweet

More Decks by Matt Farmer

Other Decks in Programming

Transcript

  1. Testing at Stream Scale
    Matt Farmer | All Things Open 2017

    View Slide

  2. About Me

    View Slide

  3. About You

    View Slide

  4. The MailChimp Data Pipeline

    View Slide

  5. The MailChimp Data Pipeline
    Σ
    MailChimp MC Data Pipeline Kafka
    Data Science

    View Slide

  6. The MailChimp Data Pipeline
    - We care about the order of data.

    - We cannot avoid binary data. (MySQL Binlogs, Thrift)

    - Our ideal performance is to deliver updates within a minute

    View Slide

  7. The time to build a better pipeline

    View Slide

  8. View Slide

  9. View Slide

  10. View Slide

  11. The time to build a better pipeline

    View Slide

  12. 1. Articulate and communicate

    a testing strategy.

    View Slide

  13. Our Testing Guidelines
    Unit tests should test complex, non obvious behavior and
    exercise various failure conditions. Should be true unit tests.

    View Slide

  14. Our Testing Guidelines
    Integration tests need to be easier to build, even easier to
    keep up to date, and stick to testing the "happy path." Test
    cases should be easy to generate.

    View Slide

  15. Our Testing Guidelines
    Continuous end-to-end testing of accuracy and
    performance. Reporting and alerting on that testing over
    time.

    View Slide

  16. 1. Articulate and communicate

    a testing strategy.

    View Slide

  17. 2. Remove barriers to debugging
    and testing

    View Slide

  18. Prologue Tools

    View Slide

  19. 2. Remove barriers to debugging
    and testing

    View Slide

  20. 3. Pursue Greatness in Staging

    View Slide

  21. Attributes of a "Great" Streaming Staging Environment
    - Sourced from the same data as production.

    - Owned by the team maintaining the streaming application.

    - Sees a significant % of production data.

    - Stable and alerts when things break.

    View Slide

  22. 3. Pursue Greatness in Staging

    View Slide

  23. 4. Continuous End-to-End Testing

    View Slide

  24. Kafka Detective

    View Slide

  25. Kafka Detective
    Kafka Detective is a highly configurable utility for reporting
    on differences between two Kafka topics that should be
    semantically identical. This enables true, continuous end-to-
    end testing of a Kafka-based application.

    View Slide

  26. Kafka Detective
    Kafka Detective is a highly configurable utility for reporting
    on differences between two Kafka topics that should be
    semantically identical. This enables true, continuous end-to-
    end testing of a Kafka-based application.

    View Slide

  27. Staging Pipeline
    Production Pipeline
    Kafka Detective
    Σ

    Kafka Detective

    View Slide

  28. Kafka Detective Catches Real Issues!
    A configuration issue that caused our user data to become ordered
    incorrectly. (e.g. Inserts after Updates for the same DB record.)

    View Slide

  29. Kafka Detective Catches Real Issues!
    An issue constructing our message keys that broke MailChimp Pro.

    View Slide

  30. Kafka Detective Features
    - Pluggable semantics around matching messages

    - Pluggable reporters (Graphite and Kafka reporter included)

    - Capable of transforming (e.g. deserializing) mismatches before reporting

    - Capable of back pressuring when things appear out-of-sync

    - Capable of handling high volume with low memory

    View Slide

  31. Kafka Detective Roadmap
    - Better support things being unhealthy when Detective starts

    - Better support for use-cases that repartition data

    - Better support for use-cases where keys are not globally unique

    - Look into re-thinking what "success" means

    View Slide

  32. 4. Continuous End-to-End Testing

    View Slide

  33. TL;DL
    1. Articulate and communicate a testing strategy.

    2. Remove barriers to debugging and testing.

    3. Pursue greatness in staging.

    4. Do continuous end-to-end testing.

    View Slide

  34. We've open sourced Kafka Detective!

    View Slide

  35. detective.frmr.me

    View Slide

  36. Thank you!
    Twitter/GitHub: farmdawgnation

    [email protected]m

    View Slide