Upgrade to Pro — share decks privately, control downloads, hide ads and more …

measuring api performance using druid

measuring api performance using druid

Druid with auto scale, monitoring metrics to build trust with our clients and wishlist from Druid.

Ananth Packkildurai

November 28, 2017
Tweet

More Decks by Ananth Packkildurai

Other Decks in Programming

Transcript

  1. Ananth Packkildurai
    November 28, 2017
    1
    Measuring Slack API performance using Druid

    View full-size slide

  2. Public launch: 2014 800+ employees across
    7 countries worldwide
    HQ in San Francisco
    Diverse set of industries
    including software/technology, retail, media,
    telecom and professional services.
    About Slack

    View full-size slide

  3. An unprecedented adoption rate

    View full-size slide

  4. Agenda
    1. A bit history.
    2. Druid infrastructure & usecases
    3. Challenges.

    View full-size slide

  5. A bit history

    View full-size slide

  6. March 2016
    5 350+ 2M
    Data Engineers Slack employees Active users

    View full-size slide

  7. October 2017
    10 800+ 6M
    Data Engineers Slack employees Active users

    View full-size slide

  8. Data usage
    1 in 3 per
    week
    500+
    tables
    400k
    access data
    warehouse
    Tables Events per sec

    View full-size slide

  9. It is all about Slogs

    View full-size slide

  10. Well, not exactly

    View full-size slide

  11. Druid infrastructure & usecases

    View full-size slide

  12. What can go wrong?

    View full-size slide

  13. We want more...

    View full-size slide

  14. Performance & Experimentation
    ● Engineering & CE team should be able to detect the performance
    bottleneck proactively.
    ● Engineers should be able to see their experimentation performance in
    near real-time.

    View full-size slide

  15. Near Real time Pipeline

    View full-size slide

  16. Keep the load in DW Kafka predictable.
    More comfortable to upgrade and verify newer Kafka version.
    Smaller Kafka cluster is relatively more straightforward to operate.
    Why Analytics Kafka

    View full-size slide

  17. Druid Architecture

    View full-size slide

  18. Middle manager Autoscale based on number of running tasks.
    Historical node autoscale based on the segment size.
    Fault tolerance deployment for overlord & Coordinator
    Brokers autoscale and load balanced by ELB.
    Druid Architecture

    View full-size slide

  19. Cascading failures

    View full-size slide

  20. Forward Index fields

    View full-size slide

  21. Bridge the gap between batch and
    realtime tables.

    View full-size slide

  22. Thank You!
    26

    View full-size slide