Upgrade to Pro — share decks privately, control downloads, hide ads and more …

measuring api performance using druid

measuring api performance using druid

Druid with auto scale, monitoring metrics to build trust with our clients and wishlist from Druid.

Ananth Packkildurai

November 28, 2017
Tweet

More Decks by Ananth Packkildurai

Other Decks in Programming

Transcript

  1. Ananth Packkildurai
    November 28, 2017
    1
    Measuring Slack API performance using Druid

    View Slide

  2. Public launch: 2014 800+ employees across
    7 countries worldwide
    HQ in San Francisco
    Diverse set of industries
    including software/technology, retail, media,
    telecom and professional services.
    About Slack

    View Slide

  3. An unprecedented adoption rate

    View Slide

  4. Agenda
    1. A bit history.
    2. Druid infrastructure & usecases
    3. Challenges.

    View Slide

  5. A bit history

    View Slide

  6. March 2016
    5 350+ 2M
    Data Engineers Slack employees Active users

    View Slide

  7. October 2017
    10 800+ 6M
    Data Engineers Slack employees Active users

    View Slide

  8. Data usage
    1 in 3 per
    week
    500+
    tables
    400k
    access data
    warehouse
    Tables Events per sec

    View Slide

  9. It is all about Slogs

    View Slide

  10. Well, not exactly

    View Slide

  11. Slog

    View Slide

  12. Slog

    View Slide

  13. Druid infrastructure & usecases

    View Slide

  14. What can go wrong?

    View Slide

  15. We want more...

    View Slide

  16. Performance & Experimentation
    ● Engineering & CE team should be able to detect the performance
    bottleneck proactively.
    ● Engineers should be able to see their experimentation performance in
    near real-time.

    View Slide

  17. Near Real time Pipeline

    View Slide

  18. Keep the load in DW Kafka predictable.
    More comfortable to upgrade and verify newer Kafka version.
    Smaller Kafka cluster is relatively more straightforward to operate.
    Why Analytics Kafka

    View Slide

  19. Druid Architecture

    View Slide

  20. Middle manager Autoscale based on number of running tasks.
    Historical node autoscale based on the segment size.
    Fault tolerance deployment for overlord & Coordinator
    Brokers autoscale and load balanced by ELB.
    Druid Architecture

    View Slide

  21. Challenges

    View Slide

  22. Cascading failures

    View Slide

  23. Forward Index fields

    View Slide

  24. SQL

    View Slide

  25. Bridge the gap between batch and
    realtime tables.

    View Slide

  26. Thank You!
    26

    View Slide