Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Necessary tooling and monitoring for performance critical applications - RootConf 2017

9151d3e01bd97f0d2df5a43dd950448f?s=47 Manan
May 11, 2017

Necessary tooling and monitoring for performance critical applications - RootConf 2017

Presented at RootConf on May 11 2017

9151d3e01bd97f0d2df5a43dd950448f?s=128

Manan

May 11, 2017
Tweet

Transcript

  1. PERFORMANCE CRITICAL APPLICATIONS NECESSARY TOOLING AND MONITORING FOR @mananbharara

  2. PERFORMANCE CRITICAL APPLICATIONS NECESSARY TOOLING AND MONITORING FOR @mananbharara

  3. OTTO

  4. OTTO http://www.forbes.com/sites/adamtanner/2014/03/05/amazons-war-on-germanys-18-billion-patriarch/#7989b6eb4162

  5. OTTO

  6. OTTO ▸ 1 million visitors / day ▸ ~ 2

    orders / second http://dev.otto.de/2016/03/20/why-microservices/#more-2320
  7. OTTO ▸ 1 million visitors / day ▸ ~ 2

    orders / second http://dev.otto.de/2016/03/20/why-microservices/#more-2320
  8. OTTO ▸ 1 million visitors / day ▸ ~ 2

    orders / second http://dev.otto.de/2016/03/20/why-microservices/#more-2320
  9. OTTO ▸ 1 million visitors / day ▸ ~ 2

    orders / second 30 customers X 8 screens X 2 orders = 480 PI’s / second http://dev.otto.de/2016/03/20/why-microservices/#more-2320
  10. OTTO ▸ 1 million visitors / day ▸ ~ 2

    orders / second 30 customers X 8 screens X 2 orders = 480 PI’s / second ~ 1000 PI’s / second http://dev.otto.de/2016/03/20/why-microservices/#more-2320
  11. WHY? WHAT MAKES THIS DIFFERENT

  12. WHY? WHAT MAKES THIS DIFFERENT ▸ We’re talking money…

  13. WHY? WHAT MAKES THIS DIFFERENT ▸ We’re talking money… ▸

    Building more features
  14. WHY? WHAT MAKES THIS DIFFERENT ▸ We’re talking money… ▸

    Building more features => Adding more value
  15. WHY? WHAT MAKES THIS DIFFERENT ▸ We’re talking money… ▸

    Building more features => Adding more value => > money
  16. WHY? WHAT MAKES THIS DIFFERENT ▸ We’re talking money… ▸

    Building more features ▸ Decision making => Adding more value => > money
  17. WHY? WHAT MAKES THIS DIFFERENT ▸ We’re talking money… ▸

    Building more features ▸ Decision making ▸ Verify business assumptions => Adding more value => > money
  18. WHY? WHAT MAKES THIS DIFFERENT ▸ We’re talking money… ▸

    Building more features ▸ Decision making ▸ Verify business assumptions => Adding more value => > money 300 billion 3 trillion
  19. REQUIREMENTS I NEED…

  20. REQUIREMENTS I NEED… ▸ What every other monitoring tool does

  21. REQUIREMENTS I NEED… ▸ What every other monitoring tool does

    - Database monitoring
  22. REQUIREMENTS I NEED… ▸ What every other monitoring tool does

    - Database monitoring - Monitoring standard server metrics - Req/Sec, Response time, Throughput…
  23. REQUIREMENTS I NEED… ▸ What every other monitoring tool does

    - Database monitoring - Monitoring standard server metrics - Req/Sec, Response time, Throughput… - Alerting upon exceptions
  24. REQUIREMENTS I NEED… ▸ What every other monitoring tool does

    - Database monitoring - Monitoring standard server metrics - Req/Sec, Response time, Throughput… - Alerting upon exceptions ▸ Measuring the state of the system
  25. REQUIREMENTS I NEED… ▸ What every other monitoring tool does

    - Database monitoring - Monitoring standard server metrics - Req/Sec, Response time, Throughput… - Alerting upon exceptions ▸ Measuring the state of the system - Orders/sec, Users on website
  26. REQUIREMENTS I NEED… ▸ What every other monitoring tool does

    - Database monitoring - Monitoring standard server metrics - Req/Sec, Response time, Throughput… - Alerting upon exceptions ▸ Measuring the state of the system - Orders/sec, Users on website ▸ Narrow down sources of bottlenecks
  27. REQUIREMENTS I NEED… ▸ What every other monitoring tool does

    - Database monitoring - Monitoring standard server metrics - Req/Sec, Response time, Throughput… - Alerting upon exceptions ▸ Measuring the state of the system - Orders/sec, Users on website ▸ Narrow down sources of bottlenecks ▸ Validate business assumptions
  28. LOGGING LOGGING

  29. LOGGING

  30. LOGGING

  31. LOGGING IS ABOUT SCATTERED INCIDENTS IN THE SYSTEM LOGGING

  32. METRICS HELP IN UNDERSTANDING THE STATE OF THE SYSTEM AND

    GAIN VALUABLE INSIGHTS… CAPTURE
  33. METRICS HELP IN UNDERSTANDING THE STATE OF THE SYSTEM AND

    GAIN VALUABLE INSIGHTS… CAPTURE ANYTIME
  34. WHY? Collecting data is cheap, but not having it when

    you need it can be expensive, so you should instrument everything, and collect all the useful data you reasonably can… https://www.datadoghq.com
  35. METRICS Measure the behavior of critical components in your production

    environment… CAPTURE http://metrics.dropwizard.io/3.1.0/
  36. METRICS Measure the behavior of critical components in your production

    environment… CAPTURE http://metrics.dropwizard.io/3.1.0/
  37. METRICS Measure the behavior of critical components in your production

    environment… CAPTURE http://metrics.dropwizard.io/3.1.0/
  38. METRICS Measure the behavior of critical components in your production

    environment… CAPTURE http://metrics.dropwizard.io/3.1.0/
  39. METRICS Measure the behavior of critical components in your production

    environment… CAPTURE http://metrics.dropwizard.io/3.1.0/
  40. METRICS Measure the behavior of critical components in your production

    environment… CAPTURE http://metrics.dropwizard.io/3.1.0/
  41. CAPTURE

  42. CAPTURE ▸ Counters

  43. CAPTURE ▸ Counters - HTTP/any connection pools

  44. CAPTURE ▸ Counters - HTTP/any connection pools ▸ Gauges

  45. CAPTURE ▸ Counters - HTTP/any connection pools ▸ Gauges -

    Used disk space
  46. CAPTURE ▸ Counters - HTTP/any connection pools ▸ Gauges -

    Used disk space ▸ Meters
  47. CAPTURE ▸ Counters - HTTP/any connection pools ▸ Gauges -

    Used disk space ▸ Meters - Requests per second
  48. CAPTURE ▸ Counters - HTTP/any connection pools ▸ Gauges -

    Used disk space ▸ Meters - Requests per second - Number of personalized users - Number of cached products - Orders per second
  49. CAPTURE ▸ Counters - HTTP/any connection pools ▸ Gauges -

    Used disk space ▸ Meters - Requests per second - Number of personalized users - Number of cached products - Orders per second
  50. CAPTURE

  51. CAPTURE ▸ Histograms

  52. CAPTURE ▸ Histograms - Mean size of response bodies

  53. CAPTURE ▸ Histograms - Mean size of response bodies -

    Percentile of customers getting 5 personal recommendations or more
  54. CAPTURE ▸ Histograms - Mean size of response bodies -

    Percentile of customers getting 5 personal recommendations or more ▸ Timers
  55. CAPTURE ▸ Histograms - Mean size of response bodies -

    Percentile of customers getting 5 personal recommendations or more ▸ Timers - Page rendering time
  56. CAPTURE ▸ Histograms - Mean size of response bodies -

    Percentile of customers getting 5 personal recommendations or more ▸ Timers - Page rendering time - It took 80ms to give recommendations to 99% customers for 300 req/sec
  57. CAPTURE ▸ Histograms - Mean size of response bodies -

    Percentile of customers getting 5 personal recommendations or more ▸ Timers - Page rendering time - It took 80ms to give recommendations to 99% customers for 300 req/sec - But took 200ms at 800 req/sec
  58. CAPTURE ▸ Histograms - Mean size of response bodies -

    Percentile of customers getting 5 personal recommendations or more ▸ Timers - Page rendering time - It took 80ms to give recommendations to 99% customers for 300 req/sec - But took 200ms at 800 req/sec
  59. AGGREGATE MAKE SENSE OF DATA

  60. AGGREGATE API METRIC STORAGE CACHE

  61. ▸ Query DSL AGGREGATE API METRIC STORAGE CACHE

  62. ▸ Query DSL ▸ Applying functions AGGREGATE API METRIC STORAGE

    CACHE
  63. ▸ Query DSL ▸ Applying functions ▸ Scalable AGGREGATE API

    METRIC STORAGE CACHE
  64. ▸ Query DSL ▸ Applying functions ▸ Scalable ▸ Up

    to date AGGREGATE API METRIC STORAGE CACHE
  65. VISUALIZE - https://docs.influxdata.com/influxdb/v1.2/concepts/storage_engine/

  66. VISUALIZE OBSERVE VARIATIONS

  67. VISUALIZE OBSERVE VARIATIONS ‣Drill down

  68. VISUALIZE OBSERVE VARIATIONS ‣Drill down ‣Interactive

  69. VISUALIZE OBSERVE VARIATIONS ‣Drill down ‣Interactive ‣Query language and editor

  70. VISUALIZE OBSERVE VARIATIONS ‣Drill down ‣Interactive ‣Query language and editor

    ‣Correlate
  71. VISUALIZE OSCILLATOR

  72. VISUALIZE OSCILLATOR

  73. VISUALIZE OSCILLATOR

  74. VISUALIZE OSCILLATOR

  75. VISUALIZE OSCILLATOR DEMO

  76. VISUALIZE ANNOTATIONS

  77. LEARNINGS LEARNINGS

  78. LEARNINGS LEARNINGS ▸ Choose the right metrics

  79. LEARNINGS LEARNINGS

  80. LEARNINGS LEARNINGS

  81. LEARNINGS LEARNINGS

  82. LEARNINGS LEARNINGS ▸ Choose the right metrics - Disk utilization,

    User logins
  83. LEARNINGS LEARNINGS ▸ Choose the right metrics - Disk utilization,

    User logins ▸ Mean can be absurd - Choose wisely
  84. LEARNINGS

  85. LEARNINGS

  86. LEARNINGS LEARNINGS ▸ Choose the right metrics - Disk utilization,

    User logins ▸ Mean can be absurd - Choose wisely
  87. LEARNINGS LEARNINGS ▸ Choose the right metrics - Disk utilization,

    User logins ▸ Mean can be absurd - Choose wisely ▸ Choose the right tool for the job. Histograms, counters, meters, timers.
  88. ALERT ALERTING

  89. ALERT ALERTING

  90. ALERT ALERTING

  91. ALERT ALERTING

  92. ALERT ALERTING

  93. ALERT ALERTING

  94. ALERT ALERTING ▸ Of extreme importance

  95. ALERT ALERTING ▸ Of extreme importance ▸ Definition of done

  96. ALERT ALERTING ▸ Of extreme importance ▸ Definition of done

    ▸ Make it visible
  97. SUMMING IT UP THE COMPLETE SOLUTION

  98. SUMMING IT UP THE COMPLETE SOLUTION CAPTURE

  99. SUMMING IT UP THE COMPLETE SOLUTION CAPTURE AGGREGATE

  100. SUMMING IT UP THE COMPLETE SOLUTION CAPTURE AGGREGATE VISUALIZE

  101. SUMMING IT UP THE COMPLETE SOLUTION CAPTURE AGGREGATE VISUALIZE ALERT

  102. CONTINUOS DELIVERY CONTINUOUS DELIVERY & MONITORING

  103. SOFTWARE THAT CLAIMS TO MONITOR EVERYTHING FOR YOU CURRENT SETUP

  104. SOFTWARE THAT CLAIMS TO MONITOR EVERYTHING FOR YOU CURRENT SETUP

  105. SOFTWARE THAT CLAIMS TO MONITOR EVERYTHING FOR YOU ▸ Hooks

    into individual function calls CURRENT SETUP
  106. SOFTWARE THAT CLAIMS TO MONITOR EVERYTHING FOR YOU ▸ Hooks

    into individual function calls ▸ Could be too expensive CURRENT SETUP
  107. SOFTWARE THAT CLAIMS TO MONITOR EVERYTHING FOR YOU ▸ Hooks

    into individual function calls ▸ Could be too expensive ▸ A great monitoring and alerting setup is possible for free CURRENT SETUP
  108. THANKS! @mananbharara @mananbharara READ MORE ▸ Metrics, Metrics, Everywhere -

    https://www.youtube.com/watch?v=czes-oa0yik ▸ Oscillator - https://github.com/otto-de/oscillator ▸ Xray - https://github.com/otto-de/tesla-xray ▸ OTTO Dev blog - http://dev.otto.de/