Slide 1

Slide 1 text

PERFORMANCE CRITICAL APPLICATIONS NECESSARY TOOLING AND MONITORING FOR @mananbharara

Slide 2

Slide 2 text

PERFORMANCE CRITICAL APPLICATIONS NECESSARY TOOLING AND MONITORING FOR @mananbharara

Slide 3

Slide 3 text

OTTO

Slide 4

Slide 4 text

OTTO http://www.forbes.com/sites/adamtanner/2014/03/05/amazons-war-on-germanys-18-billion-patriarch/#7989b6eb4162

Slide 5

Slide 5 text

OTTO

Slide 6

Slide 6 text

OTTO ▸ 1 million visitors / day ▸ ~ 2 orders / second http://dev.otto.de/2016/03/20/why-microservices/#more-2320

Slide 7

Slide 7 text

OTTO ▸ 1 million visitors / day ▸ ~ 2 orders / second http://dev.otto.de/2016/03/20/why-microservices/#more-2320

Slide 8

Slide 8 text

OTTO ▸ 1 million visitors / day ▸ ~ 2 orders / second http://dev.otto.de/2016/03/20/why-microservices/#more-2320

Slide 9

Slide 9 text

OTTO ▸ 1 million visitors / day ▸ ~ 2 orders / second 30 customers X 8 screens X 2 orders = 480 PI’s / second http://dev.otto.de/2016/03/20/why-microservices/#more-2320

Slide 10

Slide 10 text

OTTO ▸ 1 million visitors / day ▸ ~ 2 orders / second 30 customers X 8 screens X 2 orders = 480 PI’s / second ~ 1000 PI’s / second http://dev.otto.de/2016/03/20/why-microservices/#more-2320

Slide 11

Slide 11 text

WHY? WHAT MAKES THIS DIFFERENT

Slide 12

Slide 12 text

WHY? WHAT MAKES THIS DIFFERENT ▸ We’re talking money…

Slide 13

Slide 13 text

WHY? WHAT MAKES THIS DIFFERENT ▸ We’re talking money… ▸ Building more features

Slide 14

Slide 14 text

WHY? WHAT MAKES THIS DIFFERENT ▸ We’re talking money… ▸ Building more features => Adding more value

Slide 15

Slide 15 text

WHY? WHAT MAKES THIS DIFFERENT ▸ We’re talking money… ▸ Building more features => Adding more value => > money

Slide 16

Slide 16 text

WHY? WHAT MAKES THIS DIFFERENT ▸ We’re talking money… ▸ Building more features ▸ Decision making => Adding more value => > money

Slide 17

Slide 17 text

WHY? WHAT MAKES THIS DIFFERENT ▸ We’re talking money… ▸ Building more features ▸ Decision making ▸ Verify business assumptions => Adding more value => > money

Slide 18

Slide 18 text

WHY? WHAT MAKES THIS DIFFERENT ▸ We’re talking money… ▸ Building more features ▸ Decision making ▸ Verify business assumptions => Adding more value => > money 300 billion 3 trillion

Slide 19

Slide 19 text

REQUIREMENTS I NEED…

Slide 20

Slide 20 text

REQUIREMENTS I NEED… ▸ What every other monitoring tool does

Slide 21

Slide 21 text

REQUIREMENTS I NEED… ▸ What every other monitoring tool does - Database monitoring

Slide 22

Slide 22 text

REQUIREMENTS I NEED… ▸ What every other monitoring tool does - Database monitoring - Monitoring standard server metrics - Req/Sec, Response time, Throughput…

Slide 23

Slide 23 text

REQUIREMENTS I NEED… ▸ What every other monitoring tool does - Database monitoring - Monitoring standard server metrics - Req/Sec, Response time, Throughput… - Alerting upon exceptions

Slide 24

Slide 24 text

REQUIREMENTS I NEED… ▸ What every other monitoring tool does - Database monitoring - Monitoring standard server metrics - Req/Sec, Response time, Throughput… - Alerting upon exceptions ▸ Measuring the state of the system

Slide 25

Slide 25 text

REQUIREMENTS I NEED… ▸ What every other monitoring tool does - Database monitoring - Monitoring standard server metrics - Req/Sec, Response time, Throughput… - Alerting upon exceptions ▸ Measuring the state of the system - Orders/sec, Users on website

Slide 26

Slide 26 text

REQUIREMENTS I NEED… ▸ What every other monitoring tool does - Database monitoring - Monitoring standard server metrics - Req/Sec, Response time, Throughput… - Alerting upon exceptions ▸ Measuring the state of the system - Orders/sec, Users on website ▸ Narrow down sources of bottlenecks

Slide 27

Slide 27 text

REQUIREMENTS I NEED… ▸ What every other monitoring tool does - Database monitoring - Monitoring standard server metrics - Req/Sec, Response time, Throughput… - Alerting upon exceptions ▸ Measuring the state of the system - Orders/sec, Users on website ▸ Narrow down sources of bottlenecks ▸ Validate business assumptions

Slide 28

Slide 28 text

LOGGING LOGGING

Slide 29

Slide 29 text

LOGGING

Slide 30

Slide 30 text

LOGGING

Slide 31

Slide 31 text

LOGGING IS ABOUT SCATTERED INCIDENTS IN THE SYSTEM LOGGING

Slide 32

Slide 32 text

METRICS HELP IN UNDERSTANDING THE STATE OF THE SYSTEM AND GAIN VALUABLE INSIGHTS… CAPTURE

Slide 33

Slide 33 text

METRICS HELP IN UNDERSTANDING THE STATE OF THE SYSTEM AND GAIN VALUABLE INSIGHTS… CAPTURE ANYTIME

Slide 34

Slide 34 text

WHY? Collecting data is cheap, but not having it when you need it can be expensive, so you should instrument everything, and collect all the useful data you reasonably can… https://www.datadoghq.com

Slide 35

Slide 35 text

METRICS Measure the behavior of critical components in your production environment… CAPTURE http://metrics.dropwizard.io/3.1.0/

Slide 36

Slide 36 text

METRICS Measure the behavior of critical components in your production environment… CAPTURE http://metrics.dropwizard.io/3.1.0/

Slide 37

Slide 37 text

METRICS Measure the behavior of critical components in your production environment… CAPTURE http://metrics.dropwizard.io/3.1.0/

Slide 38

Slide 38 text

METRICS Measure the behavior of critical components in your production environment… CAPTURE http://metrics.dropwizard.io/3.1.0/

Slide 39

Slide 39 text

METRICS Measure the behavior of critical components in your production environment… CAPTURE http://metrics.dropwizard.io/3.1.0/

Slide 40

Slide 40 text

METRICS Measure the behavior of critical components in your production environment… CAPTURE http://metrics.dropwizard.io/3.1.0/

Slide 41

Slide 41 text

CAPTURE

Slide 42

Slide 42 text

CAPTURE ▸ Counters

Slide 43

Slide 43 text

CAPTURE ▸ Counters - HTTP/any connection pools

Slide 44

Slide 44 text

CAPTURE ▸ Counters - HTTP/any connection pools ▸ Gauges

Slide 45

Slide 45 text

CAPTURE ▸ Counters - HTTP/any connection pools ▸ Gauges - Used disk space

Slide 46

Slide 46 text

CAPTURE ▸ Counters - HTTP/any connection pools ▸ Gauges - Used disk space ▸ Meters

Slide 47

Slide 47 text

CAPTURE ▸ Counters - HTTP/any connection pools ▸ Gauges - Used disk space ▸ Meters - Requests per second

Slide 48

Slide 48 text

CAPTURE ▸ Counters - HTTP/any connection pools ▸ Gauges - Used disk space ▸ Meters - Requests per second - Number of personalized users - Number of cached products - Orders per second

Slide 49

Slide 49 text

CAPTURE ▸ Counters - HTTP/any connection pools ▸ Gauges - Used disk space ▸ Meters - Requests per second - Number of personalized users - Number of cached products - Orders per second

Slide 50

Slide 50 text

CAPTURE

Slide 51

Slide 51 text

CAPTURE ▸ Histograms

Slide 52

Slide 52 text

CAPTURE ▸ Histograms - Mean size of response bodies

Slide 53

Slide 53 text

CAPTURE ▸ Histograms - Mean size of response bodies - Percentile of customers getting 5 personal recommendations or more

Slide 54

Slide 54 text

CAPTURE ▸ Histograms - Mean size of response bodies - Percentile of customers getting 5 personal recommendations or more ▸ Timers

Slide 55

Slide 55 text

CAPTURE ▸ Histograms - Mean size of response bodies - Percentile of customers getting 5 personal recommendations or more ▸ Timers - Page rendering time

Slide 56

Slide 56 text

CAPTURE ▸ Histograms - Mean size of response bodies - Percentile of customers getting 5 personal recommendations or more ▸ Timers - Page rendering time - It took 80ms to give recommendations to 99% customers for 300 req/sec

Slide 57

Slide 57 text

CAPTURE ▸ Histograms - Mean size of response bodies - Percentile of customers getting 5 personal recommendations or more ▸ Timers - Page rendering time - It took 80ms to give recommendations to 99% customers for 300 req/sec - But took 200ms at 800 req/sec

Slide 58

Slide 58 text

CAPTURE ▸ Histograms - Mean size of response bodies - Percentile of customers getting 5 personal recommendations or more ▸ Timers - Page rendering time - It took 80ms to give recommendations to 99% customers for 300 req/sec - But took 200ms at 800 req/sec

Slide 59

Slide 59 text

AGGREGATE MAKE SENSE OF DATA

Slide 60

Slide 60 text

AGGREGATE API METRIC STORAGE CACHE

Slide 61

Slide 61 text

▸ Query DSL AGGREGATE API METRIC STORAGE CACHE

Slide 62

Slide 62 text

▸ Query DSL ▸ Applying functions AGGREGATE API METRIC STORAGE CACHE

Slide 63

Slide 63 text

▸ Query DSL ▸ Applying functions ▸ Scalable AGGREGATE API METRIC STORAGE CACHE

Slide 64

Slide 64 text

▸ Query DSL ▸ Applying functions ▸ Scalable ▸ Up to date AGGREGATE API METRIC STORAGE CACHE

Slide 65

Slide 65 text

VISUALIZE - https://docs.influxdata.com/influxdb/v1.2/concepts/storage_engine/

Slide 66

Slide 66 text

VISUALIZE OBSERVE VARIATIONS

Slide 67

Slide 67 text

VISUALIZE OBSERVE VARIATIONS ‣Drill down

Slide 68

Slide 68 text

VISUALIZE OBSERVE VARIATIONS ‣Drill down ‣Interactive

Slide 69

Slide 69 text

VISUALIZE OBSERVE VARIATIONS ‣Drill down ‣Interactive ‣Query language and editor

Slide 70

Slide 70 text

VISUALIZE OBSERVE VARIATIONS ‣Drill down ‣Interactive ‣Query language and editor ‣Correlate

Slide 71

Slide 71 text

VISUALIZE OSCILLATOR

Slide 72

Slide 72 text

VISUALIZE OSCILLATOR

Slide 73

Slide 73 text

VISUALIZE OSCILLATOR

Slide 74

Slide 74 text

VISUALIZE OSCILLATOR

Slide 75

Slide 75 text

VISUALIZE OSCILLATOR DEMO

Slide 76

Slide 76 text

VISUALIZE ANNOTATIONS

Slide 77

Slide 77 text

LEARNINGS LEARNINGS

Slide 78

Slide 78 text

LEARNINGS LEARNINGS ▸ Choose the right metrics

Slide 79

Slide 79 text

LEARNINGS LEARNINGS

Slide 80

Slide 80 text

LEARNINGS LEARNINGS

Slide 81

Slide 81 text

LEARNINGS LEARNINGS

Slide 82

Slide 82 text

LEARNINGS LEARNINGS ▸ Choose the right metrics - Disk utilization, User logins

Slide 83

Slide 83 text

LEARNINGS LEARNINGS ▸ Choose the right metrics - Disk utilization, User logins ▸ Mean can be absurd - Choose wisely

Slide 84

Slide 84 text

LEARNINGS

Slide 85

Slide 85 text

LEARNINGS

Slide 86

Slide 86 text

LEARNINGS LEARNINGS ▸ Choose the right metrics - Disk utilization, User logins ▸ Mean can be absurd - Choose wisely

Slide 87

Slide 87 text

LEARNINGS LEARNINGS ▸ Choose the right metrics - Disk utilization, User logins ▸ Mean can be absurd - Choose wisely ▸ Choose the right tool for the job. Histograms, counters, meters, timers.

Slide 88

Slide 88 text

ALERT ALERTING

Slide 89

Slide 89 text

ALERT ALERTING

Slide 90

Slide 90 text

ALERT ALERTING

Slide 91

Slide 91 text

ALERT ALERTING

Slide 92

Slide 92 text

ALERT ALERTING

Slide 93

Slide 93 text

ALERT ALERTING

Slide 94

Slide 94 text

ALERT ALERTING ▸ Of extreme importance

Slide 95

Slide 95 text

ALERT ALERTING ▸ Of extreme importance ▸ Definition of done

Slide 96

Slide 96 text

ALERT ALERTING ▸ Of extreme importance ▸ Definition of done ▸ Make it visible

Slide 97

Slide 97 text

SUMMING IT UP THE COMPLETE SOLUTION

Slide 98

Slide 98 text

SUMMING IT UP THE COMPLETE SOLUTION CAPTURE

Slide 99

Slide 99 text

SUMMING IT UP THE COMPLETE SOLUTION CAPTURE AGGREGATE

Slide 100

Slide 100 text

SUMMING IT UP THE COMPLETE SOLUTION CAPTURE AGGREGATE VISUALIZE

Slide 101

Slide 101 text

SUMMING IT UP THE COMPLETE SOLUTION CAPTURE AGGREGATE VISUALIZE ALERT

Slide 102

Slide 102 text

CONTINUOS DELIVERY CONTINUOUS DELIVERY & MONITORING

Slide 103

Slide 103 text

SOFTWARE THAT CLAIMS TO MONITOR EVERYTHING FOR YOU CURRENT SETUP

Slide 104

Slide 104 text

SOFTWARE THAT CLAIMS TO MONITOR EVERYTHING FOR YOU CURRENT SETUP

Slide 105

Slide 105 text

SOFTWARE THAT CLAIMS TO MONITOR EVERYTHING FOR YOU ▸ Hooks into individual function calls CURRENT SETUP

Slide 106

Slide 106 text

SOFTWARE THAT CLAIMS TO MONITOR EVERYTHING FOR YOU ▸ Hooks into individual function calls ▸ Could be too expensive CURRENT SETUP

Slide 107

Slide 107 text

SOFTWARE THAT CLAIMS TO MONITOR EVERYTHING FOR YOU ▸ Hooks into individual function calls ▸ Could be too expensive ▸ A great monitoring and alerting setup is possible for free CURRENT SETUP

Slide 108

Slide 108 text

THANKS! @mananbharara @mananbharara READ MORE ▸ Metrics, Metrics, Everywhere - https://www.youtube.com/watch?v=czes-oa0yik ▸ Oscillator - https://github.com/otto-de/oscillator ▸ Xray - https://github.com/otto-de/tesla-xray ▸ OTTO Dev blog - http://dev.otto.de/