Slide 1

Slide 1 text

Nancy Chauhan Software Developer Monitoring Applications with Prometheus and Grafana Women Tech Global Conference 2020

Slide 2

Slide 2 text

You can find me at : ● Github : Nancy-Chauhan ● Twitter : _nancychauhan ● Website : nancychauhan.in Software Developer at Grofers, India Women Tech Global Conference 2020 About me

Slide 3

Slide 3 text

Agenda • Need for services and systems monitoring • Different approaches to monitoring and observability. • Prometheus - landscape and basic architecture. • Monitoring your application – What to monitor – How to monitor – Visualizing – Alerting • Prometheus in Real World Women Tech Global Conference 2020

Slide 4

Slide 4 text

What is broken ? and Why ? Women Tech Global Conference 2020

Slide 5

Slide 5 text

Why monitor ? ● Know when things go wrong ● Be able to debug and gain insight ● Trending to see changes over time, and drive technical/business decisions ● To feed into other systems/processes Reality : systems are complicated and break a lot ! Women Tech Global Conference 2020

Slide 6

Slide 6 text

Potential Problems ● Software Bug -> Request/ Application errors ● Low memory utilization -> Money wasted ● High CPU utilization ● Disk Full -> No new data stored , Slow performance ● Network Outage -> Services cannot communicate Women Tech Global Conference 2020

Slide 7

Slide 7 text

Types of Monitoring ● Check Based monitoring ( health checks) ○ Nagios, Icinga ● Logs/ Events ○ Loki, Splunk, Elasticsearch, ● Metric & Time Series ○ Prometheus, StatsD, InfluxDB ● Request Tracing ○ Zipkin, Jaegar Women Tech Global Conference 2020

Slide 8

Slide 8 text

What is Prometheus ? Women Tech Global Conference 2020

Slide 9

Slide 9 text

Architecture Diagram Women Tech Global Conference 2020

Slide 10

Slide 10 text

Women Tech Global Conference 2020 https://prometheus.io/docs/introduction/overview/

Slide 11

Slide 11 text

Prometheus Architecture Women Tech Global Conference 2020

Slide 12

Slide 12 text

Prometheus Architecture Women Tech Global Conference 2020

Slide 13

Slide 13 text

Prometheus Architecture Women Tech Global Conference 2020

Slide 14

Slide 14 text

Prometheus Architecture Women Tech Global Conference 2020

Slide 15

Slide 15 text

Prometheus Architecture Women Tech Global Conference 2020

Slide 16

Slide 16 text

Favourite Features ● Dimensional data model ● Powerful query language (PromQL) ● Simple & efficient server ● Support for Service discovery Integration Women Tech Global Conference 2020

Slide 17

Slide 17 text

What is Prometheus ? Women Tech Global Conference 2020 Monitoring your Application IIlustration by Asif Jamal

Slide 18

Slide 18 text

What to monitor ? ● Request and response statuses – success or failure ● Timings of transactions – external calls, data processing ● Request and response timings ● Application runtime metrics – memory, cpu, gc, etc. Women Tech Global Conference 2020

Slide 19

Slide 19 text

What to monitor ? (contd.) ● Environment metrics – container metrics, load factor of container / VM, network IO bytes, disk IO etc. ● Event counts – no of time something happened like cache miss, exceptions ● Queue sizes, Cache sizes Women Tech Global Conference 2020

Slide 20

Slide 20 text

How to monitor ● Using your framework’s support for exporting metrics into prometheus ● Using prometheus client libraries directly ● Using node exporter to export metrics for your VM ● Prometheus operator for Kubernetes Women Tech Global Conference 2020

Slide 21

Slide 21 text

How to monitor (contd.) ● Identify what metrics you are interested in ● Categorize the data type for the metrics – whether its gauge or counter or histogram ● Write code to export the metric ● Setup prometheus to scrape metrics from your application Women Tech Global Conference 2020

Slide 22

Slide 22 text

Visualizing Although Prometheus provides basic visualisation, Grafana provides a full framework for : • sharing dashboards, • creating advanced queries and graphs, • and allowing for sharing and reuse of those dashboards. Women Tech Global Conference 2020

Slide 23

Slide 23 text

Women Tech Global Conference 2020 So that you can go from this:

Slide 24

Slide 24 text

Women Tech Global Conference 2020 To this :

Slide 25

Slide 25 text

Alerting • To truly protect applications, alerting teams is necessary when something goes wrong. • High level Tasks : • Setup an instance of Alert manager • Configure Prometheus instance to use Alert manager • Create alert rules in alert manager configuration Women Tech Global Conference 2020

Slide 26

Slide 26 text

Prometheus in Real World Women Tech Global Conference 2020

Slide 27

Slide 27 text

Monitoring Microservices ● Request and response times of the service ● Success and failure counters ● Business metrics like order size average, A/B testing metrics ● Database transactions and external call timings ● Kubernetes pod metrics Women Tech Global Conference 2020

Slide 28

Slide 28 text

Prometheus as a scraper ● Prometheus is simple and leaves features like analysis and long term storage to extensions ● Many organizations use Prometheus to collect metrics for processing and storing metrics in a different system such as Cortex, Uber M3, Thanos, CloudWatch etc. Women Tech Global Conference 2020

Slide 29

Slide 29 text

Prometheus exporters ● Sometimes a service you want to monitor isn’t prometheus ready ● Define exporters to “export” metrics from a non-prometheus aware monitoring system into Prometheus ● Prometheus project provides JMX exporter, StatsD exporter and push gateway ● Jenkins exporter exports metrics from Jenkins builds to prometheus Women Tech Global Conference 2020

Slide 30

Slide 30 text

References • https://www.youtube.com/watch?v=5O1djJ13gRU • https://www.youtube.com/watch?v=D09x0eR4vu4&t=176s Women Tech Global Conference 2020

Slide 31

Slide 31 text

Thank you Women Tech Global Conference 2020