Infrastructure & System Monitoring using Prometheus.pdf

Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

Some stuff about me... ● Mostly doing cloud related stuff ○ Java, Groovy, Scala, Spring Boot, IOT, AWS, Terraform, Infrastructure ● Enjoying the good things ● Chef leuke dingen doen == “trying out cool and new stuff” ● Currently involved in a big IOT project ● Movie & Netflix addict

Slide 3

Slide 3 text

Agenda ● Monitoring ○ Introducing you to a Scary Movie ● Prometheus overview (demo’s) ○ Running Prometheus ○ Gathering host metrics ○ Introducing Grafana ○ Monitoring Docker containers ○ Alerting ○ Instrumenting your own code ○ Service Discovery (Consul) integration

Slide 4

Slide 4 text

..Quick Inventory..

Slide 5

Slide 5 text

I am going to introduce you to some bad movies

Slide 6

Slide 6 text

Bad, very bad, extremely bad movies

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

Commonality between these movies?

Slide 13

Slide 13 text

Watching a bad movie in our case is exactly what we want!

Slide 14

Slide 14 text

Monitoring

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

Our scary movie “The Happy Developer” ● Lets push out features ● I can demo so it works :) ● It works with 1 user, so it will work with multiple ● Don’t worry about performance we will just scale using multiple machines/processes ● Logging is into place

Slide 17

Slide 17 text

But then the scary stuff begins

Slide 18

Slide 18 text

Did anyone notice? Disaster Strikes

Slide 19

Slide 19 text

Logging “recording to diagnose a system” Monitoring “observation, checking and recording” Logging != Monitoring

Slide 20

Slide 20 text

Vital Signs

Slide 21

Slide 21 text

Why Monitoring? ● Know when things go wrong ○ Detection & Alerting ● Be able to debug and gain insight ● Detect changes over time and drive technical/business decisions ● Feed into other systems/processes (e.g. security, automation)

Slide 22

Slide 22 text

What to monitor? IT Network Operating System Services Applications Capture Monitoring Information Functional Monitoring Operational Monitoring metric data

Slide 23

Slide 23 text

Functional Monitoring Applications Services event data

Slide 24

Slide 24 text

Houston we have Storage problem! Storage metric data metric data metric data metric data metric data metric data metric data metric data metric data How to store the mass amount of metrics and also making them easy to query?

Slide 25

Slide 25 text

Time Series - Database ● Time series data is a sequence of data points collected at regular intervals over a period of time. (metrics) ○ Examples: ■ Device data ■ Weather data ■ Stock prices ■ Tide measurements ■ Solar flare tracking ● The data requires aggregation and analysis Time Series Database metric data ● High write performance ● Data compaction ● Fast, easy range queries

Slide 26

Slide 26 text

Time Series - Data format

Slide 27

Slide 27 text

Source: http://db-engines.com/en/ranking/time+series+dbms http://db-engines.com/en/ranking/time+series+dbms

Slide 28

Slide 28 text

Prometheus Overview

Slide 29

Slide 29 text

Prometheus Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. It is now a standalone open source project and maintained independently of any company. https://prometheus.io Implemented using

Slide 30

Slide 30 text

Prometheus Components ● The main Prometheus server which scrapes and stores time series data ● Client libraries for instrumenting application code ● A push gateway for supporting short-lived jobs ● Special-purpose exporters (for HAProxy, StatsD, Graphite, etc.) ● An alertmanager ● Various support tools ● WhiteBox Monitoring instead of probing [aka BlackBox Monitoring]

Slide 31

Slide 31 text

Prometheus Overview

Slide 32

Slide 32 text

List of Job Exporters ● Prometheus managed: ○ JMX ○ Node ○ Graphite ○ Blackbox ○ SNMP ○ HAProxy ○ Consul ○ Memcached ○ AWS Cloudwatch ○ InfluxDB ○ StatsD ○ ... ● Custom ones: ○ Database ○ Hardware related ○ Messaging systems ○ Storage ○ HTTP ○ APIs ○ Logging ○ … https://prometheus.io/docs/instrumenting/exporters/

Slide 33

Slide 33 text

Demo Structure

Slide 34

Slide 34 text

Demo: Run Prometheus (native)

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

Code Demo “Running Prometheus Native”

Slide 37

Slide 37 text

Demo: Run Prometheus using Docker

Slide 38

Slide 38 text

38 → → → →

Slide 39

Slide 39 text

Code Demo “Running Prometheus Dockerized”

Slide 40

Slide 40 text

Demo: Add host metrics

Slide 41

Slide 41 text

→ → → → → → →

Slide 42

Slide 42 text

Slide 43

Slide 43 text

Code Demo “Add host metrics”

Slide 44

Slide 44 text

Demo: Grafana 44

Slide 45

Slide 45 text

→ → You get the idea :)

Slide 46

Slide 46 text

Code Demo “Grafana”

Slide 47

Slide 47 text

Demo: Monitor Docker containers

Slide 48

Slide 48 text

Code Demo “cAdvisor”

Slide 49

Slide 49 text

Demo: Alerting

Slide 50

Slide 50 text

Alerting Configuration ● Alert Rules ○ What are the settings where we need to alert upon? ● Alert Manager ○ Where do we need to send the alert to?

Slide 51

Slide 51 text

→ →

Slide 52

Slide 52 text

→ → → → → →

Slide 53

Slide 53 text

→ →

Slide 54

Slide 54 text

No content

Slide 55

Slide 55 text

Code Demo “Alerting -> The Alert Manager”

Slide 56

Slide 56 text

Instrumenting your own code

Slide 57

Slide 57 text

Instrumenting your own code! ● Counter ○ A cumulative metric that represents a single numerical value that only ever goes up ● Gauge ○ Single numerical value that can arbitrarily go up and down ● Histogram ○ Samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. It also provides a sum of all observed values ● Summary ○ Histogram + total count of observations + sum of all observed values, it calculates configurable quantiles over a sliding time window

Slide 58

Slide 58 text

Available Languages ● Official ○ Go, Java or Scala, Python, Ruby ● Unofficial ○ Bash, C++, Common Lisp, Elixir, Erlang, Haskell, Lua for Nginx, Lua for Tarantool, .NET / C#, Node.js, PHP, Rust →

Slide 59

Slide 59 text

Prometheus Client Libaries: SpringBoot Example → → → →

Slide 60

Slide 60 text

Demo: Application metrics

Slide 61

Slide 61 text

Code Demo “Application metrics”

Slide 62

Slide 62 text

Service Discovery (Consul) Integration

Slide 63

Slide 63 text

Demo: Consul Integration

Slide 64

Slide 64 text

Service Discovery

Slide 65

Slide 65 text

Demo: Consul integration Register the services with Consul and Monitor 1 2

Slide 66

Slide 66 text

Code Demo “Consul to the rescue”

Slide 67

Slide 67 text

No content

Slide 68

Slide 68 text

That’s a wrap! Question? Marco Pas Software geek, hands on Developer/Architect/DevOps Engineer @marcopas https://github.com/mpas/infrastructure-and-system-monitoring-using-prometheus