Some stuff about me...
● Mostly doing cloud related stuff
○ Java, Groovy, Scala, Spring Boot, IOT, AWS, Terraform, Infrastructure
● Enjoying the good things
● Chef leuke dingen doen == “trying out cool and new stuff”
● Currently involved in a big IOT project
● Movie & Netflix addict
Slide 3
Slide 3 text
Agenda
● Monitoring
○ Introducing you to a Scary Movie
● Prometheus overview (demo’s)
○ Running Prometheus
○ Gathering host metrics
○ Introducing Grafana
○ Monitoring Docker containers
○ Alerting
○ Instrumenting your own code
○ Service Discovery (Consul) integration
Slide 4
Slide 4 text
..Quick Inventory..
Slide 5
Slide 5 text
I am going to introduce
you to some bad movies
Slide 6
Slide 6 text
Bad, very bad,
extremely bad movies
Slide 7
Slide 7 text
No content
Slide 8
Slide 8 text
No content
Slide 9
Slide 9 text
No content
Slide 10
Slide 10 text
No content
Slide 11
Slide 11 text
No content
Slide 12
Slide 12 text
Commonality
between
these movies?
Slide 13
Slide 13 text
Watching a bad movie
in our case
is exactly what we want!
Slide 14
Slide 14 text
Monitoring
Slide 15
Slide 15 text
No content
Slide 16
Slide 16 text
Our scary movie “The Happy Developer”
● Lets push out features
● I can demo so it works :)
● It works with 1 user, so it will work with
multiple
● Don’t worry about performance we will
just scale using multiple
machines/processes
● Logging is into place
Slide 17
Slide 17 text
But then
the scary stuff begins
Slide 18
Slide 18 text
Did
anyone
notice?
Disaster Strikes
Slide 19
Slide 19 text
Logging
“recording to diagnose a system”
Monitoring
“observation, checking and recording”
Logging != Monitoring
Slide 20
Slide 20 text
Vital Signs
Slide 21
Slide 21 text
Why Monitoring?
● Know when things go wrong
○ Detection & Alerting
● Be able to debug and gain insight
● Detect changes over time and
drive technical/business decisions
● Feed into other systems/processes
(e.g. security, automation)
Slide 22
Slide 22 text
What to monitor?
IT Network
Operating
System
Services
Applications
Capture
Monitoring
Information
Functional
Monitoring
Operational
Monitoring
metric data
Slide 23
Slide 23 text
Functional Monitoring
Applications
Services
event data
Slide 24
Slide 24 text
Houston we have Storage problem!
Storage
metric data
metric data
metric data
metric data
metric data
metric data
metric data
metric data
metric data
How to store the mass amount of
metrics and also making them easy
to query?
Slide 25
Slide 25 text
Time Series - Database
● Time series data is a sequence of data points collected at regular intervals
over a period of time. (metrics)
○ Examples:
■ Device data
■ Weather data
■ Stock prices
■ Tide measurements
■ Solar flare tracking
● The data requires aggregation and analysis
Time Series
Database
metric data
● High write performance
● Data compaction
● Fast, easy range queries
Prometheus
Prometheus is an open-source systems monitoring and alerting toolkit originally
built at SoundCloud. It is now a standalone open source project and maintained
independently of any company.
https://prometheus.io
Implemented using
Slide 30
Slide 30 text
Prometheus Components
● The main Prometheus server which scrapes and stores time series data
● Client libraries for instrumenting application code
● A push gateway for supporting short-lived jobs
● Special-purpose exporters (for HAProxy, StatsD, Graphite, etc.)
● An alertmanager
● Various support tools
● WhiteBox Monitoring instead of probing [aka BlackBox Monitoring]
Alerting Configuration
● Alert Rules
○ What are the settings where we
need to alert upon?
● Alert Manager
○ Where do we need to send the alert
to?
Slide 51
Slide 51 text
→
→
Slide 52
Slide 52 text
→
→
→
→
→
→
Slide 53
Slide 53 text
→
→
Slide 54
Slide 54 text
No content
Slide 55
Slide 55 text
Code Demo
“Alerting -> The Alert Manager”
Slide 56
Slide 56 text
Instrumenting your own code
Slide 57
Slide 57 text
Instrumenting your own code!
● Counter
○ A cumulative metric that represents a single numerical value that only ever goes up
● Gauge
○ Single numerical value that can arbitrarily go up and down
● Histogram
○ Samples observations (usually things like request durations or response sizes) and counts
them in configurable buckets. It also provides a sum of all observed values
● Summary
○ Histogram + total count of observations + sum of all observed values, it calculates
configurable quantiles over a sliding time window
Slide 58
Slide 58 text
Available Languages
● Official
○ Go, Java or Scala, Python, Ruby
● Unofficial
○ Bash, C++, Common Lisp, Elixir, Erlang, Haskell, Lua for Nginx, Lua for Tarantool, .NET / C#,
Node.js, PHP, Rust
→
Slide 59
Slide 59 text
Prometheus Client Libaries: SpringBoot Example
→
→
→
→
Slide 60
Slide 60 text
Demo: Application metrics
Slide 61
Slide 61 text
Code Demo
“Application metrics”
Slide 62
Slide 62 text
Service Discovery
(Consul) Integration
Slide 63
Slide 63 text
Demo: Consul Integration
Slide 64
Slide 64 text
Service Discovery
Slide 65
Slide 65 text
Demo: Consul integration
Register the services with
Consul and Monitor
1
2
Slide 66
Slide 66 text
Code Demo
“Consul to the rescue”
Slide 67
Slide 67 text
No content
Slide 68
Slide 68 text
That’s a wrap!
Question?
Marco Pas
Software geek, hands on
Developer/Architect/DevOps Engineer
@marcopas
https://github.com/mpas/infrastructure-and-system-monitoring-using-prometheus