Improving Observability with Prometheus (Darkmira Tour PHP 2020)

Improving Observability with Prometheus Darkmira Tour PHP 2020

Wellington F. Silva contact: @_wsilva nicks: wsilva, boina, tom, fisi*
Roles: pai, marido, tec. telecom, programador, sysadmin, docker community leader, instrutor, escritor, zend certified engineer e docker certified associate, certified kubernetes administrator * in deprecation

Agenda • Observability • Monitoring • Prometheus • Tips

Observability

Observability Definition • https://dictionary.cambridge.org/us/dictionary/ english/observability • https://www.oxfordlearnersdictionaries.com/ spellcheck/english/?q=observability

Observability Definition Cambridge:

Observability Definition Oxford:

Observability Definition: ¯\_(ツ)_/¯

Observability Definition: Observe + Ability

Observability 3 pilars:

Observability 3 pilars: • Metrics

Observability 3 pilars: • Metrics • Logging

Observability 3 pilars: • Metrics • Logging • Tracing

Observability 3 pilars: • Metrics • Logging • Tracing •
Events

Observability 3 pilars: • Metrics • Logging • Tracing •
Events (kind of new) - MELT, or 4 golden signals

Observability advantages

Observability • Better deployments

Observability • Better deployments • Improve time to market

Observability • Better deployments • Improve time to market •
Less toil

Less toil • Avoid premature optimisation

Less toil • Avoid premature optimisation • Improve resource utilisation

Less toil • Avoid premature optimisation • Improve resource utilisation • Lower costs

Observability disadvantages

Observability • Demand effort on coding and configuring

Observability • Demand effort on coding and configuring • Could
extends time to delivery

Observability • Demand effort on coding and configuring • Could
extends time to delivery • Constant neglected

Monitoring

Monitoring • Subset of observability

Monitoring • Subset of observability • Show points where we
start to dig

Monitoring • Subset of observability • Show points where we
start to dig • Makes it easier and faster to find bottlenecks

Monitoring What metrics should we track?

Monitoring What metrics should we track? ALL

Monitoring https://www.aeroﬂap.com.br/uma-analise-evolucao-do-boeing-737/

Monitoring Issue: More metrics more difficult to analyse

Monitoring Issue: More metrics more difficult to analyse MELT become
mess

Monitoring http://aeroplanewallpaper.blogspot.com

Monitoring More focus on more important things.

Monitoring RED Method:

Monitoring RED Method: • Rate

Monitoring RED Method: • Rate • Errors

Monitoring RED Method: • Rate • Errors • Duration

Monitoring RED Method: • Rate • Errors • Duration •
Saturation

Monitoring RED Method: • Rate • Errors • Duration •
Saturation (again this joke?)

Monitoring Google’s SRE Book Way:

Monitoring Google’s SRE Book Way: • SLI • SLO •
SLA

Monitoring SLI - Service Level Indicators

Monitoring SLI - Service Level Indicator Depends on the team:

• ops: cpu, memory, disk io, networking, nodes available, pods running, messages on queue

• ops: cpu, memory, disk io, networking, nodes available, pods running, messages on queue • devs, response time, requests per second

• ops: cpu, memory, disk io, networking, nodes available, pods running, messages on queue • devs, response time, requests per second • data engineers, time to run an ETL job, how many data are been processed, the freshness of the data

Monitoring SLO - Service Level Objectives

Monitoring SLO - Service Level Objectives • We should involve
costumer to help define it

costumer to help define it • Breaches must alert the team

costumer to help define it • Breaches must alert the team • Use realistic objectives

costumer to help define it • Breaches must alert the team • Use realistic objectives • Reevaluate the values periodically

Monitoring SLA - Service Level Agreement

Monitoring SLA - Service Level Agreement • Should be higher
than the SLO. When SLO breaches it must alerts before SLA breaches

Monitoring SLA - Service Level Agreement • Should be higher
than the SLO. When SLO breaches it must alerts before SLA breaches • Pay attention on the agreement and honor it

Prometheus

Prometheus From the Greek Promēthéus, "forethought". He is a titan
(second generation), son of Iapetus (son of Uranus; an incest between Uranus and Gaia) and brother of Atlas, Epimetheus and Menoetius. He was a defender of humanity, responsible for stealing Hestia's fire and give it to mortals.

Prometheus • Metrics platform

Prometheus • Metrics platform • Started in 2012 at SoundCloud

• Opensourced and published in 2015

• Opensourced and published in 2015 • Second project under CNCF (Cloud Native Computing Foundation)

• Opensourced and published in 2015 • Second project under CNCF (Cloud Native Computing Foundation) • Can also fire and manage alerts

• Opensourced and published in 2015 • Second project under CNCF (Cloud Native Computing Foundation) • Can also fire and manage alerts • Stores metric in time series database (TSDB)

Prometheus • Pull based model (scale the exporter)

Prometheus • Pull based model (scale the exporter) • Good
for telemetry metrics and statistical metrics

Prometheus • Pull based model (scale the exporter) • Good
for telemetry metrics and statistical metrics • Known alternatives: graphite / collectd / carbon, zabbix (all push based)

Prometheus Disadvantages: • Not too easy to horizontal scale

Prometheus Disadvantages: • Not too easy to horizontal scale •
No query cache

Prometheus Disadvantages: • Not too easy to horizontal scale •
No query cache • PromQL instead of regular SQL

Prometheus Advantages: • Written in Go lang

Prometheus Advantages: • Written in Go lang • Http based
communication

communication • Service discover integration (kubernetes, Swarm, Consul, AWS, GCP, etc)

communication • Service discover integration (kubernetes, Swarm, Consul, AWS, GCP, etc) • Dashboard for alerts management

communication • Service discover integration (kubernetes, Swarm, Consul, AWS, GCP, etc) • Dashboard for alerts management • Dashboard for query debugging

Prometheus Advantages: • Multidimensional data model

Prometheus Advantages: • Multidimensional data model • Easy to set
up with Grafana

Prometheus Advantages: • Multidimensional data model • Easy to set
up with Grafana • PromQL ( kind of functional style, power for calculation)

Prometheus

Tips Start with https://github.com/endclothing/ prometheus_client_php Package jimdo/prometheus_client_php is abandoned $
composer require endclothing/ prometheus_client_php

Tips To set up a counter $registry = \Prometheus\CollectorRegistry::getDefault(); $counter
= $registry- >getOrRegisterCounter('demo', 'visitor_counter', 'it increases', ['type']); $counter->incBy(3, ['blue']);

Tips To set up a gauge $registry = \Prometheus\CollectorRegistry::getDefault(); $gauge
= $registry->getOrRegisterGauge('demo', 'score', 'it sets', ['type']); $gauge->set(2.5, ['blue']);

Tips To set up an histogram $registry = \Prometheus\CollectorRegistry::getDefault(); $histogram
= $registry- >getOrRegisterHistogram('demo', ‘secs_bucket', 'it observes', ['type'], [0.1, 1, 2, 3.5, 4, 5, 6, 7, 8, 9]); $histogram->observe(3.5, ['blue']);

Tips To show the metrics to be scraped $registry =
\Prometheus\CollectorRegistry::getDefault(); $renderer = new RenderTextFormat(); $result = $renderer->render( $registry->getMetricFamilySamples() ); header('Content-type: ' . RenderTextFormat::MIME_TYPE); echo $result;

Tips Starts with RED method Set up the following query
(ud:itentity:rate_10m < bool 1000) * 100 + (ud:error:percent_10m > bool 1.5) * 10 + (ud:read:duration_p99_10m < bool 25) * 1

Tips Define a dashboard in Grafana that maps the following
results: 111 = x Rate, x Errors, x Duration 110 = x Rate, x Errors 101 = x Rate, x Duration 100 = x Rate 011 = x Errors, x Duration 010 = x Errors 001 = x Duration 000 = Ok

Demo Available at: https://github.com/wsilva/darkmira-prometheus- php-demo

Thank You ! Slides: https://speakerdeck.com/wsilva

Improving Observability with Prometheus (Darkmi...

Improving Observability with Prometheus (Darkmira Tour PHP 2020)

More Decks by Wellington F. Silva

Other Decks in Technology

Featured

Transcript