Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Improving Observability with Prometheus (Darkmira Tour PHP 2020)

Improving Observability with Prometheus (Darkmira Tour PHP 2020)

Talk presented online on December 13th at Darkmira Tour PHP 2020 https://php.darkmiratour.rocks/2020/schedule.html Demo available at https://github.com/wsilva/darkmira-prometheus-php-demo .

Wellington F. Silva

December 13, 2020
Tweet

More Decks by Wellington F. Silva

Other Decks in Technology

Transcript

  1. Improving
    Observability with
    Prometheus
    Darkmira Tour PHP 2020

    View full-size slide

  2. Wellington F. Silva
    contact:
    @_wsilva
    nicks:
    wsilva, boina, tom, fisi*
    Roles:
    pai, marido, tec. telecom,
    programador, sysadmin,
    docker community leader,
    instrutor, escritor, zend
    certified engineer e docker
    certified associate, certified
    kubernetes administrator
    * in deprecation

    View full-size slide

  3. Agenda
    • Observability
    • Monitoring
    • Prometheus
    • Tips

    View full-size slide

  4. Observability

    View full-size slide

  5. Observability
    Definition
    • https://dictionary.cambridge.org/us/dictionary/
    english/observability
    • https://www.oxfordlearnersdictionaries.com/
    spellcheck/english/?q=observability

    View full-size slide

  6. Observability
    Definition Cambridge:

    View full-size slide

  7. Observability
    Definition Oxford:

    View full-size slide

  8. Observability
    Definition:
    ¯\_(ツ)_/¯

    View full-size slide

  9. Observability
    Definition:
    Observe + Ability

    View full-size slide

  10. Observability
    3 pilars:

    View full-size slide

  11. Observability
    3 pilars:
    • Metrics

    View full-size slide

  12. Observability
    3 pilars:
    • Metrics
    • Logging

    View full-size slide

  13. Observability
    3 pilars:
    • Metrics
    • Logging
    • Tracing

    View full-size slide

  14. Observability
    3 pilars:
    • Metrics
    • Logging
    • Tracing
    • Events

    View full-size slide

  15. Observability
    3 pilars:
    • Metrics
    • Logging
    • Tracing
    • Events (kind of new) - MELT, or 4 golden
    signals

    View full-size slide

  16. Observability
    advantages

    View full-size slide

  17. Observability
    • Better deployments

    View full-size slide

  18. Observability
    • Better deployments
    • Improve time to market

    View full-size slide

  19. Observability
    • Better deployments
    • Improve time to market
    • Less toil

    View full-size slide

  20. Observability
    • Better deployments
    • Improve time to market
    • Less toil
    • Avoid premature optimisation

    View full-size slide

  21. Observability
    • Better deployments
    • Improve time to market
    • Less toil
    • Avoid premature optimisation
    • Improve resource utilisation

    View full-size slide

  22. Observability
    • Better deployments
    • Improve time to market
    • Less toil
    • Avoid premature optimisation
    • Improve resource utilisation
    • Lower costs

    View full-size slide

  23. Observability
    disadvantages

    View full-size slide

  24. Observability
    • Demand effort on coding and configuring

    View full-size slide

  25. Observability
    • Demand effort on coding and configuring
    • Could extends time to delivery

    View full-size slide

  26. Observability
    • Demand effort on coding and configuring
    • Could extends time to delivery
    • Constant neglected

    View full-size slide

  27. Monitoring
    • Subset of observability

    View full-size slide

  28. Monitoring
    • Subset of observability
    • Show points where we start to dig

    View full-size slide

  29. Monitoring
    • Subset of observability
    • Show points where we start to dig
    • Makes it easier and faster to find bottlenecks

    View full-size slide

  30. Monitoring
    What metrics should we track?

    View full-size slide

  31. Monitoring
    What metrics should we track?
    ALL

    View full-size slide

  32. Monitoring
    https://www.aeroflap.com.br/uma-analise-evolucao-do-boeing-737/

    View full-size slide

  33. Monitoring
    Issue: More metrics more difficult to analyse

    View full-size slide

  34. Monitoring
    Issue: More metrics more difficult to analyse
    MELT become mess

    View full-size slide

  35. Monitoring
    http://aeroplanewallpaper.blogspot.com

    View full-size slide

  36. Monitoring
    More focus on more important things.

    View full-size slide

  37. Monitoring
    RED Method:

    View full-size slide

  38. Monitoring
    RED Method:
    • Rate

    View full-size slide

  39. Monitoring
    RED Method:
    • Rate
    • Errors

    View full-size slide

  40. Monitoring
    RED Method:
    • Rate
    • Errors
    • Duration

    View full-size slide

  41. Monitoring
    RED Method:
    • Rate
    • Errors
    • Duration
    • Saturation

    View full-size slide

  42. Monitoring
    RED Method:
    • Rate
    • Errors
    • Duration
    • Saturation (again this joke?)

    View full-size slide

  43. Monitoring
    Google’s SRE Book Way:

    View full-size slide

  44. Monitoring
    Google’s SRE Book Way:
    • SLI
    • SLO
    • SLA

    View full-size slide

  45. Monitoring
    SLI - Service Level Indicators

    View full-size slide

  46. Monitoring
    SLI - Service Level Indicator
    Depends on the team:

    View full-size slide

  47. Monitoring
    SLI - Service Level Indicator
    Depends on the team:
    • ops: cpu, memory, disk io, networking, nodes
    available, pods running, messages on queue

    View full-size slide

  48. Monitoring
    SLI - Service Level Indicator
    Depends on the team:
    • ops: cpu, memory, disk io, networking, nodes
    available, pods running, messages on queue
    • devs, response time, requests per second

    View full-size slide

  49. Monitoring
    SLI - Service Level Indicator
    Depends on the team:
    • ops: cpu, memory, disk io, networking, nodes
    available, pods running, messages on queue
    • devs, response time, requests per second
    • data engineers, time to run an ETL job, how
    many data are been processed, the freshness
    of the data

    View full-size slide

  50. Monitoring
    SLO - Service Level Objectives

    View full-size slide

  51. Monitoring
    SLO - Service Level Objectives
    • We should involve costumer to help define it

    View full-size slide

  52. Monitoring
    SLO - Service Level Objectives
    • We should involve costumer to help define it
    • Breaches must alert the team

    View full-size slide

  53. Monitoring
    SLO - Service Level Objectives
    • We should involve costumer to help define it
    • Breaches must alert the team
    • Use realistic objectives

    View full-size slide

  54. Monitoring
    SLO - Service Level Objectives
    • We should involve costumer to help define it
    • Breaches must alert the team
    • Use realistic objectives
    • Reevaluate the values periodically

    View full-size slide

  55. Monitoring
    SLA - Service Level Agreement

    View full-size slide

  56. Monitoring
    SLA - Service Level Agreement
    • Should be higher than the SLO. When SLO
    breaches it must alerts before SLA breaches

    View full-size slide

  57. Monitoring
    SLA - Service Level Agreement
    • Should be higher than the SLO. When SLO
    breaches it must alerts before SLA breaches
    • Pay attention on the agreement and honor it

    View full-size slide

  58. Prometheus
    From the Greek Promēthéus,
    "forethought". He is a titan
    (second generation), son of
    Iapetus (son of Uranus; an
    incest between Uranus and
    Gaia) and brother of Atlas,
    Epimetheus and Menoetius.
    He was a defender of
    humanity, responsible for
    stealing Hestia's fire and give
    it to mortals.

    View full-size slide

  59. Prometheus
    • Metrics platform

    View full-size slide

  60. Prometheus
    • Metrics platform
    • Started in 2012 at SoundCloud

    View full-size slide

  61. Prometheus
    • Metrics platform
    • Started in 2012 at SoundCloud
    • Opensourced and published in 2015

    View full-size slide

  62. Prometheus
    • Metrics platform
    • Started in 2012 at SoundCloud
    • Opensourced and published in 2015
    • Second project under CNCF (Cloud Native
    Computing Foundation)

    View full-size slide

  63. Prometheus
    • Metrics platform
    • Started in 2012 at SoundCloud
    • Opensourced and published in 2015
    • Second project under CNCF (Cloud Native
    Computing Foundation)
    • Can also fire and manage alerts

    View full-size slide

  64. Prometheus
    • Metrics platform
    • Started in 2012 at SoundCloud
    • Opensourced and published in 2015
    • Second project under CNCF (Cloud Native
    Computing Foundation)
    • Can also fire and manage alerts
    • Stores metric in time series database (TSDB)

    View full-size slide

  65. Prometheus
    • Pull based model (scale the exporter)

    View full-size slide

  66. Prometheus
    • Pull based model (scale the exporter)
    • Good for telemetry metrics and statistical
    metrics

    View full-size slide

  67. Prometheus
    • Pull based model (scale the exporter)
    • Good for telemetry metrics and statistical
    metrics
    • Known alternatives: graphite / collectd /
    carbon, zabbix (all push based)

    View full-size slide

  68. Prometheus
    Disadvantages:
    • Not too easy to horizontal scale

    View full-size slide

  69. Prometheus
    Disadvantages:
    • Not too easy to horizontal scale
    • No query cache

    View full-size slide

  70. Prometheus
    Disadvantages:
    • Not too easy to horizontal scale
    • No query cache
    • PromQL instead of regular SQL

    View full-size slide

  71. Prometheus
    Advantages:
    • Written in Go lang

    View full-size slide

  72. Prometheus
    Advantages:
    • Written in Go lang
    • Http based communication

    View full-size slide

  73. Prometheus
    Advantages:
    • Written in Go lang
    • Http based communication
    • Service discover integration (kubernetes,
    Swarm, Consul, AWS, GCP, etc)

    View full-size slide

  74. Prometheus
    Advantages:
    • Written in Go lang
    • Http based communication
    • Service discover integration (kubernetes,
    Swarm, Consul, AWS, GCP, etc)
    • Dashboard for alerts management

    View full-size slide

  75. Prometheus
    Advantages:
    • Written in Go lang
    • Http based communication
    • Service discover integration (kubernetes,
    Swarm, Consul, AWS, GCP, etc)
    • Dashboard for alerts management
    • Dashboard for query debugging

    View full-size slide

  76. Prometheus
    Advantages:
    • Multidimensional data model

    View full-size slide

  77. Prometheus
    Advantages:
    • Multidimensional data model
    • Easy to set up with Grafana

    View full-size slide

  78. Prometheus
    Advantages:
    • Multidimensional data model
    • Easy to set up with Grafana
    • PromQL ( kind of functional style, power for
    calculation)

    View full-size slide

  79. Tips
    Start with https://github.com/endclothing/
    prometheus_client_php
    Package jimdo/prometheus_client_php is
    abandoned
    $ composer require endclothing/
    prometheus_client_php

    View full-size slide

  80. Tips
    To set up a counter
    $registry =
    \Prometheus\CollectorRegistry::getDefault();
    $counter = $registry-
    >getOrRegisterCounter('demo', 'visitor_counter',
    'it increases', ['type']);
    $counter->incBy(3, ['blue']);

    View full-size slide

  81. Tips
    To set up a gauge
    $registry =
    \Prometheus\CollectorRegistry::getDefault();
    $gauge = $registry->getOrRegisterGauge('demo',
    'score', 'it sets', ['type']);
    $gauge->set(2.5, ['blue']);

    View full-size slide

  82. Tips
    To set up an histogram
    $registry =
    \Prometheus\CollectorRegistry::getDefault();
    $histogram = $registry-
    >getOrRegisterHistogram('demo', ‘secs_bucket',
    'it observes', ['type'], [0.1, 1, 2, 3.5, 4, 5,
    6, 7, 8, 9]);
    $histogram->observe(3.5, ['blue']);

    View full-size slide

  83. Tips
    To show the metrics to be scraped
    $registry =
    \Prometheus\CollectorRegistry::getDefault();
    $renderer = new RenderTextFormat();
    $result = $renderer->render(
    $registry->getMetricFamilySamples()
    );
    header('Content-type: ' .
    RenderTextFormat::MIME_TYPE);
    echo $result;

    View full-size slide

  84. Tips
    Starts with RED method
    Set up the following query
    (ud:itentity:rate_10m < bool 1000) * 100 +
    (ud:error:percent_10m > bool 1.5) * 10 +
    (ud:read:duration_p99_10m < bool 25) * 1

    View full-size slide

  85. Tips
    Define a dashboard in Grafana that maps the following
    results:
    111 = x Rate, x Errors, x Duration
    110 = x Rate, x Errors
    101 = x Rate, x Duration
    100 = x Rate
    011 = x Errors, x Duration
    010 = x Errors
    001 = x Duration
    000 = Ok

    View full-size slide

  86. Demo
    Available at:
    https://github.com/wsilva/darkmira-prometheus-
    php-demo

    View full-size slide

  87. Thank You !
    Slides: https://speakerdeck.com/wsilva

    View full-size slide