Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Improving Observability with Prometheus (Darkmira Tour PHP 2020)

Improving Observability with Prometheus (Darkmira Tour PHP 2020)

Talk presented online on December 13th at Darkmira Tour PHP 2020 https://php.darkmiratour.rocks/2020/schedule.html Demo available at https://github.com/wsilva/darkmira-prometheus-php-demo .

280fecb4f048de5ecf36bec281609ea4?s=128

Wellington F. Silva

December 13, 2020
Tweet

Transcript

  1. Improving Observability with Prometheus Darkmira Tour PHP 2020

  2. Wellington F. Silva contact: @_wsilva nicks: wsilva, boina, tom, fisi*

    Roles: pai, marido, tec. telecom, programador, sysadmin, docker community leader, instrutor, escritor, zend certified engineer e docker certified associate, certified kubernetes administrator * in deprecation
  3. Agenda • Observability • Monitoring • Prometheus • Tips

  4. Observability

  5. Observability Definition • https://dictionary.cambridge.org/us/dictionary/ english/observability • https://www.oxfordlearnersdictionaries.com/ spellcheck/english/?q=observability

  6. Observability Definition Cambridge:

  7. Observability Definition Oxford:

  8. Observability Definition: ¯\_(ツ)_/¯

  9. Observability Definition: Observe + Ability

  10. Observability 3 pilars:

  11. Observability 3 pilars: • Metrics

  12. Observability 3 pilars: • Metrics • Logging

  13. Observability 3 pilars: • Metrics • Logging • Tracing

  14. Observability 3 pilars: • Metrics • Logging • Tracing •

    Events
  15. Observability 3 pilars: • Metrics • Logging • Tracing •

    Events (kind of new) - MELT, or 4 golden signals
  16. Observability advantages

  17. Observability • Better deployments

  18. Observability • Better deployments • Improve time to market

  19. Observability • Better deployments • Improve time to market •

    Less toil
  20. Observability • Better deployments • Improve time to market •

    Less toil • Avoid premature optimisation
  21. Observability • Better deployments • Improve time to market •

    Less toil • Avoid premature optimisation • Improve resource utilisation
  22. Observability • Better deployments • Improve time to market •

    Less toil • Avoid premature optimisation • Improve resource utilisation • Lower costs
  23. Observability disadvantages

  24. Observability • Demand effort on coding and configuring

  25. Observability • Demand effort on coding and configuring • Could

    extends time to delivery
  26. Observability • Demand effort on coding and configuring • Could

    extends time to delivery • Constant neglected
  27. Monitoring

  28. Monitoring • Subset of observability

  29. Monitoring • Subset of observability • Show points where we

    start to dig
  30. Monitoring • Subset of observability • Show points where we

    start to dig • Makes it easier and faster to find bottlenecks
  31. Monitoring What metrics should we track?

  32. Monitoring What metrics should we track? ALL

  33. Monitoring https://www.aeroflap.com.br/uma-analise-evolucao-do-boeing-737/

  34. Monitoring Issue: More metrics more difficult to analyse

  35. Monitoring Issue: More metrics more difficult to analyse MELT become

    mess
  36. Monitoring http://aeroplanewallpaper.blogspot.com

  37. Monitoring More focus on more important things.

  38. Monitoring RED Method:

  39. Monitoring RED Method: • Rate

  40. Monitoring RED Method: • Rate • Errors

  41. Monitoring RED Method: • Rate • Errors • Duration

  42. Monitoring RED Method: • Rate • Errors • Duration •

    Saturation
  43. Monitoring RED Method: • Rate • Errors • Duration •

    Saturation (again this joke?)
  44. Monitoring Google’s SRE Book Way:

  45. Monitoring Google’s SRE Book Way: • SLI • SLO •

    SLA
  46. Monitoring SLI - Service Level Indicators

  47. Monitoring SLI - Service Level Indicator Depends on the team:

  48. Monitoring SLI - Service Level Indicator Depends on the team:

    • ops: cpu, memory, disk io, networking, nodes available, pods running, messages on queue
  49. Monitoring SLI - Service Level Indicator Depends on the team:

    • ops: cpu, memory, disk io, networking, nodes available, pods running, messages on queue • devs, response time, requests per second
  50. Monitoring SLI - Service Level Indicator Depends on the team:

    • ops: cpu, memory, disk io, networking, nodes available, pods running, messages on queue • devs, response time, requests per second • data engineers, time to run an ETL job, how many data are been processed, the freshness of the data
  51. Monitoring SLO - Service Level Objectives

  52. Monitoring SLO - Service Level Objectives • We should involve

    costumer to help define it
  53. Monitoring SLO - Service Level Objectives • We should involve

    costumer to help define it • Breaches must alert the team
  54. Monitoring SLO - Service Level Objectives • We should involve

    costumer to help define it • Breaches must alert the team • Use realistic objectives
  55. Monitoring SLO - Service Level Objectives • We should involve

    costumer to help define it • Breaches must alert the team • Use realistic objectives • Reevaluate the values periodically
  56. Monitoring SLA - Service Level Agreement

  57. Monitoring SLA - Service Level Agreement • Should be higher

    than the SLO. When SLO breaches it must alerts before SLA breaches
  58. Monitoring SLA - Service Level Agreement • Should be higher

    than the SLO. When SLO breaches it must alerts before SLA breaches • Pay attention on the agreement and honor it
  59. Prometheus

  60. Prometheus From the Greek Promēthéus, "forethought". He is a titan

    (second generation), son of Iapetus (son of Uranus; an incest between Uranus and Gaia) and brother of Atlas, Epimetheus and Menoetius. He was a defender of humanity, responsible for stealing Hestia's fire and give it to mortals.
  61. Prometheus • Metrics platform

  62. Prometheus • Metrics platform • Started in 2012 at SoundCloud

  63. Prometheus • Metrics platform • Started in 2012 at SoundCloud

    • Opensourced and published in 2015
  64. Prometheus • Metrics platform • Started in 2012 at SoundCloud

    • Opensourced and published in 2015 • Second project under CNCF (Cloud Native Computing Foundation)
  65. Prometheus • Metrics platform • Started in 2012 at SoundCloud

    • Opensourced and published in 2015 • Second project under CNCF (Cloud Native Computing Foundation) • Can also fire and manage alerts
  66. Prometheus • Metrics platform • Started in 2012 at SoundCloud

    • Opensourced and published in 2015 • Second project under CNCF (Cloud Native Computing Foundation) • Can also fire and manage alerts • Stores metric in time series database (TSDB)
  67. Prometheus • Pull based model (scale the exporter)

  68. Prometheus • Pull based model (scale the exporter) • Good

    for telemetry metrics and statistical metrics
  69. Prometheus • Pull based model (scale the exporter) • Good

    for telemetry metrics and statistical metrics • Known alternatives: graphite / collectd / carbon, zabbix (all push based)
  70. Prometheus Disadvantages: • Not too easy to horizontal scale

  71. Prometheus Disadvantages: • Not too easy to horizontal scale •

    No query cache
  72. Prometheus Disadvantages: • Not too easy to horizontal scale •

    No query cache • PromQL instead of regular SQL
  73. Prometheus Advantages: • Written in Go lang

  74. Prometheus Advantages: • Written in Go lang • Http based

    communication
  75. Prometheus Advantages: • Written in Go lang • Http based

    communication • Service discover integration (kubernetes, Swarm, Consul, AWS, GCP, etc)
  76. Prometheus Advantages: • Written in Go lang • Http based

    communication • Service discover integration (kubernetes, Swarm, Consul, AWS, GCP, etc) • Dashboard for alerts management
  77. Prometheus Advantages: • Written in Go lang • Http based

    communication • Service discover integration (kubernetes, Swarm, Consul, AWS, GCP, etc) • Dashboard for alerts management • Dashboard for query debugging
  78. Prometheus Advantages: • Multidimensional data model

  79. Prometheus Advantages: • Multidimensional data model • Easy to set

    up with Grafana
  80. Prometheus Advantages: • Multidimensional data model • Easy to set

    up with Grafana • PromQL ( kind of functional style, power for calculation)
  81. Prometheus

  82. Tips

  83. Tips Start with https://github.com/endclothing/ prometheus_client_php Package jimdo/prometheus_client_php is abandoned $

    composer require endclothing/ prometheus_client_php
  84. Tips To set up a counter $registry = \Prometheus\CollectorRegistry::getDefault(); $counter

    = $registry- >getOrRegisterCounter('demo', 'visitor_counter', 'it increases', ['type']); $counter->incBy(3, ['blue']);
  85. Tips To set up a gauge $registry = \Prometheus\CollectorRegistry::getDefault(); $gauge

    = $registry->getOrRegisterGauge('demo', 'score', 'it sets', ['type']); $gauge->set(2.5, ['blue']);
  86. Tips To set up an histogram $registry = \Prometheus\CollectorRegistry::getDefault(); $histogram

    = $registry- >getOrRegisterHistogram('demo', ‘secs_bucket', 'it observes', ['type'], [0.1, 1, 2, 3.5, 4, 5, 6, 7, 8, 9]); $histogram->observe(3.5, ['blue']);
  87. Tips To show the metrics to be scraped $registry =

    \Prometheus\CollectorRegistry::getDefault(); $renderer = new RenderTextFormat(); $result = $renderer->render( $registry->getMetricFamilySamples() ); header('Content-type: ' . RenderTextFormat::MIME_TYPE); echo $result;
  88. Tips Starts with RED method Set up the following query

    (ud:itentity:rate_10m < bool 1000) * 100 + (ud:error:percent_10m > bool 1.5) * 10 + (ud:read:duration_p99_10m < bool 25) * 1
  89. Tips Define a dashboard in Grafana that maps the following

    results: 111 = x Rate, x Errors, x Duration 110 = x Rate, x Errors 101 = x Rate, x Duration 100 = x Rate 011 = x Errors, x Duration 010 = x Errors 001 = x Duration 000 = Ok
  90. Demo

  91. Demo Available at: https://github.com/wsilva/darkmira-prometheus- php-demo

  92. Thank You ! Slides: https://speakerdeck.com/wsilva