Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Improving Observability with Prometheus (Darkmira Tour PHP 2020)

Improving Observability with Prometheus (Darkmira Tour PHP 2020)

Talk presented online on December 13th at Darkmira Tour PHP 2020 https://php.darkmiratour.rocks/2020/schedule.html Demo available at https://github.com/wsilva/darkmira-prometheus-php-demo .

280fecb4f048de5ecf36bec281609ea4?s=128

Wellington F. Silva

December 13, 2020
Tweet

More Decks by Wellington F. Silva

Other Decks in Technology

Transcript

  1. Improving Observability with Prometheus Darkmira Tour PHP 2020

  2. Wellington F. Silva contact: @_wsilva nicks: wsilva, boina, tom, fisi*

    Roles: pai, marido, tec. telecom, programador, sysadmin, docker community leader, instrutor, escritor, zend certified engineer e docker certified associate, certified kubernetes administrator * in deprecation
  3. Agenda • Observability • Monitoring • Prometheus • Tips

  4. Observability

  5. Observability Definition • https://dictionary.cambridge.org/us/dictionary/ english/observability • https://www.oxfordlearnersdictionaries.com/ spellcheck/english/?q=observability

  6. Observability Definition Cambridge:

  7. Observability Definition Oxford:

  8. Observability Definition: ¯\_(ツ)_/¯

  9. Observability Definition: Observe + Ability

  10. Observability 3 pilars:

  11. Observability 3 pilars: • Metrics

  12. Observability 3 pilars: • Metrics • Logging

  13. Observability 3 pilars: • Metrics • Logging • Tracing

  14. Observability 3 pilars: • Metrics • Logging • Tracing •

    Events
  15. Observability 3 pilars: • Metrics • Logging • Tracing •

    Events (kind of new) - MELT, or 4 golden signals
  16. Observability advantages

  17. Observability • Better deployments

  18. Observability • Better deployments • Improve time to market

  19. Observability • Better deployments • Improve time to market •

    Less toil
  20. Observability • Better deployments • Improve time to market •

    Less toil • Avoid premature optimisation
  21. Observability • Better deployments • Improve time to market •

    Less toil • Avoid premature optimisation • Improve resource utilisation
  22. Observability • Better deployments • Improve time to market •

    Less toil • Avoid premature optimisation • Improve resource utilisation • Lower costs
  23. Observability disadvantages

  24. Observability • Demand effort on coding and configuring

  25. Observability • Demand effort on coding and configuring • Could

    extends time to delivery
  26. Observability • Demand effort on coding and configuring • Could

    extends time to delivery • Constant neglected
  27. Monitoring

  28. Monitoring • Subset of observability

  29. Monitoring • Subset of observability • Show points where we

    start to dig
  30. Monitoring • Subset of observability • Show points where we

    start to dig • Makes it easier and faster to find bottlenecks
  31. Monitoring What metrics should we track?

  32. Monitoring What metrics should we track? ALL

  33. Monitoring https://www.aeroflap.com.br/uma-analise-evolucao-do-boeing-737/

  34. Monitoring Issue: More metrics more difficult to analyse

  35. Monitoring Issue: More metrics more difficult to analyse MELT become

    mess
  36. Monitoring http://aeroplanewallpaper.blogspot.com

  37. Monitoring More focus on more important things.

  38. Monitoring RED Method:

  39. Monitoring RED Method: • Rate

  40. Monitoring RED Method: • Rate • Errors

  41. Monitoring RED Method: • Rate • Errors • Duration

  42. Monitoring RED Method: • Rate • Errors • Duration •

    Saturation
  43. Monitoring RED Method: • Rate • Errors • Duration •

    Saturation (again this joke?)
  44. Monitoring Google’s SRE Book Way:

  45. Monitoring Google’s SRE Book Way: • SLI • SLO •

    SLA
  46. Monitoring SLI - Service Level Indicators

  47. Monitoring SLI - Service Level Indicator Depends on the team:

  48. Monitoring SLI - Service Level Indicator Depends on the team:

    • ops: cpu, memory, disk io, networking, nodes available, pods running, messages on queue
  49. Monitoring SLI - Service Level Indicator Depends on the team:

    • ops: cpu, memory, disk io, networking, nodes available, pods running, messages on queue • devs, response time, requests per second
  50. Monitoring SLI - Service Level Indicator Depends on the team:

    • ops: cpu, memory, disk io, networking, nodes available, pods running, messages on queue • devs, response time, requests per second • data engineers, time to run an ETL job, how many data are been processed, the freshness of the data
  51. Monitoring SLO - Service Level Objectives

  52. Monitoring SLO - Service Level Objectives • We should involve

    costumer to help define it
  53. Monitoring SLO - Service Level Objectives • We should involve

    costumer to help define it • Breaches must alert the team
  54. Monitoring SLO - Service Level Objectives • We should involve

    costumer to help define it • Breaches must alert the team • Use realistic objectives
  55. Monitoring SLO - Service Level Objectives • We should involve

    costumer to help define it • Breaches must alert the team • Use realistic objectives • Reevaluate the values periodically
  56. Monitoring SLA - Service Level Agreement

  57. Monitoring SLA - Service Level Agreement • Should be higher

    than the SLO. When SLO breaches it must alerts before SLA breaches
  58. Monitoring SLA - Service Level Agreement • Should be higher

    than the SLO. When SLO breaches it must alerts before SLA breaches • Pay attention on the agreement and honor it
  59. Prometheus

  60. Prometheus From the Greek Promēthéus, "forethought". He is a titan

    (second generation), son of Iapetus (son of Uranus; an incest between Uranus and Gaia) and brother of Atlas, Epimetheus and Menoetius. He was a defender of humanity, responsible for stealing Hestia's fire and give it to mortals.
  61. Prometheus • Metrics platform

  62. Prometheus • Metrics platform • Started in 2012 at SoundCloud

  63. Prometheus • Metrics platform • Started in 2012 at SoundCloud

    • Opensourced and published in 2015
  64. Prometheus • Metrics platform • Started in 2012 at SoundCloud

    • Opensourced and published in 2015 • Second project under CNCF (Cloud Native Computing Foundation)
  65. Prometheus • Metrics platform • Started in 2012 at SoundCloud

    • Opensourced and published in 2015 • Second project under CNCF (Cloud Native Computing Foundation) • Can also fire and manage alerts
  66. Prometheus • Metrics platform • Started in 2012 at SoundCloud

    • Opensourced and published in 2015 • Second project under CNCF (Cloud Native Computing Foundation) • Can also fire and manage alerts • Stores metric in time series database (TSDB)
  67. Prometheus • Pull based model (scale the exporter)

  68. Prometheus • Pull based model (scale the exporter) • Good

    for telemetry metrics and statistical metrics
  69. Prometheus • Pull based model (scale the exporter) • Good

    for telemetry metrics and statistical metrics • Known alternatives: graphite / collectd / carbon, zabbix (all push based)
  70. Prometheus Disadvantages: • Not too easy to horizontal scale

  71. Prometheus Disadvantages: • Not too easy to horizontal scale •

    No query cache
  72. Prometheus Disadvantages: • Not too easy to horizontal scale •

    No query cache • PromQL instead of regular SQL
  73. Prometheus Advantages: • Written in Go lang

  74. Prometheus Advantages: • Written in Go lang • Http based

    communication
  75. Prometheus Advantages: • Written in Go lang • Http based

    communication • Service discover integration (kubernetes, Swarm, Consul, AWS, GCP, etc)
  76. Prometheus Advantages: • Written in Go lang • Http based

    communication • Service discover integration (kubernetes, Swarm, Consul, AWS, GCP, etc) • Dashboard for alerts management
  77. Prometheus Advantages: • Written in Go lang • Http based

    communication • Service discover integration (kubernetes, Swarm, Consul, AWS, GCP, etc) • Dashboard for alerts management • Dashboard for query debugging
  78. Prometheus Advantages: • Multidimensional data model

  79. Prometheus Advantages: • Multidimensional data model • Easy to set

    up with Grafana
  80. Prometheus Advantages: • Multidimensional data model • Easy to set

    up with Grafana • PromQL ( kind of functional style, power for calculation)
  81. Prometheus

  82. Tips

  83. Tips Start with https://github.com/endclothing/ prometheus_client_php Package jimdo/prometheus_client_php is abandoned $

    composer require endclothing/ prometheus_client_php
  84. Tips To set up a counter $registry = \Prometheus\CollectorRegistry::getDefault(); $counter

    = $registry- >getOrRegisterCounter('demo', 'visitor_counter', 'it increases', ['type']); $counter->incBy(3, ['blue']);
  85. Tips To set up a gauge $registry = \Prometheus\CollectorRegistry::getDefault(); $gauge

    = $registry->getOrRegisterGauge('demo', 'score', 'it sets', ['type']); $gauge->set(2.5, ['blue']);
  86. Tips To set up an histogram $registry = \Prometheus\CollectorRegistry::getDefault(); $histogram

    = $registry- >getOrRegisterHistogram('demo', ‘secs_bucket', 'it observes', ['type'], [0.1, 1, 2, 3.5, 4, 5, 6, 7, 8, 9]); $histogram->observe(3.5, ['blue']);
  87. Tips To show the metrics to be scraped $registry =

    \Prometheus\CollectorRegistry::getDefault(); $renderer = new RenderTextFormat(); $result = $renderer->render( $registry->getMetricFamilySamples() ); header('Content-type: ' . RenderTextFormat::MIME_TYPE); echo $result;
  88. Tips Starts with RED method Set up the following query

    (ud:itentity:rate_10m < bool 1000) * 100 + (ud:error:percent_10m > bool 1.5) * 10 + (ud:read:duration_p99_10m < bool 25) * 1
  89. Tips Define a dashboard in Grafana that maps the following

    results: 111 = x Rate, x Errors, x Duration 110 = x Rate, x Errors 101 = x Rate, x Duration 100 = x Rate 011 = x Errors, x Duration 010 = x Errors 001 = x Duration 000 = Ok
  90. Demo

  91. Demo Available at: https://github.com/wsilva/darkmira-prometheus- php-demo

  92. Thank You ! Slides: https://speakerdeck.com/wsilva