Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Monitoring Wonderland

Monitoring Wonderland

Paul Seiffert

March 13, 2019
Tweet

Other Decks in Technology

Transcript

  1. M O N I T O R I N G

    W O N D E R L A N D H E L P, W H AT I S H A P P E N I N G ?
  2. PA U L S E I F F E RT

    Team Lead at Jimdo
 
 Cloud Infrastructure Engineer @seiffertp
 [email protected]
  3. • Jimdo’s internal PaaS that runs 400 services • 5000

    Docker containers at a time • ~600 deployments a Day W O N D E R L A N D
  4. W O N D E R L A N D

    AW S O T H E R S E R V I C E P R O V I D E R S I N F R A S T R U C T U R E A U T O M AT I O N A P I S M O N I T O R I N G , 
 L O G G I N G C L I T O O L S W O N D E R L A N D O T H E R T O O L I N G
  5. W O N D E R L A N D

    W O N D E R L A N D A P I AW S E C S E C S A G E N T L O G G I N G 
 D A E M O N M E T R I C 
 D A E M O N EC2
 Instance
  6. • Your team is responsible for the software component that

    delivers 20 million customer websites • You are on-call this night I M A G I N E …
  7. • either because a health check failed • or because

    a metric exceeded a configured threshold PA G E R D U T Y C A L L S
  8. H E A LT H C H E C K

    S A L E RT 
 M A N A G E R P R O M E T H E U S
  9. • All services on Wonderland: Route53 health checks • Infrastructure

    components: Pingdom checks A P I H E A LT H C H E C K S GET /health
 HTTP/1.1 200 OK
  10. • Workers write a metric after each processed message to

    the Prometheus pushgateway • For cron jobs, Wonderland automatically notifies cronitor.io about executions • Dead man’s switch: If not notified for a certain time an alert is created W O R K E R H E A LT H C H E C K S
  11. Run tests against production periodically,
 monitor results, and alert on

    issues S E M A N T I C M O N I T O R I N G S Y N T H E T I C M O N I T O R I N G
  12. S E R V I C E D A S

    H B O A R D
  13. G R A FA N A • Each service running

    on Wonderland automatically has a dashboard showing key metrics for debugging • Developers can create custom dashboards for more detailed analysis • Grafana pulls data from Prometheus instances
  14. P R O M E T H E U S

    • Semi-centralized metric system • Pull-based metric retrieval • On-the-fly calculation of derived metrics
  15. M E T R I C S I N F

    R A S T R U C T U R E M E T R I C S S Y S T E M M E T R I C S A P P L I C AT I O N M E T R I C S
  16. I N F R A S T R U C

    T U R E M E T R I C S P R O M E T H E U S C L O U D WAT C H E X P O RT E R AW S C U S T O M E X P O RT E R S W O N D E R L A N D A P I S
  17. E X A M P L E S aws_autoscaling_group_desired_capacity_average{ auto_scaling_group_name="crims",


    job="cloudwatch_exporter"
 } aws_elb_request_count_sum{
 cluster=“crims",
 job="wonderland_elb_exporter",
 service_name="web-prod"
 }
  18. S Y S T E M M E T R

    I C S P R O M E T H E U S C O L L E C T D C A D V I S O R
  19. E X A M P L E S container_memory_rss{
 container_label_cluster="crims",


    container_label_container_name="web-prod--web",
 image="web-prod:abc123",
 instance="10.8.4.91:9104",
 job=“crims_cadvisor_metrics"
 } collectd_memory{
 instance="10.8.4.42:9103",
 job="crims_collectd_metrics",
 memory="free"
 }
  20. A P P L I C AT I O N

    M E T R I C S P R O M E T H E U S C O N TA I N E R A C O N TA I N E R B … GET /metrics
  21. P R O M E T H E U S

    C O N TA I N E R A C O N TA I N E R B … W O N D E R L A N D S E R V I C E D I S C O V E RY W O N D E R L A N D A P I update
 config locate
 
 containers scrape
 metrics and
 reload S E R V I C E D I S C O V E RY
  22. http_requests_total{instance=“10.8.3.101:80”} = 53
 http_requests_total{instance=“10.8.3.102:80”} = 81
 http_requests_total{instance=“10.8.3.103:80”} = 2 ...

    job:http_requests_total:sum = sum(http_requests_total) without (instance) = 136 Automatically generated recording rules:

  23. L O N G - T E R M -

    P R O M E T H E U S S H O RT- T E R M 
 P R O M E T H E U S scrape
 
 filtered metrics 'match[]': - '{job="application_metrics", instance=""}' 3 2 D AY S 3 0 M I N F E D E R AT I O N
  24. L O N G - T E R M -

    P R O M E T H E U S S H O RT- T E R M 
 P R O M E T H E U S scrape
 
 filtered metrics http_requests_total{instance=“10.8.3.101:80”}
 http_requests_total{instance=“10.8.3.102:80”}
 http_requests_total{instance=“10.8.3.103:80”}
 ...
 job:http_requests_total:sum{} job:http_requests_total:sum{}
  25. S E R V I C E D A S

    H B O A R D
  26. L E T ’ S TA K E A L

    O O K AT T H E L O G S
  27. • Centralised logging is a must-have in a distributed system

    • It should be very easy to gather all information that concerns a service C E N T R A L I S E D L O G G I N G
  28. • Output of all services running on Wonderland is stored

    centrally • Optionally logs are parsed with configurable formats C E N T R A L I S E D L O G G I N G $ cat wonderland.yaml
 --- components: - name image: my-nginx-image logging: types: - access_log - error_log_nginx
  29. C E N T R A L I S E

    D L O G G I N G D O C K E R L O G B E AT L O G Z . I O fluentd
 
 protocol lumberjack
 
 protocol Wonderland Logbeat • receives logs via fluent protocol, • parses them, • adds metadata, • and streams them to our logging provider logz.io
  30. 4 : 1 7 A M You find this log

    message of the service autoscaler: Unable to scale-out service “web- delivery”. Configured maximum number of instances reached.
  31. 4 : 1 7 A M You increase the maximum

    number of instances: $ cat wonderland.yaml 
 […]
 auto-scaling:
 min-instances: 60
 max-instances: 150
  32. 2 : 0 0 P M In the PMA for

    this night’s incident, you create the action item to Monitor the number of instances of web-delivery to detect potential breaches of auto-scaling limits before affecting the system’s health
  33. F U RT H E R R E A D

    I N G / S O U R C E S • Beyer, Jones, Petoff & Murphy
 Site Reliability Engineering • Susan Fowler
 Production-Ready Microservices • Sam Newman
 Building Microservices • Stripe / Increment
 On-Call (https://increment.com/on-call/) • Mathias Lafeldt & Paul Seiffert
 A Journey Through Wonderland
 (https://speakerdeck.com/mlafeldt/a-journey-through-wonderland)
  34. F O T O S • Marcel Stockmann
 https://www.flickr.com/photos/marcelstockmann/33068471286 •

    Michael Theis
 https://www.flickr.com/photos/huskyte/6931056896