Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Incorpore a Supernanny para seus microservices ...

Incorpore a Supernanny para seus microservices e entenda o que andam aprontando em produção

Concepts, tools and best practices of observability in microservices architecture.

Matheus Moraes

June 06, 2019
Tweet

More Decks by Matheus Moraes

Other Decks in Technology

Transcript

  1. Autor: Claudio Oliveira | [email protected] Autor: Matheus Moraes | [email protected]

    Data: 06/05/2019 Incorpore a Supernanny para seus microservices Entenda o que eles andam aprontando em produção
  2. ▰ Microservices ▰ North/South vs East/West Traffic ▰ Monitoring ▰

    Observability ◦ Logs ◦ Traces ◦ Metrics ▰ Kiali ▰ Demo AGENDA
  3. whoami I am Claudio de Oliveira Book Author, Speaker ,

    Software Architect and Developer @sensedia Spring, Java, Microservices and Docker enthusiast
  4. 1 The term "Microservice Architecture" ... there are certain common

    characteristics around organization around business capability... https:/ /martinfowler.com/articles/microservices.html
  5. “Monitoring is the practice of collecting signals, telemetry, traces, etc

    and aggregating them, and matching them against some pre-defined criteria of system states we should carefully watch. When we find that one of our signals has crossed a threshold and may be heading toward a known bad state, we take action to remedy the system - Christian Posta https://www.manning.com/books/istio-in-action
  6. “Observability on the other hand supposes up front that our

    systems are highly unpredictable and we cannot know all of the possible failure modes up front https://www.manning.com/books/istio-in-action
  7. We need to collect much more data, even high-cardinality data

    like userIDs, requestIDs, source IPs, etc where the entire set could be exponentially large https://www.manning.com/books/istio-in-action “
  8. a user goes to pay for the items in their

    cart and experiences a 10s delay choosing a payment option. All of the pre-defined metric thresholds (disk usage, queue depth, machine health, etc) might be at acceptable levels https://www.manning.com/books/istio-in-action “
  9. The 3 pillars of observability Do I have a problem?

    Dashboarding, Trending & Problem Detection Where exactly is my problem? Cross-Service Debug & Performance Optimization What is causing it? Root Cause & Forensics Logs Traces Metrics 1 2 3
  10. sensedia.com [INFO ][2019-05-28 00:51:18][de4c1b04-9ca1][c.e.m.g.u.domain.service.PaymentsService] - finding user by id 140708

    [WARN ][2019-05-28 00:51:18][de4c1b04-9ca1][c.e.m.g.u.domain.service.PaymentsService] - user non-cached, calling user service [ERROR][2019-05-28 00:51:18][de4c1b04-9ca1][c.e.m.g.u.domain.service.PaymentsService] - error when calling /users/140708 org.springframework.web.server.ResponseStatusException: 404 NOT_FOUND
  11. Do I have a problem? Dashboarding, Trending & Problem Detection

    Where exactly is my problem? Cross-Service Debug & Performance Optimization What is causing it? Root Cause & Forensics Logs Traces Metrics 1 2 3
  12. We need to understand the application's behavior and be able

    to troubleshoot problems. https://microservices.io/patterns/observability/distributed-tracing.html
  13. Assign external unique request id passes it to all services

    that are involved includes the request id in log messages Record times information e.g start and end time https://microservices.io/patterns/observability/distributed-tracing.html
  14. Solution should have minimal overhead External tool to analysis the

    data https://microservices.io/patterns/observability/distributed-tracing.html
  15. Do I have a problem? Dashboarding, Trending & Problem Detection

    Where exactly is my problem? Cross-Service Debug & Performance Optimization What is causing it? Root Cause & Forensics Logs Traces Metrics 1 2 3
  16. ↑ Cumulative, increasing metric payments_technology_total{env="prod",method="nfc",type="business"} 152.0 Counter ↑ ↓ Gauge

    Single metric that goes up or down payments_hikaricp_connections_active{env="prod",type="infra"} 4.0 Timer Samples and buckets observation payments_crypto_seconds_count{env="prod",type="infra"} 77.0 payments_crypto_seconds_sum{env="prod",type="infra"} 34.97
  17. Do I have a problem? Dashboarding, Trending & Problem Detection

    Where exactly is my problem? Cross-Service Debug & Performance Optimization What is causing it? Root Cause & Forensics Logs Traces Metrics 1 2 3
  18. Kiali project provides answers to the questions: What microservices are

    part of my Istio service mesh? How are they connected? How are they performing?