Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Microservices Observability Zup Open Talks

Microservices Observability Zup Open Talks

Nesta talk falamos sobre Observability para microservices e como podemos fazer Canary Releases usando uns dos Pilares, as metricas

Transcript

  1. Microservices Observability

  2. Hello! I am Cláudio Oliveira Technical Lead API Team Book

    Author @luizalabs Java, Golang, k8s & microservices 2
  3. Agenda • Metrics • Distributed Tracing • Logs • Progressive

    Delivery • Demos 3
  4. 4

  5. Glossary 5 Telemetry How to collect data that will provide

    observability (sensors) Observability Monitoring, Alerting and Visualizations, Distributing tracing and Log Aggregation
  6. Glossary 6 Monitoring Is the practice of collecting signals, aggregating

    them, and matching them against some predefined criteria
  7. 7 Microservices Drawbacks Sh*** happens

  8. Fallacies of Distributed Computing

  9. Fallacies of Distributed Computing 9 • Network is Reliable •

    Latency is Zero • Bandwidth is Infinite
  10. 10 • Understand how microservices connect each other • Network

    latencies can be a bottleneck (intense IPC • Network can be unreliable • Control the UP and Running instances • Increase the non-functional requirements Microservices implies “some” challenges
  11. 11

  12. Metrics

  13. 13 Metrics are the only way to get your job

    done
  14. 14 RED pattern to monitor Services

  15. 15 R - the number of request per second

  16. 16 E - the number of failed requests per second

  17. 17 D - distributions of the amount of time each

    request takes
  18. “ The benefits of treating each service the same, from

    a monitoring perspective, is scalability in your operations teams 18
  19. Use case

  20. 20

  21. 21

  22. Distributed Tracing

  23. How it works??? 23 • Assign external Unique ID •

    Passes it to all services that are involved • Includes the Request ID in Log Messages • Record times information e.g start and end time
  24. OpenTracing OpenTelemetry 24 • Cloud Native Computing Foundation CNCF •

    It standardizes the instrumentation of apps for distributed tracing
  25. OpenTracing OpenTelemetry Concepts 25 • Trace tells the story of

    a transaction • Span represents a single call • Distributed Tracing systems collecting and we can see the graph in a nice interface
  26. 26

  27. Logs

  28. Use 5’s W!!!! 28 • who • what • when

    • where • why
  29. Use severity correctly!!! 29 • INFO • DEBUG • WARNING

    • ERROR
  30. 30

  31. Aggregate Logs Microservices is distributed systems

  32. 32

  33. 33

  34. What are my opinions to get observability done??? 34

  35. There are two ways to solve this problem 35

  36. Before Service Mesh

  37. 37

  38. 38 Things to think about

  39. Concerns about observability in the app 39 • Increase the

    size of application • Configuration should be done inside the application • It will consume the application resources • More control to “customize” metrics and distributed tracing • There is no sidecar involved
  40. After Service Mesh

  41. 41

  42. 42 Things to think about

  43. Concerns about observability with sidecar 43 • One more thing

    to care about • Control Plane should configure the sidecars • Not so intrusive • The developers can focus on business rules • It is a kind of industry standard today
  44. 44

  45. Progressive App Delivery with ArgoCD && Rollouts

  46. Progressive App Delivery 46 • Rolling out new features gradually

    • Avoid downtime as much as possible • Stateless Application is mandatory • The versions should be backwards compatible • Blue-Green, Canary Release and others
  47. 47 Argo CD is a declarative, GitOps continuous delivery tool

    for Kubernetes
  48. 48

  49. 49 Argo Rollouts is a Kubernetes controller and set of

    CRDs which provide advanced deployment capabilities such as blue-green, canary, canary analysis, experimentation, and progressive delivery features to Kubernetes.
  50. 50 But, How it connect it Observability stuff????

  51. 51

  52. 52 It should be Automated

  53. With everything metrified we can automate release process 53

  54. 54 HTTP Calls with Status Code ~2.* should be more

    than 95% Release is good to go!!! Else Ohhh sh****!!!
  55. 55

  56. 56 sum(irate(istio_requests_total{reporter="source",destination_service=~ "bets-canary.istio.svc.cluster.local",response_code=~"2.*"[2m])) / sum(irate(istio_requests_total{reporter="source",destination_servi ce=~"bets-canary.istio.svc.cluster.local"[2m])) Prometheus Query

  57. 57

  58. Conclusions

  59. 59 Follow the industry standards. Homemade solution is not a

    good way. Always
  60. 60 You can start simple and then evolve step by

    step
  61. 61 Microservices without observability (monitoring, distributed tracing and log aggregate)

    is the worst thing in the world
  62. 62 Microservices is effective to delivery software frequently. But THINK

    seriously in Observability
  63. None
  64. Thanks! Any questions? You can find me on twitter and

    linkedin • @claudioed 64