Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cloud Run + Observability

1bfc6e2ed04a895bb36f36b86828b689?s=47 Yuki Ito
February 26, 2021

Cloud Run + Observability

1bfc6e2ed04a895bb36f36b86828b689?s=128

Yuki Ito

February 26, 2021
Tweet

Transcript

  1. Cloud Run + Observability GCPUG Tokyo Observability February 2021 Yuki

    Ito
  2. Merpay / Mercari Architect Team Microservices Platform Yuki Ito X

    Asia Kauche
  3. None
  4. Agenda • Architecture • Observability • Goal • Approaches •Logging

    •Trace •Monitoring Dashboard
  5. Agenda • Architecture • Observability • Goal • Approaches •Logging

    •Trace •Monitoring Dashboard
  6. What is Cloud Run Cloud Run is a managed compute

    platform that enables you to run stateless containers that are invocable via web requests... Cloud Run is serverless: it abstracts away all infrastructure management... https://cloud.google.com/run/docs
  7. What is Cloud Run on GKE (Anthos) + Fully Managed

  8. What is Cloud Run on GKE (Anthos) + Fully Managed

  9. Architecture Run Scheduler Pub/Sub Mobile App External Service Web Hook

    API Customer API Job API
  10. Key Concepts • Everything runs on Run • Everything is

    API
  11. Key Concepts Run e.g.) vs Cloud Functions Trigger Pub/Sub Functions

    Run Firestore Functions
  12. Architecture Run Scheduler Pub/Sub Mobile App External Service Web Hook

    API Customer API Job API
  13. Architecture ✅ Using same API interceptors ✅ Managed by API

    De fi nitions ✅ Using same Monitoring environments
  14. Architecture ࣮ફ Cloud Run https://speakerdeck.com/110y/introduce-cloud-run https://www.youtube.com/watch?v=s_Y3dsUrux4

  15. Agenda • Architecture • Observability • Goal • Approaches •Logging

    •Trace •Monitoring Dashboard
  16. Observability What is Observability? 🤔

  17. Observability https://www.oreilly.com/library/view/observability-engineering/9781492076438/

  18. Observability Put simply, our de fi nition of observability for

    software systems is a measure of how well you can understand and explain any state your system can get into, no matter how novel or bizarre. ... If you can understand that bizarre or novel state without shipping new code, then you have observability. https://www.oreilly.com/library/view/observability-engineering/9781492076438/
  19. Observability With observability tools, the best debugger on the team

    is typically the engineer who is most curious. https://www.oreilly.com/library/view/observability-engineering/9781492076438/
  20. Observability The best debugger is... Without Observability the person who

    has been there the longest With Observability the person who is most curious the issue
  21. Goal ✅ Can understand unknown state without shipping new code

    ✅ Democratizing
  22. Ultimate Goal ✅ Making new comers be able to understand

    system states at Day 1.
  23. Approaches • Logging • Trace • Monitoring Dashboard

  24. Approaches • Logging • Trace • Monitoring Dashboard

  25. Logging • Request logs • Container logs https://cloud.google.com/run/docs/logging Cloud Run

    has two types of logs:
  26. Logging • Request logs • Container logs https://cloud.google.com/run/docs/logging Cloud Run

    has two types of logs: automatically sent to Cloud Logging
  27. Logging • Request logs • Container logs https://cloud.google.com/run/docs/logging Cloud Run

    has two types of logs: automatically sent to Cloud Logging
  28. Logging Cloud Run generates Request Logs

  29. Logging Cloud Run generates Request Logs Not enough...

  30. Logging • Request logs • Container logs https://cloud.google.com/run/docs/logging Cloud Run

    has two types of logs: automatically sent to Cloud Logging
  31. Logging Container (Application) logs

  32. Logging Container (Application) logs Structured Log

  33. Logging Structured logging In Cloud Logging, structured logs refer to

    log entries that use the jsonPayload fi eld to add structure to their payloads. https://cloud.google.com/logging/docs/structured-logging
  34. Logging Container (Application) logs { "message": "grpc request" , "logger":

    "grpc.request_logger" , "method": "/customer.v1.CustomerService/GetXXX" , "level": "info" , "timestamp": 1613885945098.689 } stdout
  35. Logging Container (Application) logs Structured Log

  36. Logging Structured logging go-logr / logr https://github.com/go-logr/logr

  37. Logging Let's talk about logging https://dave.cheney.net/2015/11/05/lets-talk-about-logging I believe that there

    are only two things you should log: 1. Things that developers care about when they are developing or debugging software. 2. Things that users care about when using your software. Obviously these are debug and info levels, respectively.
  38. Logging go-logr / logr https://github.com/go-logr/logr type Logger interface { Info(msg

    string, keysAndValues ...interface{} ) Error(err error, msg string, keysAndValues ...interface{} ) //... }
  39. Logging go-logr / logr https://github.com/go-logr/logr type Logger interface { Info(msg

    string, keysAndValues ...interface{} ) Error(err error, msg string, keysAndValues ...interface{} ) //... }
  40. Logging err := xxxRepository.Get(ctx, xxx ) if err != nil

    { return "", fmt.Errorf("failed to get xxx: %w", err ) } Wrapping Errors (Go 1.13 ~) https://blog.golang.org/go1.13-errors error: failed to vereify access token: auth-client: failed to verify token: failed to get xxx: failed to access to the database due to ...
  41. AD https://gihyo.jp/magazine/SD/archive/2021/202101

  42. Logging go-logr / logr https://github.com/go-logr/logr

  43. Logging go-logr / logr https://github.com/go-logr/logr#why-structured-logging

  44. Logging Container (Application) logs Structured Log

  45. Logging Request Logs + Container Logs https://cloud.google.com/run/docs/logging#correlate-logs Correlating

  46. Logging Correlating

  47. Logging Correlating Request Logs Container Logs

  48. Logging Correlating

  49. Logging Container (Application) logs { "message": "grpc request" , "logger":

    "grpc.request_logger" , "method": "/customer.v1.CustomerService/GetXXX" , "level": "info" , "timestamp": 1613885945098.68 9 "logging.googleapis.com/trace": "projects/.../traces/xxx" , } https://cloud.google.com/logging/docs/structured-logging
  50. Logging Container X-Cloud-Trace-Context: projects/.../traces/xxx Header

  51. Logging Correlating

  52. Logging Correlating Request Logs Container Logs

  53. Approaches • Logging • Trace • Monitoring Dashboard

  54. Trace Cloud Trace

  55. Trace OpenTelemetry OpenTelemetry is a collection of tools, APIs, and

    SDKs. You use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) for analysis in order to understand your software's performance and behavior. https://opentelemetry.io/
  56. Trace https://medium.com/opentelemetry/opentelemetry-speci fi cation-v1-0-0-tracing-edition-72dd08936978 🎉 OpenTelemetry Speci fi cation v1.0.0,

    Tracing Edition
  57. Trace open-telemetry / opentelemetry-go

  58. Trace open-telemetry / opentelemetry-go

  59. Trace googleapis / google-cloud-go package trac e import ( "go.opencensus.io/trace"

    // .. . ) func StartSpan(ctx context.Context, name string) context.Context { ctx, _ = trace.StartSpan(ctx, name ) return ct x }
  60. Trace open-telemetry / opentelemetry-go

  61. Trace OpenTelemetry

  62. Trace OpenCensus

  63. Trace OpenTelemetry + OpenCensus (Bridge)

  64. Trace open-telemetry / opentelemetry-go GoogleCloudPlatform / opentelemetry-operations-go (Cloud Trace Exporter)

    + https://github.com/GoogleCloudPlatform/opentelemetry-operations-go/blob/main/exporter/trace/README.md
  65. Trace

  66. Trace Just tracing is not enough...

  67. Trace Attributes

  68. Trace Attributes

  69. Trace Attributes

  70. Trace Events

  71. Trace Events

  72. Trace Events

  73. Trace Events

  74. Approaches • Logging • Trace • Monitoring Dashboard

  75. Monitoring Dashboard Cloud Monitoring

  76. Monitoring Dashboard Cloud Monitoring https://cloud.google.com/blog/products/management-tools/cloud-monitoring-improves-custom-dashboard-creation

  77. Monitoring Dashboard

  78. Monitoring Dashboard Markdown Text

  79. Monitoring Dashboard

  80. Monitoring Dashboard

  81. Monitoring Dashboard Cloud Logging https://cloud.google.com/logging/docs/release-notes#January_14_2021

  82. Monitoring Dashboard

  83. Monitoring Dashboard SLOs

  84. Monitoring Dashboard Cloud Run Logging Monitoring Metrics Log

  85. Monitoring Dashboard Log Based Metrics

  86. Monitoring Dashboard SLO: Success Rate

  87. Monitoring Dashboard

  88. Monitoring Dashboard MQL

  89. Monitoring Dashboard Monitoring Query Language https://cloud.google.com/blog/products/management-tools/introducing-monitoring-query-language-or-mql MQL represents a decade

    of learnings and improvements on Google’s internal metric query language. The same language that powers advanced querying for internal Google production users, is now available to Google Cloud users as well.
  90. Monitoring Dashboard SLO: Success Rate

  91. Monitoring Dashboard fetch cloud_run_revisio n | { t_error : metric

    'logging.googleapis.com/user/CloudRunCustomerAPIErrorLogs ' | align delta( ) ; t_all : metric 'logging.googleapis.com/user/CloudRunCustomerAPIAllLogs ' | align delta() } | outer_join [0 ] | valu e [c'int lit *' : if(t_all.value.CloudRunCustomerAPIAllLogs == 0, 100.0 , (100 * (1 - (t_error.value.CloudRunCustomerAPIErrorLog s / t_all.value.CloudRunCustomerAPIAllLogs))))] Monitoring Query Language Reference: https://cloud.google.com/monitoring/mql/reference
  92. Monitoring Dashboard fetch cloud_run_revisio n | { t_error : metric

    'logging.googleapis.com/user/CloudRunCustomerAPIErrorLogs ' | align delta( ) ; t_all : metric 'logging.googleapis.com/user/CloudRunCustomerAPIAllLogs ' | align delta() } | outer_join [0 ] | valu e [c'int lit *' : if(t_all.value.CloudRunCustomerAPIAllLogs == 0, 100.0 , (100 * (1 - (t_error.value.CloudRunCustomerAPIErrorLog s / t_all.value.CloudRunCustomerAPIAllLogs))))] Monitoring Query Language Log-based metrics Reference: https://cloud.google.com/monitoring/mql/reference
  93. Monitoring Dashboard Cloud Run Logging Monitoring Slack Metrics Log Alert

    👨💻 Dashboard
  94. Approaches • Logging • Trace • Monitoring Dashboard

  95. Architecture ✅ Using same API interceptors ✅ Managed by API

    De fi nitions ✅ Using same Monitoring environments