and aggregating them, and matching them against some pre-defined criteria of system states we should carefully watch. When we find that one of our signals has crossed a threshold and may be heading toward a known bad state, we take action to remedy the system - Christian Posta https://www.manning.com/books/istio-in-action
cart and experiences a 10s delay choosing a payment option. All of the pre-defined metric thresholds (disk usage, queue depth, machine health, etc) might be at acceptable levels https://www.manning.com/books/istio-in-action “
Dashboarding, Trending & Problem Detection Where exactly is my problem? Cross-Service Debug & Performance Optimization What is causing it? Root Cause & Forensics Logs Traces Metrics 1 2 3
[WARN ][2019-05-28 00:51:18][de4c1b04-9ca1][c.e.m.g.u.domain.service.PaymentsService] - user non-cached, calling user service [ERROR][2019-05-28 00:51:18][de4c1b04-9ca1][c.e.m.g.u.domain.service.PaymentsService] - error when calling /users/140708 org.springframework.web.server.ResponseStatusException: 404 NOT_FOUND
that are involved includes the request id in log messages Record times information e.g start and end time https://microservices.io/patterns/observability/distributed-tracing.html
Single metric that goes up or down payments_hikaricp_connections_active{env="prod",type="infra"} 4.0 Timer Samples and buckets observation payments_crypto_seconds_count{env="prod",type="infra"} 77.0 payments_crypto_seconds_sum{env="prod",type="infra"} 34.97