of the total costs of software are incurred after launch.” • Facts and Fallacies of Software Engineering, Glass R (2002), Addison-Wesley, p-115 • Which Factors affect Software Projects maintenance costs more? Acta Informatica Medica
its health. ◦ HealthChecks ▪ Is the service running? ▪ Can I do more work? ◦ Metrics ▪ System ▪ Application ▪ Functional • Alerts ◦ Anomalous behaviors - How do you define an anomaly?
running? ◦ Can I do more work? • Methods ◦ Broadcast - Gossip Protocols (Cassandra) ◦ Register - Service Discovery ◦ Health endpoints - ELB, HAProxy, Nginx
- Judicious • Low Cardinality ◦ Keep a Watch! ◦ Don’t emit for users/orders. We use Logs for that! • Provide system summary • Questions: ▪ How many transactions failed? ▪ How many logins succeeded? ** https://queue.acm.org/detail.cfm?id=3309571 - Must Read
• Profile system usage at a span level. • Extract Insights Tools: • Google Dapper (https://ai.google/research/pubs/pub36356) • Twitter Zipkin (https://zipkin.io/) • Open Jaeger (https://www.jaegertracing.io/) • New Relic
a Quantifiable Goal for a service. • Measure the goal - Represents the User Experience/Delight Factor. • First step before writing a new service. Work backwards • Have as few SLO’s as possible. ◦ Represents the system behaviour.
- How to build observable Distributed systems? https://www.youtube.com/watch?v=ACL_YVPD3gw • Charity Majors - Observability for Emerging Infra: What Got You Here Won't Get You There" https://www.youtube.com/watch?v=1wjovFSCGhE • Caitee McAfree - Of the Order of Billions: Building Observability at Twitter https://www.youtube.com/watch?v=SC6XuD1tgcQ • https://eng.uber.com/observability-at-scale/