Automating Failure Testing Research at Internet Scale: https://people.ucsc.edu/~palvaro/socc16.pdf Posts: - Logging v. Instrumentation https://peter.bourgon.org/blog/2016/02/07/logging-v-instrument ation.html - Monitoring and Observability https://medium.com/@copyconstruct/monitoring-and-observability -8417d1952e1c - Monitoring in the Time of Cloud Native https://medium.com/@copyconstruct/monitoring-in-the-time-of-cl oud-native-c87c7a5bfa3e Tools: - Zipkin: https://zipkin.io/ - Brave: https://github.com/openzipkin/brave - Kafka Interceptors: https://github.com/sysco-middleware/kafka-interceptors - Spigo: https://github.com/adrianco/spigo - Vizceral: https://github.com/Netflix/vizceral