Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Release with Confidence - Observability for Microservices Kevin Crawley S e s s i o n I D Developer Relations Instana

Slide 3

Slide 3 text

@notsureifkevin $> whoami ● Developer Relations @ Instana ○ Education / Awareness ○ Product Focus on SRE topics ○ Blogs / Talks / Webinars / etc ● Principal SRE @ Single Music ○ Co-Owner and Consultant ○ Built Delivery Systems and Manage Infrastructure ○ Maintain Production Excellence ● 20 years software dev exp ○ Early Adoption of Docker (2014) ○ Docker Captain ○ Gitlab Hero

Slide 4

Slide 4 text

@notsureifkevin Discussion Points ● What is Observability ● What is Distributed Tracing ● Monitoring Landscape ● Observability In Action: Live Demo

Slide 5

Slide 5 text

@notsureifkevin Observability Theory and Reasoning

Slide 6

Slide 6 text

@notsureifkevin Observability Theory Kalman, 1961 paper On the general theory of control systems ● A system is observable if the behavior of the entire system can be determined by only looking at its inputs and outputs. ● Lesson: control theory is a well-documented approach which people can understand and adopt https://en.wikipedia.org/wiki/Control_theory

Slide 7

Slide 7 text

@notsureifkevin Observability should enable us to: ● Identify Patterns ● Assign Significance ● Aid Reasoning ● Guide Action

Slide 8

Slide 8 text

@notsureifkevin Why Does My Organization Need Observability?

Slide 9

Slide 9 text

@notsureifkevin Distributed Tracing Abstract In 2010, Google published a technical report on their distributed tracing project named Dapper. In their abstract they summarized why they built Dapper in the first place: “Modern Internet services are often implemented as complex, large-scale distributed systems. These applications are constructed from collections of software modules that may be developed by different teams, perhaps in different programming languages, and could span many thousands of machines across multiple physical facilities. Tools that aid in understanding system behavior and reasoning about performance issues are invaluable in such an environment.” Google Technical Report dapper-2010-1, April 2010, p. 1 - https://ai.google/research/pubs/pub36356

Slide 10

Slide 10 text

@notsureifkevin How to Visualize Distributed Interactions ● Every transaction (HTTP, Messaging, RPC, etc) has a custom header injected into it which is intercepted and processed by a system of record ● This is visualized with a GANTT chart to show the hierarchical structure and timing of every transaction which occurred once the initial trace is generated Google Technical Report dapper-2010-1, April 2010, p. 3, fig. 2 - https://ai.google/research/pubs/pub36356

Slide 11

Slide 11 text

@notsureifkevin We Need More Than Just Distributed Tracing • No longer treating services like Schrödinger's cat • Much more context around events and transactions • Actionable insights generated by aggregated request-scoped events https://peter.bourgon.org/blog/2017/02/21/metrics-tracing-and-logging.html

Slide 12

Slide 12 text

@notsureifkevin What Do We Get From All This Data? ● A tremendous amount of telemetry which is perfect for: ○ Aggregation ○ SLI/SLOs ○ Machine Learning ○ Performance Analysis ○ Debugging

Slide 13

Slide 13 text

@notsureifkevin … or rather, a big ‘ol Data Lake aHhghgh hhhghhh nnnng...

Slide 14

Slide 14 text

@notsureifkevin We need machines to reconstruct this data so we can easily make decisions on how to react!

Slide 15

Slide 15 text

@notsureifkevin Pet Clinic Microservice Demo Kubernetes, REST, Kafka

Slide 16

Slide 16 text

@notsureifkevin Spring Pet Clinic - Architecture ● Original was a monolith, refactored to microservices by the community ○ Removed dependencies on Zuul, Hystrix, etc. to ease compatibility with K8S ○ Added Notifications / Kafka Service ○ Built a Load Testing Script ○ Built Deploy pipelines for K8S / Gitlab https://gitlab.com/opentracing-workshop/spring-petclinic-kubernetes

Slide 17

Slide 17 text

@notsureifkevin Spring Pet Clinic - Problem? ● Load script generates new customers, while accessing the endpoint which loads all customers ○ https://gitlab.com/notsureifkevin/spring-petclinic- kubernetes/blob/master/scripts/spc-load/spc-load.py#L25-33 ○ https://gitlab.com/notsureifkevin/spring-petclinic- kubernetes/blob/master/scripts/spc-load/spc-load.py#L18 ● Pagination is non-existent (but we’ll deploy it) ○ https://gitlab.com/kc_wrhse/spring-petclinic- kubernetes/commit/7b52d3fcdfe20945cbc53e2269b12be4191e2777 ● Live demonstration on how this is visualized and remediated using modern observability tools

Slide 18

Slide 18 text

@notsureifkevin

Slide 19

Slide 19 text

@notsureifkevin

Slide 20

Slide 20 text

@notsureifkevin

Slide 21

Slide 21 text

@notsureifkevin

Slide 22

Slide 22 text

@notsureifkevin In Summary ● Microservices are HARD (ask Segment), instrument your services so you can make these systems easier to understand and manage ● Observability Tools should help you understand how your systems are performing without creating additional work for your team ● Share your successes and lessons learned with the community!

Slide 23

Slide 23 text

Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Kevin Crawley Twitter: @notsureifkevin Visit our booth #511 or schedule some time with us http://bit.ly/instana-reinvent

Slide 24

Slide 24 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.