Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Software Architecture NYC 2020 - Cloud Native Debugging Workshop

Solo.io
February 24, 2020

Software Architecture NYC 2020 - Cloud Native Debugging Workshop

Microservices have been great for accelerating the software innovation and delivery, but they also present new challenges, especially as abstractions and automated orchestration at every layer make pinpointing the issue seem like walking around a maze with a blindfold. Existing tools weren’t designed for distributed environments, and the new tools need to consider how to leverage these abstraction layers to better observe, test, and troubleshoot issues.

Christian Posta walks you through Envoy Proxy and service mesh architecture for L7 data plane, the key features in Envoy that can help in debugging and troubleshooting, chaos engineering as a testing methodology for microservices, how to approach a testing and debugging framework for microservices, and new open source tools that address these areas. You’ll explore a workflow to discover and resolve microservices issues, including injecting experiments for stress testing the applications, gathering requests in flight, recording and replaying them, and debugging them step by step without affecting production traffic.

Link to workshop details
https://conferences.oreilly.com/software-architecture/sa-ny/public/schedule/detail/79614

Solo.io

February 24, 2020
Tweet

More Decks by Solo.io

Other Decks in Technology

Transcript

  1. 2 | Copyright © 2020 CHRISTIAN POSTA Global Field CTO,

    Solo.io @christianposta [email protected] https://blog.christianposta.com https://slideshare.net/ceposta
  2. 3 | Copyright © 2020 01 02 03 04 05

    06 Challenges of microservices, debugging Introduction to our lab environment Distributed tracing with a service mesh Debugging microservices Debugging in production with record and replay Proactive debugging with chaos experimentation Approximate flow of workshop
  3. 7 | Copyright © 2020 SERVICE MESH JOURNEY INNOVATION MODERNIZE

    TO MICROSERVICES SERVICE MESH MANAGEMENT ANY MESH - ANYWHERE ADAPTIVE SERVICE MESH
  4. 8 | Copyright © 2020 December 11, 2018 2018 TOP

    WOMEN ENTREPRENEURS IN CLOUD INNOVATION Seventh Annual Award Honors Women Founders for Outstanding Accomplishments in Cloud and Emerging Technologies, Sponsored by Facebook, Intel, and Google. Award Winning Innovation Key Industry Collaborations
  5. 10 | Copyright © 2020 As we move to services

    architectures, on cloud-native deployment platforms, we increase the complexity between our services.
  6. 11 | Copyright © 2020 Cloud application networking challenges •

    Service discovery • Retries • Timeouts • Load balancing • Rate limiting • Thread bulk heading • Circuit breaking
  7. 12 | Copyright © 2020 Cloud application networking challenges •

    Edge/DMZ routing • Surgical / fine / per-request routing • A/B rollout • Traffic shaping • Request racing • Internal releases / dark launches • Request shadowing • Fault injection
  8. 13 | Copyright © 2020 Cloud application networking challenges •

    Adaptive, zone-aware routing • Deadlines • Health checking • Stats, metric, collection • Logging • Distributed tracing • Security
  9. 14 | Copyright © 2020 How do we begin to

    understand what’s happening so we can debug?
  10. 16 | Copyright © 2020 16 | Copyright © 2020

    Decentralized, language-independent observability in the network Foundational technology to help solve these challenges in a cloud-native application architecture
  11. 17 | Copyright © 2020 Envoy is to Application Networking

    what Kubernetes is to Container Deployment http://envoyproxy.io
  12. 18 | Copyright © 2020 Envoy implements: • zone aware,

    least request load balancing • circuit breaking • outlier detection • retries, retry policies • timeout (including budgets) • traffic shadowing • request racing • rate limiting • access logging, statistics collection • Many other features!
  13. 20 | Copyright © 2020 Why Envoy? • C++ •

    Built ground-up for services environment • Large, diverse, vibrant community • Dynamic configuration model • Highly extensible (in C++  we’ll come back to this) • Many out of the box L7 filters (HTTP, HTTP2, grpc, redis, mysql, DynamoDB, thrift, zookeeper, kafka, et. al.) • Incredible trove of telemetry, tracing out of the box • Very versatile deployment options (as we’ll see)
  14. 26 | Copyright © 2020 Service mesh technologies provide the

    following: • Service discovery / Load balancing • Secure service-to-service communication • Traffic control / shaping / shifting • Policy / Intention based access control • Traffic metric collection • Service resilience • API / programmable interface
  15. 28 | Copyright © 2020 28 | Copyright © 2020

    Setting up the lab environment
  16. 33 | Copyright © 2020 Consul Service Mesh connect =

    { proxy = { config = { upstreams = [ { destination_name = "mysql", local_bind_port = 8001 } ] } } }
  17. 35 | Copyright © 2020 35 | Copyright © 2020

    Tracing with a service mesh
  18. 36 | Copyright © 2020 @christianposta DB S3 DEBUGGING IN

    PRODUCTION CLUSTER POD 1 POD 2 POD 3 POD 4
  19. 37 | Copyright © 2020 @christianposta DB S3 P P

    P P DEBUGGING IN PRODUCTION CLUSTER POD 1 POD 2 POD 3 POD 4
  20. 41 | Copyright © 2020 THE PROBLEM A MONOLITHIC APPLICATION

    CONSISTS OF A SINGLE PROCESS AN ATTACHED DEBUGGER ALLOWS VIEWING THE COMPLETE STATE OF THE APPLICATION DURING RUNTIME A MICROSERVICES APPLICATION CONSISTS OF POTENTIALLY HUNDREDS OF PROCESSES IS IT POSSIBLE TO GET A COMPLETE VIEW OF THE STATE OF A SUCH APPLICATION?!
  21. 42 | Copyright © 2020 42 | Copyright © 2020

    Demo: multi-language, distributed debugging with Squash
  22. 45 | Copyright © 2020 -> ls -l /proc/self/ns total

    0 lrwxrwxrwx 1 idit idit 0 Dec 7 01:14 cgroup -> cgroup:[4026531835] lrwxrwxrwx 1 idit idit 0 Dec 7 01:14 ipc -> ipc:[4026531839] lrwxrwxrwx 1 idit idit 0 Dec 7 01:14 mnt -> mnt:[4026531840] lrwxrwxrwx 1 idit idit 0 Dec 7 01:14 net -> net:[4026532009] lrwxrwxrwx 1 idit idit 0 Dec 7 01:14 pid -> pid:[4026531836] lrwxrwxrwx 1 idit idit 0 Dec 7 01:14 pid_for_children -> pid:[4026531836] lrwxrwxrwx 1 idit idit 0 Dec 7 01:14 user -> user:[4026531837] lrwxrwxrwx 1 idit idit 0 Dec 7 01:14 uts -> uts:[4026531838] -> inod of mnt namespace (unique identifier to the container namespace) via CRI api call ExecSyncRequest Node Namespace: ns-a s-dlv CRI c1 We need to translate the pid of the process (application that run in the container) to the host pid namespace to allow debugger to attach. Namespace: Squash
  23. 46 | Copyright © 2020 SQUASH SECURE MODE Node Namespace:

    ns-a Namespace: squash s-dlv c1 CRD Intent squash
  24. 48 | Copyright © 2020 48 | Copyright © 2020

    Break: 3:00p – 3:30p When we come back: Debugging microservices lab NOTE: Make sure to charge your devices!
  25. 51 | Copyright © 2020 @christianposta DB S3 DEBUGGING IN

    PRODUCTION CLUSTER POD 1 POD 2 > ONLY HEADER WILL BE SENT > SAMPLING POD 3 POD 4
  26. 52 | Copyright © 2020 @christianposta DB S3 P P

    P P DEBUGGING IN PRODUCTION CLUSTER POD 1 POD 2 POD 3 POD 4 > ONLY HEADER WILL BE SENT > SAMPLING
  27. 53 | Copyright © 2020 @christianposta DB S3 P P

    P P DEBUGGING IN PRODUCTION CLUSTER
  28. 54 | Copyright © 2020 @christianposta DB S3 P P

    P P DEBUGGING IN PRODUCTION CLUSTER
  29. 56 | Copyright © 2020 56 | Copyright © 2020

    Getting traffic into your mesh Workflow-specific APIs for Envoy Proxy
  30. 59 | Copyright © 2020 API Gateway built on Envoy

    https://github.com/solo-io/gloo
  31. 60 | Copyright © 2020 Gloo Data Plane and Control

    Plane EXTERNAL AUTH RATE LIMITING GLOO FILTERS ROUTER UPSTREAM EXTERNAL AUTH SERVER RATE LIMITING SERVER CACHING DATA LOSS PREVENTION LAMBDA NATS.IO TRANSFORMATION WEB APPLICATION FIREWALL (WAF)
  32. 61 | Copyright © 2020 API Gateway built on Envoy

    ENVIRONMENT SECRET CONFIGURATION Data Plane Upstream gRPC-JSON transcoder Rate limiting External AUTH … Control Plane Configure and manage envoy’s plugins Router
  33. 62 | Copyright © 2020 Gloo API Gateway • Unify

    backend APIs running in Kubernetes, VMs, Physical, FaaS, etc • Decentralized configuration: allow service teams to move fast • Declarative configuration • Provides a control plane for Envoy • Security (Oauth/ODIC, API Key, TLS, SNI, OPA, HMAC, custom) • Kubernetes native / run outside Kube as well • Highly pluggable/extensible • “If you know Kubernetes, you know Gloo”  user quote
  34. 64 | Copyright © 2020 DOCS: COMING REAL SOON …

    GITHUB: COMING REAL SOON … COMMUNITY: HTTPS://SLACK.SOLO.IO
  35. 65 | Copyright © 2020 65 | Copyright © 2020

    Demo: Loop with service mesh
  36. 67 | Copyright © 2020 @christianposta CHAOS ENGINEERING THINK OF

    A VACCINE OR A FLU SHOT INJECT YOURSELF WITH SOMETHING HARMFUL IN ORDER TO PREVENT A FUTURE ISSUE. CAREFULLY INJECTING THIS HARM INTO YOUR SYSTEMS TO TEST THE SYSTEM’S ABILITY TO RESPOND TO IT. “BREAK THINGS ON PURPOSE" IN ORDER TO LEARN HOW TO BUILD MORE RESILIENT SYSTEMS.
  37. 68 | Copyright © 2020 PROBLEMS WITH CHAOS ENGINEERING TODAY?

    LANGUAGE SPECIFIC CODE MODIFICATION 1 2
  38. 69 | Copyright © 2020 @christianposta NETWORK ABSTRACTION EAST-WEST TRAFFIC

    NORTH-SOUTH TRAFFIC SERVICE I SERVICE II SERVICE III SERVICE IV SERVICE V
  39. 70 | Copyright © 2020 @christianposta CONTROL EXPERIMENT ⍄ DEFINE

    EXPERIMENTS (SET OF: MESSAGE DELAYS, NETWORK FAULTS) ⍄ RUN EVERY INTERVAL (E.G. EVERY FRIDAY AT 9PM) ⍄ GATHERED METRICS – COMPARE BASELINE ⍄ STOP EXPERIMENT IF CONDITION REACHED
  40. 71 | Copyright © 2020 @christianposta GLOOSHOT GLOOSHOT ALLOWS YOU

    TO PERFORM CHAOS EXPERIMENTS AT THE SERVICE MESH LEVEL. DEFINE ERROR CONDITIONS IN TERMS OF SUCH FAILURE MODES: ⍄ MESSAGE DELAYS ⍄ NETWORK FAULTS. RUN EXPERIMENTS UNTIL A STOP CONDITION IS MET. GLOOSHOT INTERFACES WITH ALL MAJOR SERVICE MESHES THROUGH SERVICE MESH INTERFACE (SMI).
  41. 74 | Copyright © 2020 74 | Copyright © 2020

    What to watch for Upcoming improvements for which to keep an eye out
  42. 76 | Copyright © 2020 Web Assembly shaking up the

    data plane https://github.com/envoyproxy/envoy-wasm
  43. 77 | Copyright © 2020 Web Assembly shaking up the

    data plane https://webassemblyhub.io
  44. 78 | Copyright © 2020 @christianposta THANK YOU FOR COMING

    OUT! @christianposta [email protected] https://blog.christianposta.com https://slideshare.net/ceposta
  45. 79 | Copyright © 2020 • https://solo.io • https://slack.solo.io •

    https://gloo.solo.io • https://envoyproxy.io • https://istio.io • https://webassemblyhub.io • https://servicemeshhub.io • https://blog.christianposta.com