Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Monitoring and Debugging Containers

E7526ec3e801f8ba99f6746498a154a6?s=47 JBD
December 04, 2018

Monitoring and Debugging Containers

E7526ec3e801f8ba99f6746498a154a6?s=128

JBD

December 04, 2018
Tweet

Transcript

  1. @rakyll monitoring and debugging containerized systems Jaana B. Dogan, Google

    jbd@google.com
  2. @rakyll me overly frustrated engineer 15+ years in networking systems

    making systems more reliable
  3. @rakyll the new old monitoring? (maybe)

  4. @rakyll systems are growing... and you are not in control

  5. @rakyll bare metal kernel network stack cloud stack libraries frameworks

    your code
  6. @rakyll

  7. @rakyll complexity is inevitable

  8. @rakyll container

  9. @rakyll container

  10. @rakyll container container

  11. @rakyll container container

  12. @rakyll container container message queue

  13. @rakyll container container storage/database

  14. @rakyll container container load balancer location=us-west location=europe-central

  15. @rakyll host host container container load balancer

  16. @rakyll container container container container container orchestrated hot mess

  17. @rakyll areas of issues: - lack of locality - networking

    - scheduling - dependencies
  18. @rakyll bare metal kernel network stack cloud stack libraries frameworks

    your code
  19. @rakyll “my job is done here”

  20. @rakyll after going to production... 1. monitor 2. alert 3.

    troubleshoot 4. fix
  21. @rakyll

  22. @rakyll load balancer

  23. @rakyll load balancer critical path

  24. @rakyll discovering critical paths making them reliable then fast making

    them debuggable
  25. @rakyll

  26. @rakyll Latency Numbers Every Programmer Should Know by Jeff Dean

  27. @rakyll

  28. @rakyll ping pong pongservice:6996 project: ping the pong server.

  29. @rakyll opencensus.io

  30. @rakyll not my team!

  31. @rakyll where is the source code?

  32. @rakyll who to page?

  33. @rakyll who to page?

  34. @rakyll give me the logs, runtime events, profiles...

  35. @rakyll

  36. @rakyll

  37. @rakyll

  38. @rakyll http://server:9999/tracez

  39. @rakyll challenges...

  40. @rakyll no wire standards

  41. @rakyll

  42. @rakyll traceparent: <version>-<traceid>-<spanid>-<opts> Example: traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01

  43. @rakyll no export standards

  44. @rakyll areas of issues: - locality - networking - scheduling

    - dependencies
  45. @rakyll fin jbd@google.com