Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Distributed Tracing - LISA2017

9c3bc4fea06c0e0bafab417be0bbdb74?s=47 stellacotton
November 01, 2017

Distributed Tracing - LISA2017

9c3bc4fea06c0e0bafab417be0bbdb74?s=128

stellacotton

November 01, 2017
Tweet

Transcript

  1. - From Theory to Practice -

  2. stella cotton | @practice_cactus

  3. stella cotton | @practice_cactus Stella Cotton Corey Donohoe Thomas Balthazar

    Yannick Schutz
  4. stella cotton | @practice_cactus What is distributed tracing?

  5. stella cotton | @practice_cactus 
 Tracing requests 
 across 


    distributed system 
 boundaries
  6. stella cotton | @practice_cactus 
 
 A Simple Use Case

  7. stella cotton | @practice_cactus User Orders Items Authentication Monolithic Web

    Process Web Request
  8. stella cotton | @practice_cactus User Orders Items Web Request Auth

    Process Ecommerce Process New app Ruby app
  9. stella cotton | @practice_cactus Web Request Auth Process Ecommerce Process

    Original Ruby app Orders Recommendations Billing Python????
  10. stella cotton | @practice_cactus 
 
 Microservices! (j/k)

  11. stella cotton | @practice_cactus 
 
 Services (Micro or otherwise)

  12. stella cotton | @practice_cactus Web Request Auth Process Ecommerce Process

    Orders Recommendations Billing
  13. stella cotton | @practice_cactus 
 
 <insert container joke>

  14. stella cotton | @practice_cactus Why do we need distributed tracing?

  15. stella cotton | @practice_cactus 
 
 Internal services 
 look

    like external APIs
  16. stella cotton | @practice_cactus Web Request Auth Process Ecommerce Process

    Orders Recommendations Billing Why is this slow ???? Blame data science?
  17. stella cotton | @practice_cactus 
 “You can’t tell a 


    coherent macro story 
 about your application 
 by monitoring 
 individual processes” Ben Seligman
  18. stella cotton | @practice_cactus 
 
 People are bad guessers

  19. stella cotton | @practice_cactus 
 
 How do you tell

    the story?
  20. stella cotton | @practice_cactus 
 
 Distributed Tracing!

  21. stella cotton | @practice_cactus 
 “Distributed tracing
 commoditizes 
 knowledge”

    - Adrian Cole
  22. stella cotton | @practice_cactus What’s Stopping You?

  23. stella cotton | @practice_cactus 
 
 Outside Your 
 Language’s


    Wheelhouse
  24. stella cotton | @practice_cactus 
 
 Domain Specific 
 Vocabulary

  25. stella cotton | @practice_cactus 
 
 Fractured Ecosystem

  26. stella cotton | @practice_cactus Theory -> Practice

  27. stella cotton | @practice_cactus The Basics

  28. stella cotton | @practice_cactus 
 
 Black Box Tracing

  29. stella cotton | @practice_cactus 
 
 Black Box Tracing

  30. stella cotton | @practice_cactus 
 
 Why might this not

    
 work for you?
  31. stella cotton | @practice_cactus • Need lots of data •

    Delayed results • Can’t guarantee causality
  32. stella cotton | @practice_cactus def my_cool_system
 service_1
 service_2
 end
 


    def service_1
 Rails.logger "Service 1"
 execute_async_job
 end
 
 def execute_async_job
 Rails.logger "Async Job"
 end def service_2
 Rails.logger "Service 2"
 end 
01-01-2001 01:01:01 Service 1 01-01-2001 01:01:02 Async Job 01-01-2001 01:01:03 Service 2
 Aggregated Log
  33. stella cotton | @practice_cactus def my_cool_system
 service_1
 service_2
 end
 


    def service_1
 Rails.logger "Service 1"
 execute_async_job
 end
 
 def execute_async_job
 sleep 15 Rails.logger "Async Job"
 end def service_2
 Rails.logger "Service 2"
 end 
01-01-2001 01:01:01 Service 1 01-01-2001 01:01:02 Service 2 01-01-2001 01:01:17 Async Job
 Aggregated Log Latency Simulate
 latency
  34. stella cotton | @practice_cactus 
 
 White Box Tracing

  35. stella cotton | @practice_cactus 
 
 Metadata Propagation

  36. stella cotton | @practice_cactus 
 
 Realtime Analysis

  37. stella cotton | @practice_cactus History Lesson

  38. stella cotton | @practice_cactus 
 
 Dapper

  39. stella cotton | @practice_cactus 
 
 Zipkin

  40. stella cotton | @practice_cactus 
 
 “Distributed Tracing”

  41. stella cotton | @practice_cactus 
 
 
 “So, you want

    to trace your distributed system?
 Key design insights from years of practical experience” Raja R. Sambasivan, Rodrigo Fonseca, Ilari Shafer, Gregory R. Ganger http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-14-102.pdf
  42. stella cotton | @practice_cactus Tracing 
 Your Applications

  43. stella cotton | @practice_cactus • Tracer • Transport • Collector

    • Storage • UI Main Components:
  44. stella cotton | @practice_cactus Tracing Requests

  45. stella cotton | @practice_cactus 
 
 Tracer:
 Lives inside your

    apps, does the tracing
  46. stella cotton | @practice_cactus 
 
 Trace: 
 The story

    of a request’s journey 
 through your system
  47. stella cotton | @practice_cactus Web Request Auth Process Ecommerce Process

    Orders Recommendations Billing A trace 
 tells 
 this whole
 story
  48. stella cotton | @practice_cactus 
 
 Span:
 Each chapter in

    that story
  49. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing A span
  50. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing Trace id 123 Trace id 123 Trace id 123 Trace id 123 Trace id 123
  51. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing Trace id 123 Parent id nil Span id 1
  52. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing Trace id 123 Parent id nil Span id 1 Trace id 123 Parent id 1 Span id 2
  53. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing Trace id 123 Parent id nil Span id 1 Trace id 123 Parent id 1 Span id 2 Trace id 123 Parent id 2 Span id 3
  54. stella cotton | @practice_cactus 
 
 A Trace is many


    Parent - Child Relationships
  55. stella cotton | @practice_cactus 
 
 Directed Acyclic Graph

  56. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing Trace id 123 Parent id nil Span id 1 Trace id 123 Parent id 1 Span id 2 Trace id 123 Parent id 2 Span id 3
  57. stella cotton | @practice_cactus 
 
 Annotations:
 Gives us richer

    insights into our spans
  58. stella cotton | @practice_cactus 
 Client Start 01:01:01 
 Server

    Receive 01:01:02
 Server Send 01:01:03
 Client Receive 01:01:04
  59. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing Trace id 123 Parent id nil Span id 1 Trace id 123 Parent id 1 Span id 2 Trace id 123 Parent id 2 Span id 3
  60. stella cotton | @practice_cactus Auth Ecommerce Server Receive Client 


    Send Trace id 123 Parent id 1 Span id 2
  61. stella cotton | @practice_cactus Auth Ecommerce Server 
 Send Server

    Receive Client 
 Send
  62. stella cotton | @practice_cactus Auth Ecommerce Server 
 Send Server

    Receive Client 
 Send Client
 Receive
  63. stella cotton | @practice_cactus 
 
 Transporting the Data

  64. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing Trace id 123 Parent id nil Span id 1 Collector Storage Transport
  65. stella cotton | @practice_cactus Propagates ids in band

  66. stella cotton | @practice_cactus Reports out of band

  67. stella cotton | @practice_cactus 
 
 Viewing the Data

  68. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing
  69. stella cotton | @practice_cactus http://opentracing.io/documentation/

  70. stella cotton | @practice_cactus

  71. stella cotton | @practice_cactus All happening
 inside the 
 “resource


    allocation
 & provisioning”
  72. stella cotton | @practice_cactus Parallel 
 execution Blocking 
 execution

  73. stella cotton | @practice_cactus A widening 
 gap here
 could

    indicate queueing
  74. stella cotton | @practice_cactus Tracing Incoming Requests

  75. stella cotton | @practice_cactus 
 
 Middleware

  76. stella cotton | @practice_cactus def call(env)
 trace do
 @app.call(env)
 end


    end def trace(env, &block) # tracing code end Trace some stuff
  77. stella cotton | @practice_cactus def call(env)
 trace do
 @app.call(env)
 end


    end def trace(env, &block)
 span = Span.new("authentication", generate_span_id)
 span.record(SERVER_RECV)
 status, headers, body = yield
 ensure
 span.record(SERVER_SEND)
 end https://github.com/openzipkin/zipkin-ruby/blob/master/lib/zipkin-tracer/rack/zipkin-tracer.rb Execute the app Received a request Sending back
 to the client Non-pseudocode version:
  78. stella cotton | @practice_cactus # config/initializers/tracing.rb
 Rails.application.config.middleware.use TracingRackMiddleware, {
 #

    some configuration
 } Use our middleware!
  79. stella cotton | @practice_cactus # config/initializers/tracing.rb
 Rails.application.config.middleware.use TracingRackMiddleware, {
 service_name:

    "SERVICE_DOMAIN_NAME",
 service_port: 443,
 sample_rate: ENV.fetch("ZIPKIN_SAMPLE_RATE", 0.1).to_f,
 json_api_host: ENV["ZIPKIN_HOST"]
 }
 Sample a portion of requests
  80. stella cotton | @practice_cactus Tracing Outgoing Requests

  81. stella cotton | @practice_cactus 
 
 More Middleware!

  82. stella cotton | @practice_cactus 
 
 Faraday

  83. stella cotton | @practice_cactus 
 def call(env)
 trace!(env) do |env|


    @app.call(env)
 end
 end
 
 def trace!(env, &block)
 # some tracing
 end
 Execute our http client
  84. stella cotton | @practice_cactus 
 def call(env)
 trace!(env) do |env|


    @app.call(env)
 end
 end
 
 def trace!(env, &block)
 env = set_headers(env)
 span = Span.new("external_call", 1234)
 span.record(Trace::Annotation::CLIENT_SEND)
 status, headers, body = yield env
 ensure
 span.record(Trace::Annotation::CLIENT_RECV)
 end
 Manipulate the headers Using client instead of server
  85. stella cotton | @practice_cactus Each of these
 colors 
 represents


    an instrumented
 application
  86. stella cotton | @practice_cactus Client Send Client Receive 
 def

    call(env)
 trace!(env) do |env|
 @app.call(env)
 end
 end
 
 def trace!(env, &block)
 env = set_headers(env)
 span = Span.new("external_call", 1234)
 span.record(Trace::Annotation::CLIENT_SEND)
 status, headers, body = yield env
 ensure
 span.record(Trace::Annotation::CLIENT_RECV)
 end

  87. stella cotton | @practice_cactus def self.client
 Faraday.new(url: base_url) do |connection|


    connection.use TracingFaradayMiddleware
 connection.adapter Faraday.default_adapter 
 end
 end Add our middleware
  88. stella cotton | @practice_cactus Checklist

  89. stella cotton | @practice_cactus Buy, Build, or Adopt

  90. stella cotton | @practice_cactus 
 
 Buy?

  91. stella cotton | @practice_cactus 
 Lightstep
 TraceView…
 and more?

  92. stella cotton | @practice_cactus 
 
 Adopt an OSS Solution?

  93. stella cotton | @practice_cactus 
 
 Zipkin
 Jaeger

  94. stella cotton | @practice_cactus 
 
 What about Open Tracing?

  95. stella cotton | @practice_cactus 
 
 Standardizes Instrumentation

  96. stella cotton | @practice_cactus 
 
 Where is OpenTracing at

    today?
  97. stella cotton | @practice_cactus 
 
 Interoperability is Still Messy

  98. stella cotton | @practice_cactus 
 
 “Language Support”

  99. stella cotton | @practice_cactus 
 
 Rinse and Repeat

  100. stella cotton | @practice_cactus 
 
 Build Your Own?

  101. stella cotton | @practice_cactus 
 
 What are other folks

    doing?
  102. stella cotton | @practice_cactus End-to-End Tracing: Adoption and Use Cases

    Jonathan Mace, Brown University https://cs.brown.edu/~jcmace/papers/mace2017survey.pdf
  103. stella cotton | @practice_cactus • 15 using Zipkin • 9

    using internal solutions • 1 using other OSS solution • 1 using paid solution
 Jonathan Mace, Brown University https://cs.brown.edu/~jcmace/papers/mace2017survey.pdf
  104. stella cotton | @practice_cactus Infra Requirements and Limitations

  105. stella cotton | @practice_cactus 
 Dependency matrix of:
 - Tracer


    - Transport Layer
 - Collection Layer
 - Storage Layer
  106. stella cotton | @practice_cactus 
 
 Installing a Separate Agent

  107. stella cotton | @practice_cactus Authentication

  108. stella cotton | @practice_cactus 
 
 Missing Authentication & Authorization

  109. stella cotton | @practice_cactus 
 
 Client Authorization

  110. stella cotton | @practice_cactus 
 
 Basic auth via htpsswd

    https://www.nginx.com/resources/admin-guide/restricting-access-auth-basic/
  111. stella cotton | @practice_cactus by Corey Donohoe

  112. stella cotton | @practice_cactus 
 
 Browser Authentication

  113. stella cotton | @practice_cactus 
 
 bit.ly’s Oauth2 proxy https://github.com/bitly/oauth2_proxy

  114. stella cotton | @practice_cactus by Corey Donohoe

  115. stella cotton | @practice_cactus 
 
 Giving people access

  116. stella cotton | @practice_cactus Sensitive Data

  117. stella cotton | @practice_cactus 
 
 Custom Instrumentation

  118. stella cotton | @practice_cactus 
 
 What happens when data

    leaks?
  119. stella cotton | @practice_cactus Is Everyone On Board?

  120. stella cotton | @practice_cactus 
 
 Get it on the

    Roadmap
  121. stella cotton | @practice_cactus 
 
 Open PRs

  122. stella cotton | @practice_cactus Should you buy, build or adopt?

    What are your infrastructure requirements and limitations? How is it authenticated? Do you have sensitive data? What will you do if it leaks? Is everyone on board? Evaluating Distributed Tracing Solutions:
  123. stella cotton | @practice_cactus OMG, this is so much information

  124. stella cotton | @practice_cactus 
 
 Try out Docker Zipkin

  125. stella cotton | @practice_cactus Thank you! Plant Illustrations designed by

    Natkacheva / Freepik