Distributed Tracing: From Theory to Practice

Distributed Tracing: From Theory to Practice

9c3bc4fea06c0e0bafab417be0bbdb74?s=128

stellacotton

April 28, 2017
Tweet

Transcript

  1. - From Theory to Practice -

  2. stella cotton | @practice_cactus

  3. stella cotton | @practice_cactus A Small Favor

  4. stella cotton | @practice_cactus Stella Cotton Corey Donohoe Thomas Balthazar

    Yannick Schutz Jon Roes
  5. stella cotton | @practice_cactus What is distributed tracing?

  6. stella cotton | @practice_cactus 
 Tracing requests 
 across 


    distributed system 
 boundaries
  7. stella cotton | @practice_cactus 
 
 Wait, distributed systems??

  8. stella cotton | @practice_cactus 
 “A distributed system 
 is

    a collection of 
 independent computers 
 that appear to its users as a 
 single coherent system.” Andrew S. Tanenbaum and Maarten van Steen
 Distributed Systems: 
 Principles and Paradigms 
 Prentice Hall, Second Edition, 2007
  9. stella cotton | @practice_cactus 
 
 A Simple Use Case

  10. stella cotton | @practice_cactus User Orders Items Authentication Monolithic Web

    Process Web Request
  11. stella cotton | @practice_cactus User Orders Items Web Request Auth

    Process Ecommerce Process Still Rails, 
 but a new app Original app
  12. stella cotton | @practice_cactus Web Request Auth Process Ecommerce Process

    Original app Orders Recommendations Billing Python????
  13. stella cotton | @practice_cactus 
 
 Microservices! (j/k)

  14. stella cotton | @practice_cactus 
 
 Services (Micro or otherwise)

  15. stella cotton | @practice_cactus Web Request Auth Process Ecommerce Process

    Orders Recommendations Billing
  16. stella cotton | @practice_cactus 
 
 <insert container joke>

  17. stella cotton | @practice_cactus Why do we need distributed tracing?

  18. stella cotton | @practice_cactus 
 
 Internal services 
 look

    like external APIs
  19. stella cotton | @practice_cactus Web Request Auth Process Ecommerce Process

    Orders Recommendations Billing Why is this slow ???? Blame data science?
  20. stella cotton | @practice_cactus 
 “You can’t tell a 


    coherent macro story 
 about your application 
 by monitoring 
 individual processes” Ben Seligman
  21. stella cotton | @practice_cactus 
 
 People are bad guessers

  22. stella cotton | @practice_cactus 
 
 How do you tell

    the story?
  23. stella cotton | @practice_cactus 
 
 Distributed Tracing!

  24. stella cotton | @practice_cactus 
 “Distributed tracing
 commoditizes 
 knowledge”

    - Adrian Cole
  25. stella cotton | @practice_cactus What’s Stopping You?

  26. stella cotton | @practice_cactus 
 
 Outside the 
 Ruby

    Wheelhouse
  27. stella cotton | @practice_cactus 
 
 Domain Specific 
 Vocabulary

  28. stella cotton | @practice_cactus 
 
 Fractured Ecosystem

  29. stella cotton | @practice_cactus Theory -> Practice

  30. stella cotton | @practice_cactus The Basics

  31. stella cotton | @practice_cactus 
 
 Black Box Tracing

  32. stella cotton | @practice_cactus 
 
 Black Box Tracing

  33. stella cotton | @practice_cactus 
 
 Why might this not

    
 work for you?
  34. stella cotton | @practice_cactus • Need lots of data •

    Delayed results • Can’t guarantee causality
  35. stella cotton | @practice_cactus def my_cool_system
 service_1
 service_2
 end
 


    def service_1
 Rails.logger "Service 1"
 execute_async_job
 end
 
 def execute_async_job
 Rails.logger "Async Job"
 end def service_2
 Rails.logger "Service 2"
 end 
01-01-2001 01:01:01 Service 1 01-01-2001 01:01:02 Async Job 01-01-2001 01:01:03 Service 2
 Aggregated Log
  36. stella cotton | @practice_cactus def my_cool_system
 service_1
 service_2
 end
 


    def service_1
 Rails.logger "Service 1"
 execute_async_job
 end
 
 def execute_async_job
 sleep 15 Rails.logger "Async Job"
 end def service_2
 Rails.logger "Service 2"
 end 
01-01-2001 01:01:01 Service 1 01-01-2001 01:01:02 Service 2 01-01-2001 01:01:17 Async Job
 Aggregated Log Latency Simulate
 latency
  37. stella cotton | @practice_cactus 
 
 White Box Tracing

  38. stella cotton | @practice_cactus 
 
 Metadata Propagation

  39. stella cotton | @practice_cactus 
 
 Realtime Analysis

  40. stella cotton | @practice_cactus History Lesson

  41. stella cotton | @practice_cactus 
 
 Dapper

  42. stella cotton | @practice_cactus 
 
 Zipkin

  43. stella cotton | @practice_cactus 
 
 “Distributed Tracing”

  44. stella cotton | @practice_cactus 
 
 
 “So, you want

    to trace your distributed system?
 Key design insights from years of practical experience” Raja R. Sambasivan, Rodrigo Fonseca, Ilari Shafer, Gregory R. Ganger http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-14-102.pdf
  45. stella cotton | @practice_cactus Tracing 
 Your Applications

  46. stella cotton | @practice_cactus • Tracer • Transport • Collector

    • Storage • UI Main Components:
  47. stella cotton | @practice_cactus Tracing Requests

  48. stella cotton | @practice_cactus 
 
 Tracer:
 Lives inside your

    apps, does the tracing
  49. stella cotton | @practice_cactus 
 
 Trace: 
 The story

    of a request’s journey 
 through your system
  50. stella cotton | @practice_cactus Web Request Auth Process Ecommerce Process

    Orders Recommendations Billing A trace 
 tells 
 this whole
 story
  51. stella cotton | @practice_cactus 
 
 Span:
 Each chapter in

    that story
  52. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing A span
  53. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing Trace id 123 Trace id 123 Trace id 123 Trace id 123 Trace id 123
  54. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing Trace id 123 Parent id nil Span id 1
  55. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing Trace id 123 Parent id nil Span id 1 Trace id 123 Parent id 1 Span id 2
  56. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing Trace id 123 Parent id nil Span id 1 Trace id 123 Parent id 1 Span id 2 Trace id 123 Parent id 2 Span id 3
  57. stella cotton | @practice_cactus 
 
 A Trace is many


    Parent - Child Relationships
  58. stella cotton | @practice_cactus 
 
 Directed Acyclic Graph

  59. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing Trace id 123 Parent id nil Span id 1 Trace id 123 Parent id 1 Span id 2 Trace id 123 Parent id 2 Span id 3
  60. stella cotton | @practice_cactus 
 
 Annotations:
 Gives us richer

    insights into our spans
  61. stella cotton | @practice_cactus 
 Client Start 01:01:01 
 Server

    Receive 01:01:02
 Server Send 01:01:03
 Client Receive 01:01:04
  62. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing Trace id 123 Parent id nil Span id 1 Trace id 123 Parent id 1 Span id 2 Trace id 123 Parent id 2 Span id 3
  63. stella cotton | @practice_cactus Auth Ecommerce Server Receive Client 


    Send Trace id 123 Parent id 1 Span id 2
  64. stella cotton | @practice_cactus Auth Ecommerce Server 
 Send Server

    Receive Client 
 Send
  65. stella cotton | @practice_cactus Auth Ecommerce Server 
 Send Server

    Receive Client 
 Send Client
 Receive
  66. stella cotton | @practice_cactus 
 
 Transporting the Data

  67. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing Trace id 123 Parent id nil Span id 1 Collector Storage Transport
  68. stella cotton | @practice_cactus Propagates ids in band

  69. stella cotton | @practice_cactus Reports out of band

  70. stella cotton | @practice_cactus 
 
 Viewing the Data

  71. stella cotton | @practice_cactus Web Request Auth Ecommerce Orders Recommendations

    Billing
  72. stella cotton | @practice_cactus http://opentracing.io/documentation/

  73. stella cotton | @practice_cactus

  74. stella cotton | @practice_cactus All happening
 inside the 
 “resource


    allocation
 & provisioning”
  75. stella cotton | @practice_cactus Parallel 
 execution Blocking 
 execution

  76. stella cotton | @practice_cactus A widening 
 gap here
 could

    indicate queueing
  77. stella cotton | @practice_cactus Tracing Incoming Requests

  78. stella cotton | @practice_cactus 
 
 Rack!

  79. stella cotton | @practice_cactus class RackApp
 def call(environment)
 [
 '200',


    {'Content-Type' => 'text/html'},
 ["Hello world"]
 ]
 end
 end
 Responds 
 to .call() Takes an environment hash Returns:
 [ 
 status, 
 header, 
 body 
 ]
  80. stella cotton | @practice_cactus class TracingRackMiddleware
 def initialize(app)
 @app =

    app
 end
 
 def call(env)
 @app.call(env)
 end
 end Initialize with our rack app Execute our rack app 
 or the next middleware
 in the chain
  81. stella cotton | @practice_cactus class TracingRackMiddleware
 def initialize(app)
 @app =

    app
 end
 
 def call(env)
 trace do
 @app.call(env)
 end
 end
 end Trace some stuff
  82. stella cotton | @practice_cactus class TracingRackMiddleware
 def initialize(app)
 @app =

    app
 end
 
 def call(env)
 trace do
 @app.call(env)
 end
 end def trace(env, &block)
 span = Span.new("authentication", generate_span_id)
 span.record(SERVER_RECV)
 status, headers, body = yield
 ensure
 span.record(SERVER_SEND)
 end end https://github.com/openzipkin/zipkin-ruby/blob/master/lib/zipkin-tracer/rack/zipkin-tracer.rb Execute our rack app Received a request Sending back
 to the client Non-pseudocode version:
  83. stella cotton | @practice_cactus # config/initializers/tracing.rb
 Rails.application.config.middleware.use TracingRackMiddleware, {
 #

    some configuration
 } Use our middleware!
  84. stella cotton | @practice_cactus # config/initializers/tracing.rb
 Rails.application.config.middleware.use TracingRackMiddleware, {
 service_name:

    "SERVICE_DOMAIN_NAME",
 service_port: 443,
 sample_rate: ENV.fetch("ZIPKIN_SAMPLE_RATE", 0.1).to_f,
 json_api_host: ENV["ZIPKIN_HOST"]
 }
 Sample a portion of requests
  85. stella cotton | @practice_cactus Tracing Outgoing Requests

  86. stella cotton | @practice_cactus 
 
 More Middleware!

  87. stella cotton | @practice_cactus 
 
 Faraday

  88. stella cotton | @practice_cactus class TracingFaradayMiddleware
 def initialize(app)
 @app =

    app
 end
 
 def call(env)
 trace!(env) do |env|
 @app.call(env)
 end
 end
 end
 Execute our http client
  89. stella cotton | @practice_cactus class TracingFaradayMiddleware
 def initialize(app)
 @app =

    app
 end
 
 def call(env)
 trace!(env) do |env|
 @app.call(env)
 end
 end
 
 def trace!(env, &block)
 env = set_headers(env)
 span = Span.new("external_call", 1234)
 span.record(Trace::Annotation::CLIENT_SEND)
 status, headers, body = yield env
 ensure
 span.record(Trace::Annotation::CLIENT_RECV)
 end
 end
 Manipulate the headers Using client instead of server
  90. stella cotton | @practice_cactus Each of these
 colors 
 represents


    an instrumented
 application
  91. stella cotton | @practice_cactus class TracingFaradayMiddleware
 def initialize(app)
 @app =

    app
 end
 
 def call(env)
 trace!(env) do |env|
 @app.call(env)
 end
 end
 
 def trace!(env, &block)
 env = set_headers(env)
 span = Span.new("external_call", 1234)
 span.record(Trace::Annotation::CLIENT_SEND)
 status, headers, body = yield env
 ensure
 span.record(Trace::Annotation::CLIENT_RECV)
 end
 end
 Client Send Client Receive
  92. stella cotton | @practice_cactus def self.client
 Faraday.new(url: base_url) do |connection|


    connection.use TracingFaradayMiddleware
 connection.adapter Faraday.default_adapter 
 end
 end Add our middleware
  93. stella cotton | @practice_cactus Checklist

  94. stella cotton | @practice_cactus Buy, Build, or Adopt

  95. stella cotton | @practice_cactus 
 
 Buy?

  96. stella cotton | @practice_cactus 
 Lightstep
 TraceView…
 and more?

  97. stella cotton | @practice_cactus 
 
 Adopt an OSS Solution?

  98. stella cotton | @practice_cactus 
 
 Zipkin

  99. stella cotton | @practice_cactus 
 
 What about Open Tracing?

  100. stella cotton | @practice_cactus 
 
 Standardizes Instrumentation

  101. stella cotton | @practice_cactus 
 
 Where is OpenTracing at

    today?
  102. stella cotton | @practice_cactus 
 
 Interoperability is Still Messy

  103. stella cotton | @practice_cactus 
 
 “Ruby Support”

  104. stella cotton | @practice_cactus 
 
 Rinse and Repeat

  105. stella cotton | @practice_cactus 
 
 Build Your Own?

  106. stella cotton | @practice_cactus 
 
 What are other folks

    doing?
  107. stella cotton | @practice_cactus End-to-End Tracing: Adoption and Use Cases

    Jonathan Mace, Brown University https://cs.brown.edu/~jcmace/papers/mace2017survey.pdf
  108. stella cotton | @practice_cactus • 15 using Zipkin • 9

    using internal solutions • 1 using other OSS solution • 1 using paid solution
 Jonathan Mace, Brown University https://cs.brown.edu/~jcmace/papers/mace2017survey.pdf
  109. stella cotton | @practice_cactus Infra Requirements and Limitations

  110. stella cotton | @practice_cactus 
 Dependency matrix of:
 - Tracer


    - Transport Layer
 - Collection Layer
 - Storage Layer
  111. stella cotton | @practice_cactus 
 
 Installing a Separate Agent

  112. stella cotton | @practice_cactus Authentication

  113. stella cotton | @practice_cactus 
 
 Missing Authentication & Authorization

  114. stella cotton | @practice_cactus {
 "buildpacks": [
 {
 "url": “https://github.com/heroku/heroku-buildpack-apt"


    },
 {
 "url": "https://github.com/danp/heroku-buildpack-runit"
 }
 ]
 }
  115. stella cotton | @practice_cactus 
 
 Advanced Package Tool (Apt)

  116. stella cotton | @practice_cactus 
 
 Runit

  117. stella cotton | @practice_cactus 
 
 Client Authorization

  118. stella cotton | @practice_cactus 
 
 Basic auth via htpsswd

    https://www.nginx.com/resources/admin-guide/restricting-access-auth-basic/
  119. stella cotton | @practice_cactus by Corey Donohoe

  120. stella cotton | @practice_cactus # config/initializers/zipkin.rb
 Rails.application.config.middleware.use ZipkinTracer::RackHandler, {
 service_name:

    "test.example.com",
 service_port: 443,
 json_api_host: ENV["ZIPKIN_HOST"]
 } ENV["ZIPKIN_HOST"] = "https://username:password@my-zipkin.com" Uses Basic Auth Where we’re sending traces Our app’s configuration file
  121. stella cotton | @practice_cactus 
 
 Browser Authentication

  122. stella cotton | @practice_cactus 
 
 bit.ly’s Oauth2 proxy https://github.com/bitly/oauth2_proxy

  123. stella cotton | @practice_cactus by Corey Donohoe

  124. stella cotton | @practice_cactus 
 
 Giving people access

  125. stella cotton | @practice_cactus Sensitive Data

  126. stella cotton | @practice_cactus 
 
 Custom Instrumentation

  127. stella cotton | @practice_cactus class ::ActiveRecord::ConnectionAdapters::AbstractAdapter
 prepend Tracing::SQL
 end module

    Tracing
 module SQL
 def log(sql, name = "SQL", binds = [], statement_name = nil)
 ZipkinTracer::TraceClient.local_component_span("sql query") do |span|
 span.record_tag("query", sql.to_s)
 super
 end
 end
 end
 end Monkey Patching with Prepend Mimic log method Wrap all sql calls and record the sql statement
  128. stella cotton | @practice_cactus 
 
 What happens when data

    leaks?
  129. stella cotton | @practice_cactus Is Everyone On Board?

  130. stella cotton | @practice_cactus 
 
 Get it on the

    Roadmap
  131. stella cotton | @practice_cactus 
 
 Open PRs

  132. stella cotton | @practice_cactus Should you buy, build or adopt?

    What are your infrastructure requirements and limitations? How is it authenticated? Do you have sensitive data? What will you do if it leaks? Is everyone on board? Evaluating Distributed Tracing Solutions:
  133. stella cotton | @practice_cactus OMG, this is so much information

  134. stella cotton | @practice_cactus 
 
 Try out Docker Zipkin

  135. stella cotton | @practice_cactus Heroku Booth Plant Illustrations designed by

    Natkacheva / Freepik @practice_cactus today, 3:30pm-4:30pm Come say hi at the