Upgrade to Pro — share decks privately, control downloads, hide ads and more …

a day in the life of a request

a day in the life of a request

Igor Wiedler

March 23, 2019
Tweet

More Decks by Igor Wiedler

Other Decks in Technology

Transcript

  1. a day in the life
    of a request

    View Slide

  2. View Slide

  3. hello!

    View Slide

  4. why is it slow?

    View Slide

  5. latency

    View Slide

  6. t0 t1

    View Slide

  7. Designs, Lessons and Advice from Building Large Distributed Systems, Jeff Dean

    View Slide

  8. What is in the tail?
    0
    0.5
    1
    1.5
    2
    2.5
    3
    3.5
    4
    4.5
    5
    0 20 40 60 80 100
    0
    20
    40
    60
    80
    100
    Percentage of requests
    Latency (ms)
    ?
    Measuring and Optimizing Tail Latency, Kathryn McKinley

    View Slide

  9. Benchmarking "Hello, World!", Dick Sites

    View Slide

  10. Amdahl's law, Wikipedia

    View Slide

  11. Example 2: Task Scheduling in Spark
    Driver
    W1
    W2
    W3
    5
    SnailTrail, critical participation
    Window
    Conventional profiling
    Window
    % time
    SnailTrail, Hoffmann et al

    View Slide

  12. CPU Flame Graphs, Brendan Gregg

    View Slide

  13. Systems Performance by Brendan Gregg

    View Slide

  14. View Slide


  15. View Slide

  16. The Gantt Chart: A Working Tool of Management, Henry Wallace Clark

    View Slide

  17. Twitter Dot Com, Google Chrome

    View Slide

  18. Symfony

    View Slide

  19. Dapper, Google

    View Slide

  20. func ProcessVideo(ctx, video) {
    ctx, span := trace.StartSpan(ctx, "ProcessVideo")
    defer span.End()
    video.Process()
    }

    View Slide

  21. things this helps debug

    View Slide

  22. Travis CI

    View Slide

  23. func (rl *redisRateLimiter) RateLimit(...) {
    conn := rl.pool.Get()
    defer conn.Close()
    ctx, span := trace.StartSpan(ctx, "Redis.RateLimit")
    defer span.End()
    ...
    }

    View Slide

  24. tx0
    tx1
    tx2
    tx3
    tx4
    tx5
    ...
    blocked

    View Slide


  25. View Slide

  26. context propagation

    View Slide

  27. Dapper, Google

    View Slide

  28. X-Request-ID

    View Slide

  29. SELECT COUNT(*)
    FROM likes
    WHERE artist = 'CHVRCHES'

    View Slide

  30. SELECT COUNT(*)
    FROM likes
    WHERE artist = 'CHVRCHES'
    /*request_id:123e4567-e89b-12d3-
    a456-426655440000*/
    Marginalia, Basecamp

    View Slide

  31. EXPLAIN ANALYZE
    SELECT COUNT(*)
    FROM likes
    WHERE artist = 'CHVRCHES'
    /*request_id:123e4567-e89b-12d3-
    a456-426655440000*/

    View Slide

  32. Aggregate
    Buffers: shared hit=74 read=41
    -> Index Only Scan using likes_artist_idx on likes
    Index Cond: (artist = 'CHRVRCHES'::text)
    Heap Fetches: 10000
    Buffers: shared hit=74 read=41
    Planning Time: 0.344 ms
    Execution Time: 5.182 ms

    View Slide

  33. req, err := http.NewRequest("GET", serviceURL, nil)
    req.Header.Add("X-Request-ID", requestID)
    resp, err := client.Do(req)

    View Slide

  34. Canopy, Facebook

    View Slide

  35. sampling

    View Slide

  36. Dapper, Google

    View Slide

  37. sampling decision

    View Slide

  38. Travis CI

    View Slide

  39. finding interesting
    traces

    View Slide

  40. Honeycomb

    View Slide

  41. LightStep

    View Slide

  42. group by customer

    View Slide

  43. happy path can also be
    interesting!

    View Slide

  44. visualization

    View Slide

  45. Jaeger, Uber

    View Slide

  46. where do we go from here?

    View Slide

  47. aggregation

    View Slide

  48. View Slide

  49. Canopy, Facebook

    View Slide

  50. Canopy, Facebook

    View Slide

  51. Pivot Tracing, Mace et al

    View Slide

  52. Pivot Tracing, Mace et al

    View Slide

  53. kernel tracing

    View Slide

  54. Systems Performance by Brendan Gregg

    View Slide

  55. Debugging Latency in Go 1.11, Jaana B. Dogan

    View Slide

  56. eBPF

    View Slide

  57. View Slide

  58. Performance Analysis of Cloud Applications, Google

    View Slide

  59. Performance Analysis of Cloud Applications, Google

    View Slide

  60. Benchmarking "Hello, World!", Dick Sites

    View Slide

  61. Benchmarking "Hello, World!", Dick Sites

    View Slide

  62. Go Dynamic Tools, Dmitry Vyukov, GopherCon 2015

    View Slide

  63. Visualization: Statemaps
    The Hurricane’s Butterfly, Bryan Cantrill

    View Slide

  64. Stacked statemaps across machines
    Visualizing Systems with Statemaps, Bryan Cantrill

    View Slide

  65. adaptively improving
    tail latency

    View Slide

  66. "long requests reveal
    themselves"
    ~ Kathryn McKinley

    View Slide

  67. The Tail Longest 200 requests
    15
    0
    20
    40
    60
    80
    100
    120
    0 50 100 150 200
    latency (ms)
    Top 200 requests
    Network and networking queueing time
    Idle time
    CPU time
    Dispatch queueing time
    latency
    Network & other
    Idle
    CPU work
    Queuing at worker
    not noise
    Network imperfections
    OS imperfections
    Long requests
    Overload
    }noise
    }
    Measuring and Optimizing Tail Latency, Kathryn McKinley

    View Slide

  68. dealing with noise

    View Slide

  69. speeding up work

    View Slide

  70. recap
    • tail latency matters
    • tracing helps debug it

    View Slide

  71. OpenCensus

    View Slide

  72. the morning paper
    blog.acolyer.org

    View Slide

  73. • Dapper, a Large-Scale Distributed Systems Tracing Infrastructure from Google, 2010
    • Scuba: Diving into Data at Facebook from Facebook, 2016
    • Canopy: An End-to-End Performance Tracing And Analysis System from Facebook, 2017
    • Performance Analysis of Cloud Applications from Google, 2018
    • Systems Performance: Enterprise and the Cloud by Brendan Gregg, 2013
    • The Tail at Scale by Jeff Dean and Luiz André Barroso, 2013
    • Designs, Lessons and Advice from Building Large Distributed Systems by Jeff Dean, 2009
    • Data Center Computers: Modern Challenges in CPU Design by Dick Sites, 2015
    • Measuring and Optimizing Tail Latency by Kathryn McKinley, Strange Loop 2017
    • Benchmarking "Hello, World!" by Dick Sites, 2018
    • Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems by Mace et al, 2015
    • RobinHood: Tail Latency Aware Caching by Berger et al, 2018
    • SnailTrail: Generalizing Critical Paths for Online Analysis of Distributed Dataflows by Hoffmann
    et al, 2018

    View Slide

  74. thanks!
    @igorwhilefalse

    View Slide