a day in the life of a request

hello!

why is it slow?

latency

Designs, Lessons and Advice from Building Large Distributed Systems, Jeff
Dean

What is in the tail? 0 0.5 1 1.5 2
2.5 3 3.5 4 4.5 5 0 20 40 60 80 100 0 20 40 60 80 100 Percentage of requests Latency (ms) ? Measuring and Optimizing Tail Latency, Kathryn McKinley

Benchmarking "Hello, World!", Dick Sites

Amdahl's law, Wikipedia

Example 2: Task Scheduling in Spark Driver W1 W2 W3
5 SnailTrail, critical participation Window Conventional profiling Window % time SnailTrail, Hoffmann et al

CPU Flame Graphs, Brendan Gregg

Systems Performance by Brendan Gregg

<span>

The Gantt Chart: A Working Tool of Management, Henry Wallace
Clark

Twitter Dot Com, Google Chrome

Symfony

Dapper, Google

func ProcessVideo(ctx, video) { ctx, span := trace.StartSpan(ctx, "ProcessVideo") defer
span.End() video.Process() }

things this helps debug

Travis CI

func (rl *redisRateLimiter) RateLimit(...) { conn := rl.pool.Get() defer conn.Close()
ctx, span := trace.StartSpan(ctx, "Redis.RateLimit") defer span.End() ... }

tx0 tx1 tx2 tx3 tx4 tx5 ... blocked

</span>

context propagation

Dapper, Google

X-Request-ID

SELECT COUNT(*) FROM likes WHERE artist = 'CHVRCHES'

SELECT COUNT(*) FROM likes WHERE artist = 'CHVRCHES' /*request_id:123e4567-e89b-12d3- a456-426655440000*/
Marginalia, Basecamp

EXPLAIN ANALYZE SELECT COUNT(*) FROM likes WHERE artist = 'CHVRCHES'
/*request_id:123e4567-e89b-12d3- a456-426655440000*/

Aggregate Buffers: shared hit=74 read=41 -> Index Only Scan using
likes_artist_idx on likes Index Cond: (artist = 'CHRVRCHES'::text) Heap Fetches: 10000 Buffers: shared hit=74 read=41 Planning Time: 0.344 ms Execution Time: 5.182 ms

req, err := http.NewRequest("GET", serviceURL, nil) req.Header.Add("X-Request-ID", requestID) resp, err
:= client.Do(req)

Canopy, Facebook

sampling

Dapper, Google

sampling decision

Travis CI

ﬁnding interesting traces

Honeycomb

LightStep

group by customer

happy path can also be interesting!

visualization

Jaeger, Uber

where do we go from here?

aggregation

Canopy, Facebook

Pivot Tracing, Mace et al

kernel tracing

Systems Performance by Brendan Gregg

Debugging Latency in Go 1.11, Jaana B. Dogan

Performance Analysis of Cloud Applications, Google

Benchmarking "Hello, World!", Dick Sites

Go Dynamic Tools, Dmitry Vyukov, GopherCon 2015

Visualization: Statemaps The Hurricane’s Butterﬂy, Bryan Cantrill

Stacked statemaps across machines Visualizing Systems with Statemaps, Bryan Cantrill

adaptively improving tail latency

"long requests reveal themselves" ~ Kathryn McKinley

The Tail Longest 200 requests 15 0 20 40 60
80 100 120 0 50 100 150 200 latency (ms) Top 200 requests Network and networking queueing time Idle time CPU time Dispatch queueing time latency Network & other Idle CPU work Queuing at worker not noise Network imperfections OS imperfections Long requests Overload }noise } Measuring and Optimizing Tail Latency, Kathryn McKinley

dealing with noise

speeding up work

recap • tail latency matters • tracing helps debug it

OpenCensus

the morning paper blog.acolyer.org

• Dapper, a Large-Scale Distributed Systems Tracing Infrastructure from Google,
2010 • Scuba: Diving into Data at Facebook from Facebook, 2016 • Canopy: An End-to-End Performance Tracing And Analysis System from Facebook, 2017 • Performance Analysis of Cloud Applications from Google, 2018 • Systems Performance: Enterprise and the Cloud by Brendan Gregg, 2013 • The Tail at Scale by Jeff Dean and Luiz André Barroso, 2013 • Designs, Lessons and Advice from Building Large Distributed Systems by Jeff Dean, 2009 • Data Center Computers: Modern Challenges in CPU Design by Dick Sites, 2015 • Measuring and Optimizing Tail Latency by Kathryn McKinley, Strange Loop 2017 • Benchmarking "Hello, World!" by Dick Sites, 2018 • Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems by Mace et al, 2015 • RobinHood: Tail Latency Aware Caching by Berger et al, 2018 • SnailTrail: Generalizing Critical Paths for Online Analysis of Distributed Dataﬂows by Hoffmann et al, 2018

thanks! @igorwhilefalse

a day in the life of a request

a day in the life of a request

More Decks by Igor Wiedler

Other Decks in Technology

Featured

Transcript