80 100 120 0 50 100 150 200 latency (ms) Top 200 requests Network and networking queueing time Idle time CPU time Dispatch queueing time latency Network & other Idle CPU work Queuing at worker not noise Network imperfections OS imperfections Long requests Overload }noise } Measuring and Optimizing Tail Latency, Kathryn McKinley
2010 • Scuba: Diving into Data at Facebook from Facebook, 2016 • Canopy: An End-to-End Performance Tracing And Analysis System from Facebook, 2017 • Performance Analysis of Cloud Applications from Google, 2018 • Systems Performance: Enterprise and the Cloud by Brendan Gregg, 2013 • The Tail at Scale by Jeff Dean and Luiz André Barroso, 2013 • Designs, Lessons and Advice from Building Large Distributed Systems by Jeff Dean, 2009 • Data Center Computers: Modern Challenges in CPU Design by Dick Sites, 2015 • Measuring and Optimizing Tail Latency by Kathryn McKinley, Strange Loop 2017 • Benchmarking "Hello, World!" by Dick Sites, 2018 • Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems by Mace et al, 2015 • RobinHood: Tail Latency Aware Caching by Berger et al, 2018 • SnailTrail: Generalizing Critical Paths for Online Analysis of Distributed Dataflows by Hoffmann et al, 2018