Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Flywheel: Google's Data Compression Proxy for t...

Flywheel: Google's Data Compression Proxy for the Mobile Web

Colin Scott

August 11, 2015
Tweet

More Decks by Colin Scott

Other Decks in Technology

Transcript

  1. Flywheel: Google's Data Compression Proxy for the Mobile Web Victor

    Agababov, Michael Buettner, Victor Chudnovsky, Mark Cogan, Ben Greenstein, Shane McDaniel, Michael Piatek, Colin Scott*, Matt Welsh, Bolian Yin Currently a graduate student at UC Berkeley * [email protected] [email protected]
  2. Flywheel • Flywheel: proxy service that optimizes HTTP response size

    • Three years of deployment experience, part of Chrome for Android, iOS, Desktop • Currently serving millions of users & billions of requests per day 4
  3. What Flywheel Does 5 Total bytes: 9764 Total bytes: 5565

    Transcode to WebP Pick quality level Minify CSS, JS GZip text objects
  4. 6

  5. What lessons did we learn from building and operating Flywheel?

    Key lessons: • Highly challenging to maintain good performance • Tussles are pervasive, ongoing, & time-consuming
  6. Outline • Is a proxy really needed? • What’s hard

    about engineering Flywheel? • Does Flywheel meet our goals? • What can be learned? 9
  7. The web isn’ t well optimized for mobile • Case

    in point: • 42% of HTML response bytes not compressed 10
  8. Hard to keep up with best practices • New optimizations:

    WebP, SDCH, HTTP/2,... rolled out as often as every 6 weeks • Heterogeneity of mobile devices increasing • Need: an optimizing compiler for the mobile web 11 Need: Optimizing service for the mobile web
  9. Outline • Is a proxy really needed? • What’s hard

    about engineering Flywheel? • Does Flywheel meet our goals? • What can be learned? 12
  10. Challenge: trading off latency vs compression 14 Indirection through Flywheel

    often increases RTT RTT’ RTT > Network latency is dominant performance factor (Page size is not a dominant factor!) HTTP Origin
  11. Google datacenter Cache Optimization services Optimization services Optimization services Fetch

    bots Proxy Fetch router Flywheel Design 15 GET / Fetch router maintains connection affinity HTTP Origin
  12. Google datacenter Cache Optimization services Optimization services Optimization services Fetch

    bots Proxy Fetch router Flywheel Design 16 200 Optimizations: image transcoding, GZip, minification … 200 Separate optimization services for isolation, provisioning HTTP Origin
  13. Selective Proxying 17 GET / Fetch objects on critical path

    from origin HTTP Origin Indirection through Flywheel often increases RTT
  14. Selective Proxying 18 Fetch objects on critical path from origin

    GET /i.jpg Proxy objects that yield high data reduction HTTP Origin
  15. Outline • Is a proxy really needed? • What’s hard

    about engineering Flywheel? • Does Flywheel meet our goals? • What can be learned? 20
  16. Evaluation • This talk: • Primary goal: reduce web page

    size • Secondary goal: maintain good performance • Paper: • Fault tolerance 21
  17. Workload: Geographic Adoption 22 Adoption highest in developing markets Country

    Adoption Worldwide 10.5% Brazil 17% Russia 16.5% Indonesia 16.3% Mexico 15.5% USA 9.5%
  18. Workload: Page Footprints 25 97% of bytes come from top

    5% largest pages Majority of pages are small
  19. Type % of Bytes Savings Share of Benefit Total 100%

    58% - Images 74.12% 66.40% 85% HTML 9.64% 38.43% 6% JavaScript 9.10% 41.09% 6% CSS 1.81% 52.10% 2% Other 5.33% 9.23% 1% Data Reduction 26 Savings = 1 - (outgoing bytes / incoming bytes) * 100
  20. Type % of Bytes Savings Share of Benefit Total 100%

    58% - Images 74.12% 66.40% 85% HTML 9.64% 38.43% 6% JavaScript 9.10% 41.09% 6% CSS 1.81% 52.10% 2% Other 5.33% 9.23% 1% Data Reduction 27 Savings = 1 - (outgoing bytes / incoming bytes) * 100
  21. Type % of Bytes Savings Share of Benefit Total 100%

    58% - Images 74.12% 66.40% 85% HTML 9.64% 38.43% 6% JavaScript 9.10% 41.09% 6% CSS 1.81% 52.10% 2% Other 5.33% 9.23% 1% Data Reduction 28 Images are bulk of bytes & savings Savings = 1 - (outgoing bytes / incoming bytes) * 100
  22. Performance: Methodology • Goal: don’t degrade performance Compare: • Random

    sampling of Flywheel users • Holdback experimental group 33
  23. Simple Model of Page Load Time Load time of subresource

    si = propagation delay + transmission delay + computation time Page load time = Σ si on critical path Critical path = longest chain of dependent subresources Time
  24. Page Load Time 35 Seconds 0 4 8 12 16

    20 24 28 32 36 40 Quantile (page loads) Median 70th 80th 90th 95th 99th 36.13 13.89 8.95 5.37 3.78 2.21 39.38 14.61 9.22 5.38 3.68 2.08 Holdback Flywheel Flywheel improves performance only in the tail
  25. Why is this? • Recall our workload: • Long tail

    of large pages • Most pages are small 36 Propagation delay dominant factor Transmission delay dominant factor • Good way to understand propagation delay: time to first byte (TTFB)
  26. Seconds 0 0.6 1.2 1.8 2.4 3 3.6 4.2 4.8

    5.4 6 Quantile (requests) Median 70th 80th 90th 95th 99th 5.81 1.90 1.16 0.69 0.49 0.30 5.06 1.69 1.00 0.55 0.36 0.19 Holdback Flywheel 0.19 Time to First Byte 37 Most users’ direct path to the origin is shorter than the indirect path
  27. Outline • Is a proxy really needed? • What’s hard

    about engineering Flywheel? • Does Flywheel meet our goals? • What can be learned? 38
  28. Disappointing Optimizations • Preconnect: open TCP connection with origin early

    40 bar.com foo.com ..<head><script  src=“bar.com/js”>... • Increases reused connections from 73% to 80% • Yields less than 2% decrease in median PLT foo.com bar.com Already a strong tendency for connection affinity
  29. Disappointing Optimizations • Prefetch: request cacheable subresources early 41 foo.com

    ..<head><script  src=“bar.com/js”>... • Increases cache hit ratio from 22% to 32% • Yields less than 2% decrease in median PLT GET /js /js foo.com bar.com Cacheable items often aren’t on the critical path
  30. Lessons Learned Improving PLT is highly challenging • Compression doesn’t

    help (much) • Difficult to target critical path 42
  31. Conclusion • Flywheel shows it’s possible to provide 58% average

    HTTP data reduction at web scale • Data reduction is the easy part • Maintaining performance and accommodating tussles are the hard part 45 [email protected] [email protected] lmgtfy.com/?q=Chrome+Data+Saver