Slide 1

Slide 1 text

Flywheel: Google's Data Compression Proxy for the Mobile Web Victor Agababov, Michael Buettner, Victor Chudnovsky, Mark Cogan, Ben Greenstein, Shane McDaniel, Michael Piatek, Colin Scott*, Matt Welsh, Bolian Yin Currently a graduate student at UC Berkeley * [email protected] [email protected]

Slide 2

Slide 2 text

Dominant Access Tech: Mobile • Mobile devices are increasingly dominant • Growth is greatest in emerging markets 2

Slide 3

Slide 3 text

Mobile Data is Expensive 3 Source: blog.jana.com Emerging markets have highest costs!

Slide 4

Slide 4 text

Flywheel • Flywheel: proxy service that optimizes HTTP response size • Three years of deployment experience, part of Chrome for Android, iOS, Desktop • Currently serving millions of users & billions of requests per day 4

Slide 5

Slide 5 text

What Flywheel Does 5 Total bytes: 9764 Total bytes: 5565 Transcode to WebP Pick quality level Minify CSS, JS GZip text objects

Slide 6

Slide 6 text

6

Slide 7

Slide 7 text

7 None of our optimizations are novel

Slide 8

Slide 8 text

What lessons did we learn from building and operating Flywheel? Key lessons: • Highly challenging to maintain good performance • Tussles are pervasive, ongoing, & time-consuming

Slide 9

Slide 9 text

Outline • Is a proxy really needed? • What’s hard about engineering Flywheel? • Does Flywheel meet our goals? • What can be learned? 9

Slide 10

Slide 10 text

The web isn’ t well optimized for mobile • Case in point: • 42% of HTML response bytes not compressed 10

Slide 11

Slide 11 text

Hard to keep up with best practices • New optimizations: WebP, SDCH, HTTP/2,... rolled out as often as every 6 weeks • Heterogeneity of mobile devices increasing • Need: an optimizing compiler for the mobile web 11 Need: Optimizing service for the mobile web

Slide 12

Slide 12 text

Outline • Is a proxy really needed? • What’s hard about engineering Flywheel? • Does Flywheel meet our goals? • What can be learned? 12

Slide 13

Slide 13 text

Design Constraints • Opt-in deployment model • Transparent to users • HTTP only (No HTTPS) 13

Slide 14

Slide 14 text

Challenge: trading off latency vs compression 14 Indirection through Flywheel often increases RTT RTT’ RTT > Network latency is dominant performance factor (Page size is not a dominant factor!) HTTP Origin

Slide 15

Slide 15 text

Google datacenter Cache Optimization services Optimization services Optimization services Fetch bots Proxy Fetch router Flywheel Design 15 GET / Fetch router maintains connection affinity HTTP Origin

Slide 16

Slide 16 text

Google datacenter Cache Optimization services Optimization services Optimization services Fetch bots Proxy Fetch router Flywheel Design 16 200 Optimizations: image transcoding, GZip, minification … 200 Separate optimization services for isolation, provisioning HTTP Origin

Slide 17

Slide 17 text

Selective Proxying 17 GET / Fetch objects on critical path from origin HTTP Origin Indirection through Flywheel often increases RTT

Slide 18

Slide 18 text

Selective Proxying 18 Fetch objects on critical path from origin GET /i.jpg Proxy objects that yield high data reduction HTTP Origin

Slide 19

Slide 19 text

Challenge: Accommodating Tussles 19 Mechanism: HTTP fallback (blockable canary requests) Need to be policy-neutral CANARY

Slide 20

Slide 20 text

Outline • Is a proxy really needed? • What’s hard about engineering Flywheel? • Does Flywheel meet our goals? • What can be learned? 20

Slide 21

Slide 21 text

Evaluation • This talk: • Primary goal: reduce web page size • Secondary goal: maintain good performance • Paper: • Fault tolerance 21

Slide 22

Slide 22 text

Workload: Geographic Adoption 22 Adoption highest in developing markets Country Adoption Worldwide 10.5% Brazil 17% Russia 16.5% Indonesia 16.3% Mexico 15.5% USA 9.5%

Slide 23

Slide 23 text

Workload: Page Footprints 23

Slide 24

Slide 24 text

Workload: Page Footprints 24 Majority of pages are small

Slide 25

Slide 25 text

Workload: Page Footprints 25 97% of bytes come from top 5% largest pages Majority of pages are small

Slide 26

Slide 26 text

Type % of Bytes Savings Share of Benefit Total 100% 58% - Images 74.12% 66.40% 85% HTML 9.64% 38.43% 6% JavaScript 9.10% 41.09% 6% CSS 1.81% 52.10% 2% Other 5.33% 9.23% 1% Data Reduction 26 Savings = 1 - (outgoing bytes / incoming bytes) * 100

Slide 27

Slide 27 text

Type % of Bytes Savings Share of Benefit Total 100% 58% - Images 74.12% 66.40% 85% HTML 9.64% 38.43% 6% JavaScript 9.10% 41.09% 6% CSS 1.81% 52.10% 2% Other 5.33% 9.23% 1% Data Reduction 27 Savings = 1 - (outgoing bytes / incoming bytes) * 100

Slide 28

Slide 28 text

Type % of Bytes Savings Share of Benefit Total 100% 58% - Images 74.12% 66.40% 85% HTML 9.64% 38.43% 6% JavaScript 9.10% 41.09% 6% CSS 1.81% 52.10% 2% Other 5.33% 9.23% 1% Data Reduction 28 Images are bulk of bytes & savings Savings = 1 - (outgoing bytes / incoming bytes) * 100

Slide 29

Slide 29 text

Reduction Across Users 29

Slide 30

Slide 30 text

Reduction Across Users 30 Median data reduction: 50%

Slide 31

Slide 31 text

Reduction Across Users 31

Slide 32

Slide 32 text

Reduction Across Users 32 Overall data reduction: 27%

Slide 33

Slide 33 text

Performance: Methodology • Goal: don’t degrade performance Compare: • Random sampling of Flywheel users • Holdback experimental group 33

Slide 34

Slide 34 text

Simple Model of Page Load Time Load time of subresource si = propagation delay + transmission delay + computation time Page load time = Σ si on critical path Critical path = longest chain of dependent subresources Time

Slide 35

Slide 35 text

Page Load Time 35 Seconds 0 4 8 12 16 20 24 28 32 36 40 Quantile (page loads) Median 70th 80th 90th 95th 99th 36.13 13.89 8.95 5.37 3.78 2.21 39.38 14.61 9.22 5.38 3.68 2.08 Holdback Flywheel Flywheel improves performance only in the tail

Slide 36

Slide 36 text

Why is this? • Recall our workload: • Long tail of large pages • Most pages are small 36 Propagation delay dominant factor Transmission delay dominant factor • Good way to understand propagation delay: time to first byte (TTFB)

Slide 37

Slide 37 text

Seconds 0 0.6 1.2 1.8 2.4 3 3.6 4.2 4.8 5.4 6 Quantile (requests) Median 70th 80th 90th 95th 99th 5.81 1.90 1.16 0.69 0.49 0.30 5.06 1.69 1.00 0.55 0.36 0.19 Holdback Flywheel 0.19 Time to First Byte 37 Most users’ direct path to the origin is shorter than the indirect path

Slide 38

Slide 38 text

Outline • Is a proxy really needed? • What’s hard about engineering Flywheel? • Does Flywheel meet our goals? • What can be learned? 38

Slide 39

Slide 39 text

Lessons Learned Many performance optimizations we expected to have impact did not at Web scale 39

Slide 40

Slide 40 text

Disappointing Optimizations • Preconnect: open TCP connection with origin early 40 bar.com foo.com ..... • Increases reused connections from 73% to 80% • Yields less than 2% decrease in median PLT foo.com bar.com Already a strong tendency for connection affinity

Slide 41

Slide 41 text

Disappointing Optimizations • Prefetch: request cacheable subresources early 41 foo.com ..... • Increases cache hit ratio from 22% to 32% • Yields less than 2% decrease in median PLT GET /js /js foo.com bar.com Cacheable items often aren’t on the critical path

Slide 42

Slide 42 text

Lessons Learned Improving PLT is highly challenging • Compression doesn’t help (much) • Difficult to target critical path 42

Slide 43

Slide 43 text

Lessons Learned 43 If you want widespread adoption, you must accommodate policy issues!

Slide 44

Slide 44 text

Lessons Learned 44 Many more measurement findings and lessons in the paper!

Slide 45

Slide 45 text

Conclusion • Flywheel shows it’s possible to provide 58% average HTTP data reduction at web scale • Data reduction is the easy part • Maintaining performance and accommodating tussles are the hard part 45 [email protected] [email protected] lmgtfy.com/?q=Chrome+Data+Saver