Slide 1

Slide 1 text

Distribution The trouble with (and pugs)

Slide 2

Slide 2 text

Hi, I’m Inés Sombra! @Randommood

Slide 3

Slide 3 text

Globally Distributed & Highly available

Slide 4

Slide 4 text

Today’s Agenda Conclusions 
 & takeaways Context setting blah blah blah Hard things Random pug photos Ines fast talking

Slide 5

Slide 5 text

TRADEOFFS

Slide 6

Slide 6 text

IMAGEOPTO

Slide 7

Slide 7 text

Meet
 Gordo

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

JPEG 1920 × 1080 px Max quality 2.3MB

Slide 10

Slide 10 text

A sample site img-large im g-thum b im g-thum b im g-thum b im g-thum b im g-xs im g-xs

Slide 11

Slide 11 text

Making this site faster Many image sizes stored in your origin

Slide 12

Slide 12 text

Pre-processing Tradeoffs Good open source options available for DIY Scale and tailor pre- processing pipeline as you need Setup, hosting, and operability burden on you Increases number of stack components Variability (1-2 vs all) Costly & takes time

Slide 13

Slide 13 text

Crop with aspect ratio http://www.fastly.io/gordo.jpg? crop=10:7,offset-x80&width=800 Crop the image square & resize the width to 200px http://www.fastly.io/gordo.jpg?crop=1:1&width=200 200 × 200 px Quality 85 6.93KB vs 2.3MB JPEG 800 × 560 px Quality 85 47 KB vs 2.3MB

Slide 14

Slide 14 text

Hard Things

Slide 15

Slide 15 text

Centralized vs Distributed computation

Slide 16

Slide 16 text

Centralized vs Distributed computation

Slide 17

Slide 17 text

Centralized vs Distributed computation

Slide 18

Slide 18 text

Centralized vs Distributed computation

Slide 19

Slide 19 text

”A distributed system is a collection of independent entities that cooperate to solve a problem that cannot be individually solved” Kshemkalyani & Singhal “Distributed Computing: Principles, Algorithms, and Systems”

Slide 20

Slide 20 text

Meeting system Goals

Slide 21

Slide 21 text

Our System’s Goals Store a single large original Perform many transformations on the fly Cache everything Do this fast!

Slide 22

Slide 22 text

Your origin & its images gordo-thumb please! Our System’s Goals

Slide 23

Slide 23 text

Your origin & its images thanks! Our System’s Goals 2.3 MB 6.3 KB

Slide 24

Slide 24 text

Serial vs Parallel Computation

Slide 25

Slide 25 text

Serial vs Parallel Computation

Slide 26

Slide 26 text

Image Optimizers

Slide 27

Slide 27 text

Image Optimizers

Slide 28

Slide 28 text

Image Optimizers gordo-thumb please!

Slide 29

Slide 29 text

Image Optimizers gordo-thumb please! original gordo please!

Slide 30

Slide 30 text

Image Optimizers gordo-thumb please! there you go

Slide 31

Slide 31 text

Image Optimizers gordo-thumb please! gordo-thum b please!

Slide 32

Slide 32 text

Image Optimizers gordo-thumb please! Done!

Slide 33

Slide 33 text

Image Optimizers gordo-thumb please! there you go

Slide 34

Slide 34 text

Image Optimizers YASS!

Slide 35

Slide 35 text

Image Optimizers gordo-thumb please!

Slide 36

Slide 36 text

Image Optimizers YASS!

Slide 37

Slide 37 text

ImageOpto Tradeoffs Parallel request processing Fully distributed Relying on CDN for targeted functionality Stateless We have to deal with different kinds of parallelization Increased system complexity No SPOF but many failure domains

Slide 38

Slide 38 text

Strategies from Literature Break into subsystems Randomness to make the worst-case & average-case the same Only provide strong consistency for the subsystems that need it 1999

Slide 39

Slide 39 text

CDN for logging CDN for authentication CDN for request management CDN for state management CDN for purging CDN for API translation CDN for doing less work! Orthogonality / Composability

Slide 40

Slide 40 text

Meeting System Goals Use orthogonality to meet your system’s goals Graceful degradation under faults is a goal too Simple but composable operations is a good design aesthetic

Slide 41

Slide 41 text

Pug Tradeoff #1 About 90 dB!

Slide 42

Slide 42 text

Global Visibility

Slide 43

Slide 43 text

Centralized vs Distributed insight vs ?

Slide 44

Slide 44 text

System Design & Visibility Image Optimizers Cost of decoding an image Caches Varnish

Slide 45

Slide 45 text

System Design & Visibility

Slide 46

Slide 46 text

System Design & Visibility

Slide 47

Slide 47 text

System Design & Visibility Decodes from an uncompressed format are faster About 90% reduction in processing time by saving the decoded image into an uncompressed format Let’s use this strategy to provide speedier transformations!

Slide 48

Slide 48 text

Image Optimizers Caches Varnish Varnish Varnish Varnish Varnish Varnish Varnish System Design & Visibility

Slide 49

Slide 49 text

Image Optimizers Varnish Varnish Varnish Varnish Varnish Varnish Edge Varnish Origin System Design & Visibility Shield Varnish

Slide 50

Slide 50 text

Shield Varnish Image Optimizers Varnish Varnish Varnish Varnish Varnish Varnish Edge Varnish System Design & Visibility Origin

Slide 51

Slide 51 text

Shield Varnish Image Optimizers Varnish Varnish Varnish Varnish Varnish Varnish Edge Varnish System Design & Visibility Origin

Slide 52

Slide 52 text

Shield Varnish Image Optimizers Varnish Varnish Varnish Varnish Varnish Varnish Edge Varnish System Design & Visibility Origin

Slide 53

Slide 53 text

Shield Varnish Image Optimizers Varnish Varnish Varnish Varnish Varnish Varnish Edge Varnish System Design & Visibility Origin

Slide 54

Slide 54 text

Shield Varnish Image Optimizers Varnish Varnish Varnish Varnish Varnish Varnish Edge Varnish System Design & Visibility Origin

Slide 55

Slide 55 text

Shield Varnish Image Optimizers Varnish Varnish Varnish Varnish Varnish Varnish Edge Varnish System Design & Visibility Origin

Slide 56

Slide 56 text

Shield Varnish Image Optimizers Varnish Varnish Varnish Varnish Varnish Varnish Edge Varnish System Design & Visibility Origin

Slide 57

Slide 57 text

Shield Varnish Image Optimizers Varnish Varnish Varnish Varnish Varnish Varnish Edge Varnish System Design & Visibility Origin

Slide 58

Slide 58 text

Shield Varnish Image Optimizers Varnish Varnish Varnish Varnish Varnish Varnish Edge Varnish System Design & Visibility Origin

Slide 59

Slide 59 text

Shield Varnish Image Optimizers Varnish Varnish Varnish Varnish Varnish Varnish Edge Varnish System Design & Visibility Origin YASS!

Slide 60

Slide 60 text

Bitmap Cache Tradeoffs Faster requests due to decoding image once & caching it Use shielding to increase the chance of getting a HIT for a given resource Complex request cycle: what happened in this request? was difficult to answer Starting with a non-trivial solution takes a lot of time Unexpected interactions with purging & shielding

Slide 61

Slide 61 text

No content

Slide 62

Slide 62 text

Shield Varnish Edge Varnish Designing system Visibility (again) Image Optimizers Origin

Slide 63

Slide 63 text

Edge Varnish Shield Varnish Designing system Visibility (again) Image Optimizers Origin

Slide 64

Slide 64 text

Shield Varnish Edge Varnish Designing system Visibility (again) Image Optimizers Origin

Slide 65

Slide 65 text

Shield Varnish Edge Varnish Designing system Visibility (again) Image Optimizers Origin

Slide 66

Slide 66 text

Shield Varnish Edge Varnish Designing system Visibility (again) Image Optimizers Origin

Slide 67

Slide 67 text

Shield Varnish Edge Varnish Designing system Visibility (again) Image Optimizers Origin

Slide 68

Slide 68 text

Shield Varnish Edge Varnish Designing system Visibility (again) Image Optimizers Origin

Slide 69

Slide 69 text

Shield Varnish Edge Varnish Designing system Visibility (again) Image Optimizers Origin

Slide 70

Slide 70 text

Edge Varnish Shield Varnish Designing system Visibility (again) Image Optimizers Origin YASS!

Slide 71

Slide 71 text

No Bitmap Cache Tradeoffs Fast enough requests Simplified architecture Simpler request path Purging works! Not the theoretical fastest Needed a few new caching features We wasted a lot of time & effort

Slide 72

Slide 72 text

Visibility & metadata

Slide 73

Slide 73 text

Visibility & metadata Stripping image metadata reduces the file size Smaller files are faster to transform & deliver Metadata has quality impacting information EXIF for image orientation ICC profiles define color attributes / viewing requirements

Slide 74

Slide 74 text

Visibility & ICC metadata

Slide 75

Slide 75 text

“It’s slow”

Slide 76

Slide 76 text

Jeff Hodges “Notes on Distributed Systems for Young Bloods” “It’s slow” might mean: one or more of the systems involved in performing a request is slow… one or more of the parts of a pipeline of transformations is slow. “It’s slow” is hard, in part, because the problem statement doesn’t provide many clues to location of the flaw and, until the degradation becomes very obvious, you won’t receive as many resources (time, money, & tooling) to solve it.”

Slide 77

Slide 77 text

System Visibility & debugging Image Optimizers Image size & format Utilization of hardware resources: CPUs, RAM Concurrency of used libraries The network can make you slow Things you didn’t know you had or that you were not doing well can make you slow

Slide 78

Slide 78 text

Logging in distributed systems: usefulness, verbosity, aggregation, etc System health & failure detectors Inspection of code & libraries used varnishlog, grep, & a whole lotta Jed Tools in place

Slide 79

Slide 79 text

System Visibility & debugging SCOPE FORMAT SIZE AVG Response Before (ms) AVG Response AFTER (ms) Response time decrease FR WebP 250x250 95 44 54% FR WebP 72x72 80 40 50% FR WebP 641x641 200 170 15% FR JPEG 250x250 95 150 -58% FR JPEG 72x72 80 40 50% FR JPEG 641x641 200 170 15% INTER WebP + JPEG All 750 250 67%

Slide 80

Slide 80 text

No Global System Visibility Ability to reason about your system’s goals & its dependencies is key Tracking and fixing “slow” is an ongoing activity Seemingly small amounts of performance variability in critical components quickly add up to create less than ideal conditions* * Ilya Grigorik - Building Fast & Resilient Web Applications

Slide 81

Slide 81 text

Pug Tradeoff #2 99.999% Available

Slide 82

Slide 82 text

No content

Slide 83

Slide 83 text

Resilience

Slide 84

Slide 84 text

Image Optimizers Edge Varnish System Design & Visibility

Slide 85

Slide 85 text

Image Optimizers Edge Varnish X System Design & Visibility

Slide 86

Slide 86 text

Resilience & Geographical distribution

Slide 87

Slide 87 text

Geographical Distribution Tradeoffs Dedicated hardware- based ImageOpto POPs Beefy machines with tons of cores & RAM Fast network connectivity Harder to dynamically grow the service with customer demand Cannot accommodate customers with origins not in USA

Slide 88

Slide 88 text

IO GCP IO POP Resilience & Geographical distribution

Slide 89

Slide 89 text

Resilience & Operability Image Optimizers

Slide 90

Slide 90 text

Resilience & Operability Image Optimizers

Slide 91

Slide 91 text

Resilience & Operability Image Optimizers

Slide 92

Slide 92 text

Resilience & Operability Image Optimizers

Slide 93

Slide 93 text

Resilience & Operability Image Optimizers

Slide 94

Slide 94 text

Resilience & Operability Image Optimizers

Slide 95

Slide 95 text

Resilience & Mixed Mode Fastly-IO-Info Header ETags are a function of the server handling a particular request Use HTTP ETags to guard against output encoding changes

Slide 96

Slide 96 text

Resilience & Operability Complex operations make systems less resilient & more incident-prone New systems/ functionality tend to shake new bugs Expect everything to be awful (always) so try to isolate your failure domains

Slide 97

Slide 97 text

Operability Tradeoffs Pay-as-you-go investment model in system operability /resilience Redundancies are key Less to do the less your API does Corners cut here will come back at the most inopportune time Complexity sometimes is unavoidable Cannot be bolted on

Slide 98

Slide 98 text

Adding resilience may come at the cost of other desired goals (e.g. time, performance, simplicity, cost, etc) Dependencies are hard: customer setup, customer inputs, caching layer, libraries, and other systems. We have to be resilient to all of them Designing for operability increases robustness Ensuring System Resilience

Slide 99

Slide 99 text

Parting 
 Thoughts

Slide 100

Slide 100 text

Tradeoffs are made in context and should be revisited often Goal tunnel vision may lead you to work harder on the wrong solution A narrow API that grows later is great, specially in early phases It’s all about tradeoffs

Slide 101

Slide 101 text

Our tradeoffs in hindsight GA’ed in April System evolving & growing Operability, performance, & increasing resilience are key Used by companies like Airbnb, Nordstrom Rack, Beatport, Gannett, LaRedoute, 1stdibs, Surfdome, and more! www.fastly.com/io

Slide 102

Slide 102 text

tl;dr DESIGN VISIBILITY RESILIENCE Simple utilitarian design helps you meet system goals Few composable operations & expand API later Keep global system context in mind! Ability to reason about what’s happening in your system is key Use logging, request tracing, & system instrumentation Many perspectives Hardening system against failure domains Have barriers to contain cascading failures Operability design matters

Slide 103

Slide 103 text

Thank you! github.com/Randommood/TroubleWithDistribution @Randommood Special thanks to: Tyler McMullen, Jed Denlea, Adam Thomason, Ian Fung, Joao Taveira, Ezekiel Templin, Ashok Lalwani, Matt Whiteley, Kyle Kingsbury, Peter Bourgon, Camille Fournier, Caitie McCaffrey, Lorenzo Saino, Elaine Greenberg, & Greg Bako.