Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Integrating Multiple CDN Providers: Our Experience at Etsy

Integrating Multiple CDN Providers: Our Experience at Etsy

Until recently Etsy has relied on a single content delivery network (CDN) to optimize delivery of cacheable content, and to accelerate dynamic content to our community. Being tied to a single vendor for such a critical piece of site delivery infrastructure can limit the amount of flexibility you have available to you.

These limits exist in a number of dimensions with the potential to leave your site vulnerable to a single vendor’s availability. By diversifying your CDN providers you can put the power back in your hands, allowing you to get the best of both worlds in terms of performance, reliability and cost.

In this talk Marcus Barczak and Laurie Denness will present the journey Etsy has undertaken over the past 12 months to introduce two additional CDN providers into Etsy’s site delivery infrastructure. They will present:

* Etsy’s rationale for wanting to integrate multiple CDN providers
* The test methodology used for evaluating each provider against live production traffic
* Tooling and monitoring infrastructure built to support multiple providers
* The issues and challenges faced with a multiple vendor solution

The approach and techniques used in the design, evaluation and implementation of Etsy’s multi-CDN solution are applicable to a wide range of use cases. Attendees should leave this talk with some ideas around establishing meaningful evaluation criteria, strategies for conducting A/B split testing in a live production environment, and techniques that they may explore for implementing multiple CDN’s within their own environment.

Marcus Barczak

November 14, 2013
Tweet

More Decks by Marcus Barczak

Other Decks in Technology

Transcript

  1. @lozzd • @ickymettle Background ▪ First started using a single

    CDN in 2008 ▪ Exponential Growth ▪ Start of 2012 began investigation into running multiple CDNs
  2. @lozzd • @ickymettle Why use a CDN? ▪ Goal: Consistently

    fast user experience globally ▪ Improve last mile performance by caching content close to the user ▪ Offload content delivery from origin infrastructure to the CDN provider
  3. @lozzd • @ickymettle Why use more than one CDN? ▪

    Resilience - Eliminate single point of failure ▪ Flexibility - Balance traffic based on business requirements ▪ Cost - Manage provider costs
  4. @lozzd • @ickymettle The Plan 1. Establish evaluation criteria 2.

    Initial configuration and testing 3. Test with production traffic 4. Operationalising
  5. @lozzd • @ickymettle Performance ▪ Baseline Response Times - Should

    be within ±5% of our existing CDN provider’s response times ▪ Hit Ratios and Origin Offload - Provider should achieve equivalent or better origin offload performance and hit ratios
  6. @lozzd • @ickymettle Configuration ▪ Complexity - how complex is

    the providers configuration system ▪ Self service - can you make changes directly or do they require professional services or other intervention ▪ Latency for changes - how quickly do changes take to propagate
  7. @lozzd • @ickymettle Culture ▪ Understand our culture ▪ Postmortems

    ▪ Access to technical staff ▪ Shared success
  8. @lozzd • @ickymettle Clean the house ▪ Managing caching TTLs

    from origin - CDNs honour the origin cache-control headers! <LocationMatch "\.(gif|jpg|jpeg|png|css|js)$"> Header set Cache-Control "max-age=94670800" </LocationMatch>
  9. @lozzd • @ickymettle Clean the house ▪ Manage gzip compression

    from origin - Honoured by CDNs - Compression from origin to CDN ## mod_deflate compression - see OPS-1537 ## AddOutputFilterByType DEFLATE text/html text/plain text/css application/x-javascript [..]
  10. HTTP/1.1 200 OK Server: Apache Last-Modified: Sat, 09 Nov 2013

    23:43:38 GMT Cache-Control: max-age=94670800 [...] X-Served-By: cache-lo82-LHR X-Cache: MISS X-Cache-Hits: 0 curl -i -H 'Host: img0.etsystatic.com' \ global-ssl.fastly.net/someimage.jpg
  11. HTTP/1.1 200 OK Server: Apache Last-Modified: Sat, 09 Nov 2013

    23:43:38 GMT Cache-Control: max-age=94670800 [...] X-Served-By: cache-lo82-LHR X-Cache: HIT X-Cache-Hits: 1 curl -i -H 'Host: img0.etsystatic.com' \ global-ssl.fastly.net/someimage.jpg
  12. @lozzd • @ickymettle Mean Time To Curl ▪ No need

    to touch existing infrastructure ▪ Smoke test of functionality ▪ 10 minutes from configuration to curl ▪ New providers should be plug and play
  13. @lozzd • @ickymettle Testing with Production Traffic ▪ Images only

    at first ▪ Good test of caching performance ▪ Easy to test by swapping hostnames ▪ Made even easier with our A/B testing framework
  14. @lozzd • @ickymettle A/B Test Framework ▪ Fine grained control

    ▪ Enable test for specific users or groups ▪ Percentage of users ▪ All controlled via configuration in code ▪ Rapid and complete rollback
  15. @lozzd • @ickymettle Configure Mappings to CDNs $server_config["image"] = array(

    'akamai' => array( 'img0-ak.etsystatic.com', 'img1-ak.etsystatic.com', ), 'edgecast' => array( 'img0-ec.etsystatic.com', 'img1-ec.etsystatic.com', ), 'fastly' => array( 'img0-f.etsystatic.com', 'img1-f.etsystatic.com', ), );
  16. @lozzd • @ickymettle Test Controls $server_config['ab']['cdn'] = array( 'enabled' =>

    'on', 'weights' => array( 'akamai' => 0.0, 'edgecast' => 0.0, 'fastly' => 0.0, 'origin' => 100.0, ), 'override' => 'cdn_diversity', );
  17. @lozzd • @ickymettle ▪ Get more detail by pulling metrics

    in house ▪ Write script to pull data from API ▪ Create dashboards with data Metrics and Monitoring
  18. @lozzd • @ickymettle ▪ Get more detail by pulling metrics

    in house ▪ Write script to pull data from API ▪ Create dashboards with data Metrics and Monitoring
  19. @lozzd • @ickymettle Testing Plan 1. for c in $cdns;

    do rampup $c; done; 2. Deliberately slow and steady 3. Watch traffic increase 4. Watch origin offload increase 5. Watch performance
  20. @lozzd • @ickymettle Downsides of this approach ▪ AB testing

    can’t be used for main site ▪ Exposing your test CNAMEs ▪ Especially if hotlinking is a concern
  21. @lozzd • @ickymettle Downsides of this approach ▪ Exposing your

    test CNAMEs ▪ Especially if hotlinking is a concern
  22. @lozzd • @ickymettle How do you know it’s broke? ▪

    Check the graphs! ▪ Check with your community ▪ Keep support in the loop
  23. @lozzd • @ickymettle Etsy’s site partitioning Listing Images, Avatars imgX.etsystatic.com

    Static Assets (js, css, fonts) site.etsystatic.com Dynamic HTML Content www.etsy.com
  24. @lozzd • @ickymettle Balancing Traffic Using DNS ▪ Traffic Manager

    ▪ Extends DNS to dynamically return records based on rules ▪ Weighted round robin
  25. @lozzd • @ickymettle Balancing Traffic Using DNS [2589:~] $ dig

    +short www.etsy.com www.etsy.com.edgekey.net. e2463.b.akamaiedge.net. 23.74.122.37 [2589:~] $ dig +short www.etsy.com cs34.adn.edgecastcdn.net. 93.184.219.54 [2589:~] $ dig +short www.etsy.com global-ssl.fastly.net. 185.31.19.184 [2589:~] $ dig +short www.etsy.com etsy.com. 38.123.123.123
  26. @lozzd • @ickymettle Balancing Traffic Using DNS [2589:~] $ dig

    +short www.etsy.com www.etsy.com.edgekey.net. e2463.b.akamaiedge.net. 23.74.122.37 [2589:~] $ dig +short www.etsy.com cs34.adn.edgecastcdn.net. 93.184.219.54 [2589:~] $ dig +short www.etsy.com global-ssl.fastly.net. 185.31.19.184 [2589:~] $ dig +short www.etsy.com etsy.com. 38.123.123.123
  27. @lozzd • @ickymettle Balancing Traffic Using DNS ▪ Rule updates

    typically made via web UI ▪ Can be slow and error prone ▪ Changes need to be applied to all three domains ▪ API available to make changes programmatically
  28. @lozzd • @ickymettle DNS balancing downsides ▪ Low TTLs for

    fast convergence ▪ More DNS lookups for users ▪ Not 100% instant or deterministic ▪ Mo QPS == Mo Money
  29. @lozzd • @ickymettle Whoopsie Page ▪ Static HTML delivered for

    5xx errors - Branding - Translated error messages - Links to status page
  30. @lozzd • @ickymettle Whoopsie Page ▪ Static HTML delivered for

    5xx errors - Branding - Translated error messages - Links to status page
  31. @lozzd • @ickymettle Failure Beacons 1. 1x1 tracking pixel embedded

    in page [...] <img src="//failure.etsy.com/status/images/beacon.gif? beacon_source=fastly_origin_failure-etsy.com"> </body> </html>
  32. @lozzd • @ickymettle Failure Beacons 1. 1x1 tracking pixel embedded

    in page 2. Request creates an access log line
  33. @lozzd • @ickymettle Failure Beacons 1. 1x1 tracking pixel embedded

    in page 2. Request creates an access log line 3. Scrape them out minutely using logster self.reg = re.compile('^\S+(\s:)? (?P<remote_addr>[0-9\.]+),? [0-9\.,\- ]+ \[[^\]]+\] \"GET /status/images/beacon\.gif\? (beacon_)?source=(?P<source>\S+) HTTP/1\.\d\" \d+ [\d\-]+ \"(? P<referrer>[^\"]+)\" \"(?P<user_agent>[^\"]+)\" .*$')
  34. @lozzd • @ickymettle 1. 1x1 tracking pixel embedded in page

    2. Request creates an access log line 3. Scrape them out minutely using logster 4. Logster posts event counts to Graphite Failure Beacons
  35. @lozzd • @ickymettle 1. 1x1 tracking pixel embedded in page

    2. Request creates an access log line 3. Scrape them out minutely using logster 4. Logster posts event counts to Graphite Failure Beacons
  36. @lozzd • @ickymettle Failure Beacons 1. 1x1 tracking pixel embedded

    in page 2. Request creates an access log line 3. Scrape them out minutely using logster 4. Logster posts event counts to Graphite 5. Alert on Graphite graph in Nagios
  37. @lozzd • @ickymettle Failure Beacons 1. 1x1 tracking pixel embedded

    in page 2. Request creates an access log line 3. Scrape them out minutely using logster 4. Logster posts event counts to Graphite 5. Alert on Graphite graph in Nagios
  38. @lozzd • @ickymettle Failure Beacons ▪ Optional extra debugging information

    [31/Oct/2013:07:06:42 +0000] "GET /status/images/ beacon.gif?beacon_source=fastly_origin_failure-etsy.com &provider_error=Connection%20timed%20out &server_identity=cache-ny57-NYC HTTP/1.1"
  39. @lozzd • @ickymettle Tracking Requests to Origin GET / HTTP/1.1

    User-Agent: curl/7.24.0 Accept: */* X-Forwarded-Host: www.etsy.com [...] X-CDN-Provider: edgecast [...] Host: www.etsy.com
  40. @lozzd • @ickymettle GET / HTTP/1.1 User-Agent: curl/7.24.0 Accept: */*

    X-Forwarded-Host: www.etsy.com [...] X-CDN-Provider: edgecast [...] Host: www.etsy.com Tracking Requests to Origin
  41. @lozzd • @ickymettle Backend Monitoring ▪ Logster on CDN provider

    header ▪ Vendor APIs to bring data in house
  42. @lozzd • @ickymettle Backend Monitoring ▪ Vendor APIs to bring

    data in house ▪ Data in-house benefits include - Integration with our anomaly detection systems - Consistent and unified view of all CDN metrics - We control data retention period
  43. @lozzd • @ickymettle Awareness ▪ Over 100 engineers ▪ Deploying

    60 times a day ▪ Correlating external and internal services
  44. @lozzd • @ickymettle Frontend Monitoring ▪ Performance is important to

    us ▪ Monitoring overall site performance ▪ Monitoring performance by CDN provider ▪ Real User Monitoring on key pages to track page performance
  45. @lozzd • @ickymettle Frontend Monitoring ▪ Performance is important to

    us ▪ Monitoring overall site performance ▪ Monitoring performance by CDN provider ▪ SOASTA mPulse on key pages to track real user page performance
  46. @lozzd • @ickymettle Debugging: What broke? ▪ MTTD/MTTR can be

    extremely low with this system ▪ But not always
  47. @lozzd • @ickymettle Debugging: What broke? ▪ MTTD/MTTR can be

    extremely low with this system ▪ But not always
  48. @lozzd • @ickymettle Debugging: What broke? ▪ Non technical member

    base ▪ Confusing and time consuming ▪ Amazing support team ▪ Log as much information as possible
  49. @lozzd • @ickymettle Great success ▪ 12 months in the

    benefits have far outweighed the few downsides ▪ We’re continuing to evolve the system ▪ We’ll be sure to share our experience with the community along the way
  50. @lozzd • @ickymettle Links/Open Source ▪ cdncontrol http://github.com/etsy/cdncontrol http://github.com/etsy/cdncontrol_ui ▪

    logster http://github.com/etsy/logster ▪ CDN API to Graphite scripts http://github.com/lozzd/cdn_scripts