Upgrade to Pro — share decks privately, control downloads, hide ads and more …

When dispatcher caching is not enough... (exten...

When dispatcher caching is not enough... (extended version)

Content distribution for worldwide audience is not a trivial task. Most of the time the goal is very well known - keep your users happy and deliver them content they need as fast as you can.

There are at least two ways you can achieve that. You can build (and manage!) your own solution (AEM/dispatcher farms spread across the globe) or put a CDN in front of your application stack. The first one may sound tempting, but on second thought you quickly realize it's too much hassle and you would rather go for CDN. Regardless of the solution a set of problems stays the same.

Back in the old days you could just cache (almost) everything, as your website was pretty much static, but currently it's much more complicated. Your AEM stack is built from dynamic components that fetch data from 3rd party apps, there's a search engine under the hood and all crucial content is available for logged-in users only. To be even worse your resources are updated multiple times a day. Is it even possible to leverage CDN for that type of websites?

Have you ever tried to cache customized content that is available for authenticated users? Or authorize them at the edge? Or maybe you were crazy enough to implement CDN, not only for content served from AEM publish, but also in front of your authoring? In my talk I'd like to present you how we integrated AEM app that serves content to users distributed all over the world with heavily customizable content delivery network (Fastly).

This will be an extended version of the talk I gave at ConnectCon conference this June. Expect more details and a rich live demo on stage!

Avatar for Jakub Wądołowski

Jakub Wądołowski

September 09, 2015
Tweet

More Decks by Jakub Wądołowski

Other Decks in Technology

Transcript

  1. To be perfectly honest, initially it was rather like that…

    www.flickr.com/photos/garryknight/5703519506
  2. The client  EU pharmaceutical company  75 offices across

    the globe  Over 40 000 employees  Medical products available worldwide (180+ countries) www.flickr.com/photos/worak/2258271659
  3.  Country specific brochureware websites for medical products  iPad

    app for sales representatives  Single point for content entry  Multiple integration points (SSO, user/device authentication, etc.)  CQ 5.5, upgrade to AEM 6.1 in progress Requirements
  4.  Single datacenter in London (Rackspace)  REST-like API for

    iPad app  Integrations with local and remote services Logical architecture
  5. “Our team in Argentina complains that the app feels slow.

    They can’t download presentations sometimes. Could you please investigate that?” Mr B. www.flickr.com/photos/r4vi/8640618489
  6.  Latency, latency, latency…  Way too high round trip

    times (RTT)  Timeouts  Broken streams  Connection resets  Poor Internet connections in some areas Problems
  7.  Client-server problems became server-server ones  How we’re going

    to sync all the changes (both ways)?  What about deployments?  Do we have enough licenses?  What’s the best way to implement content sharding?  How long it will take to implement all of these things? When initial excitement was gone…
  8.  We can’t just cache more on dispatcher  This

    is a very well known problem  Let’s use the right tool to solve the problem the right way  Content Delivery Network (CDN) is the way to go! The road to CDN
  9. “(…) CDN is a large distributed system of servers deployed

    in multiple data centers across the Internet. The goal of a CDN is to serve content to end-users with high availability and high performance. CDNs serve a large fraction of the Internet content today (…).”, Wikipedia CDN definition
  10.  Pay-as-you-go model  Powered by Varnish  Highly customizable

    (ability to upload your own VCL)  150 ms to purge – globally  ~5 sec to change a config through the web API  SSD powered servers connected to T1 networks  Real-time insight what’s happening (graphs, logs, etc)  Great support Why Fastly?
  11.  grep, awk, sed - all of these are your

    friends  Count your requests  Leverage the power of log monitoring tools (ELK, Splunk, etc.)  Plan your content structure carefully Logs and content structure
  12.  If it is a GET request and starts with

    /bin/myapp/v[1-2]/a_string.json then it is X  All requests to /content/something/*/_jcr_content.zip end with 302 to /some/path/to/file.zip Request patterns
  13.  Public content  Private content  Content available for

    authorized users only Content groups/buckets
  14.  Reverse HTTP proxy  In-memory time based cache 

    Blazing-fast  Big “state” machine  Varnish Configuration Language (VCL)  Full control of HTTP flow Varnish in 1 slide!
  15.  Cacheable methods: GET, HEAD  Cacheable response codes: 

    200, 203  300, 301, 302  404, 410  “Cache-Control: private” if not defined otherwise General caching rules
  16.  3 request types  REST API request  Presentation

    request (ZIP files)  Image request iPad – HTTP flows
  17.  2 content groups  Private  For all authorized

    users  8 request patterns  TTL varies from 10 minutes to 7 days  35/65 dynamic/static content (frequently changing JSON files vs PDFs/PNGs)  All REST API responses are private iPad app content
  18.  Private content is cacheable  What makes HTTP response

    private?  It is tied up with user session – in other words HTTP request carried unique authorization cookie Private content
  19.  Varnish cache is a key-value store  Default key:

    req.url + req.http.host  req.url + req.http.host + sessionId = private cache space - voila! Private cache
  20.  Cache usually brings some trade-off  Updates won’t be

    instantaneous  TTL has to expire, or  a purge request has to be triggered  CDN is the way to go if you accept this delay Dynamic content
  21.  Fastly exposes purge REST API  Purge URL 

    Purge Key  Purge all assets marked with special “label”  https://www.fastly.com/blog/surrogate-keys-part-1  Purge All  Purge vs Soft Purge  https://www.fastly.com/blog/introducing-soft-purge Content purging
  22.  Presentation downloads  Europe: up to 21% faster 

    South America: up to 50% faster  APAC: up to 83% faster  API responses  Europe: up to 60% faster  South America: up to 40% faster  APAC: up to 55% faster Speed boost
  23.  Adding Set-Cookie to every response  Auth cookie is

    not revoked in the browser after logout  TBD Crimes against cacheability
  24. “iPad app performance is much better now! But we still

    have some issues with authoring. It is really slow in some countries.” Mr B. www.flickr.com/photos/r4vi/8640618489
  25.  I was rather skeptical  Way too dynamic to

    be considered cacheable?  What kind of improvement we might get? 5-10%? Is it worth it?  Don’t know how, but it has been decided to roll things out  CDN in front of authoring?
  26.  3 content groups  36 request patterns  TTL

    up to 14 days  Mostly dynamic + static web GUI resources  A lot of assets common for every logged in user CDN + AEM Author Request pattern Cachable? /apps/cq/core/content/login/.*(png|jpg|css|js)$ YES /libs/cq/i18n/dict.en.json YES /etc/.*\.(png|woff|css|js|jpg|gif|ttf|svg|eot|swf|ico)$ YES /cf#/content/myapp/en/about.html NO
  27.  CDN knows nothing about user session  The goal

    is to cache common content for successfully authorized users  Authorize them at the edge! Authorize at the edge
  28.  2nd auth cookie (token), readable by CDN  HMAC

    function  2 auth cookies are tied together  Reference implementation: https://github.com/fastly/token-functions  Private key shared between AEM and CDN  CDN can evaluate user session without request to AEM Auth tokens
  29.  Adding Set-Cookie to every response  Auth cookie is

    not revoked in the browser after logout  “Vary: Cookie” usage Crimes against cacheability
  30.  Does every deploy involve full CDN cache purge? 

    Nope!  iPad presentations are packaged in a ZIP file and versioned  Majority of authoring related cacheable assets stay untouched between deployments AEM deployments
  31.  Traffic growth is no longer an issue  Over

    2 TB monthly reaches CDN servers  ~5,5 million HTTP requests per month  just ~570 GB was passed through to AEM  License, budget and time savings  More than satisfying results  Very small changes in the AEM app itself  Happy client  Summary