Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The underappreciated power of content invalidation

The underappreciated power of content invalidation

Some say there are only two hard things in computer science: cache invalidation and naming things. The latter is a matter of personal taste, but keeping things up to date is an aspect many of us have to deal with on daily basis.

This is particularly important from content delivery standpoint. Whenever you log into AEM to activate your carefully crafted pages the expectations are rather straightforward - you want that to be delivered fast, regardless of user’s location or device type. This is where Content Delivery Network (CDN) comes into play.
AEM with CDN is a fairly common thing these days. The most basic implementation assumes that only static resources get cached - DAM assets, clientlibs and possibly pages that don’t change very often. You set a reasonably short TTL and call it a day. But what if that’s not enough? What if you don’t want to sacrifice dynamic content and keep objects cached as long as there was no update in AEM?

We decided to give “everything is cacheable” principle a try in a complex multi-AEM setup. It didn’t took us long to realize that content invalidation is as important or even more important than caching itself. You can treat it rough, but it will negatively impact user experience - neither authors nor users want to wait when change is required now.

Jakub Wądołowski

September 12, 2018
Tweet

More Decks by Jakub Wądołowski

Other Decks in Technology

Transcript

  1. APACHE SLING & FRIENDS TECH MEETUP 10-12 SEPTEMBER 2018 The

    underappreciated power of content invalidation Jakub Wądołowski, Cognifide (@jwadolowski)
  2. 2 https://flic.kr/p/c7a8iE There are only two hard things in Computer

    Science: cache invalidation and naming things. Phil Karlton
  3. What can be cached? 4 ▪ HTTP response (body &

    headers) ▪ What makes response cacheable? ▪ Static assets only? ▪ Body encoding matters ▪ gzip, deflate, br, … https://flic.kr/p/EWtcyL
  4. Caching layers 5 ▪ AEM (custom in-memory cache) ▪ Dispatcher

    ▪ CDN ▪ Web browser https://flic.kr/p/Vh2TXE
  5. Dispatcher invalidation 8 ▪ 3 primary techniques ▪ by folder

    level (statfile) ▪ TTLs ▪ resource only ▪ Extras ▪ refetching ▪ custom script https://flic.kr/p/mAYbSi
  6. 9

  7. 11

  8. Nothing fancy, huh? 12 ▪ Plenty of possibilities ▪ handle

    related resources ▪ invalidate CDN’s cache ▪ push data to external systems ▪ … https://flic.kr/p/qZygAC
  9. The plan 14 ▪ High hit ratio & cache coverage

    ▪ Long TTLs (30+ days) ▪ AEM content changes reflected quickly ▪ Precise invalidation https://flic.kr/p/7F4bHa
  10. Content view 15 ▪ Dispatcher mirrors AEM content structure ▪

    SEO-optimized customer-facing URLs https://flic.kr/p/7rstvj
  11. 19

  12. Invalidation APIs 20 ▪ Multiple endpoints ▪ feature/authentication differences ▪

    performance may vary ▪ Invalidation scope ▪ everything ▪ URL(s) ▪ tag https://flic.kr/p/dgfRgD
  13. DAM assets 21 ▪ Sounds trivial? ▪ Renditions ▪ Alt

    texts ▪ Non-standard properties https://flic.kr/p/9d2MRh
  14. Alt text (2/3) 27 ▪ Get all the pages that

    use given asset ▪ Query Builder API
  15. Vanity URLs 29 ▪ Each page can have a number

    of easy-to- remember URLs ▪ Those pages are cacheable too https://flic.kr/p/GDprQk
  16. Reusable components 33 ▪ Common elements (i.e. header) ▪ SDI

    (Sling Dynamic Include) to the rescue ▪ SSI (Server Side Includes) ▪ ESI (Edge Side Includes) https://flic.kr/p/dgghVK
  17. 34

  18. 35

  19. Tag-based invalidation (1/4) 36 ▪ Not everything can be wrapped

    into SSI/ESI ▪ Layout/templates ▪ HTML markup configs ▪ Is QueryBuilder the only option? https://flic.kr/p/25LNnfK
  20. 38

  21. Versioned clientlibs 42 ▪ Generally a good idea ▪ Does

    it fit well into CDN setup? https://flic.kr/p/9gTRKV
  22. 44

  23. We barely scratched the surface… 45 ▪ API throttling ▪

    Error handling ▪ Payload limits ▪ Non-obvious content references https://flic.kr/p/pSB1rM
  24. Lessons learned 46 ▪ purge all is dangerous ▪ make

    invalidation as precise as possible ▪ plan SSI/ESI since the very beginning ▪ QueryBuilder calls may require extra indexes https://flic.kr/p/dgge8U