Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The curious case of misrendered JSON

The curious case of misrendered JSON

Have you ever felt that AEM is playing tricks on you? I bet you have. This is one of those stories - the more you dig in, the less you understand.

That was a classic headless setup - AEM’s role was to expose content via REST-like JSON API so various applications can consume the data. Things were running smoothly, but one day a misrendered JSON shows up - a set of mandatory properties just disappeared from the object. The case got thoroughly investigated, however no one could even reproduce it. Nothing has changed at JCR level, there was no deployment in the meantime and when you visit the exact same URL all the data are correct. “Oh, that must have been a one-off incident” someone said. The ticket gets closed and life goes on. A week after similar issue got reported - a different JSON object is broken this time, but at least you can reproduce it. Unfortunately, an hour later the problem magically goes away. Time passes by and a slightly different variant of the problem surfaces in production - you keep requesting affected URL and the response alternates between completely valid JSON and its broken form. Your team hops on a call to get to the bottom of the problem, but in a matter of minutes it just vanishes without a trace again.

Interested in what happened and where we ended up? That’s what the talk’s going to be about.

Jakub Wądołowski

September 25, 2023
Tweet

More Decks by Jakub Wądołowski

Other Decks in Technology

Transcript

  1. EUROPE'S LEADING AEM DEVELOPER CONFERENCE 25th – 27th SEPTEMBER 2023

    The curious case of misrendered JSON Jakub Wądołowski, diva-e (@jwadolowski)
  2. Project context ▪ e-commerce platform ▪ AEMaaCS as headless content

    API (JSON)* ▪ Sling Model Exporter ▪ Hyperlinked API (driven by content structure) ▪ Custom SPA Editor (WYSIWYG) https://flic.kr/p/xJi1En
  3. June 2021 “Approx. 29% of AEM requests end with misrendered

    JSON and 200 status code” https://flic.kr/p/9bUbH3
  4. Missing JSON data reasons ▪ Content issues ▪ Deployment /

    caching issue ▪ AEM downtime / maintenance https://flic.kr/p/bJoDYk
  5. Troubleshooting phase ▪ Not reproducible on local AEM ▪ Debug

    loggers / headers ▪ Page (re)activation solves the problem ▪ Cache bypassing tricks ▪ Ongoing monitoring https://flic.kr/p/c5xUxS
  6. Analysis results ▪ Incomplete AEMaaCS logs (Loki 2.3 issue) ▪

    No relevant log entries ▪ New endpoints with invalid JSON ▪ Bypassed cache == issue is gone* ▪ Warmup service goes first* https://flic.kr/p/dgfRgD
  7. AEMaaCS startup and warmup service ▪ AEMaaCS /systemready probe ▪

    Warmup service ▪ goal: pre-populate dispatcher cache ▪ requests the most popular URLs (last 24h) ▪ Host header is taken into account https://flic.kr/p/xGJnBw
  8. New hope ▪ SKYOPS-16686 (~mid July 2021) ▪ dead end

    ▪ SKYOPS-17857 (~mid Aug 2021) ▪ delivered early Sep 2021 https://flic.kr/p/8JhwbU
  9. Over the finish line ▪ AEMaaCS 2021.8.5755 includes the fix

    ▪ Dispatcher caching got re-enabled ▪ No JSON issues for 2 weeks - let’s celebrate! https://flic.kr/p/5bdFXt
  10. Asset data vanished ▪ “skyline-service-warmup” goes first again ▪ Affected

    1 out of 2 publish instances ▪ Duration: ~6 hours ▪ (Re)activation fixes everything https://flic.kr/p/aH4q2
  11. Race conditions ▪ Publish isn’t fully ready when warmup starts

    ▪ Dynamic Media fault? ▪ Does “Accept-Encoding” matter? ▪ How /systemready works? https://flic.kr/p/xL4C1D
  12. The naive loop • (Re)start local AEM instance • AEM

    state snapshots • /systemready • OSGi bundles/components/services • misbehaving JSON https://flic.kr/p/deCibc
  13. AEM startup stages ▪ #1 ▪ /systemready 503 ▪ JSON

    404 ▪ #2 ▪ /systemready 503 ▪ JSON 200 + invalid data ▪ #3 ▪ /systemready 503 ▪ JSON 200 + correct data ▪ #4 ▪ /systemready 200 ▪ JSON 200 + correct data https://flic.kr/p/cmoAhY
  14. Healthchecks ▪ Let’s create a custom one! ▪ Extend the

    built-in healthcheck config https://flic.kr/p/2me55Vo
  15. Warning! ▪ Nonexistent component on the list ▪ Mind refactored

    / renamed components https://flic.kr/p/5nmVJy
  16. December 2021 / January 2022 ▪ Component list fine-tuning ▪

    It’s finally over! https://flic.kr/p/qPu4DZ
  17. Next steps ▪ SLING-11569 (released in Sling 2.11.0) ▪ AEMaaCS

    2022.12.10488 (Sling 2.12.2) https://flic.kr/p/LA2Ta
  18. Asset link hurdles ▪ DM is not enabled everywhere ▪

    Enabled DM implies blocked DAM access (404) ▪ /conf/global/settings/dms7enabled - bad idea! ▪ Per asset DM detection (dam:scene7File) ▪ Dedicated option for DAM fallback https://flic.kr/p/CyXpjV
  19. June 2023 ▪ JSON cutoff strikes back ▪ This time

    in Single Page Editor (AEM Author) ▪ Beware of AEMaaCS auto-upgrades https://flic.kr/p/dypRQZ