Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Debugging rendering problems at scale

Debugging rendering problems at scale

BrightonSEO, July 2021 - To better understand a website's content search engines developed Web Rendering Services and are now able to render pages more or less like a normal user. Those Web Rendering Services are strictly connected to other phases of the crawling-indexing-ranking pipeline - if a rendering fails, it may affect all of them. In this session Giacomo will guide you through the process of understanding why rendering could be a problem also for non-Javascript pages, how to manually debug page rendering, the difference between understanding WRSs' capabilities and debugging problems on a website, and eventually how to test pages at scale.

Giacomo Zecchini

July 23, 2021
Tweet

More Decks by Giacomo Zecchini

Other Decks in Marketing & SEO

Transcript

  1. Debugging rendering problems at scale Giacomo Zecchini | Verve Search

    SLIDESHARE.NET/GIACOMOZECCHINI @GIACOMOZECCHINI
  2. Hi, I’m Giacomo. Technical Director at Verve Search. Technical background

    and previous experiences in development. @giacomozecchini #brightonSEO
  3. Today we are going to talk about rendering errors, the

    challenges of debugging at scale and a new approach to solve these issues. @giacomozecchini #brightonSEO
  4. The search engine's rendering process is very similar to Schrödinger's

    cat paradox. https://en.wikipedia.org/wiki/Schrödinger's_cat @giacomozecchini #brightonSEO
  5. A hypothetical cat page may be considered simultaneously both alive

    correctly rendered and dead not correctly rendered. @giacomozecchini #brightonSEO
  6. Search engines get web pages and put them in web

    rendering services. https://developers.google.com/search/docs/guides/javascript-seo-basics @giacomozecchini #brightonSEO
  7. Inside the web rendering services, the pages are rendered similarly

    to a browser. https://developers.google.com/search/docs/guides/javascript-seo-basics @giacomozecchini #brightonSEO
  8. Then, the search engines can extract all information they need

    from those rendered pages. https://developers.google.com/search/docs/guides/javascript-seo-basics @giacomozecchini #brightonSEO
  9. If you want to know more about this I’d suggest

    to watch Martin Splitt’s TechSEO Boost 2019 talk. https://www.youtube.com/watch?v=Qxd_d9m9vzo @giacomozecchini #brightonSEO
  10. What happens inside the web rendering services is something hidden

    from our eyes, like in a closed box. @giacomozecchini #brightonSEO
  11. You don’t know if a page has been correctly rendered

    until you check it manually. @giacomozecchini #brightonSEO
  12. Search engines are capable of rendering your pages and most

    of the time the process will be fine. @giacomozecchini #brightonSEO
  13. A page is “not correctly rendered” when is not possible

    for the WRS to get an asset or when an error blocks the process. @giacomozecchini #brightonSEO
  14. Fetch timeout @giacomozecchini #brightonSEO Crawler WRS Cache SEARCH ENGINE *

    this doesn’t seem very common, but it can happen * Icons made by Freepik from www.flaticon.com
  15. https/http mixed content @giacomozecchini #brightonSEO If your website has an

    HTTPS URL but one of the Javascript files has an HTTP URL and the HTTPS version is not available, the script won't be used!
  16. Cache mismatch, user permission for specific features (e.g. geolocation), service

    worker registration, Javascript syntax errors, etc. @giacomozecchini #brightonSEO
  17. If WRS can’t get your CSS the page layout won’t

    be correct and you may also have Mobile Usability issues. @giacomozecchini #brightonSEO
  18. If WRS can’t get or execute your JS files correctly,

    your page may be blank or broken. @giacomozecchini #brightonSEO
  19. Eventually, WRS may need to render again your page, which

    means slower indexing. @giacomozecchini #brightonSEO
  20. Manually checking page per page might work on very small

    websites. @giacomozecchini #brightonSEO
  21. When you start having a lot of pages.. That’s a

    problem! @giacomozecchini #brightonSEO
  22. You can prioritise and group pages with similar HTML and

    resources together, but.. @giacomozecchini #brightonSEO
  23. ..the rendering of a page can fail regardless of what

    happens to other similar pages. @giacomozecchini #brightonSEO
  24. You still have to manually check pages to be 100%

    sure those are correctly rendered. @giacomozecchini #brightonSEO
  25. Understanding what a Web Rendering Service can or can’t do

    is a one time task. @giacomozecchini #brightonSEO
  26. You can build a page with a specific feature and

    test it. If it works once it will work again on other pages. @giacomozecchini #brightonSEO
  27. When debugging issues you are not focusing on a single

    feature but on having an overall correct rendering. @giacomozecchini #brightonSEO
  28. But that’s not enough, you want more. For instance, Javascript

    console messages are coalesced and not shown. @giacomozecchini #brightonSEO
  29. Yes, you can get JavaScript console errors from the Mobile

    Friendly test or other live tests but it’s not the same! @giacomozecchini #brightonSEO
  30. Mobile-Friendly Test and the other live tests bypass the cache,

    have shorter timeouts, and few other differences. @giacomozecchini #brightonSEO
  31. I started my research by getting and printing the information

    I needed on the page with some Javascript, in a hidden <DIV>. @giacomozecchini #brightonSEO
  32. <html> … <div id="info" style="display:none"></div> … <script> … function getInformation(){

    // do stuff! } … var div = document.getElementById("info"); var p = document.createElement("p"); p.innerText = getInformation(); div.appendChild(p); … </script> … </html> @giacomozecchini #brightonSEO This prints the information you need in the DIV at rendering time and then you can get them in Search Console view crawled page HTML.
  33. But waiting for a page to be crawled, rendered and

    indexed again is time consuming and not scalable. @giacomozecchini #brightonSEO
  34. It’s a nice way of discovering new things but you

    still have to manually check all pages. @giacomozecchini #brightonSEO
  35. Then, I thought of using 1x1 px images, appending errors

    or information in the URL: https://www.example.com/image.jpg ?u=page_url&e=error @giacomozecchini #brightonSEO
  36. The idea was to look in the server access log

    and find all errors that occurred during the rendering. @giacomozecchini #brightonSEO
  37. The answer was always in front of my eyes: Javascript

    + POST requests! @giacomozecchini #brightonSEO
  38. Say welcome to the shiny new Search Engine Rendering Errors

    Logging framework! @giacomozecchini #brightonSEO
  39. @giacomozecchini #brightonSEO Crawler WRS Cache SEARCH ENGINE YOUR WEBSITE Search

    Engines download or use the cache of the resources they need to render your pages. * Icons made by Freepik from www.flaticon.com
  40. @giacomozecchini #brightonSEO CHROMIUM INSTANCE SEARCH ENGINE Crawler INTERNET During the

    rendering the website, WRS executes Javascript and downloads additional resources a website might need or request. * Icons made by Freepik from www.flaticon.com
  41. @giacomozecchini #brightonSEO CHROMIUM INSTANCE * Icons made by Freepik from

    www.flaticon.com SEARCH ENGINE Crawler SERVER What if one of those Javascript sends a non cacheable POST request to an external server?! POST REQUEST
  42. @giacomozecchini #brightonSEO There are multiple ways of sending POST requests

    in JS: Fetch API https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch Navigator.sendBeacon() https://developer.mozilla.org/en-US/docs/Web/API/Navigator/sendBeacon XMLHttpRequest.send() https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/send
  43. { "page":"https://www.example.com", "timestamp": 1592568000000, "category": "Fetch", "error": "https://www.example.com/style.css" } @giacomozecchini

    #brightonSEO The message (or beacon) contains the information you want to store in your database.
  44. @giacomozecchini #brightonSEO TIME URL CATEGORY ERROR 25/10/1985 09:00:00 https://www.example.com Fetch

    https://www.example.com/style.css 21/10/2015 07:28:00 https://www.example.com/about.html Fetch https://www.example.com/app.js 12/11/1955 06:38:00 https://www.example.com Javascript File: https://www.example.com/app.js Line: 3 Col: 2 Error: Uncaught ReferenceError: APP is not defined When you have everything in a database you can query the tables and do all your analysis. You can also have automatic alerts, etc.
  45. !! Warning !! Don’t use this code on your website,

    these are just (bad) examples. @giacomozecchini #brightonSEO
  46. <html> … <script> sendMessageToServer(); </script> … </html> @giacomozecchini #brightonSEO When

    the WRS executes the script, the function sends a message back to the server.
  47. Debugging example #2 Know if there is a problem downloading

    CSS or JS files @giacomozecchini #brightonSEO
  48. <html> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width,

    initial-scale=1"> <script> … window.addEventListener('error', function(err) { if (isDownloadError(err)){ sendMessageToServer(err); } }, true); … </script> … </head> … </html> @giacomozecchini #brightonSEO If there is an error and it's a CSS or JS load error you can send a message back to the server. This works for HTTP/DNS/Network errors, Robots.txt, fetch timeouts, etc.
  49. There are some products out there but all of them

    focus on users and not on search engines. @giacomozecchini #brightonSEO
  50. Search engines are different and you need to solve different

    problems. @giacomozecchini #brightonSEO
  51. Web Performance issues You don’t want to slow down the

    user experience with something you need only for search engines. @giacomozecchini #brightonSEO
  52. Web Performance issues Check for the User-Agent and run the

    script only for search engines. @giacomozecchini #brightonSEO
  53. Crawl budget You don’t want to consume your crawl budget

    on these requests. @giacomozecchini #brightonSEO
  54. Crawl budget Host your debugging server on a different domain

    or subdomain. @giacomozecchini #brightonSEO
  55. There are many other possible problems, you just need to

    find a solution for them. @giacomozecchini #brightonSEO
  56. The simpler a page is, the more chances it will

    render correctly. The majority of pages are just fine. @giacomozecchini #brightonSEO
  57. If you work on big or complex websites you may

    encounter rendering problems. @giacomozecchini #brightonSEO
  58. ..but, if you use the right approach you can cut

    down the time it takes. @giacomozecchini #brightonSEO
  59. You can use this approach as a one time debugging

    script to get more information or as a monitoring system. @giacomozecchini #brightonSEO