Upgrade to Pro — share decks privately, control downloads, hide ads and more …

(Almost) Everything You Need To Know About Crawling, Indexing, and Especially Rendering in Google

(Almost) Everything You Need To Know About Crawling, Indexing, and Especially Rendering in Google

Slides from my talk at Friends of Search 2022 in Amsterdam and Brussels where I spoke about crawling, indexing, and rendering in Google's search ecosystem.

Barry Adams

June 14, 2022
Tweet

More Decks by Barry Adams

Other Decks in Marketing & SEO

Transcript

  1. @badams #FOS22
    (Almost) Everything You Need To Know
    About Crawling, Indexing, and Rendering
    in Google
    Barry Adams
    June 2022

    View Slide

  2. @badams #FOS22
    What does Google do?

    View Slide

  3. @badams #FOS22
    Crawler Indexer Ranker
    Google Processes

    View Slide

  4. @badams #FOS22
    Crawler Indexer Ranker
    1. Crawler (Googlebot)

    View Slide

  5. @badams #FOS22
    @badams #FOS22
    Crawling: Discovery

    View Slide

  6. @badams #FOS22
    Crawling: Queue Management
    URL Deduplication

    View Slide

  7. @badams #FOS22
    Crawling: Queue Management
    Prioritisation & Scheduling

    View Slide

  8. @badams #FOS22
    @badams #FOS22
    Crawling: Fetch & Parse

    View Slide

  9. @badams #FOS22
    @badams #FOS22
    Crawl Politeness

    View Slide

  10. @badams #FOS22
    @badams #FOS22
    Optimise Crawling
    • Server Response Time

    View Slide

  11. @badams #FOS22
    @badams #FOS22
    GSC Crawl Stats

    View Slide

  12. @badams #FOS22
    @badams #FOS22
    Page Resource Load

    View Slide

  13. @badams #FOS22
    @badams #FOS22
    Googlebot & AdsBot

    View Slide

  14. @badams #FOS22
    @badams #FOS22
    Optimise Crawling
    • Serve correct HTTP status codes;
    ➢ 200 OK
    ➢ 301 / 302 Redirects
    ➢ 304 Not Modified
    ➢ 401 / 403 Permission Issues
    ➢ 404 / 410 Not Found/Gone
    ➢ 5xx Error

    View Slide

  15. @badams #FOS22
    @badams #FOS22
    Optimise Crawling
    • ALL resources consume crawl budget;
    ➢ Not just HTML pages
    ➢ Reduce HTTP requests per page

    View Slide

  16. @badams #FOS22
    @badams #FOS22
    Optimise Crawling
    • ALL resources consume crawl budget;
    ➢ Not just HTML pages
    ➢ Reduce HTTP requests per page
    • AdsBot can consume crawl budget;
    ➢ Double-check your Google Ads campaigns

    View Slide

  17. @badams #FOS22
    @badams #FOS22
    Optimise Crawling
    • ALL resources consume crawl budget;
    ➢ Not just HTML pages
    ➢ Reduce HTTP requests per page
    • AdsBot can consume crawl budget;
    ➢ Double-check your Google Ads campaigns
    • Link equity (PageRank) impacts crawl budget;
    ➢ More link equity = more crawl budget

    View Slide

  18. @badams #FOS22
    2. Indexer
    Crawler Indexer Ranker

    View Slide

  19. @badams #FOS22
    @badams #FOS22
    Two Stages* of Indexing
    Crawler
    Indexer
    Ranker
    1
    2
    *At least – indexing is a collection of interconnected processes

    View Slide

  20. @badams #FOS22
    @badams #FOS22
    Indexing: HTML Lexer & Tokenizer

    View Slide

  21. @badams #FOS22
    @badams #FOS22
    Indexing: Selection

    View Slide

  22. @badams #FOS22
    @badams #FOS22
    Indexing: HTML Source

    View Slide

  23. @badams #FOS22
    @badams #FOS22
    Indexing: Rendering

    View Slide

  24. @badams #FOS22
    Indexing: Index Integrity
    Deduplication & Canonicalisation

    View Slide

  25. @badams #FOS22
    @badams #FOS22
    Rendering

    View Slide

  26. @badams #FOS22
    @badams #FOS22
    Evergreen Chrome

    View Slide

  27. @badams #FOS22
    @badams #FOS22
    What happens during Rendering
    in your Browser?
    HTML
    CSS
    HTML
    Parser
    CSS
    Parser
    DOM
    Tree
    CSSOM
    Render
    Tree
    Painting Display
    Layout

    View Slide

  28. @badams #FOS22
    @badams #FOS22
    JavaScript
    HTML
    CSS
    HTML
    Parser
    CSS
    Parser
    DOM
    Tree
    CSSOM
    Render
    Tree
    Painting Display
    JavaScript
    Layout

    View Slide

  29. @badams #FOS22
    @badams #FOS22
    JavaScript…
    HTML
    CSS
    HTML
    Parser
    CSS
    Parser
    DOM
    Tree
    CSSOM
    Render
    Tree
    Painting Display
    JavaScript
    Layout
    JavaScript

    View Slide

  30. @badams #FOS22
    @badams #FOS22
    JavaScript…
    HTML
    CSS
    HTML
    Parser
    CSS
    Parser
    DOM
    Tree
    CSSOM
    Render
    Tree
    Painting Display
    JavaScript
    Layout
    JavaScript
    JavaScript

    View Slide

  31. @badams #FOS22
    @badams #FOS22
    JavaScript…
    HTML
    CSS
    HTML
    Parser
    CSS
    Parser
    DOM
    Tree
    CSSOM
    Render
    Tree
    Painting Display
    JavaScript
    Layout
    JavaScript
    JavaScript
    JavaScript

    View Slide

  32. @badams #FOS22
    @badams #FOS22
    Google’s Rendering as part of Indexing
    HTML
    CSS
    HTML
    Parser
    CSS
    Parser
    DOM
    Tree
    CSSOM
    Render
    Tree
    Painting Display
    JavaScript
    Layout
    JavaScript
    JavaScript

    View Slide

  33. @badams #FOS22
    @badams #FOS22
    Google does not perform actions

    View Slide

  34. @badams #FOS22
    Why Rendering?

    View Slide

  35. @badams #FOS22
    @badams #FOS22
    Raw HTML:

    View Slide

  36. @badams #FOS22
    @badams #FOS22
    Rendered DOM:

    View Slide

  37. @badams #FOS22
    @badams #FOS22
    Rendering allows Google to…
    • … load all meta data, content, and links on a webpage
    • … understand the page’s layout and content hierarchy
    • … evaluate the usability and quality of the webpage

    View Slide

  38. @badams #FOS22
    Rendering Issues

    View Slide

  39. @badams #FOS22
    @badams #FOS22
    Possible Rendering Issues in GSC

    View Slide

  40. @badams #FOS22
    @badams #FOS22
    Rendering Issues
    • Inaccessible Resources;
    ➢ Make sure all page resources can be crawled

    View Slide

  41. @badams #FOS22
    @badams #FOS22
    Rendering Issues
    • JavaScript inserts invalid HTML in the ;
    ➢ tags in the break Google’s processing of meta tags

    View Slide

  42. @badams #FOS22
    @badams #FOS22
    Rendering Issues
    • JavaScript inserts invalid HTML in the ;
    ➢ tags in the break Google’s processing of meta tags

    View Slide

  43. @badams #FOS22
    @badams #FOS22
    https://developers.google.com/search/docs/advanced/guidelines/valid-html

    View Slide

  44. @badams #FOS22
    @badams #FOS22
    Rendering Issues
    • HTML vs Render mismatch;
    ➢ Different content in raw HTML vs fully rendered page

    View Slide

  45. @badams #FOS22
    @badams #FOS22
    https://chrome.google.com/webstore/detail/view-rendered-
    source/ejgngohbdedoabanmclafpkoogegdpob

    View Slide

  46. @badams #FOS22
    @badams #FOS22
    SEO Crawlers Can Also Render

    View Slide

  47. @badams #FOS22
    @badams #FOS22
    Google Tools *ALWAYS* Render

    View Slide

  48. @badams #FOS22
    @badams #FOS22
    Optimise Rendering
    • Don’t rely on Google’s rendering;
    ➢ Use SSR & CDN caching
    • Minimise page weight;
    ➢ Fewer page resources = better use of crawl budget
    faster load speed & CWV
    less chance of rendering issues
    • Optimise your HTML source;
    ➢ Think about where tags exist and what they<br/>do when their code is executed<br/>

    View Slide

  49. @badams #FOS22
    @badams #FOS22
    Optimise Indexing
    • Optimise your page layouts;
    ➢ Prominent content & links are more valuable for users & Google
    • Improve internal linking;
    ➢ More PageRank = higher chance of indexing
    • Improve your content;
    ➢ Google has no obligation to index all your pages
    ➢ Make it worth Google’s while…

    View Slide

  50. @badams #FOS22
    Bypassing Rendering*
    with Edge SEO
    *sort of

    View Slide

  51. @badams #FOS22
    @badams #FOS22
    Edge SEO
    Your
    Webserver
    Cloud CDNs
    Users

    View Slide

  52. @badams #FOS22
    @badams #FOS22
    Edge SEO
    Your
    Webserver
    Cloud CDNs
    Googlebot

    View Slide

  53. @badams #FOS22
    @badams #FOS22
    Edge SEO
    Your
    Webserver
    Cloud CDNs
    Googlebot
    Change your
    webpages here

    View Slide

  54. @badams #FOS22
    @badams #FOS22
    Edge SEO
    • CDNs store cached versions of your webpages;
    ➢ Global coverage with edge nodes worldwide
    ➢ Usually also results in faster crawling and better CWV
    • You manipulate your CDN cached pages;
    ➢ Cloud Workers enable a range of functionality
    • Googlebot crawls & indexes the changed CDN-cached pages;
    ➢ Your ‘original’ website remains unchanged
    ➢ Google only sees the changed CDN webpages

    View Slide

  55. @badams #FOS22
    @badams #FOS22
    Why Edge SEO?
    • Faster deployment;
    ➢ Bypass your developers’ lengthy queues
    ➢ ‘Ask forgiveness, not permission’
    ➢ No reliance on client-side JavaScript
    • No CMS constraints;
    ➢ Change pages directly regardless of your CMS capabilities
    • Testing;
    ➢ Perform narrow tests on specific site sections
    ➢ A/B testing for SEO

    View Slide

  56. @badams #FOS22
    @badams #FOS22
    SEO A/B Split Testing

    View Slide

  57. @badams #FOS22
    @badams #FOS22
    SEO Split Testing Case Studies
    https://www.searchpilot.com/resources/newsletter/

    View Slide

  58. @badams #FOS22
    @badams #FOS22
    Barry Adams
    ➢ Doing SEO since 1998
    ➢ Specialist in Technical SEO & News SEO
    ➢ Newsletter: SEOforGoogleNews.com

    View Slide

  59. @badams #FOS22
    Thank You
    [email protected]
    @badams

    View Slide