Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Googlebot Renders

DDavydoff
September 20, 2019
540

How Googlebot Renders

DDavydoff

September 20, 2019
Tweet

Transcript

  1. Think like a bot,
    Rank like a boss:
    How Googlebot renders
    Jamie Alberico // Not a Robot
    SLIDESHARE.NET/JAMIEALBERICO
    @JAMMER_VOLTS

    View Slide

  2. Jamie Alberico
    My name means Usurper Elf
    King.
    I’m a Technical SEO, Search
    Advocate, & Wood Elf Druid.
    Oh yeah, and I’m Not a
    Robot.
    #brightonSEO @Jammer_Volts

    View Slide

  3. Masters of unlocking magic in everyday
    objects, Technical SEOs are extremely
    resourceful.
    They see magic as a complex system
    waiting to be decoded and controlled.
    Proficiencies (recommended)
    Chrome Developer Tools, Lighthouse,
    Google Search Console, webcrawlers
    Technical SEOs
    Class Details
    #brightonSEO @Jammer_Volts

    View Slide

  4. Our Technical
    SEO Quest
    To protect site visibility by delivering
    our content to Google’s index.
    To do this, we must pass
    through a powerful construct.
    #brightonSEO @Jammer_Volts

    View Slide

  5. When Googlebot retrieves your pages,
    Googlebot runs your code, and assess your
    content to understand the layout or structure of
    your site.
    What is Rendering?
    #brightonSEO @Jammer_Volts

    View Slide

  6. All information Google collects during the
    rendering process is then used to rank the quality
    and value of your site content against other sites
    and what people are searching for with Google
    Search.
    How Google Search Works, Search Console Help Center
    Rendering’s role in Rank
    #brightonSEO @Jammer_Volts

    View Slide

  7. Initial HTML
    (1st wave of indexing)
    Rendered HTML
    (2nd Wave of indexing)
    Rendering
    #brightonSEO @Jammer_Volts

    View Slide

  8. If Google cannot render the pages on
    your site, it becomes more difficult to
    understand your web content because we
    are missing key visual layout information
    about your web pages.
    As a result, the visibility of your site
    content in Google Search can suffer.
    Rendering Risks
    #brightonSEO @Jammer_Volts

    View Slide

  9. Until 2018, we thought our quest looked
    like this
    Crawl
    Index
    Rank
    #brightonSEO @Jammer_Volts

    View Slide

  10. Now, we know that Rendering is part of the process
    and that Google has two waves of indexing.
    Crawl Index
    Render
    Rank
    First Wave
    Second Wave
    #brightonSEO @Jammer_Volts

    View Slide

  11. If Google can’t render content, we fail our quest
    Crawl Index
    #brightonSEO @Jammer_Volts

    View Slide

  12. Google’s Web
    Rendering Service
    (WRS)
    Insight Check
    #brightonSEO @Jammer_Volts

    View Slide

  13. Google Web Rendering Service
    Large Construct (legendary), lawful neutral
    Languages HTML, CSS, JavaScript, Images
    Skills Perception +12, Dexterity +10
    Senses Robots.txt, Robots directives
    #brightonSEO @Jammer_Volts

    View Slide

  14. Takes action using threads
    Each requests to made by a thread. A thread is a single
    connection. It sequentially moves through each action,
    one at a time, until it’s task is complete.
    Features & Traits Actions Equipment
    #brightonSEO @Jammer_Volts

    View Slide

  15. SEOs call this Crawl Budget
    “Simply put, [crawl budget] represents the number of
    simultaneous parallel connections Googlebot may use to
    crawl the site, as well as the time it has to wait between
    the fetches.”
    What Crawl Budget Means for Googlebot, Google Webmaster Blog
    Features & Traits Actions Equipment
    #brightonSEO @Jammer_Volts

    View Slide

  16. Stateless
    ● Does not retain state across page loads
    ● Local Storage and Session Storage data are cleared
    across pages loads
    ● HTTP Cookies are cleared across page loads
    Features & Traits Actions Equipment
    #brightonSEO @Jammer_Volts

    View Slide

  17. Obedient
    Obeys HTML/HTML5 protocol
    Literal
    “Googlebot, go to the apothecary and buy a
    healing potion. If they have shields, buy 2. “
    Googlebot comes back with 2 potions.
    Features & Traits Actions Equipment
    #brightonSEO @Jammer_Volts

    View Slide

  18. Politeness is priority 0
    Crawling is its main priority while making sure it doesn't
    degrade the experience of users visiting the site. We call
    this the "crawl rate limit," which limits the maximum
    fetching rate for a given site.
    What Crawl Budget Means for Googlebot, Google Webmaster Central Blog
    Features & Traits Actions Equipment
    #brightonSEO @Jammer_Volts

    View Slide

  19. Multi-thread
    Googlebot can execute more than one request at a time
    if demand and server stability allows.
    Features & Traits Actions Equipment
    #brightonSEO @Jammer_Volts

    View Slide

  20. Request URI
    Googlebot send a request for content at a unique
    resource instance (URI).
    Googlebot can discover a URL
    via link or submission
    Features & Traits Actions Equipment
    #brightonSEO @Jammer_Volts

    View Slide

  21. Read HTTP response and headers
    Q. Does the thing I asked for exist?
    A. HTTP Status Codes
    Q. Anything I should know before looking at this?
    A. Cache-Control, and Directives
    Features & Traits Actions Equipment
    #brightonSEO @Jammer_Volts

    View Slide

  22. Parse
    Download the response from
    server
    Features & Traits Equipment Actions
    #brightonSEO @Jammer_Volts

    View Slide

  23. Identify Resources
    Googlebot identifies resources
    needed to complete the request.
    It feeds identified resources into
    the crawling queue.
    Features & Traits Actions Equipment
    Use Network tab to see how many
    resources a page calls
    #brightonSEO @Jammer_Volts

    View Slide

  24. Cache
    If the requested website implements a cache, a copy of
    the data is made or requested
    Features & Traits Actions Equipment
    #brightonSEO @Jammer_Volts

    View Slide

  25. Actions
    WRS, web rendering service
    Features & Traits Equipment
    Googlebot queues pages for both crawling and rendering. It is not
    immediately obvious when a page is waiting for crawling and when it is
    waiting for rendering.
    WRS is the name used to represent the collective elements involved in
    Google’s rendering service. Many details are not publically available.
    #brightonSEO @Jammer_Volts

    View Slide

  26. Web Rendering Service (WRS)
    Blink Browser Engine
    V8 Rendering Engine
    Ignition TurboFan Liftoff Display backend
    Google Magic
    Chromium Headless Browser
    #brightonSEO @Jammer_Volts

    View Slide

  27. Actions
    WRS process
    Features & Traits Equipment
    1. A URL is pulled from the crawl queue
    2. Googlebot requests the URL and downloads the initial HTML
    3. The Initial HTML is passed to the processing stage which extracts links
    4. Links go back on the crawl queue
    5. Once resources are crawled, the page queues for rendering
    #brightonSEO @Jammer_Volts

    View Slide

  28. Actions
    WRS process
    Features & Traits Equipment
    6. When resources become available, the request moves from the render
    queue to the renderer
    7. Renderer passes the rendered HTML back to processing
    8. Processing indexes the content
    9. Extracts links from the rendered HTML to put them into the crawl
    queue
    #brightonSEO @Jammer_Volts

    View Slide

  29. Chromium, headless browser
    Equipment
    Actions
    Features & Traits
    ● Headless means that there is no GUI (visual representation)
    ● Used to load web pages and extract metadata
    ● reading from and writing to the DOM
    ● observing network events
    ● capturing screenshots
    ● inspecting worker scripts
    ● recording Chrome Traces
    #brightonSEO @Jammer_Volts

    View Slide

  30. Blink, browser engine
    ● Allows for querying and manipulating the rendering
    engine settings (ex: mobile vs. desktop)
    ● Blink loves service workers. Blink may create multiple
    worker threads to run Web Workers, ServiceWorker
    and Worklets
    Equipment
    Actions
    Features & Traits
    #brightonSEO @Jammer_Volts

    View Slide

  31. Blink, browser engine
    Blink is responsible for 2 major elements:
    Memory heap: stores the result of script execution
    (Memory Heap results are added to DOM.)
    Call stack: queue of sequential next steps
    (Each entry in the call stack is called a Stack Frame.)
    Equipment
    Actions
    Features & Traits
    #brightonSEO @Jammer_Volts

    View Slide

  32. Blink, browser engine
    Local storage and Session storage are key-value pairs
    that can store any JS objects and functions in the
    browser
    These keys are a weak point in your rendering offense
    against a stateless Googlebot.
    Equipment
    Actions
    Features & Traits
    #brightonSEO @Jammer_Volts

    View Slide

  33. V8, JavaScript engine
    JavaScript is a single-threaded process and each entry or
    execution step is a stack frame.
    Googlebot can opt run simultaneous parallel
    connections.
    Equipment
    Actions
    Features & Traits
    #brightonSEO @Jammer_Volts

    View Slide

  34. V8, JavaScript engine
    Each thread will runs through a process of:
    1. Loading
    2. Parsing
    3. Compiling
    4. Executing
    Equipment
    Actions
    Features & Traits
    #brightonSEO @Jammer_Volts

    View Slide

  35. V8, JavaScript engine
    ● open-source JavaScript engine and WebAssembly
    engine
    ● developed by Google & The Chromium Project
    ● Use in Node.js, Google Chrome, and Chromium web
    browsers
    Equipment
    Actions
    Features & Traits
    #brightonSEO @Jammer_Volts

    View Slide

  36. V8’s components
    ● Ignition, a fast low-level register-based JavaScript
    interpreter written using the backend of TurboFan
    ● TurboFan, one of V8’s optimizing compilers
    ● Liftoff, a new baseline compiler for WebAssembly
    Equipment
    Actions
    Features & Traits
    #brightonSEO @Jammer_Volts

    View Slide

  37. Optimized Rendering
    Roll with Advantage
    #brightonSEO @Jammer_Volts

    View Slide

  38. Parse content critical to
    user intent in initial HTML
    #brightonSEO @Jammer_Volts

    View Slide

  39. Crawl
    Index
    Render
    HTML
    DOM
    1st wave of
    indexing
    2nd wave of
    indexing
    #brightonSEO @Jammer_Volts

    View Slide

  40. Critical = why the user came
    #brightonSEO @Jammer_Volts

    View Slide

  41. Define it for your site, by template
    #brightonSEO @Jammer_Volts

    View Slide

  42. Use clean, consistent signals
    Googlebot won’t see past a noindex directive in initial HTML
    to see an index placed in DOM.
    Duplicative content without a canonical in initial HTML is
    crawl waste until rendering.
    Inconsistent title tags and descriptions can result from
    overwriting the initial HTML with rendered HTML.
    #brightonSEO @Jammer_Volts

    View Slide

  43. Focus rendering efforts with nofollow
    If a resource is not valuable to the construction of the page,
    add a nofollow directive to resources that are not necessary
    or beneficial to page construction.
    #brightonSEO @Jammer_Volts

    View Slide

  44. Mobile vs Desktop Rendering
    Layout matters for both.
    If you want to rank for
    position zero, remember that
    the content must be exposed
    on initial mobile load.
    #brightonSEO @Jammer_Volts

    View Slide

  45. Choose the rendering strategy that’s
    right for your business and stack.
    You don’t have to be 100% client-side, 100% server-side, or
    100% both (dynamic).
    Load what matters when it matters.
    #brightonSEO @Jammer_Volts

    View Slide

  46. #brightonSEO @Jammer_Volts

    View Slide

  47. Rendering
    Challenges
    Survival Check
    #brightonSEO @Jammer_Volts

    View Slide

  48. #brightonSEO @Jammer_Volts

    View Slide

  49. Rendering
    &
    Performance
    DOM
    HTML
    Style
    Sheets
    HTML
    Parser
    CSS
    Parser
    DOM
    Tree
    Style
    Rules
    Render
    Tree
    Attachment
    Layout
    Painting Display
    TTFB
    TTI
    #brightonSEO @Jammer_Volts

    View Slide

  50. #brightonSEO @Jammer_Volts

    View Slide

  51. More Pages Resources require more
    rendering resources
    Each resource must be fetched independently before the
    page can be accurately rendered.
    This is a major part of the issue with client-side rendering.
    More client-side calls mean more blindspots for you.
    #brightonSEO @Jammer_Volts

    View Slide

  52. Excessive scripts
    runs the risk of
    hitting thread/rest
    thresholds.
    This is most often
    observed as Other
    error .
    #brightonSEO @Jammer_Volts

    View Slide

  53. Call Stacks have a maximum size
    While the Call Stack has functions to execute, the browser
    can’t actually do anything else — it’s getting blocked.
    #brightonSEO @Jammer_Volts

    View Slide

  54. Session and Local web storage limits
    5MB per object, and 50MB per system
    If your CSR resources are too large, you risk hitting the upper
    limit. Elements in queue once the limit is reached may not be
    considered by Googlebot.
    #brightonSEO @Jammer_Volts

    View Slide

  55. Load scripts & images without blocking
    Asynchronous calls are supported with async attributes

    Lazy load images in Chrome with native attributes

    #brightonSEO @Jammer_Volts

    View Slide

  56. Broken Structured Data Markup
    #brightonSEO @Jammer_Volts

    View Slide

  57. Don’t trust document.write( )
    Dynamic code (such as script elements containing
    document.write() calls) can add extra tokens, so the parsing
    process actually modifies the input.
    #brightonSEO @Jammer_Volts

    View Slide

  58. Render Testing
    Check for traps
    #brightonSEO @Jammer_Volts

    View Slide

  59. Test local/firewalled with tunneling
    SimpleHTTPServer (http.server in Python 3) is a Simple HTTP
    request handler for QA
    #brightonSEO @Jammer_Volts

    View Slide

  60. Test local or firewalled with tunneling
    ngrok exposes that page on a publicly accessible URL
    #brightonSEO @Jammer_Volts

    View Slide

  61. #brightonSEO @Jammer_Volts

    View Slide

  62. #brightonSEO @Jammer_Volts

    View Slide

  63. #brightonSEO @Jammer_Volts

    View Slide

  64. #brightonSEO @Jammer_Volts
    Make Allies

    View Slide

  65. | ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄|
    DON'T BE AFRAID
    TO LEARN IN PUBLIC
    |___________|
    (\__/) ||
    (•ㅅ•) ||
    /   づ
    #brightonSEO @Jammer_Volts

    View Slide

  66. Resources
    ● Get started with Chrome Developer
    Tools
    ● HTML/HTML5 Parsing Standards
    ● Debugging your pages
    ● SimpleHTTPServer
    ● Ngrok
    ● Fix Search-related JavaScript
    problems
    ● TurboFan overview
    ● Liftover overview
    ● Tame the Bots Portals
    ● Blink Rendering, life of a pixel
    ● The Rendering Critical Path
    ● JavaScript Sites in Search Working Group
    #brightonSEO @Jammer_Volts

    View Slide