Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Googlebot Renders

DDavydoff
September 20, 2019
710

How Googlebot Renders

DDavydoff

September 20, 2019
Tweet

Transcript

  1. Think like a bot, Rank like a boss: How Googlebot

    renders Jamie Alberico // Not a Robot SLIDESHARE.NET/JAMIEALBERICO @JAMMER_VOLTS
  2. Jamie Alberico My name means Usurper Elf King. I’m a

    Technical SEO, Search Advocate, & Wood Elf Druid. Oh yeah, and I’m Not a Robot. #brightonSEO @Jammer_Volts
  3. Masters of unlocking magic in everyday objects, Technical SEOs are

    extremely resourceful. They see magic as a complex system waiting to be decoded and controlled. Proficiencies (recommended) Chrome Developer Tools, Lighthouse, Google Search Console, webcrawlers Technical SEOs Class Details #brightonSEO @Jammer_Volts
  4. Our Technical SEO Quest To protect site visibility by delivering

    our content to Google’s index. To do this, we must pass through a powerful construct. #brightonSEO @Jammer_Volts
  5. When Googlebot retrieves your pages, Googlebot runs your code, and

    assess your content to understand the layout or structure of your site. What is Rendering? #brightonSEO @Jammer_Volts
  6. All information Google collects during the rendering process is then

    used to rank the quality and value of your site content against other sites and what people are searching for with Google Search. How Google Search Works, Search Console Help Center Rendering’s role in Rank #brightonSEO @Jammer_Volts
  7. Initial HTML (1st wave of indexing) Rendered HTML (2nd Wave

    of indexing) Rendering #brightonSEO @Jammer_Volts
  8. If Google cannot render the pages on your site, it

    becomes more difficult to understand your web content because we are missing key visual layout information about your web pages. As a result, the visibility of your site content in Google Search can suffer. Rendering Risks #brightonSEO @Jammer_Volts
  9. Until 2018, we thought our quest looked like this Crawl

    Index Rank #brightonSEO @Jammer_Volts
  10. Now, we know that Rendering is part of the process

    and that Google has two waves of indexing. Crawl Index Render Rank First Wave Second Wave #brightonSEO @Jammer_Volts
  11. Google Web Rendering Service Large Construct (legendary), lawful neutral Languages

    HTML, CSS, JavaScript, Images Skills Perception +12, Dexterity +10 Senses Robots.txt, Robots directives #brightonSEO @Jammer_Volts
  12. Takes action using threads Each requests to made by a

    thread. A thread is a single connection. It sequentially moves through each action, one at a time, until it’s task is complete. Features & Traits Actions Equipment #brightonSEO @Jammer_Volts
  13. SEOs call this Crawl Budget “Simply put, [crawl budget] represents

    the number of simultaneous parallel connections Googlebot may use to crawl the site, as well as the time it has to wait between the fetches.” What Crawl Budget Means for Googlebot, Google Webmaster Blog Features & Traits Actions Equipment #brightonSEO @Jammer_Volts
  14. Stateless • Does not retain state across page loads •

    Local Storage and Session Storage data are cleared across pages loads • HTTP Cookies are cleared across page loads Features & Traits Actions Equipment #brightonSEO @Jammer_Volts
  15. Obedient Obeys HTML/HTML5 protocol Literal “Googlebot, go to the apothecary

    and buy a healing potion. If they have shields, buy 2. “ Googlebot comes back with 2 potions. Features & Traits Actions Equipment #brightonSEO @Jammer_Volts
  16. Politeness is priority 0 Crawling is its main priority while

    making sure it doesn't degrade the experience of users visiting the site. We call this the "crawl rate limit," which limits the maximum fetching rate for a given site. What Crawl Budget Means for Googlebot, Google Webmaster Central Blog Features & Traits Actions Equipment #brightonSEO @Jammer_Volts
  17. Multi-thread Googlebot can execute more than one request at a

    time if demand and server stability allows. Features & Traits Actions Equipment #brightonSEO @Jammer_Volts
  18. Request URI Googlebot send a request for content at a

    unique resource instance (URI). Googlebot can discover a URL via link or submission Features & Traits Actions Equipment #brightonSEO @Jammer_Volts
  19. Read HTTP response and headers Q. Does the thing I

    asked for exist? A. HTTP Status Codes Q. Anything I should know before looking at this? A. Cache-Control, and Directives Features & Traits Actions Equipment #brightonSEO @Jammer_Volts
  20. Identify Resources Googlebot identifies resources needed to complete the request.

    It feeds identified resources into the crawling queue. Features & Traits Actions Equipment Use Network tab to see how many resources a page calls #brightonSEO @Jammer_Volts
  21. Cache If the requested website implements a cache, a copy

    of the data is made or requested Features & Traits Actions Equipment #brightonSEO @Jammer_Volts
  22. Actions WRS, web rendering service Features & Traits Equipment Googlebot

    queues pages for both crawling and rendering. It is not immediately obvious when a page is waiting for crawling and when it is waiting for rendering. WRS is the name used to represent the collective elements involved in Google’s rendering service. Many details are not publically available. #brightonSEO @Jammer_Volts
  23. Web Rendering Service (WRS) Blink Browser Engine V8 Rendering Engine

    Ignition TurboFan Liftoff Display backend Google Magic Chromium Headless Browser #brightonSEO @Jammer_Volts
  24. Actions WRS process Features & Traits Equipment 1. A URL

    is pulled from the crawl queue 2. Googlebot requests the URL and downloads the initial HTML 3. The Initial HTML is passed to the processing stage which extracts links 4. Links go back on the crawl queue 5. Once resources are crawled, the page queues for rendering #brightonSEO @Jammer_Volts
  25. Actions WRS process Features & Traits Equipment 6. When resources

    become available, the request moves from the render queue to the renderer 7. Renderer passes the rendered HTML back to processing 8. Processing indexes the content 9. Extracts links from the rendered HTML to put them into the crawl queue #brightonSEO @Jammer_Volts
  26. Chromium, headless browser Equipment Actions Features & Traits • Headless

    means that there is no GUI (visual representation) • Used to load web pages and extract metadata • reading from and writing to the DOM • observing network events • capturing screenshots • inspecting worker scripts • recording Chrome Traces #brightonSEO @Jammer_Volts
  27. Blink, browser engine • Allows for querying and manipulating the

    rendering engine settings (ex: mobile vs. desktop) • Blink loves service workers. Blink may create multiple worker threads to run Web Workers, ServiceWorker and Worklets Equipment Actions Features & Traits #brightonSEO @Jammer_Volts
  28. Blink, browser engine Blink is responsible for 2 major elements:

    Memory heap: stores the result of script execution (Memory Heap results are added to DOM.) Call stack: queue of sequential next steps (Each entry in the call stack is called a Stack Frame.) Equipment Actions Features & Traits #brightonSEO @Jammer_Volts
  29. Blink, browser engine Local storage and Session storage are key-value

    pairs that can store any JS objects and functions in the browser These keys are a weak point in your rendering offense against a stateless Googlebot. Equipment Actions Features & Traits #brightonSEO @Jammer_Volts
  30. V8, JavaScript engine JavaScript is a single-threaded process and each

    entry or execution step is a stack frame. Googlebot can opt run simultaneous parallel connections. Equipment Actions Features & Traits #brightonSEO @Jammer_Volts
  31. V8, JavaScript engine Each thread will runs through a process

    of: 1. Loading 2. Parsing 3. Compiling 4. Executing Equipment Actions Features & Traits #brightonSEO @Jammer_Volts
  32. V8, JavaScript engine • open-source JavaScript engine and WebAssembly engine

    • developed by Google & The Chromium Project • Use in Node.js, Google Chrome, and Chromium web browsers Equipment Actions Features & Traits #brightonSEO @Jammer_Volts
  33. V8’s components • Ignition, a fast low-level register-based JavaScript interpreter

    written using the backend of TurboFan • TurboFan, one of V8’s optimizing compilers • Liftoff, a new baseline compiler for WebAssembly Equipment Actions Features & Traits #brightonSEO @Jammer_Volts
  34. Crawl Index Render HTML DOM 1st wave of indexing 2nd

    wave of indexing #brightonSEO @Jammer_Volts
  35. Use clean, consistent signals Googlebot won’t see past a noindex

    directive in initial HTML to see an index placed in DOM. Duplicative content without a canonical in initial HTML is crawl waste until rendering. Inconsistent title tags and descriptions can result from overwriting the initial HTML with rendered HTML. #brightonSEO @Jammer_Volts
  36. Focus rendering efforts with nofollow If a resource is not

    valuable to the construction of the page, add a nofollow directive to resources that are not necessary or beneficial to page construction. #brightonSEO @Jammer_Volts
  37. Mobile vs Desktop Rendering Layout matters for both. If you

    want to rank for position zero, remember that the content must be exposed on initial mobile load. #brightonSEO @Jammer_Volts
  38. Choose the rendering strategy that’s right for your business and

    stack. You don’t have to be 100% client-side, 100% server-side, or 100% both (dynamic). Load what matters when it matters. #brightonSEO @Jammer_Volts
  39. Rendering & Performance DOM HTML Style Sheets HTML Parser CSS

    Parser DOM Tree Style Rules Render Tree Attachment Layout Painting Display TTFB TTI #brightonSEO @Jammer_Volts
  40. More Pages Resources require more rendering resources Each resource must

    be fetched independently before the page can be accurately rendered. This is a major part of the issue with client-side rendering. More client-side calls mean more blindspots for you. #brightonSEO @Jammer_Volts
  41. Excessive scripts runs the risk of hitting thread/rest thresholds. This

    is most often observed as Other error . #brightonSEO @Jammer_Volts
  42. Call Stacks have a maximum size While the Call Stack

    has functions to execute, the browser can’t actually do anything else — it’s getting blocked. #brightonSEO @Jammer_Volts
  43. Session and Local web storage limits 5MB per object, and

    50MB per system If your CSR resources are too large, you risk hitting the upper limit. Elements in queue once the limit is reached may not be considered by Googlebot. #brightonSEO @Jammer_Volts
  44. Load scripts & images without blocking Asynchronous calls are supported

    with async attributes <rel=”myscript.js” async defer> Lazy load images in Chrome with native attributes <img src=”the-traveler.jpg” loading=”lazy”> #brightonSEO @Jammer_Volts
  45. Don’t trust document.write( ) Dynamic code (such as script elements

    containing document.write() calls) can add extra tokens, so the parsing process actually modifies the input. #brightonSEO @Jammer_Volts
  46. Test local/firewalled with tunneling SimpleHTTPServer (http.server in Python 3) is

    a Simple HTTP request handler for QA #brightonSEO @Jammer_Volts
  47. Test local or firewalled with tunneling ngrok exposes that page

    on a publicly accessible URL #brightonSEO @Jammer_Volts
  48. Resources • Get started with Chrome Developer Tools • HTML/HTML5

    Parsing Standards • Debugging your pages • SimpleHTTPServer • Ngrok • Fix Search-related JavaScript problems • TurboFan overview • Liftover overview • Tame the Bots Portals • Blink Rendering, life of a pixel • The Rendering Critical Path • JavaScript Sites in Search Working Group #brightonSEO @Jammer_Volts