How Googlebot Renders - Speaker Deck

Slide 1

Slide 1 text

Think like a bot, Rank like a boss: How Googlebot renders Jamie Alberico // Not a Robot SLIDESHARE.NET/JAMIEALBERICO @JAMMER_VOLTS

Slide 2

Slide 2 text

Jamie Alberico My name means Usurper Elf King. I’m a Technical SEO, Search Advocate, & Wood Elf Druid. Oh yeah, and I’m Not a Robot. #brightonSEO @Jammer_Volts

Slide 3

Slide 3 text

Masters of unlocking magic in everyday objects, Technical SEOs are extremely resourceful. They see magic as a complex system waiting to be decoded and controlled. Proﬁciencies (recommended) Chrome Developer Tools, Lighthouse, Google Search Console, webcrawlers Technical SEOs Class Details #brightonSEO @Jammer_Volts

Slide 4

Slide 4 text

Our Technical SEO Quest To protect site visibility by delivering our content to Google’s index. To do this, we must pass through a powerful construct. #brightonSEO @Jammer_Volts

Slide 5

Slide 5 text

When Googlebot retrieves your pages, Googlebot runs your code, and assess your content to understand the layout or structure of your site. What is Rendering? #brightonSEO @Jammer_Volts

Slide 6

Slide 6 text

All information Google collects during the rendering process is then used to rank the quality and value of your site content against other sites and what people are searching for with Google Search. How Google Search Works, Search Console Help Center Rendering’s role in Rank #brightonSEO @Jammer_Volts

Slide 7

Slide 7 text

Initial HTML (1st wave of indexing) Rendered HTML (2nd Wave of indexing) Rendering #brightonSEO @Jammer_Volts

Slide 8

Slide 8 text

If Google cannot render the pages on your site, it becomes more diﬃcult to understand your web content because we are missing key visual layout information about your web pages. As a result, the visibility of your site content in Google Search can suffer. Rendering Risks #brightonSEO @Jammer_Volts

Slide 9

Slide 9 text

Until 2018, we thought our quest looked like this Crawl Index Rank #brightonSEO @Jammer_Volts

Slide 10

Slide 10 text

Now, we know that Rendering is part of the process and that Google has two waves of indexing. Crawl Index Render Rank First Wave Second Wave #brightonSEO @Jammer_Volts

Slide 11

Slide 11 text

If Google can’t render content, we fail our quest Crawl Index #brightonSEO @Jammer_Volts

Slide 12

Slide 12 text

Google’s Web Rendering Service (WRS) Insight Check #brightonSEO @Jammer_Volts

Slide 13

Slide 13 text

Google Web Rendering Service Large Construct (legendary), lawful neutral Languages HTML, CSS, JavaScript, Images Skills Perception +12, Dexterity +10 Senses Robots.txt, Robots directives #brightonSEO @Jammer_Volts

Slide 14

Slide 14 text

Takes action using threads Each requests to made by a thread. A thread is a single connection. It sequentially moves through each action, one at a time, until it’s task is complete. Features & Traits Actions Equipment #brightonSEO @Jammer_Volts

Slide 15

Slide 15 text

SEOs call this Crawl Budget “Simply put, [crawl budget] represents the number of simultaneous parallel connections Googlebot may use to crawl the site, as well as the time it has to wait between the fetches.” What Crawl Budget Means for Googlebot, Google Webmaster Blog Features & Traits Actions Equipment #brightonSEO @Jammer_Volts

Slide 16

Slide 16 text

Stateless ● Does not retain state across page loads ● Local Storage and Session Storage data are cleared across pages loads ● HTTP Cookies are cleared across page loads Features & Traits Actions Equipment #brightonSEO @Jammer_Volts

Slide 17

Slide 17 text

Obedient Obeys HTML/HTML5 protocol Literal “Googlebot, go to the apothecary and buy a healing potion. If they have shields, buy 2. “ Googlebot comes back with 2 potions. Features & Traits Actions Equipment #brightonSEO @Jammer_Volts

Slide 18

Slide 18 text

Politeness is priority 0 Crawling is its main priority while making sure it doesn't degrade the experience of users visiting the site. We call this the "crawl rate limit," which limits the maximum fetching rate for a given site. What Crawl Budget Means for Googlebot, Google Webmaster Central Blog Features & Traits Actions Equipment #brightonSEO @Jammer_Volts

Slide 19

Slide 19 text

Multi-thread Googlebot can execute more than one request at a time if demand and server stability allows. Features & Traits Actions Equipment #brightonSEO @Jammer_Volts

Slide 20

Slide 20 text

Request URI Googlebot send a request for content at a unique resource instance (URI). Googlebot can discover a URL via link or submission Features & Traits Actions Equipment #brightonSEO @Jammer_Volts

Slide 21

Slide 21 text

Read HTTP response and headers Q. Does the thing I asked for exist? A. HTTP Status Codes Q. Anything I should know before looking at this? A. Cache-Control, and Directives Features & Traits Actions Equipment #brightonSEO @Jammer_Volts

Slide 22

Slide 22 text

Parse Download the response from server Features & Traits Equipment Actions #brightonSEO @Jammer_Volts

Slide 23

Slide 23 text

Identify Resources Googlebot identiﬁes resources needed to complete the request. It feeds identiﬁed resources into the crawling queue. Features & Traits Actions Equipment Use Network tab to see how many resources a page calls #brightonSEO @Jammer_Volts

Slide 24

Slide 24 text

Cache If the requested website implements a cache, a copy of the data is made or requested Features & Traits Actions Equipment #brightonSEO @Jammer_Volts

Slide 25

Slide 25 text

Actions WRS, web rendering service Features & Traits Equipment Googlebot queues pages for both crawling and rendering. It is not immediately obvious when a page is waiting for crawling and when it is waiting for rendering. WRS is the name used to represent the collective elements involved in Google’s rendering service. Many details are not publically available. #brightonSEO @Jammer_Volts

Slide 26

Slide 26 text

Web Rendering Service (WRS) Blink Browser Engine V8 Rendering Engine Ignition TurboFan Liftoff Display backend Google Magic Chromium Headless Browser #brightonSEO @Jammer_Volts

Slide 27

Slide 27 text

Actions WRS process Features & Traits Equipment 1. A URL is pulled from the crawl queue 2. Googlebot requests the URL and downloads the initial HTML 3. The Initial HTML is passed to the processing stage which extracts links 4. Links go back on the crawl queue 5. Once resources are crawled, the page queues for rendering #brightonSEO @Jammer_Volts

Slide 28

Slide 28 text

Actions WRS process Features & Traits Equipment 6. When resources become available, the request moves from the render queue to the renderer 7. Renderer passes the rendered HTML back to processing 8. Processing indexes the content 9. Extracts links from the rendered HTML to put them into the crawl queue #brightonSEO @Jammer_Volts

Slide 29

Slide 29 text

Chromium, headless browser Equipment Actions Features & Traits ● Headless means that there is no GUI (visual representation) ● Used to load web pages and extract metadata ● reading from and writing to the DOM ● observing network events ● capturing screenshots ● inspecting worker scripts ● recording Chrome Traces #brightonSEO @Jammer_Volts

Slide 30

Slide 30 text

Blink, browser engine ● Allows for querying and manipulating the rendering engine settings (ex: mobile vs. desktop) ● Blink loves service workers. Blink may create multiple worker threads to run Web Workers, ServiceWorker and Worklets Equipment Actions Features & Traits #brightonSEO @Jammer_Volts

Slide 31

Slide 31 text

Blink, browser engine Blink is responsible for 2 major elements: Memory heap: stores the result of script execution (Memory Heap results are added to DOM.) Call stack: queue of sequential next steps (Each entry in the call stack is called a Stack Frame.) Equipment Actions Features & Traits #brightonSEO @Jammer_Volts

Slide 32

Slide 32 text

Blink, browser engine Local storage and Session storage are key-value pairs that can store any JS objects and functions in the browser These keys are a weak point in your rendering offense against a stateless Googlebot. Equipment Actions Features & Traits #brightonSEO @Jammer_Volts

Slide 33

Slide 33 text

V8, JavaScript engine JavaScript is a single-threaded process and each entry or execution step is a stack frame. Googlebot can opt run simultaneous parallel connections. Equipment Actions Features & Traits #brightonSEO @Jammer_Volts

Slide 34

Slide 34 text

V8, JavaScript engine Each thread will runs through a process of: 1. Loading 2. Parsing 3. Compiling 4. Executing Equipment Actions Features & Traits #brightonSEO @Jammer_Volts

Slide 35

Slide 35 text

V8, JavaScript engine ● open-source JavaScript engine and WebAssembly engine ● developed by Google & The Chromium Project ● Use in Node.js, Google Chrome, and Chromium web browsers Equipment Actions Features & Traits #brightonSEO @Jammer_Volts

Slide 36

Slide 36 text

V8’s components ● Ignition, a fast low-level register-based JavaScript interpreter written using the backend of TurboFan ● TurboFan, one of V8’s optimizing compilers ● Liftoff, a new baseline compiler for WebAssembly Equipment Actions Features & Traits #brightonSEO @Jammer_Volts

Slide 37

Slide 37 text

Optimized Rendering Roll with Advantage #brightonSEO @Jammer_Volts

Slide 38

Slide 38 text

Parse content critical to user intent in initial HTML #brightonSEO @Jammer_Volts

Slide 39

Slide 39 text

Crawl Index Render HTML DOM 1st wave of indexing 2nd wave of indexing #brightonSEO @Jammer_Volts

Slide 40

Slide 40 text

Critical = why the user came #brightonSEO @Jammer_Volts

Slide 41

Slide 41 text

Deﬁne it for your site, by template #brightonSEO @Jammer_Volts

Slide 42

Slide 42 text

Use clean, consistent signals Googlebot won’t see past a noindex directive in initial HTML to see an index placed in DOM. Duplicative content without a canonical in initial HTML is crawl waste until rendering. Inconsistent title tags and descriptions can result from overwriting the initial HTML with rendered HTML. #brightonSEO @Jammer_Volts

Slide 43

Slide 43 text

Focus rendering efforts with nofollow If a resource is not valuable to the construction of the page, add a nofollow directive to resources that are not necessary or beneﬁcial to page construction. #brightonSEO @Jammer_Volts

Slide 44

Slide 44 text

Mobile vs Desktop Rendering Layout matters for both. If you want to rank for position zero, remember that the content must be exposed on initial mobile load. #brightonSEO @Jammer_Volts

Slide 45

Slide 45 text

Choose the rendering strategy that’s right for your business and stack. You don’t have to be 100% client-side, 100% server-side, or 100% both (dynamic). Load what matters when it matters. #brightonSEO @Jammer_Volts

Slide 46

Slide 46 text

#brightonSEO @Jammer_Volts

Slide 47

Slide 47 text

Rendering Challenges Survival Check #brightonSEO @Jammer_Volts

Slide 48

Slide 48 text

#brightonSEO @Jammer_Volts

Slide 49

Slide 49 text

Rendering & Performance DOM HTML Style Sheets HTML Parser CSS Parser DOM Tree Style Rules Render Tree Attachment Layout Painting Display TTFB TTI #brightonSEO @Jammer_Volts

Slide 50

Slide 50 text

#brightonSEO @Jammer_Volts

Slide 51

Slide 51 text

More Pages Resources require more rendering resources Each resource must be fetched independently before the page can be accurately rendered. This is a major part of the issue with client-side rendering. More client-side calls mean more blindspots for you. #brightonSEO @Jammer_Volts

Slide 52

Slide 52 text

Excessive scripts runs the risk of hitting thread/rest thresholds. This is most often observed as Other error . #brightonSEO @Jammer_Volts

Slide 53

Slide 53 text

Call Stacks have a maximum size While the Call Stack has functions to execute, the browser can’t actually do anything else — it’s getting blocked. #brightonSEO @Jammer_Volts

Slide 54

Slide 54 text

Session and Local web storage limits 5MB per object, and 50MB per system If your CSR resources are too large, you risk hitting the upper limit. Elements in queue once the limit is reached may not be considered by Googlebot. #brightonSEO @Jammer_Volts

Slide 55

Slide 55 text

Load scripts & images without blocking Asynchronous calls are supported with async attributes Lazy load images in Chrome with native attributes

#brightonSEO @Jammer_Volts

Slide 56

Slide 56 text

Broken Structured Data Markup #brightonSEO @Jammer_Volts

Slide 57

Slide 57 text

Don’t trust document.write( ) Dynamic code (such as script elements containing document.write() calls) can add extra tokens, so the parsing process actually modiﬁes the input. #brightonSEO @Jammer_Volts

Slide 58

Slide 58 text

Render Testing Check for traps #brightonSEO @Jammer_Volts

Slide 59

Slide 59 text

Test local/ﬁrewalled with tunneling SimpleHTTPServer (http.server in Python 3) is a Simple HTTP request handler for QA #brightonSEO @Jammer_Volts

Slide 60

Slide 60 text

Test local or ﬁrewalled with tunneling ngrok exposes that page on a publicly accessible URL #brightonSEO @Jammer_Volts

Slide 61

Slide 61 text

#brightonSEO @Jammer_Volts

Slide 62

Slide 62 text

#brightonSEO @Jammer_Volts

Slide 63

Slide 63 text

#brightonSEO @Jammer_Volts

Slide 64

Slide 64 text

#brightonSEO @Jammer_Volts Make Allies

Slide 65

Slide 65 text

|￣￣￣￣￣￣￣￣￣￣￣| DON'T BE AFRAID TO LEARN IN PUBLIC |＿＿＿＿＿＿＿＿＿＿＿| (\__/) || (•ㅅ•) || / 　づ #brightonSEO @Jammer_Volts

Slide 66

Slide 66 text

Resources ● Get started with Chrome Developer Tools ● HTML/HTML5 Parsing Standards ● Debugging your pages ● SimpleHTTPServer ● Ngrok ● Fix Search-related JavaScript problems ● TurboFan overview ● Liftover overview ● Tame the Bots Portals ● Blink Rendering, life of a pixel ● The Rendering Critical Path ● JavaScript Sites in Search Working Group #brightonSEO @Jammer_Volts