London | Berlin | New York | Toronto | Sydney | Singapore 100+ digital experts globally, with local expertise media managed internationally $200M+ E-commerce Experience Since 2009
The Crawling, Indexation & Search Process To understand indexation, we must know the whole journey Google Bot Mobile Google Servers Newly crawled pages are indexed & added to server if allowed Browser Algorithms act as filter and show relevant results in index User searches & request sent to Google server… Relevant results listed in browser
Because it’s the Reason We’re All Here Indexation of e-commerce brand pages can lead to KPI success More Organic Traffic Better Site Authority More Organic Revenue Better User Experience More SERP Presence Advantages Over Competitors
Pagination Issues Sitemap Issues Canonical Issues Redirect Issues 4xx Errors Robots.txt Issues Robots Directive Issues Internal Link Issues Technical Issues Are The Main Culprits The following SEO issues hinder a website's crawlability and indexability
Remove 404 Pages & Low Value OOS Pages This will save crawl budget & alleviate issue of Google perceiving website as low quality Remove “sold out” pages. Redirect broken pages to most relevant page.
Connect Orphaned Pages Back to Website Pages with no internal links are harder to find No links from main domain page(s) for crawlers to use to navigate to the page.
Avoid Pagination & Facets Diluting Structure Avoid creating too many pages, which impacts crawl budget Current Page User Input Back to Page 1 Back 1 Page No. of Articles/Products
Be Wary of Crawl Depth in Site Hierarchy Crawl depth > 4 can lead to lack of website page visibility Crawl Depth 1 Crawl Depth 2 Crawl Depth 3 Crawl Depth 4 Crawl Depth 5 Add category & product pages near root domain
Create an Optimised and Concise Sitemap Only include important and working pages in sitemap(s) Do: ➔ Include important pages like category & product. ➔ Have < 50,000 URLs per sitemap. ➔ Include 200 code indexed pages. ➔ Group sitemap(s) by category, language, site area, etc. Don't: ➔ Include low value pages (T&C’s, etc.) ➔ Have > 50,000 URLs per sitemap ➔ Include non-indexed pages (e.g. canonicalised, redirected, noindex tag, etc.)
Respect the Robots Directive Tags Address important product pages marked with incorrect tags
Directives Directives Key: Index - Allows page to be indexed. Follow - Allows links on page to be crawled. NoFollow - Tells crawlers not to crawl links on the page. NoIndex - Tells crawlers not to index the page.
We’ve contradicted ourselves and confused search engines here by allowing crawling of any .php$ resources and also disallowing crawling of any .php$ resources. Correct types of pages to disallow from Googlebot(s). Tidy Your Robots.txt Instruction File Check we’re not disallowing any important pages or areas
Make Your Temporary Redirects Permanent Temporary redirect pages are still used by Google for ranking, impacting indexation & ranking of new pages HTTP:// HTTP:// HTTP:// HTTPS:// HTTPS:// HTTPS:// HTTP:// HTTPS:// 302 Temporary Redirect 307 Temporary Redirect 301 Permanent Redirect 308 Permanent Redirect
Banner pulled from CDN, which is auto compressed and alleviates any rendering/speed/size issues. Product images are pulled from different CDN, which doesn’t auto compress. Use CDN & CMS Auto Compression Features Next.js (REACT Framework to create web applications) isn’t autocompressing using gzip, brotli or deflate. To combat Core Web Vital (CWV) issues such as LCP
Server Side Rendering the Navigation Will improve page speed and increase Googles chances of caching and crawling our navigation links Googlebot mobile struggling to access links in menu. SSR will improve chances of caching and speed up the process in which content is loaded/rendered.
Tracking resources loaded last e.g. GTM Above the fold images rendered first. Render Important Page Resources First This will allow Google to quickly find important page content Sometimes this isn’t possible, so SSR would help preload resources above images (e.g. JS)
We Have High Level & Evergreen Checks (1/2) Site searches can be used for indexation checks Shows us if the URL is indexed and in Google… Shows us what URL’s on the domain are indexed and ranking for the term “Hello World”… Shows us if the page is indexed and in Google for the term “Hello World”…
As Well As Prerequisite Checks (1/3) By checking Google’s cache, we identify how visible URLs are 1 2 3 Navigation is heavily JavaScript and hasn’t been cached by Google. This issue leads to less navigation URLs becoming less visible, impacting crawling & indexation.
As Well As Prerequisite Checks (2/3) Large elements on a page can also reduce link visibility Banner image too large & potentially blocking visibility of other page elements.
As Well As Prerequisite Checks (3/3) Utilise crawlers available to identify blockers Crawlers Blockers x-robots-tag: noindex, nofollow, nosnippets Redirects, Canonicals, Error Pages Crawling: Sitemaps, Robots.txt, Orphaned Pages, Crawl Depth, etc.
Introduction to END. Clothing A little context around our client before we dive into results END. Clothing is a global fashion retailer selling designer brands END. Clothing is JavaScript heavy & struggles with being crawled Tug identified crawling blocker as reason for lack of indexability
Case Study: Increase in Facet Indexation UK Facet page indexation increased by 64% MoM Validation passed: Google resolved canonical issue There was a significant fix on 29th November, resolving a duplicate issue where there was a large volume of pages without user-selected canonical.
Case Study: Increase in Indexation After Temporary Redirect Fixes 28% increase in events to pages between April & May 2023 Site speed & CWV Fixes Temporary Redirect Fixes
3 Key Points to Remember Basic technical checks and fixes still yield positive results Use 3rd party tools to analyse Google interactions Employ platform features for sitewide changes