Technical SEO in e-Commerce - Search Y 2021

Slide 1

Slide 1 text

Jeudi 3 juin 2021 | Événement en ligne L’Événement Search Marketing TECHNICAL SEO

Slide 2

Slide 2 text

Technical SEO in E-Commerce How to successfully master the biggest technical SEO challenges in online shopping/e-commerce Bastian Grimm, Peak Ace AG | @basgr

Slide 3

Slide 3 text

Why dedicate a whole session to online shopping?

Slide 4

Slide 4 text

pa.ag @peakaceag 4 No two shops are the same… Differences in industry and size require customised SEO strategies * Brand = e.g. Uber (not a priority to sell on the site) vs. eComm = e.g. Nike (online shop) or Emirates (ticket shop) Type of domain Number of URLs (scope) eCommerce Publishing Classifieds Lead-gen Brand Other <1,000 <10,000 <100,000 <1,000,000 1,000,000+ Which quadrant are you in?

Slide 5

Slide 5 text

pa.ag @peakaceag 5 No two shops are the same… Online retailers with limited product ranges (<1,000 products) face different challenges than multi-range retailers * Brand = e.g. Uber (not a priority to sell on the site) vs. eComm = e.g. Nike (online shop) or Emirates (ticket shop) Type of domain Number of URLs (scope) eCommerce Single retailer Multi retailer … Publishing Special interest (e.g. health) Daily newspaper … Classifieds Lead-gen Brand Other <1,000 <10,000 <100,000 <1,000,000 1,000,000+ Which quadrant are you in?

Slide 6

Slide 6 text

Focus: Category and product detail pages 10 tips. Let‘s go!

Slide 7

Slide 7 text

pa.ag @peakaceag 7 #1 Indexing strategy: categories, sub-categories, pagination, etc. Caused by/refers to: All types of overview/listing pages Issue brief: Categories that compete with subcategories or super deep paginations that cause crawl and indexing problems Issue categories: Crawling inefficiencies, website quality Suggested change/fix: Crawl/indexing strategy dependent on size/page types Comment: Loads of variables to consider; the larger the site gets, the more complex to get right

Slide 8

Slide 8 text

pa.ag @peakaceag 8 No crawling and/or indexing strategy Depending on the age, scope and volume, there can be lots of URLs to deal with; carefully consider what you want to give to Googlebot:

Slide 9

Slide 9 text

pa.ag @peakaceag 9 Google released it‘s own guide to managing crawl budget Opposite to what Google is saying, this is very well worth a read for everyone – even though its specifically tailored to “large” as well as “very rapidly changing sites”: Source: https://pa.ag/35MqZHX

Slide 10

Slide 10 text

pa.ag @peakaceag 10 Getting Google to crawl those important URLs Significantly cuts to the crawlable URL inventory led to an intended shift; Google started crawling previously uncrawled URLs after eliminating 15m+ unnecessary URLs Source: Peak Ace AG 0 5.000.000 10.000.000 15.000.000 20.000.000 25.000.000 30.000.000 35.000.000 Jul Aug Sep Oct Nov Dec Jan Feb crawled URLs un-crawled URLs total URLs

Slide 11

Slide 11 text

pa.ag @peakaceag 11 Keyword targeting: main categories vs sub-categories Which (sub)category should be found for the term “fresh fruit"? Pay close attention to clear terminology and differentiation:

Slide 12

Slide 12 text

pa.ag @peakaceag 12 Tons of unnecessary/unused sorting and/or filtering If you have sorting options, ensure they're being used (analytics is your friend) – otherwise remove them and prevent them from being crawled (robots.txt / PRG)

Slide 13

Slide 13 text

pa.ag @peakaceag 13 The “articles per page” filter/selection: don’t bother For each category listing, three times the number of URLs are generated – this is a crawling disaster. And often, if left unchecked, this leads to duplicate content: Client-side, JavaScript would at least solve crawling and indexing problems - but it is questionable whether this feature is actually being used.

Slide 14

Slide 14 text

pa.ag @peakaceag 14 Pagination (for large websites) is essential! Recommendation for the "correct" pagination (for each objective) from audisto: Source: https://pa.ag/3cjlgev For lists with equally important items, choose the logarithmic or the Ghostblock pagination (equal PR across all pagination and item pages). For lists with a small number of important items use the “Link first Pages”, “Neighbors”, “Fixed Block” pagination (most PR goes to the first pagination and item pages).

Slide 15

Slide 15 text

This just got very, very interesting… So, what about pagination?

Slide 16

Slide 16 text

pa.ag @peakaceag 16 Hang on - what about "noindex=nofollow"? Noindexing pages should lead to nofollow - at least over time - as well Source: https://pa.ag/2EssNeV Google's John Mueller says that long term, noindex follow will eventually equate to a noindex, nofollow directive as well […] eventually Google will stop going to the page because of the noindex, remove it from the index, and thus not be able to follow the links on that page.

Slide 17

Slide 17 text

pa.ag @peakaceag 17 Links from noindex‘ed pages might be worthless “Noindex = don’t index. And if we completely drop it […], then we wouldn’t use anything from there […] I wouldn’t count on links from noindex pages being used.” Source: https://pa.ag/2TCiADY

Slide 18

Slide 18 text

pa.ag @peakaceag 19 #2 Content quality: thin & duplicate product pages Caused by/refers to: Product Detail Pages Issue brief: Automatically delivered product data or products with hardly any (differentiation) features quickly create (near) duplicate content. Issue categories: Website quality, duplicate content, thin content Suggested change/fix: Monitor content quality carefully, e.g. define noindex rules accordingly

Slide 19

Slide 19 text

pa.ag @peakaceag 20 Common causes of duplicate content #1 For Google, these examples are each two different URLs: Dealing with duplication issues ▪ 301 redirect: e.g. non-www vs. www, HTTP vs. HTTPs, casing (upper/lower), trailing slashes, Index pages (index.php) ▪ noindex: e.g. white labelling, internal search result pages, work-in-progress content, PPC- and other landing pages ▪ (self-referencing) canonicals: e.g. for parameters used for tracking, session IDs, printer friendly version, PDF to HTML, etc. ▪ 403 password protect: e.g. staging-/development servers ▪ 404/410 gone: e.g. feeded content that needs to go fast, other outdated/irrelevant or low-quality content i https://pa.ag https://www.pa.ag non-www vs www http://pa.ag https://pa.ag HTTP vs HTTPS https://www.pa.ag/cars?colour=black&type=racing https://www.pa.ag/cars?type=racing&colour=black URL GET-parameter order

Slide 20

Slide 20 text

pa.ag @peakaceag 21 Common causes of duplicate content #2 For Google, these examples are each two different URLs: Taxonomy issues Production server vs. https://pa.ag/url-a/ https://pa.ag/url-A/ Case sensitivity https://pa.ag/url-b https://pa.ag/url-b/ Trailing Slashes Category A Category B Category C Staging / testing server

Slide 21

Slide 21 text

pa.ag @peakaceag 22 Content quality is still important in ecommerce! Keep a close eye on the indexing of articles that are "very similar" in content: About 2.410 results (0,37 seconds)

Slide 22

Slide 22 text

pa.ag @peakaceag 23 Prevent inferior content from being indexed In particular: automatically generated URLs like "no ratings for X" or "no comments for Y" often lead to lower quality content! Other kinds of bad or thin content: ▪ Content from (external) feeds (e.g. through white label solutions / partnerships, affiliate feeds etc.) ▪ Various "no results" pages (no comments for product A, no ratings for product B, no comments for article C etc.) ▪ Badly written content (e.g. grammatical errors) ▪ General: same content on different domains

Slide 23

Slide 23 text

pa.ag @peakaceag 24 #3 Handling multiple versions of a product (colour/size) Caused by/refers to: Product detail pages Issue brief: Demand is too low for the PDPs being indexed (in all their combinations). Link equity/ranking potential is lost. Issue categories: Duplicate content, crawl inefficiency, ranking issues Suggested change/fix: Consolidate a default product whenever possible (e.g. strongest selling colour/size) Comment: Client-side JS or, at a minimum, canonical tags are needed

Slide 24

Slide 24 text

pa.ag @peakaceag 25 Exactly the same GEL-NIMBUS 22 in a different colour Asics uses individual URLs for each of their available colour/size variations Not enough people search for “asics Gel Nimbus 22 black 43” and “grey” respectively. Demand is too low for the PDPs being indexed (in all their combinations). Link equity/ranking potential is lost/split. i gel-nimbus-22/p/1011A680-002.html gel-nimbus-22/p/1011A680-022.html 022.html

Slide 25

Slide 25 text

pa.ag @peakaceag 26 One solution could be to canonicalise to a root product: A canonical tag is only a hint, not a directive. Google can choose to ignore it entirely. When using canonical tags, please be extra careful: ▪ There may only be one rel-canonical annotation per URL - only ONE! ▪ Use absolute URLs with protocols and subdomains ▪ Rel-canonical targets must actually work (no 4XX targets) – they need to serve a HTTP 200 ▪ No canonical tag chaining, Google will ignore this! ▪ Maintain consistency: only one protocol (HTTP vs. HTTPS), either www or non-www and consistent use of trailing slashes ▪ Etc.

Slide 26

Slide 26 text

pa.ag @peakaceag 28 Most efficient: minimising URL overhead Improve crawl budget (and link equity) by consolidating to one URL. Salmon PDPs are rewarded with strong rankings: #848=15694 #848=15692

Slide 27

Slide 27 text

pa.ag @peakaceag 29 #4 One product, but reachable via multiple categories Caused by/refers to: Product detail pages Issue brief: Product detail pages should be reachable via multiple URLs (due to the category name being part of the PDP URL) Issue categories: Duplicate content, crawl inefficiencies, ranking issues Suggested change/fix: Category-independent product URLs Comment: Alternatively, define a default category to be used in the URL slug

Slide 28

Slide 28 text

pa.ag @peakaceag 30 Two different URLs serving the exact same product This minimises the chances of it ranking well; also from a crawling perspective, this isn‘t a good solution at all – both URLs would be crawled individually. international-gins most-popular

Slide 29

Slide 29 text

pa.ag @peakaceag 31 Solution: only ever use one URL per product! A dedicated product directory - regardless of the category - is the best solution in most cases; it also often makes analysis easier: Alternative: Consolidate all products within the document root Watch out: Using canonical tags or noindex for products with multiple results is possible - but inefficient in terms of crawling. reduction > noindex > canonical tag. !

Slide 30

Slide 30 text

pa.ag @peakaceag 32 #5 Brand filter vs. branded category: /watches/breitling vs. /breitling/all Caused by/refers to: Category pages and their filters Issue brief: A brand category that targets the exact same keyword set vs a category that allows filtering for a brand name Issue categories: Keyword cannibalisation, crawl inefficiency Suggested change/fix: Canonicalise, prevent indexation of one URL variant Comment: PRG pattern for large-scale scenarios (e.g. preventing an entire block of filtering from being crawled/indexed)

Slide 31

Slide 31 text

pa.ag @peakaceag 33 Another classic: brand filter vs brand (category) page If you index both, which one is supposed to rank for the generic branded term? One keyword, one URL: try to minimise internal competition as much as you can. Two (or more) pages targeting "Breitling watches" make it unnecessarily hard for Google to select the best result! i Category ”watches“ filtered by brand ”Breitling“ Dedicated “Breitling” showcase/brand page

Slide 32

Slide 32 text

pa.ag @peakaceag 34 #6 Expired/(temp.) out of stock product management Caused by/refers to: Product detail pages Issue brief: PDPs for products that are (temporarily) out of stock can cause bad engagement metrics (e.g. high bounce rates, etc.) Issue categories: Engagement metrics, website quality, inefficient URLs Suggested change/fix: Implement OOS strategy (redirects, info layer, disable ((410) entirely, etc.) Comment: Hugely complex topic depending on the size, the volatility of the inventory, and much more

Slide 33

Slide 33 text

pa.ag @peakaceag 35 Deal with your out of stock items - but not like M&S does! Are they just temporarily unavailable (and for how long) or will they never come back? Also, what alternative versions are available? About 294.000 results (0,23 seconds) M&S keeps all of their out of stock pages indexed: i

Slide 34

Slide 34 text

pa.ag @peakaceag 36 How to deal with OOS situations? For non-deliverable products, there is not only one solution. Often, it comes down to a combination. Tip: use dynamic infolayer to inform users. OOS-Handling REDIRECT (internal search) REDIRECT (successor) REDIRECT (similar products in other colours, sizes, etc.) 410 ERROR (only if you really want to delete!) REDIRECT (same product but e.g. in a different colour) NOINDEX (newsletter/lead gen)

Slide 35

Slide 35 text

pa.ag @peakaceag 37 No exit strategy for paginated categories? Categories with high churn need to deal with paginated pages coming and going (e.g. what happens when there's not enough products to display a 2nd page?) About 3,065 results (0,28 seconds)

Slide 36

Slide 36 text

pa.ag @peakaceag 38 #7 Facetted navigation, sorting & filtering (e.g. in categories) Caused by/refers to: Category pages that allow for filtering and/or sorting Issue brief: Various sorting/filtering/facets time categories and sub- categories can lead to millions of (worthless) URLs Issue categories: Keyword cannibalisation, crawl inefficiency, thin content Suggested change/fix: Individual indexing strategy (based on demand) per filter and facet, prevent crawling/indexing for sorting Comment: Very difficult to get right, usually requires individual solutions

Slide 37

Slide 37 text

pa.ag @peakaceag 39 Issue: facetted navigation poorly controlled/implemented “A facetted search is a restriction of the selection according to different properties, characteristics and/or values.” If Zalando would allow for all these options to become crawlable URLs, this would lead to millions and millions of useless URLs. Only allow crawling and indexing of URLs that have target keywords and keyword combinations with actual search demand. Pay special attention to internal keyword cannibalisation. i

Slide 38

Slide 38 text

pa.ag @peakaceag 40 Solution: Boots handles this excellently using client-side JS Also, from a user's perspective, using JavaScript for features such as filtering feels much faster, since the perceived load time decreases #facet:-100271105108108101116116101,-1046543&product

Slide 39

Slide 39 text

pa.ag @peakaceag 43 #8 Structured Data: schema.org Caused by/refers to: Product detail pages Issue brief: Google needs machine-readable data in a structured form (basis: schema.org) to display some information directly in the search results, for example price and product availability. If this information is not available, the result preview is smaller (= one line is missing). Issue categories: SERP snippet preview, SERP CTR Suggested change/fix: Implement schema.org markup on product detail pages (min. prices, availability etc.); and ideally ratings too, if available.

Slide 40

Slide 40 text

pa.ag @peakaceag 44 Rich snippets based on structured data A valuable additional line in the SERP Snippet for more attention: schema.org/Rating + AggregateRating schema.org/Product + schema.org/Offers + schema.org/InStock

Slide 41

Slide 41 text

pa.ag @peakaceag 45 Label products and offers with schema.org Schema.org markup for product details as well as price, stock & reviews

Slide 42

Slide 42 text

pa.ag @peakaceag 46 Google discontinued the SDTT … yeah, I know – right? Attention: The Rich Results Test does not show all kinds / types of structured data, but only those that Google supports. Source: https://pa.ag/2DSKpzO ▪ Bing Webmaster Markup Validator https://www.bing.com/toolbox/markup-validator ▪ Yandex Structured Data Validator https://webmaster.yandex.com/tools/microtest/ ▪ ClassySchema Structured Data Viewer: https://classyschema.org/Visualisation ▪ https://schemamarkup.net/ ▪ https://www.schemaapp.com/tools/schema-paths/ ▪ https://json-ld.org/playground/ ▪ https://technicalseo.com/tools/schema-markup-generator/

Slide 43

Slide 43 text

pa.ag @peakaceag 47 Tip: Free Structured Data Helper from RYTE Highlights syntax errors and missing required properties. All nested elements in one place for convenient in-line validation: Source: https://pa.ag/3b9CkU5

Slide 44

Slide 44 text

pa.ag @peakaceag 48 To avoid confusion: no schema mark-up! Schema.org mark-up is not being used to show/generate this extended SERP snippet: So-called featured snippets are usually shown for non-transactional search queries; schema.org mark-up is not mandatory. Also no schema.org mark-up, Google extracts this information (“monthly leasing rate”) directly from the HTML mark-up.

Slide 45

Slide 45 text

pa.ag @peakaceag 49 #9 Discovery: XML sitemaps, etc. Caused by/refers to: Better/faster article indexing Issue brief: Sitemaps and crawl hubs for better internal linking, discovery and additional canonicalisation signals Issue categories: Crawl efficiency, internal linking Suggested change/fix: Establish a proper XML sitemap (creation) process, find the URLs that Google hits heavily and use them to link internally Comment: Poorly maintained XML sitemaps, e.g. containing broken / irrelevant URLs, can lead to significant crawl budget waste

Slide 46

Slide 46 text

pa.ag @peakaceag 50 Poorly maintained XML sitemaps No redirects, no URLs that are blocked via robots.txt or meta robots, no URLs with a different canonical tag! ▪ Screaming Frog ▪ Mode > List ▪ Download XML Sitemap

Slide 47

Slide 47 text

pa.ag @peakaceag 51 Pages not in sitemaps – using DeepCrawl’s source gap Comparing (and overlaying various crawl sources) to identify hidden issues/potentials

Slide 48

Slide 48 text

pa.ag @peakaceag 52 #10 Web performance: maximum loading speed Caused by/refers to: Loading speed (entire website) Issue brief: Often mobile services in particular are still extremely slow, but this is not the only area where there is a need for optimisation. Most online shops are not fully optimised: Pictures, external fonts, JavaScripts and much more offer opportunities for performance gains. Issue categories: Loading speed, engagement metrics Suggested change/fix: Multifaceted topic with a significant number of individual optimisation possibilities, which depend to a large extent on the infrastructure, shop system etc. - can only be solved successfully together with the IT team.

Slide 49

Slide 49 text

Fast loading time plays an important role in overall user experience! Performance is about user experience!

Slide 50

Slide 50 text

pa.ag @peakaceag 54 Revisited: page speed already is a ranking factor Source: http://pa.ag/2iAmA4Y | http://pa.ag/2ERTPYY

Slide 51

Slide 51 text

pa.ag @peakaceag 55 User experience to become a Google ranking factor The current Core Web Vitals set focuses on three aspects of user experience - loading, interactivity, and visual stability - and includes the following metrics: Source: https://pa.ag/3irantb Google announced a new ranking algorithm designed to judge web pages based on how users perceive the experience of interacting with a web page. That means if Google thinks your website users will have a poor experience on your pages, Google may not rank those pages as highly as they are now. i

Slide 52

Slide 52 text

pa.ag @peakaceag 57 Optimising for Core Web Vitals such as LCP, FID and CLS? An overview of the most common issues and respective fixes: LCP is primarily affected by: ▪ Slow server response time ▪ Render blocking JS/CSS ▪ Resource load times ▪ Client-side rendering FID is primarily affected by: ▪ Third-party code ▪ JS execution time ▪ Main thread work/business ▪ Request count & transfer size CLS is primarily affected by: ▪ Images without dimensions ▪ Ads, embeds and iframes without dimensions ▪ Web fonts (FOIT/FOUT) Optimizing for LCP: ▪ Server response times & routing ▪ CDNs, caching & compression ▪ Optimise critical rendering path ▪ Reduce blocking times (CSS, JS, fonts) ▪ Images (format, compression, etc.) ▪ Preloading & pre-rendering ▪ Instant loading based on PRPL Optimising for FID: ▪ Reduce JS execution (defer/async) ▪ Code-split large JS bundles ▪ Break up JS long tasks (>50ms) ▪ Minimise unused polyfills ▪ Use web workers to run JS on a non-critical background thread Optimising for CLS: ▪ Always include size attributes on images, video, iframes, etc. ▪ Reserve required spaces in advance ▪ Reduce dynamic injections

Slide 53

Slide 53 text

pa.ag @peakaceag 58 Client-side/front-end optimisation tasks Use my checklist on SlideShare to double check: All slides on SlideShare: http://pa.ag/iss18speed ▪ Establish a content-first approach: progressive enhancement, also prioritise visible, above the fold content: 14kB (compressed). ▪ Reduce size: implement effective caching and compression. ▪ Whenever possible, use asynchronous requests. ▪ Decrease the size of CSS and JavaScript files (minify). ▪ Lean mark-up: no comments, use inline CSS/JS only where necessary or useful. ▪ Optimise images: reduce overhead for JPGs & PNGs (metadata, etc.), request properly sized images and try new formats. ▪ Minimise browser reflow & repaint.

Slide 54

Slide 54 text

pa.ag @peakaceag 59 Increasing crawled URLs due to faster load times Slashing website load times (Lighthouse score ~36 to 70) led to >25% more URLs being crawled by Googlebot: Source: Peak Ace AG 0 10 20 30 40 50 60 70 80 0 50.000 100.000 150.000 200.000 250.000 300.000 350.000 400.000 Nov Dec Jan Feb Mar Apr crawled URLs Lighthouse perf. score (avg.)

Slide 55

Slide 55 text

Possible answers to some “what if…?“ questions Want some more?

Slide 56

Slide 56 text

What if I need to rename/delete categories?

Slide 57

Slide 57 text

pa.ag @peakaceag 62 Don‘t forget 301s when changing your structure If the category name is "automatically" connected to the URL slug, redirect to the "new" name; when deleting a category, always have a redirect in place. The following applies: if an (old) URL was linked externally at some point (and that link still exists), the internal redirect (e.g. old category name > new category name) is now required forever.

Slide 58

Slide 58 text

pa.ag @peakaceag 63 Bulk test these things: redirects & other headers HTTP status codes (errors, redirects, etc.) at scale, for free: httpstatus.io Check it out: https://httpstatus.io/

Slide 59

Slide 59 text

pa.ag @peakaceag 64 Fix those redirect chains, especially on large sites… …as multiple requests waste valuable performance and crawl budget!

Slide 60

Slide 60 text

pa.ag @peakaceag 65 Don‘t be lazy: ensure code hygiene! Remove internally linked redirects from templates and adjust them to “direct“ linking:

Slide 61

Slide 61 text

pa.ag @peakaceag 66 Also fix (internally) linked error pages (e.g. 404)! Adjust internal links in the code and check alternative references (canonical, sitemap, etc.); for traffic, ext. links and rankings => redirect. Quality signal!?

Slide 62

Slide 62 text

What if I have no time to write my own titles (or descriptions)?

Slide 63

Slide 63 text

pa.ag @peakaceag 68 At least try to use simple templates! Google usually autogenerates the worst snippets; the same standard fallback page title directly qualifies the affected URLs as duplicates:

Slide 64

Slide 64 text

What if I need to use URL parameters for tracking?

Slide 65

Slide 65 text

pa.ag @peakaceag 70 For tracking, whenever possible use # instead of ? Run GA Tracking with fragments instead of GET parameters; or automatically remove parameters with hitCallback parameters (after page view measurement): Source: https://pa.ag/2TuJMk5 If - for whatever reason – you need to use URL parameters, don't forget implementing canonical tags and always test using GSC to ensure that Google actually uses them.

Slide 66

Slide 66 text

pa.ag @peakaceag 71 Never use parameterised tracking URLs for internal linking

Slide 67

Slide 67 text

pa.ag @peakaceag 72 Also: do not use your own parameters for tracking! Did I mention that parameters actually cause all sorts of problems, constantly?

Slide 68

Slide 68 text

pa.ag @peakaceag 74 URL parameter settings in Google Search Console GSC also allows you to manually configure URL parameters and their effects; please note that this is "only" available for Google.

Slide 69

Slide 69 text

How should I manage my (internal) search?

Slide 70

Slide 70 text

pa.ag @peakaceag 81 Prevent crawling & indexing: POST-Req. & noindex Prevent crawling and indexing of search results. SERP in SERP usually leads to a bad user experience / bad signals - Google sees it the same way: About 663,000 results (0,92 seconds)

Slide 71

Slide 71 text

How do I deal with personalisation in relation to the Googlebot?

Slide 72

Slide 72 text

pa.ag @peakaceag 83 Personalisation: good or bad - and what to consider? Consider variable internal linking, such as "last viewed" or "you might also like this article" in the link graph: Use the non-personalised standard view for the Googlebot; personalisation as "layer on top" is unproblematic from an SEO point of view.

Slide 73

Slide 73 text

How do I make my (listing) pages faster?

Slide 74

Slide 74 text

pa.ag @peakaceag 91 Things are much easier now: loading = lazy Performance benefits paired with SEO friendliness (and no JS) simultaneously Tip: This now also works for : Most recent versions of Chrome, Firefox and Edge do support this already:

Slide 75

Slide 75 text

Care for the slides? www.pa.ag twitter.com/peakaceag facebook.com/peakaceag Take your career to the next level: jobs.pa.ag [email protected] Bastian Grimm [email protected]