Technical SEO in e-Commerce - Search Y 2021

Jeudi 3 juin 2021 | Événement en ligne L’Événement Search
Marketing TECHNICAL SEO

Technical SEO in E-Commerce How to successfully master the biggest
technical SEO challenges in online shopping/e-commerce Bastian Grimm, Peak Ace AG | @basgr

Why dedicate a whole session to online shopping?

pa.ag @peakaceag 4 No two shops are the same… Differences
in industry and size require customised SEO strategies * Brand = e.g. Uber (not a priority to sell on the site) vs. eComm = e.g. Nike (online shop) or Emirates (ticket shop) Type of domain Number of URLs (scope) eCommerce Publishing Classifieds Lead-gen Brand Other <1,000 <10,000 <100,000 <1,000,000 1,000,000+ Which quadrant are you in?

pa.ag @peakaceag 5 No two shops are the same… Online
retailers with limited product ranges (<1,000 products) face different challenges than multi-range retailers * Brand = e.g. Uber (not a priority to sell on the site) vs. eComm = e.g. Nike (online shop) or Emirates (ticket shop) Type of domain Number of URLs (scope) eCommerce Single retailer Multi retailer … Publishing Special interest (e.g. health) Daily newspaper … Classifieds Lead-gen Brand Other <1,000 <10,000 <100,000 <1,000,000 1,000,000+ Which quadrant are you in?

Focus: Category and product detail pages 10 tips. Let‘s go!

pa.ag @peakaceag 7 #1 Indexing strategy: categories, sub-categories, pagination, etc.
Caused by/refers to: All types of overview/listing pages Issue brief: Categories that compete with subcategories or super deep paginations that cause crawl and indexing problems Issue categories: Crawling inefficiencies, website quality Suggested change/fix: Crawl/indexing strategy dependent on size/page types Comment: Loads of variables to consider; the larger the site gets, the more complex to get right

pa.ag @peakaceag 8 No crawling and/or indexing strategy Depending on
the age, scope and volume, there can be lots of URLs to deal with; carefully consider what you want to give to Googlebot:

pa.ag @peakaceag 9 Google released it‘s own guide to managing
crawl budget Opposite to what Google is saying, this is very well worth a read for everyone – even though its specifically tailored to “large” as well as “very rapidly changing sites”: Source: https://pa.ag/35MqZHX

pa.ag @peakaceag 10 Getting Google to crawl those important URLs
Significantly cuts to the crawlable URL inventory led to an intended shift; Google started crawling previously uncrawled URLs after eliminating 15m+ unnecessary URLs Source: Peak Ace AG 0 5.000.000 10.000.000 15.000.000 20.000.000 25.000.000 30.000.000 35.000.000 Jul Aug Sep Oct Nov Dec Jan Feb crawled URLs un-crawled URLs total URLs

pa.ag @peakaceag 11 Keyword targeting: main categories vs sub-categories Which
(sub)category should be found for the term “fresh fruit"? Pay close attention to clear terminology and differentiation:

pa.ag @peakaceag 12 Tons of unnecessary/unused sorting and/or filtering If
you have sorting options, ensure they're being used (analytics is your friend) – otherwise remove them and prevent them from being crawled (robots.txt / PRG)

pa.ag @peakaceag 13 The “articles per page” filter/selection: don’t bother
For each category listing, three times the number of URLs are generated – this is a crawling disaster. And often, if left unchecked, this leads to duplicate content: Client-side, JavaScript would at least solve crawling and indexing problems - but it is questionable whether this feature is actually being used.

pa.ag @peakaceag 14 Pagination (for large websites) is essential! Recommendation
for the "correct" pagination (for each objective) from audisto: Source: https://pa.ag/3cjlgev For lists with equally important items, choose the logarithmic or the Ghostblock pagination (equal PR across all pagination and item pages). For lists with a small number of important items use the “Link first Pages”, “Neighbors”, “Fixed Block” pagination (most PR goes to the first pagination and item pages).

This just got very, very interesting… So, what about pagination?

pa.ag @peakaceag 16 Hang on - what about "noindex=nofollow"? Noindexing
pages should lead to nofollow - at least over time - as well Source: https://pa.ag/2EssNeV Google's John Mueller says that long term, noindex follow will eventually equate to a noindex, nofollow directive as well […] eventually Google will stop going to the page because of the noindex, remove it from the index, and thus not be able to follow the links on that page.

pa.ag @peakaceag 17 Links from noindex‘ed pages might be worthless
“Noindex = don’t index. And if we completely drop it […], then we wouldn’t use anything from there […] I wouldn’t count on links from noindex pages being used.” Source: https://pa.ag/2TCiADY

pa.ag @peakaceag 19 #2 Content quality: thin & duplicate product
pages Caused by/refers to: Product Detail Pages Issue brief: Automatically delivered product data or products with hardly any (differentiation) features quickly create (near) duplicate content. Issue categories: Website quality, duplicate content, thin content Suggested change/fix: Monitor content quality carefully, e.g. define noindex rules accordingly

pa.ag @peakaceag 20 Common causes of duplicate content #1 For
Google, these examples are each two different URLs: Dealing with duplication issues ▪ 301 redirect: e.g. non-www vs. www, HTTP vs. HTTPs, casing (upper/lower), trailing slashes, Index pages (index.php) ▪ noindex: e.g. white labelling, internal search result pages, work-in-progress content, PPC- and other landing pages ▪ (self-referencing) canonicals: e.g. for parameters used for tracking, session IDs, printer friendly version, PDF to HTML, etc. ▪ 403 password protect: e.g. staging-/development servers ▪ 404/410 gone: e.g. feeded content that needs to go fast, other outdated/irrelevant or low-quality content i https://pa.ag https://www.pa.ag non-www vs www http://pa.ag https://pa.ag HTTP vs HTTPS https://www.pa.ag/cars?colour=black&type=racing https://www.pa.ag/cars?type=racing&colour=black URL GET-parameter order

pa.ag @peakaceag 21 Common causes of duplicate content #2 For
Google, these examples are each two different URLs: Taxonomy issues Production server vs. https://pa.ag/url-a/ https://pa.ag/url-A/ Case sensitivity https://pa.ag/url-b https://pa.ag/url-b/ Trailing Slashes Category A Category B Category C Staging / testing server

pa.ag @peakaceag 22 Content quality is still important in ecommerce!
Keep a close eye on the indexing of articles that are "very similar" in content: About 2.410 results (0,37 seconds)

pa.ag @peakaceag 23 Prevent inferior content from being indexed In
particular: automatically generated URLs like "no ratings for X" or "no comments for Y" often lead to lower quality content! Other kinds of bad or thin content: ▪ Content from (external) feeds (e.g. through white label solutions / partnerships, affiliate feeds etc.) ▪ Various "no results" pages (no comments for product A, no ratings for product B, no comments for article C etc.) ▪ Badly written content (e.g. grammatical errors) ▪ General: same content on different domains

pa.ag @peakaceag 24 #3 Handling multiple versions of a product
(colour/size) Caused by/refers to: Product detail pages Issue brief: Demand is too low for the PDPs being indexed (in all their combinations). Link equity/ranking potential is lost. Issue categories: Duplicate content, crawl inefficiency, ranking issues Suggested change/fix: Consolidate a default product whenever possible (e.g. strongest selling colour/size) Comment: Client-side JS or, at a minimum, canonical tags are needed

pa.ag @peakaceag 25 Exactly the same GEL-NIMBUS 22 in a
different colour Asics uses individual URLs for each of their available colour/size variations Not enough people search for “asics Gel Nimbus 22 black 43” and “grey” respectively. Demand is too low for the PDPs being indexed (in all their combinations). Link equity/ranking potential is lost/split. i gel-nimbus-22/p/1011A680-002.html gel-nimbus-22/p/1011A680-022.html 022.html

pa.ag @peakaceag 26 One solution could be to canonicalise to
a root product: A canonical tag is only a hint, not a directive. Google can choose to ignore it entirely. When using canonical tags, please be extra careful: ▪ There may only be one rel-canonical annotation per URL - only ONE! ▪ Use absolute URLs with protocols and subdomains ▪ Rel-canonical targets must actually work (no 4XX targets) – they need to serve a HTTP 200 ▪ No canonical tag chaining, Google will ignore this! ▪ Maintain consistency: only one protocol (HTTP vs. HTTPS), either www or non-www and consistent use of trailing slashes ▪ Etc.

pa.ag @peakaceag 28 Most efficient: minimising URL overhead Improve crawl
budget (and link equity) by consolidating to one URL. Salmon PDPs are rewarded with strong rankings: #848=15694 #848=15692

pa.ag @peakaceag 29 #4 One product, but reachable via multiple
categories Caused by/refers to: Product detail pages Issue brief: Product detail pages should be reachable via multiple URLs (due to the category name being part of the PDP URL) Issue categories: Duplicate content, crawl inefficiencies, ranking issues Suggested change/fix: Category-independent product URLs Comment: Alternatively, define a default category to be used in the URL slug

pa.ag @peakaceag 30 Two different URLs serving the exact same
product This minimises the chances of it ranking well; also from a crawling perspective, this isn‘t a good solution at all – both URLs would be crawled individually. international-gins most-popular

pa.ag @peakaceag 31 Solution: only ever use one URL per
product! A dedicated product directory - regardless of the category - is the best solution in most cases; it also often makes analysis easier: Alternative: Consolidate all products within the document root Watch out: Using canonical tags or noindex for products with multiple results is possible - but inefficient in terms of crawling. reduction > noindex > canonical tag. !

pa.ag @peakaceag 32 #5 Brand filter vs. branded category: /watches/breitling
vs. /breitling/all Caused by/refers to: Category pages and their filters Issue brief: A brand category that targets the exact same keyword set vs a category that allows filtering for a brand name Issue categories: Keyword cannibalisation, crawl inefficiency Suggested change/fix: Canonicalise, prevent indexation of one URL variant Comment: PRG pattern for large-scale scenarios (e.g. preventing an entire block of filtering from being crawled/indexed)

pa.ag @peakaceag 33 Another classic: brand filter vs brand (category)
page If you index both, which one is supposed to rank for the generic branded term? One keyword, one URL: try to minimise internal competition as much as you can. Two (or more) pages targeting "Breitling watches" make it unnecessarily hard for Google to select the best result! i Category ”watches“ filtered by brand ”Breitling“ Dedicated “Breitling” showcase/brand page

pa.ag @peakaceag 34 #6 Expired/(temp.) out of stock product management
Caused by/refers to: Product detail pages Issue brief: PDPs for products that are (temporarily) out of stock can cause bad engagement metrics (e.g. high bounce rates, etc.) Issue categories: Engagement metrics, website quality, inefficient URLs Suggested change/fix: Implement OOS strategy (redirects, info layer, disable ((410) entirely, etc.) Comment: Hugely complex topic depending on the size, the volatility of the inventory, and much more

pa.ag @peakaceag 35 Deal with your out of stock items
- but not like M&S does! Are they just temporarily unavailable (and for how long) or will they never come back? Also, what alternative versions are available? About 294.000 results (0,23 seconds) M&S keeps all of their out of stock pages indexed: <meta name="robots" content="index, follow"> <link rel="canonical“ href="[…]/chef-hard-anodised-28cm-saute-pan/p/p22467321"> i

pa.ag @peakaceag 36 How to deal with OOS situations? For
non-deliverable products, there is not only one solution. Often, it comes down to a combination. Tip: use dynamic infolayer to inform users. OOS-Handling REDIRECT (internal search) REDIRECT (successor) REDIRECT (similar products in other colours, sizes, etc.) 410 ERROR (only if you really want to delete!) REDIRECT (same product but e.g. in a different colour) NOINDEX (newsletter/lead gen)

pa.ag @peakaceag 37 No exit strategy for paginated categories? Categories
with high churn need to deal with paginated pages coming and going (e.g. what happens when there's not enough products to display a 2nd page?) About 3,065 results (0,28 seconds)

pa.ag @peakaceag 38 #7 Facetted navigation, sorting & filtering (e.g.
in categories) Caused by/refers to: Category pages that allow for filtering and/or sorting Issue brief: Various sorting/filtering/facets time categories and sub- categories can lead to millions of (worthless) URLs Issue categories: Keyword cannibalisation, crawl inefficiency, thin content Suggested change/fix: Individual indexing strategy (based on demand) per filter and facet, prevent crawling/indexing for sorting Comment: Very difficult to get right, usually requires individual solutions

pa.ag @peakaceag 39 Issue: facetted navigation poorly controlled/implemented “A facetted
search is a restriction of the selection according to different properties, characteristics and/or values.” If Zalando would allow for all these options to become crawlable URLs, this would lead to millions and millions of useless URLs. Only allow crawling and indexing of URLs that have target keywords and keyword combinations with actual search demand. Pay special attention to internal keyword cannibalisation. i

pa.ag @peakaceag 40 Solution: Boots handles this excellently using client-side
JS Also, from a user's perspective, using JavaScript for features such as filtering feels much faster, since the perceived load time decreases #facet:-100271105108108101116116101,-1046543&product

pa.ag @peakaceag 43 #8 Structured Data: schema.org Caused by/refers to:
Product detail pages Issue brief: Google needs machine-readable data in a structured form (basis: schema.org) to display some information directly in the search results, for example price and product availability. If this information is not available, the result preview is smaller (= one line is missing). Issue categories: SERP snippet preview, SERP CTR Suggested change/fix: Implement schema.org markup on product detail pages (min. prices, availability etc.); and ideally ratings too, if available.

pa.ag @peakaceag 44 Rich snippets based on structured data A
valuable additional line in the SERP Snippet for more attention: schema.org/Rating + AggregateRating schema.org/Product + schema.org/Offers + schema.org/InStock

pa.ag @peakaceag 45 Label products and offers with schema.org Schema.org
markup for product details as well as price, stock & reviews

pa.ag @peakaceag 46 Google discontinued the SDTT … yeah, I
know – right? Attention: The Rich Results Test does not show all kinds / types of structured data, but only those that Google supports. Source: https://pa.ag/2DSKpzO ▪ Bing Webmaster Markup Validator https://www.bing.com/toolbox/markup-validator ▪ Yandex Structured Data Validator https://webmaster.yandex.com/tools/microtest/ ▪ ClassySchema Structured Data Viewer: https://classyschema.org/Visualisation ▪ https://schemamarkup.net/ ▪ https://www.schemaapp.com/tools/schema-paths/ ▪ https://json-ld.org/playground/ ▪ https://technicalseo.com/tools/schema-markup-generator/

pa.ag @peakaceag 47 Tip: Free Structured Data Helper from RYTE
Highlights syntax errors and missing required properties. All nested elements in one place for convenient in-line validation: Source: https://pa.ag/3b9CkU5

pa.ag @peakaceag 48 To avoid confusion: no schema mark-up! Schema.org
mark-up is not being used to show/generate this extended SERP snippet: So-called featured snippets are usually shown for non-transactional search queries; schema.org mark-up is not mandatory. Also no schema.org mark-up, Google extracts this information (“monthly leasing rate”) directly from the HTML mark-up.

pa.ag @peakaceag 49 #9 Discovery: XML sitemaps, etc. Caused by/refers
to: Better/faster article indexing Issue brief: Sitemaps and crawl hubs for better internal linking, discovery and additional canonicalisation signals Issue categories: Crawl efficiency, internal linking Suggested change/fix: Establish a proper XML sitemap (creation) process, find the URLs that Google hits heavily and use them to link internally Comment: Poorly maintained XML sitemaps, e.g. containing broken / irrelevant URLs, can lead to significant crawl budget waste

pa.ag @peakaceag 50 Poorly maintained XML sitemaps No redirects, no
URLs that are blocked via robots.txt or meta robots, no URLs with a different canonical tag! ▪ Screaming Frog ▪ Mode > List ▪ Download XML Sitemap

pa.ag @peakaceag 51 Pages not in sitemaps – using DeepCrawl’s
source gap Comparing (and overlaying various crawl sources) to identify hidden issues/potentials

pa.ag @peakaceag 52 #10 Web performance: maximum loading speed Caused
by/refers to: Loading speed (entire website) Issue brief: Often mobile services in particular are still extremely slow, but this is not the only area where there is a need for optimisation. Most online shops are not fully optimised: Pictures, external fonts, JavaScripts and much more offer opportunities for performance gains. Issue categories: Loading speed, engagement metrics Suggested change/fix: Multifaceted topic with a significant number of individual optimisation possibilities, which depend to a large extent on the infrastructure, shop system etc. - can only be solved successfully together with the IT team.

Fast loading time plays an important role in overall user
experience! Performance is about user experience!

pa.ag @peakaceag 54 Revisited: page speed already is a ranking
factor Source: http://pa.ag/2iAmA4Y | http://pa.ag/2ERTPYY

pa.ag @peakaceag 55 User experience to become a Google ranking
factor The current Core Web Vitals set focuses on three aspects of user experience - loading, interactivity, and visual stability - and includes the following metrics: Source: https://pa.ag/3irantb Google announced a new ranking algorithm designed to judge web pages based on how users perceive the experience of interacting with a web page. That means if Google thinks your website users will have a poor experience on your pages, Google may not rank those pages as highly as they are now. i

pa.ag @peakaceag 57 Optimising for Core Web Vitals such as
LCP, FID and CLS? An overview of the most common issues and respective fixes: LCP is primarily affected by: ▪ Slow server response time ▪ Render blocking JS/CSS ▪ Resource load times ▪ Client-side rendering FID is primarily affected by: ▪ Third-party code ▪ JS execution time ▪ Main thread work/business ▪ Request count & transfer size CLS is primarily affected by: ▪ Images without dimensions ▪ Ads, embeds and iframes without dimensions ▪ Web fonts (FOIT/FOUT) Optimizing for LCP: ▪ Server response times & routing ▪ CDNs, caching & compression ▪ Optimise critical rendering path ▪ Reduce blocking times (CSS, JS, fonts) ▪ Images (format, compression, etc.) ▪ Preloading & pre-rendering ▪ Instant loading based on PRPL Optimising for FID: ▪ Reduce JS execution (defer/async) ▪ Code-split large JS bundles ▪ Break up JS long tasks (>50ms) ▪ Minimise unused polyfills ▪ Use web workers to run JS on a non-critical background thread Optimising for CLS: ▪ Always include size attributes on images, video, iframes, etc. ▪ Reserve required spaces in advance ▪ Reduce dynamic injections

pa.ag @peakaceag 58 Client-side/front-end optimisation tasks Use my checklist on
SlideShare to double check: All slides on SlideShare: http://pa.ag/iss18speed ▪ Establish a content-first approach: progressive enhancement, also prioritise visible, above the fold content: 14kB (compressed). ▪ Reduce size: implement effective caching and compression. ▪ Whenever possible, use asynchronous requests. ▪ Decrease the size of CSS and JavaScript files (minify). ▪ Lean mark-up: no comments, use inline CSS/JS only where necessary or useful. ▪ Optimise images: reduce overhead for JPGs & PNGs (metadata, etc.), request properly sized images and try new formats. ▪ Minimise browser reflow & repaint.

pa.ag @peakaceag 59 Increasing crawled URLs due to faster load
times Slashing website load times (Lighthouse score ~36 to 70) led to >25% more URLs being crawled by Googlebot: Source: Peak Ace AG 0 10 20 30 40 50 60 70 80 0 50.000 100.000 150.000 200.000 250.000 300.000 350.000 400.000 Nov Dec Jan Feb Mar Apr crawled URLs Lighthouse perf. score (avg.)

Possible answers to some “what if…?“ questions Want some more?

What if I need to rename/delete categories?

pa.ag @peakaceag 62 Don‘t forget 301s when changing your structure
If the category name is "automatically" connected to the URL slug, redirect to the "new" name; when deleting a category, always have a redirect in place. The following applies: if an (old) URL was linked externally at some point (and that link still exists), the internal redirect (e.g. old category name > new category name) is now required forever.

pa.ag @peakaceag 63 Bulk test these things: redirects & other
headers HTTP status codes (errors, redirects, etc.) at scale, for free: httpstatus.io Check it out: https://httpstatus.io/

pa.ag @peakaceag 64 Fix those redirect chains, especially on large
sites… …as multiple requests waste valuable performance and crawl budget!

pa.ag @peakaceag 65 Don‘t be lazy: ensure code hygiene! Remove
internally linked redirects from templates and adjust them to “direct“ linking:

pa.ag @peakaceag 66 Also fix (internally) linked error pages (e.g.
404)! Adjust internal links in the code and check alternative references (canonical, sitemap, etc.); for traffic, ext. links and rankings => redirect. Quality signal!?

What if I have no time to write my own
titles (or descriptions)?

pa.ag @peakaceag 68 At least try to use simple templates!
Google usually autogenerates the worst snippets; the same standard fallback page title directly qualifies the affected URLs as duplicates:

What if I need to use URL parameters for tracking?

pa.ag @peakaceag 70 For tracking, whenever possible use # instead
of ? Run GA Tracking with fragments instead of GET parameters; or automatically remove parameters with hitCallback parameters (after page view measurement): Source: https://pa.ag/2TuJMk5 If - for whatever reason – you need to use URL parameters, don't forget implementing canonical tags and always test using GSC to ensure that Google actually uses them.

pa.ag @peakaceag 71 Never use parameterised tracking URLs for internal
linking

pa.ag @peakaceag 72 Also: do not use your own parameters
for tracking! Did I mention that parameters actually cause all sorts of problems, constantly?

pa.ag @peakaceag 74 URL parameter settings in Google Search Console
GSC also allows you to manually configure URL parameters and their effects; please note that this is "only" available for Google.

How should I manage my (internal) search?

pa.ag @peakaceag 81 Prevent crawling & indexing: POST-Req. & noindex
Prevent crawling and indexing of search results. SERP in SERP usually leads to a bad user experience / bad signals - Google sees it the same way: About 663,000 results (0,92 seconds)

How do I deal with personalisation in relation to the
Googlebot?

pa.ag @peakaceag 83 Personalisation: good or bad - and what
to consider? Consider variable internal linking, such as "last viewed" or "you might also like this article" in the link graph: Use the non-personalised standard view for the Googlebot; personalisation as "layer on top" is unproblematic from an SEO point of view.

How do I make my (listing) pages faster?

pa.ag @peakaceag 91 Things are much easier now: loading =
lazy Performance benefits paired with SEO friendliness (and no JS) simultaneously Tip: This now also works for <iframe src=“…“ loading=“lazy“> : Most recent versions of Chrome, Firefox and Edge do support this already:

Care for the slides? www.pa.ag twitter.com/peakaceag facebook.com/peakaceag Take your career
to the next level: jobs.pa.ag [email protected] Bastian Grimm [email protected]

Technical SEO in e-Commerce - Search Y 2021

Technical SEO in e-Commerce - Search Y 2021

More Decks by Bastian Grimm

Other Decks in Technology

Featured

Transcript