Technical SEO Testing 2022 - SMX Advanced

Slide 1

Slide 1 text

@SPEAKERNAME/#SMX @peakaceag Technical SEO Testing 2022 Bastian Grimm, Peak Ace AG | @basgr Separating fact from fiction with the Peak Ace test lab

Slide 2

Slide 2 text

@SPEAKERNAME/#SMX @peakaceag One of the biggest problems in SEO?

Slide 3

Slide 3 text

@SPEAKERNAME/#SMX @peakaceag Misinformation!

Slide 4

Slide 4 text

@SPEAKERNAME/#SMX @peakaceag “I’ve heard…” People incorrectly citing other parties, often without any context/deeper understanding of the issues at hand

Slide 5

Slide 5 text

@SPEAKERNAME/#SMX @peakaceag Google says one thing… … but then actually does another (or it‘s so cryptic that you can‘t do much with it)

Slide 6

Slide 6 text

@SPEAKERNAME/#SMX @peakaceag Say hello to the Peak Ace SEO playground

Slide 7

Slide 7 text

@SPEAKERNAME/#SMX @peakaceag How does the setup work and what does it do? Pick case Case C Case A Case B Case N HeaderOption present? Apply headerOption ruleset: 0 X-Robots-Tag: noindex 1 X-Robots-Tag: noindex, no follow 2 Link: https://xxx.com/; rel=“canonical“ 3 … Is there metaOption? Apply metaOption ruleset: 0 2 3 4 5 6 7 8 … regenerateOption present? Page indexable? Generate unique text Bot indexes the page Store visit logs in DB Session ends No No No Yes Yes Yes Yes No

Slide 8

Slide 8 text

@SPEAKERNAME/#SMX @peakaceag Yeah… right! Let’s make this a bit more hands-on… essentially it’s a mini “SEO CMS”:

Slide 9

Slide 9 text

@SPEAKERNAME/#SMX @peakaceag

Slide 10

Slide 10 text

@SPEAKERNAME/#SMX @peakaceag Test-specific mark-up/directives in , e.g. JS, meta or canonical tags

Slide 11

Slide 11 text

@SPEAKERNAME/#SMX @peakaceag The actual URL that serves the content – especially interesting for redirects, etc.

Slide 12

Slide 12 text

@SPEAKERNAME/#SMX @peakaceag Unique content, in different languages, to test the actual indexing of a page

Slide 13

Slide 13 text

@SPEAKERNAME/#SMX @peakaceag A JS-based tracker, using feature detection to log Googlebot requests

Slide 14

Slide 14 text

@SPEAKERNAME/#SMX @peakaceag A couple of things you can do with this Set up new HTML documents/tests with the click of a button Add an unlimited amount of server-side headers, such as X-Robots, canonicals, hreflang, redirects, caching, etc. Add elements to the document , for example meta robots, canonical or tags to run JS Add unique content to the page, depending on the language you want to test for (sometimes, content generation has a valid use-case) Add any type of HTML to the <body> / DOM Integrated bot tracking (JS for evergreen Googlebot + non-JS) by default Automatically generate output by using standard tags (e.g. <iframe>) as well as JavaScript (to ensure rendering is in play) And lots more…

Slide 15

Slide 15 text

@SPEAKERNAME/#SMX @peakaceag Sound good? Interested in the slide deck as well and/or the GitHub repository (including all source codes) > https://pa.ag/smxtesting22 (all free!)

Slide 16

Slide 16 text

@SPEAKERNAME/#SMX @peakaceag Context matters Old domain vs new domain, mobile first indexing vs non-mobile first indexing, etc.

Slide 17

Slide 17 text

@SPEAKERNAME/#SMX @peakaceag Warning: Draw your own conclusions! Isolated “SEO testing” is next to impossible; be aware that there may be other (external) signals at play that you can’t control

Slide 18

Slide 18 text

@SPEAKERNAME/#SMX @peakaceag #1 Indexing Robots meta & X-Robots tags

Slide 19

Slide 19 text

@SPEAKERNAME/#SMX @peakaceag Anything wrong with this? I looked at this client website the other day and something felt off…

Slide 20

Slide 20 text

@SPEAKERNAME/#SMX @peakaceag It needs to be “content” instead of “value”! Using the “value” attribute is actually invalid according to W3C HTML specifications:

Slide 21

Slide 21 text

@SPEAKERNAME/#SMX @peakaceag Interestingly enough, Google doesn’t seem to care Google also utilises the invalid “value” attribute to manage indexing:

Slide 22

Slide 22 text

@SPEAKERNAME/#SMX @peakaceag What if you combined “value” and “content” attributes? Google considers the valid over the invalid attribute, it takes “content” in this instance:

Slide 23

Slide 23 text

@SPEAKERNAME/#SMX @peakaceag What if you change the element order? Order doesn’t matter – Google still takes the “content” attribute:

Slide 24

Slide 24 text

@SPEAKERNAME/#SMX @peakaceag Going down the rabbit hole… So, what about this one? – No… this can‘t work, can it?

Slide 25

Slide 25 text

@SPEAKERNAME/#SMX @peakaceag Google internally corrects “robot” to “robots” To control indexing, Google also considers the invalid “robot” value:

Slide 26

Slide 26 text

@SPEAKERNAME/#SMX @peakaceag What’s Google supposed to do with this one? Noindex (because it’s more restrictive) or index (because of the more precise UA)?

Slide 27

Slide 27 text

@SPEAKERNAME/#SMX @peakaceag Google considers the most specific user agent directive It’s no surprise; this approach hasn’t changed for years:

Slide 28

Slide 28 text

@SPEAKERNAME/#SMX @peakaceag But, what if… … you added an “X-Robots-Tag: noindex” header into the mix?

Slide 29

Slide 29 text

@SPEAKERNAME/#SMX @peakaceag Header and meta robots directives combined: +

Slide 30

Slide 30 text

@SPEAKERNAME/#SMX @peakaceag Header noindex vs meta robots index (for Googlebot) The generic X-Robots-Tag (no specific UA) overrides the more specific robots meta tag for “Googlebot”: + X-Robots-Tag: noindex I found this quite surprising since the Googlebot directive should supersede; it appears that the header and meta indexing pipelines are somewhat separated?

Slide 31

Slide 31 text

@SPEAKERNAME/#SMX @peakaceag Of course, GSC also takes rendering into account as well! For example: adding directives using JS works as expected (but only in )

Slide 32

Slide 32 text

@SPEAKERNAME/#SMX @peakaceag #2 Web Components Custom elements, Shadow DOM & more

Slide 33

Slide 33 text

@SPEAKERNAME/#SMX @peakaceag There seems to be a fair amount of confusion A typical “SEO answer“ about web components often looks like this: Look, I’m not really sure how Googlebot deals with custom HTML elements – better to play it safe and rely on good old, standardised HTML only…

Slide 34

Slide 34 text

@SPEAKERNAME/#SMX @peakaceag No idea what Web Components is? Source: http://webcomponents.github.io Web Components is a suite of different technologies allowing you to create reusable custom elements – with their functionality encapsulated away from the rest of your code – and utilise them in your web apps.

Slide 35

Slide 35 text

@SPEAKERNAME/#SMX @peakaceag In this example, we define , our very own HTML element. In this example, we’re generating a component using the shadow DOM.

Slide 36

Slide 36 text

@SPEAKERNAME/#SMX @peakaceag No issues using Web Components whatsoever! Content which lives in a web component, such as a custom HTML element, will be indexed properly. Essentially, it will be flattened into the main HTML: Content is created in an element which is part of the Shadow DOM Content which is part of the

Slide 37

Slide 37 text

@SPEAKERNAME/#SMX @peakaceag #3 CSS Content-Visibility Enables the user-agent to skip an element's rendering work

Slide 38

Slide 38 text

@SPEAKERNAME/#SMX @peakaceag Content-visibility, a new CSS property to boost rendering content-visibility enables the user-agent to skip an element's rendering work, including layout & painting, until it is needed – and therefore makes the initial load much faster! Source: https://pa.ag/2Wxn399

Slide 39

Slide 39 text

@SPEAKERNAME/#SMX @peakaceag Content exists in HTML mark-up but is set to “content-visibility:hidden” Respectively, the content will not be rendered or displayed at all.

Slide 40

Slide 40 text

@SPEAKERNAME/#SMX @peakaceag Whether it’s “auto” or “hidden”, the content will be found Even though these elements are skipped at rendering due to its content-visibility settings, the URL is returned for both test phrases: content-visibility:hidden content-visibility:auto

Slide 41

Slide 41 text

@SPEAKERNAME/#SMX @peakaceag #4 iFrames Including content from a second URL into a parent URL

Slide 42

Slide 42 text

@SPEAKERNAME/#SMX @peakaceag According to BuiltWith (top 1m), iFrames are still a thing: Source: https://pa.ag/2l8qDaN

Slide 43

Slide 43 text

@SPEAKERNAME/#SMX @peakaceag Revisited: parent URL + iFrame Parent page - area in yellow square iFramed content (from a 2nd URL) within the red highlighted square

Slide 44

Slide 44 text

@SPEAKERNAME/#SMX @peakaceag It appears that regular iFrames are dangerous these days iFrame content will be attributed to its parent URL post-render; the parent page can now be found for content from within the iFrame: This phrase is originally taken from within the iFrame, not from the parent URL

Slide 45

Slide 45 text

@SPEAKERNAME/#SMX @peakaceag Post-render, the parent page can now be found for content within the iFrame: To make it simple: this URL… … can now rank for content from this 2nd URL!

Slide 46

Slide 46 text

@SPEAKERNAME/#SMX @peakaceag Page level quality? What about all that 3rd party content people feed in?

Slide 47

Slide 47 text

@SPEAKERNAME/#SMX @peakaceag Still not convinced? We ran some follow-up tests, because: links!

Slide 48

Slide 48 text

@SPEAKERNAME/#SMX @peakaceag Added two additional links (1 internal, 1 external) to the iFrame URL

Slide 49

Slide 49 text

@SPEAKERNAME/#SMX @peakaceag Naturally, the GSC HTML displays the links: Again, they’re flattened into the DOM of the parent URL

Slide 50

Slide 50 text

@SPEAKERNAME/#SMX @peakaceag GSC’s “Top linked pages” report is also helpful The parent URL appears as the linking page for bastiangrimm.com – however, this URL doesn’t have any links in its HTML mark-up.

Slide 51

Slide 51 text

@SPEAKERNAME/#SMX @peakaceag So what can you do?

Slide 52

Slide 52 text

@SPEAKERNAME/#SMX @peakaceag Of course, you can noindex/robots.txt the frame content If you do, auto-generated meta descriptions will lack any iFrame content, also GSC rendered HTML doesn’t show the in-lined content (from the iFrame):

Slide 53

Slide 53 text

@SPEAKERNAME/#SMX @peakaceag Content from/in “hidden” frames won’t be indexed either! Similar to noindexed frames, a meta description does not appear in their SERPs The iFrame tag is using a display:none annotation, the content is not inlined with the rendered DOM No in-lining into the rendered DOM due to “hidden” applied via JS

Slide 54

Slide 54 text

@SPEAKERNAME/#SMX @peakaceag What about this new Robots tag? In early 2022, Google released “indexifembedded“ specifically for iFrame usage: Source: https://pa.ag/3KGvMfN

Slide 55

Slide 55 text

@SPEAKERNAME/#SMX @peakaceag X-Frame-Options Header If you want to prevent someone from loading (and ranking for) your content in an iFrame

Slide 56

Slide 56 text

@SPEAKERNAME/#SMX @peakaceag #5 Longform content Can pages actually become “too long“?

Slide 57

Slide 57 text

@SPEAKERNAME/#SMX @peakaceag You all recall this, I presume? Source: http://pa.ag/2A5630t

Slide 58

Slide 58 text

@SPEAKERNAME/#SMX @peakaceag In case you need it: Still true for desktop; for smartphone it’s fixed at ~1,700px in height, no scroll

Slide 59

Slide 59 text

@SPEAKERNAME/#SMX @peakaceag GSC is really only a preview! And here’s some further “proof” of that…

Slide 60

Slide 60 text

@SPEAKERNAME/#SMX @peakaceag GSC screenshot vs post-rendered (live) viewport Pushing the iFrame below 15,000 pixels so that the GSC will cut it off in its preview, still results in post-rendered content being found, just like in the first test: GSC preview doesn’t show any text content This content is only shown “below” a 15k pixel div container; GSC rendered HTML does indeed show the container, and of course, the test phrase was returned as well.

Slide 61

Slide 61 text

@SPEAKERNAME/#SMX @peakaceag The “More Info” tab is really awesome – use it! It can really help with troubleshooting and debugging, so make good use of it This is the same/similar to your Chrome Developer Console

Slide 62

Slide 62 text

@SPEAKERNAME/#SMX @peakaceag #6 CSS selectors Ever heard of .class::before and .class::after?

Slide 63

Slide 63 text

@SPEAKERNAME/#SMX @peakaceag What are CSS selectors and how do they work? ::before creates a pseudo element that is the first child of the matched element Source: https://pa.ag/2QRr9aH

Slide 64

Slide 64 text

@SPEAKERNAME/#SMX @peakaceag Content that lives in the HTML mark-up Content that lives in a CSS selector such as ::before

Slide 65

Slide 65 text

@SPEAKERNAME/#SMX @peakaceag Again, the GSC preview shows what it would look like: Googlebot seems to treat this identically to Chrome on desktop/smartphone, the rendered DOM remains unchanged (to be expected since it’s a pseudo class): HTML CSS

Slide 66

Slide 66 text

@SPEAKERNAME/#SMX @peakaceag Content from within CSS selectors won’t be indexed Whether Googlebot renders the URL or not, the content will not be found Content that lives in the HTML mark-up will be found and indexed, as expected Content that lives in a CSS selector such as ::before won’t be indexed.

Slide 67

Slide 67 text

@SPEAKERNAME/#SMX @peakaceag Why should you care? Maybe you have to display certain content that gets classified as “boilerplate” (e.g. shipping info) or you want to create a certain content footprint?

Slide 68

Slide 68 text

@SPEAKERNAME/#SMX @peakaceag #7 User-agent client hints API-based access to information about a user's browser – or a crawler’s features

Slide 69

Slide 69 text

@SPEAKERNAME/#SMX @peakaceag The User-Agent string is messy, like, very messy: Over the decades, this string has accrued a variety of details about the client making the request as well as cruft, due to backwards compatibility: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Slide 70

Slide 70 text

@SPEAKERNAME/#SMX @peakaceag The UA string will be frozen, client hints to take over User-Agent Client Hints are a new expansion to the Client Hints API and enables developers to access information about a user's browser – or a crawler’s features: Source: https://pa.ag/3AiiUaI

Slide 71

Slide 71 text

@SPEAKERNAME/#SMX @peakaceag It‘s never too early to start testing these things: Googlebot (running Chrome >89) apparently already populates those CH-headers:

Slide 72

Slide 72 text

@SPEAKERNAME/#SMX @peakaceag #8 Redirect fun Redirect chains: 301 vs 302 vs JS

Slide 73

Slide 73 text

@SPEAKERNAME/#SMX @peakaceag Redirect chains are bad – avoid them! But what if you have to use them?

Slide 74

Slide 74 text

@SPEAKERNAME/#SMX @peakaceag Up to 5 hops, they’ll show you the final destination GSC shows the content from the final “destination” of a URL in a redirect chain

Slide 75

Slide 75 text

@SPEAKERNAME/#SMX @peakaceag For 30x chains, GSC cuts you/the preview off after 5 hops: Behaviour seems to be in sync with Google’s statements concerning this: Source: https://pa.ag/2XdvKVr In general, what happens is Googlebot will follow five 301s in a row, then if we can’t reach the destination page, then we will try again the next time.

Slide 76

Slide 76 text

@SPEAKERNAME/#SMX @peakaceag Using JS, you could use 10 hops & it still seems to work However - I’m not saying using 10 redirects in a row is a great idea. They might not pass the same equity (if any) and are super sloooooow!

Slide 77

Slide 77 text

@SPEAKERNAME/#SMX @peakaceag Glad you asked, yes – they’ll even index the destination! Again, this is the content from the URL after 10x JS redirects have been executed

Slide 78

Slide 78 text

@SPEAKERNAME/#SMX @peakaceag Yeah, I really like to break things… GSC gave up when I tried to go for 15 hops… still, I wonder why the limit is different from server-side redirects – maybe render timeout?

Slide 79

Slide 79 text

@SPEAKERNAME/#SMX @peakaceag twitter.com/peakaceag facebook.com/peakaceag www.pa.ag Take your career to the next level: jobs.pa.ag THANK YOU! SEE YOU AT THE NEXT SMX! Care for the slides? Any questions? email us > [email protected] https://pa.ag/smxtesting22