Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Technical SEO Testing 2022 - SMX Advanced

Technical SEO Testing 2022 - SMX Advanced

Sometimes, technical SEO boils down to reading tea leaves: What could Google mean by this or that statement? How does that compare to what I heard from this or that person – or has worked in my past experience? It’s time for hard facts. Together, we'll debunk myths, challenge your misconceptions, and soak up fresh inspiration for creating your own SEO testing playground. Bastian shows you how to pinpoint common errors in indexing, examine how good Googlebot actually is when it comes to understanding hidden/inactive content, and explain why iframes are actually pretty important for evaluating page quality.

Bastian Grimm

November 07, 2022

More Decks by Bastian Grimm

Other Decks in Marketing & SEO


  1. @SPEAKERNAME/#SMX @peakaceag Technical SEO Testing 2022 Bastian Grimm, Peak Ace

    AG | @basgr Separating fact from fiction with the Peak Ace test lab
  2. @SPEAKERNAME/#SMX @peakaceag One of the biggest problems in SEO?

  3. @SPEAKERNAME/#SMX @peakaceag Misinformation!

  4. @SPEAKERNAME/#SMX @peakaceag “I’ve heard…” People incorrectly citing other parties, often

    without any context/deeper understanding of the issues at hand
  5. @SPEAKERNAME/#SMX @peakaceag Google says one thing… … but then actually

    does another (or it‘s so cryptic that you can‘t do much with it)
  6. @SPEAKERNAME/#SMX @peakaceag Say hello to the Peak Ace SEO playground

  7. @SPEAKERNAME/#SMX @peakaceag How does the setup work and what does

    it do? Pick case Case C Case A Case B Case N HeaderOption present? Apply headerOption ruleset: 0 X-Robots-Tag: noindex 1 X-Robots-Tag: noindex, no follow 2 Link: https://xxx.com/; rel=“canonical“ 3 … Is there metaOption? Apply metaOption ruleset: 0 <meta name=“robots“ content=“noindex, follow“ /> 2 <meta name=“robots“ value=“noindex, follow“ /> 3 <meta name=“robots“ value=“noindex, follow“ content=“noindex, follow“ /> 4 <meta name=“robots“ content=“noindex, follow“ /> 5 <meta name=“robot“ content=“noindex, follow“ /> 6 <meta name=“robots“ content=“noindex“ /><meta name=“googlebot“content=“noindex“ /> 7 <meta name=“googlebot“content=“unavailable_after: 1 Jan 1970 00:00:00 GMT“ /> 8 … regenerateOption present? Page indexable? Generate unique text Bot indexes the page Store visit logs in DB Session ends No No No Yes Yes Yes Yes No
  8. @SPEAKERNAME/#SMX @peakaceag Yeah… right! Let’s make this a bit more

    hands-on… essentially it’s a mini “SEO CMS”:
  9. @SPEAKERNAME/#SMX @peakaceag

  10. @SPEAKERNAME/#SMX @peakaceag Test-specific mark-up/directives in <head>, e.g. JS, meta or

    canonical tags
  11. @SPEAKERNAME/#SMX @peakaceag The actual URL that serves the content –

    especially interesting for redirects, etc.
  12. @SPEAKERNAME/#SMX @peakaceag Unique content, in different languages, to test the

    actual indexing of a page
  13. @SPEAKERNAME/#SMX @peakaceag A JS-based tracker, using feature detection to log

    Googlebot requests
  14. @SPEAKERNAME/#SMX @peakaceag A couple of things you can do with

    this Set up new HTML documents/tests with the click of a button Add an unlimited amount of server-side headers, such as X-Robots, canonicals, hreflang, redirects, caching, etc. Add elements to the document <head>, for example meta robots, canonical or <script> tags to run JS Add unique content to the page, depending on the language you want to test for (sometimes, content generation has a valid use-case) Add any type of HTML to the <body> / DOM Integrated bot tracking (JS for evergreen Googlebot + non-JS) by default Automatically generate output by using standard tags (e.g. <iframe>) as well as JavaScript (to ensure rendering is in play) And lots more…
  15. @SPEAKERNAME/#SMX @peakaceag Sound good? Interested in the slide deck as

    well and/or the GitHub repository (including all source codes) > https://pa.ag/smxtesting22 (all free!)
  16. @SPEAKERNAME/#SMX @peakaceag Context matters Old domain vs new domain, mobile

    first indexing vs non-mobile first indexing, etc.
  17. @SPEAKERNAME/#SMX @peakaceag Warning: Draw your own conclusions! Isolated “SEO testing”

    is next to impossible; be aware that there may be other (external) signals at play that you can’t control
  18. @SPEAKERNAME/#SMX @peakaceag #1 Indexing Robots meta & X-Robots tags

  19. @SPEAKERNAME/#SMX @peakaceag Anything wrong with this? I looked at this

    client website the other day and something felt off… <meta name="robots" value="noindex, follow" />
  20. @SPEAKERNAME/#SMX @peakaceag It needs to be “content” instead of “value”!

    Using the “value” attribute is actually invalid according to W3C HTML specifications: <meta name="robots" content="noindex, follow" />
  21. @SPEAKERNAME/#SMX @peakaceag Interestingly enough, Google doesn’t seem to care Google

    also utilises the invalid “value” attribute to manage indexing: <meta name="robots" value="noindex, follow" />
  22. @SPEAKERNAME/#SMX @peakaceag What if you combined “value” and “content” attributes?

    Google considers the valid over the invalid attribute, it takes “content” in this instance: <meta name="robots" value="noindex, follow" content="index, follow" />
  23. @SPEAKERNAME/#SMX @peakaceag What if you change the element order? Order

    doesn’t matter – Google still takes the “content” attribute: <meta name="robots" content="index, follow" value="noindex, follow" />
  24. @SPEAKERNAME/#SMX @peakaceag Going down the rabbit hole… So, what about

    this one? – No… this can‘t work, can it? <meta name="robot" content="noindex, follow" />
  25. @SPEAKERNAME/#SMX @peakaceag Google internally corrects “robot” to “robots” To control

    indexing, Google also considers the invalid “robot” value: <meta name="robot" content="noindex, follow" />
  26. @SPEAKERNAME/#SMX @peakaceag What’s Google supposed to do with this one?

    Noindex (because it’s more restrictive) or index (because of the more precise UA)? <meta name="robots" content="noindex" /> <meta name="googlebot" content="index" />
  27. @SPEAKERNAME/#SMX @peakaceag Google considers the most specific user agent directive

    It’s no surprise; this approach hasn’t changed for years: <meta name="robots" content="noindex" /> <meta name="googlebot" content="index" />
  28. @SPEAKERNAME/#SMX @peakaceag But, what if… … you added an “X-Robots-Tag:

    noindex” header into the mix?
  29. @SPEAKERNAME/#SMX @peakaceag Header and meta robots directives combined: +

  30. @SPEAKERNAME/#SMX @peakaceag Header noindex vs meta robots index (for Googlebot)

    The generic X-Robots-Tag (no specific UA) overrides the more specific robots meta tag for “Googlebot”: <meta name="robots" content="noindex" /> <meta name="googlebot" content="index" /> + X-Robots-Tag: noindex I found this quite surprising since the Googlebot directive should supersede; it appears that the header and meta indexing pipelines are somewhat separated?
  31. @SPEAKERNAME/#SMX @peakaceag Of course, GSC also takes rendering into account

    as well! For example: adding directives using JS works as expected (but only in <head>)
  32. @SPEAKERNAME/#SMX @peakaceag #2 Web Components Custom elements, Shadow DOM &

  33. @SPEAKERNAME/#SMX @peakaceag There seems to be a fair amount of

    confusion A typical “SEO answer“ about web components often looks like this: Look, I’m not really sure how Googlebot deals with custom HTML elements – better to play it safe and rely on good old, standardised HTML only…
  34. @SPEAKERNAME/#SMX @peakaceag No idea what Web Components is? Source: http://webcomponents.github.io

    Web Components is a suite of different technologies allowing you to create reusable custom elements – with their functionality encapsulated away from the rest of your code – and utilise them in your web apps.
  35. @SPEAKERNAME/#SMX @peakaceag In this example, we define <custom- component>, our

    very own HTML element. In this example, we’re generating a component using the shadow DOM.
  36. @SPEAKERNAME/#SMX @peakaceag No issues using Web Components whatsoever! Content which

    lives in a web component, such as a custom HTML element, will be indexed properly. Essentially, it will be flattened into the main HTML: Content is created in an element which is part of the Shadow DOM Content which is part of the <custom-component>
  37. @SPEAKERNAME/#SMX @peakaceag #3 CSS Content-Visibility Enables the user-agent to skip

    an element's rendering work
  38. @SPEAKERNAME/#SMX @peakaceag Content-visibility, a new CSS property to boost rendering

    content-visibility enables the user-agent to skip an element's rendering work, including layout & painting, until it is needed – and therefore makes the initial load much faster! Source: https://pa.ag/2Wxn399
  39. @SPEAKERNAME/#SMX @peakaceag Content exists in HTML mark-up but is set

    to “content-visibility:hidden” Respectively, the content will not be rendered or displayed at all.
  40. @SPEAKERNAME/#SMX @peakaceag Whether it’s “auto” or “hidden”, the content will

    be found Even though these elements are skipped at rendering due to its content-visibility settings, the URL is returned for both test phrases: content-visibility:hidden content-visibility:auto
  41. @SPEAKERNAME/#SMX @peakaceag #4 iFrames Including content from a second URL

    into a parent URL
  42. @SPEAKERNAME/#SMX @peakaceag According to BuiltWith (top 1m), iFrames are still

    a thing: Source: https://pa.ag/2l8qDaN
  43. @SPEAKERNAME/#SMX @peakaceag Revisited: parent URL + iFrame Parent page -

    area in yellow square iFramed content (from a 2nd URL) within the red highlighted square <iframe src="URL"></iframe>
  44. @SPEAKERNAME/#SMX @peakaceag It appears that regular iFrames are dangerous these

    days iFrame content will be attributed to its parent URL post-render; the parent page can now be found for content from within the iFrame: This phrase is originally taken from within the iFrame, not from the parent URL
  45. @SPEAKERNAME/#SMX @peakaceag Post-render, the parent page can now be found

    for content within the iFrame: To make it simple: this URL… … can now rank for content from this 2nd URL!
  46. @SPEAKERNAME/#SMX @peakaceag Page level quality? What about all that 3rd

    party content people feed in?
  47. @SPEAKERNAME/#SMX @peakaceag Still not convinced? We ran some follow-up tests,

    because: links!
  48. @SPEAKERNAME/#SMX @peakaceag Added two additional links (1 internal, 1 external)

    to the iFrame URL
  49. @SPEAKERNAME/#SMX @peakaceag Naturally, the GSC HTML displays the links: Again,

    they’re flattened into the DOM of the parent URL
  50. @SPEAKERNAME/#SMX @peakaceag GSC’s “Top linked pages” report is also helpful

    The parent URL appears as the linking page for bastiangrimm.com – however, this URL doesn’t have any links in its HTML mark-up.
  51. @SPEAKERNAME/#SMX @peakaceag So what can you do?

  52. @SPEAKERNAME/#SMX @peakaceag Of course, you can noindex/robots.txt the frame content

    If you do, auto-generated meta descriptions will lack any iFrame content, also GSC rendered HTML doesn’t show the in-lined content (from the iFrame):
  53. @SPEAKERNAME/#SMX @peakaceag Content from/in “hidden” frames won’t be indexed either!

    Similar to noindexed frames, a meta description does not appear in their SERPs The iFrame tag is using a display:none annotation, the content is not inlined with the rendered DOM No in-lining into the rendered DOM due to “hidden” applied via JS
  54. @SPEAKERNAME/#SMX @peakaceag What about this new Robots tag? In early

    2022, Google released “indexifembedded“ specifically for iFrame usage: Source: https://pa.ag/3KGvMfN
  55. @SPEAKERNAME/#SMX @peakaceag X-Frame-Options Header If you want to prevent someone

    from loading (and ranking for) your content in an iFrame
  56. @SPEAKERNAME/#SMX @peakaceag #5 Longform content Can pages actually become “too

  57. @SPEAKERNAME/#SMX @peakaceag You all recall this, I presume? Source: http://pa.ag/2A5630t

  58. @SPEAKERNAME/#SMX @peakaceag In case you need it: Still true for

    desktop; for smartphone it’s fixed at ~1,700px in height, no scroll
  59. @SPEAKERNAME/#SMX @peakaceag GSC is really only a preview! And here’s

    some further “proof” of that…
  60. @SPEAKERNAME/#SMX @peakaceag GSC screenshot vs post-rendered (live) viewport Pushing the

    iFrame below 15,000 pixels so that the GSC will cut it off in its preview, still results in post-rendered content being found, just like in the first test: GSC preview doesn’t show any text content This content is only shown “below” a 15k pixel div container; GSC rendered HTML does indeed show the container, and of course, the test phrase was returned as well.
  61. @SPEAKERNAME/#SMX @peakaceag The “More Info” tab is really awesome –

    use it! It can really help with troubleshooting and debugging, so make good use of it This is the same/similar to your Chrome Developer Console
  62. @SPEAKERNAME/#SMX @peakaceag #6 CSS selectors Ever heard of .class::before and

  63. @SPEAKERNAME/#SMX @peakaceag What are CSS selectors and how do they

    work? ::before creates a pseudo element that is the first child of the matched element Source: https://pa.ag/2QRr9aH
  64. @SPEAKERNAME/#SMX @peakaceag Content that lives in the HTML mark-up Content

    that lives in a CSS selector such as ::before
  65. @SPEAKERNAME/#SMX @peakaceag Again, the GSC preview shows what it would

    look like: Googlebot seems to treat this identically to Chrome on desktop/smartphone, the rendered DOM remains unchanged (to be expected since it’s a pseudo class): HTML CSS
  66. @SPEAKERNAME/#SMX @peakaceag Content from within CSS selectors won’t be indexed

    Whether Googlebot renders the URL or not, the content will not be found Content that lives in the HTML mark-up will be found and indexed, as expected Content that lives in a CSS selector such as ::before won’t be indexed.
  67. @SPEAKERNAME/#SMX @peakaceag Why should you care? Maybe you have to

    display certain content that gets classified as “boilerplate” (e.g. shipping info) or you want to create a certain content footprint?
  68. @SPEAKERNAME/#SMX @peakaceag #7 User-agent client hints API-based access to information

    about a user's browser – or a crawler’s features
  69. @SPEAKERNAME/#SMX @peakaceag The User-Agent string is messy, like, very messy:

    Over the decades, this string has accrued a variety of details about the client making the request as well as cruft, due to backwards compatibility: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
  70. @SPEAKERNAME/#SMX @peakaceag The UA string will be frozen, client hints

    to take over User-Agent Client Hints are a new expansion to the Client Hints API and enables developers to access information about a user's browser – or a crawler’s features: Source: https://pa.ag/3AiiUaI
  71. @SPEAKERNAME/#SMX @peakaceag It‘s never too early to start testing these

    things: Googlebot (running Chrome >89) apparently already populates those CH-headers:
  72. @SPEAKERNAME/#SMX @peakaceag #8 Redirect fun Redirect chains: 301 vs 302

    vs JS
  73. @SPEAKERNAME/#SMX @peakaceag Redirect chains are bad – avoid them! But

    what if you have to use them?
  74. @SPEAKERNAME/#SMX @peakaceag Up to 5 hops, they’ll show you the

    final destination GSC shows the content from the final “destination” of a URL in a redirect chain
  75. @SPEAKERNAME/#SMX @peakaceag For 30x chains, GSC cuts you/the preview off

    after 5 hops: Behaviour seems to be in sync with Google’s statements concerning this: Source: https://pa.ag/2XdvKVr In general, what happens is Googlebot will follow five 301s in a row, then if we can’t reach the destination page, then we will try again the next time.
  76. @SPEAKERNAME/#SMX @peakaceag Using JS, you could use 10 hops &

    it still seems to work However - I’m not saying using 10 redirects in a row is a great idea. They might not pass the same equity (if any) and are super sloooooow!
  77. @SPEAKERNAME/#SMX @peakaceag Glad you asked, yes – they’ll even index

    the destination! Again, this is the content from the URL after 10x JS redirects have been executed
  78. @SPEAKERNAME/#SMX @peakaceag Yeah, I really like to break things… GSC

    gave up when I tried to go for 15 hops… still, I wonder why the limit is different from server-side redirects – maybe render timeout?
  79. @SPEAKERNAME/#SMX @peakaceag twitter.com/peakaceag facebook.com/peakaceag www.pa.ag Take your career to the

    next level: jobs.pa.ag THANK YOU! SEE YOU AT THE NEXT SMX! Care for the slides? Any questions? email us > [email protected] https://pa.ag/smxtesting22