$30 off During Our Annual Pro Sale. View Details »

Technical SEO Testing 2022 - SMX Advanced

Technical SEO Testing 2022 - SMX Advanced

Sometimes, technical SEO boils down to reading tea leaves: What could Google mean by this or that statement? How does that compare to what I heard from this or that person – or has worked in my past experience? It’s time for hard facts. Together, we'll debunk myths, challenge your misconceptions, and soak up fresh inspiration for creating your own SEO testing playground. Bastian shows you how to pinpoint common errors in indexing, examine how good Googlebot actually is when it comes to understanding hidden/inactive content, and explain why iframes are actually pretty important for evaluating page quality.

Bastian Grimm
PRO

November 07, 2022
Tweet

More Decks by Bastian Grimm

Other Decks in Marketing & SEO

Transcript

  1. @SPEAKERNAME/#SMX
    @peakaceag
    Technical SEO Testing 2022
    Bastian Grimm, Peak Ace AG | @basgr
    Separating fact from fiction with the Peak Ace test lab

    View Slide

  2. @SPEAKERNAME/#SMX
    @peakaceag
    One of the biggest
    problems in SEO?

    View Slide

  3. @SPEAKERNAME/#SMX
    @peakaceag
    Misinformation!

    View Slide

  4. @SPEAKERNAME/#SMX
    @peakaceag
    “I’ve heard…”
    People incorrectly citing other parties,
    often without any context/deeper understanding
    of the issues at hand

    View Slide

  5. @SPEAKERNAME/#SMX
    @peakaceag
    Google says one thing…
    … but then actually does another
    (or it‘s so cryptic that you can‘t do much with it)

    View Slide

  6. @SPEAKERNAME/#SMX
    @peakaceag
    Say hello to the Peak Ace SEO playground

    View Slide

  7. @SPEAKERNAME/#SMX
    @peakaceag
    How does the setup work and what does it do?
    Pick
    case
    Case C
    Case A Case B Case N
    HeaderOption present?
    Apply headerOption ruleset:
    0 X-Robots-Tag: noindex
    1 X-Robots-Tag: noindex, no follow
    2 Link: https://xxx.com/; rel=“canonical“
    3 …
    Is there metaOption?
    Apply metaOption ruleset:
    0
    2
    3 follow“ content=“noindex, follow“ />
    4
    5
    6 />
    7 1 Jan 1970 00:00:00 GMT“ />
    8 …
    regenerateOption present? Page indexable?
    Generate unique text Bot indexes the page
    Store visit logs in DB
    Session ends
    No No No
    Yes Yes Yes Yes
    No

    View Slide

  8. @SPEAKERNAME/#SMX
    @peakaceag
    Yeah… right!
    Let’s make this a bit more hands-on… essentially it’s a mini “SEO CMS”:

    View Slide

  9. @SPEAKERNAME/#SMX
    @peakaceag

    View Slide

  10. @SPEAKERNAME/#SMX
    @peakaceag
    Test-specific mark-up/directives in
    , e.g. JS, meta or canonical tags

    View Slide

  11. @SPEAKERNAME/#SMX
    @peakaceag
    The actual URL that serves the content –
    especially interesting for redirects, etc.

    View Slide

  12. @SPEAKERNAME/#SMX
    @peakaceag
    Unique content, in different languages,
    to test the actual indexing of a page

    View Slide

  13. @SPEAKERNAME/#SMX
    @peakaceag
    A JS-based tracker, using feature
    detection to log Googlebot requests

    View Slide

  14. @SPEAKERNAME/#SMX
    @peakaceag
    A couple of things you can do with this
    Set up new HTML documents/tests with the click of a button
    Add an unlimited amount of server-side headers, such as X-Robots, canonicals,
    hreflang, redirects, caching, etc.
    Add elements to the document , for example meta robots, canonical or
    tags to run JS<br/>Add unique content to the page, depending on the language you want<br/>to test for (sometimes, content generation has a valid use-case)<br/>Add any type of HTML to the <body> / DOM<br/>Integrated bot tracking (JS for evergreen Googlebot + non-JS) by default<br/>Automatically generate output by using standard tags (e.g. <iframe>)<br/>as well as JavaScript (to ensure rendering is in play)<br/>And lots more…<br/>

    View Slide

  15. @SPEAKERNAME/#SMX
    @peakaceag
    Sound good?
    Interested in the slide deck as well and/or the GitHub repository
    (including all source codes) > https://pa.ag/smxtesting22 (all free!)

    View Slide

  16. @SPEAKERNAME/#SMX
    @peakaceag
    Context matters
    Old domain vs new domain, mobile first indexing vs
    non-mobile first indexing, etc.

    View Slide

  17. @SPEAKERNAME/#SMX
    @peakaceag
    Warning:
    Draw your own conclusions!
    Isolated “SEO testing” is next to impossible; be aware that there
    may be other (external) signals at play that you can’t control

    View Slide

  18. @SPEAKERNAME/#SMX
    @peakaceag
    #1 Indexing
    Robots meta & X-Robots tags

    View Slide

  19. @SPEAKERNAME/#SMX
    @peakaceag
    Anything wrong with this?
    I looked at this client website the other day and something felt off…

    View Slide

  20. @SPEAKERNAME/#SMX
    @peakaceag
    It needs to be “content” instead of “value”!
    Using the “value” attribute is actually invalid according to W3C HTML specifications:

    View Slide

  21. @SPEAKERNAME/#SMX
    @peakaceag
    Interestingly enough, Google doesn’t seem to care
    Google also utilises the invalid “value” attribute to manage indexing:

    View Slide

  22. @SPEAKERNAME/#SMX
    @peakaceag
    What if you combined “value” and “content” attributes?
    Google considers the valid over the invalid attribute, it takes “content” in this instance:

    View Slide

  23. @SPEAKERNAME/#SMX
    @peakaceag
    What if you change the element order?
    Order doesn’t matter – Google still takes the “content” attribute:

    View Slide

  24. @SPEAKERNAME/#SMX
    @peakaceag
    Going down the rabbit hole…
    So, what about this one? – No… this can‘t work, can it?

    View Slide

  25. @SPEAKERNAME/#SMX
    @peakaceag
    Google internally corrects “robot” to “robots”
    To control indexing, Google also considers the invalid “robot” value:

    View Slide

  26. @SPEAKERNAME/#SMX
    @peakaceag
    What’s Google supposed to do with this one?
    Noindex (because it’s more restrictive) or index (because of the more precise UA)?


    View Slide

  27. @SPEAKERNAME/#SMX
    @peakaceag
    Google considers the most specific user agent directive
    It’s no surprise; this approach hasn’t changed for years:


    View Slide

  28. @SPEAKERNAME/#SMX
    @peakaceag
    But, what if…
    … you added an “X-Robots-Tag: noindex”
    header into the mix?

    View Slide

  29. @SPEAKERNAME/#SMX
    @peakaceag
    Header and meta robots directives combined:
    +

    View Slide

  30. @SPEAKERNAME/#SMX
    @peakaceag
    Header noindex vs meta robots index (for Googlebot)
    The generic X-Robots-Tag (no specific UA) overrides the more specific robots meta tag
    for “Googlebot”:

    + X-Robots-Tag: noindex
    I found this quite surprising
    since the Googlebot directive
    should supersede; it appears
    that the header and meta
    indexing pipelines are
    somewhat separated?

    View Slide

  31. @SPEAKERNAME/#SMX
    @peakaceag
    Of course, GSC also takes rendering into account as well!
    For example: adding directives using JS works as expected (but only in )

    View Slide

  32. @SPEAKERNAME/#SMX
    @peakaceag
    #2 Web Components
    Custom elements, Shadow DOM & more

    View Slide

  33. @SPEAKERNAME/#SMX
    @peakaceag
    There seems to be a fair amount of confusion
    A typical “SEO answer“ about web components often looks like this:
    Look, I’m not really sure how Googlebot
    deals with custom HTML elements –
    better to play it safe and rely on good
    old, standardised HTML only…

    View Slide

  34. @SPEAKERNAME/#SMX
    @peakaceag
    No idea what Web Components is?
    Source: http://webcomponents.github.io
    Web Components is a suite of
    different technologies allowing
    you to create reusable custom
    elements – with their
    functionality encapsulated away
    from the rest of your code – and
    utilise them in your web apps.

    View Slide

  35. @SPEAKERNAME/#SMX
    @peakaceag
    In this example, we define component>, our very own HTML element.
    In this example, we’re generating a
    component using the shadow DOM.

    View Slide

  36. @SPEAKERNAME/#SMX
    @peakaceag
    No issues using Web Components whatsoever!
    Content which lives in a web component, such as a custom HTML element, will be
    indexed properly. Essentially, it will be flattened into the main HTML:
    Content is created in an element
    which is part of the Shadow DOM
    Content which is part of the

    View Slide

  37. @SPEAKERNAME/#SMX
    @peakaceag
    #3 CSS Content-Visibility
    Enables the user-agent to skip an element's rendering work

    View Slide

  38. @SPEAKERNAME/#SMX
    @peakaceag
    Content-visibility, a new CSS property to boost rendering
    content-visibility enables the user-agent to skip an element's rendering work, including
    layout & painting, until it is needed – and therefore makes the initial load much faster!
    Source: https://pa.ag/2Wxn399

    View Slide

  39. @SPEAKERNAME/#SMX
    @peakaceag
    Content exists in HTML
    mark-up but is set to
    “content-visibility:hidden”
    Respectively, the content
    will not be rendered or
    displayed at all.

    View Slide

  40. @SPEAKERNAME/#SMX
    @peakaceag
    Whether it’s “auto” or “hidden”, the content will be found
    Even though these elements are skipped at rendering due to its content-visibility
    settings, the URL is returned for both test phrases:
    content-visibility:hidden
    content-visibility:auto

    View Slide

  41. @SPEAKERNAME/#SMX
    @peakaceag
    #4 iFrames
    Including content from a second URL into a parent URL

    View Slide

  42. @SPEAKERNAME/#SMX
    @peakaceag
    According to BuiltWith (top 1m), iFrames are still a thing:
    Source: https://pa.ag/2l8qDaN

    View Slide

  43. @SPEAKERNAME/#SMX
    @peakaceag
    Revisited: parent URL + iFrame
    Parent page - area in yellow square
    iFramed content (from a 2nd URL)
    within the red highlighted square

    View Slide

  44. @SPEAKERNAME/#SMX
    @peakaceag
    It appears that regular iFrames are dangerous these days
    iFrame content will be attributed to its parent URL post-render; the parent page can now
    be found for content from within the iFrame:
    This phrase is originally taken from within
    the iFrame, not from the parent URL

    View Slide

  45. @SPEAKERNAME/#SMX
    @peakaceag
    Post-render, the parent page can now be found for content
    within the iFrame:
    To make it simple: this URL…
    … can now rank for content
    from this 2nd URL!

    View Slide

  46. @SPEAKERNAME/#SMX
    @peakaceag
    Page level quality?
    What about all that 3rd party content people feed in?

    View Slide

  47. @SPEAKERNAME/#SMX
    @peakaceag
    Still not convinced?
    We ran some follow-up tests, because: links!

    View Slide

  48. @SPEAKERNAME/#SMX
    @peakaceag
    Added two additional links (1 internal,
    1 external) to the iFrame URL

    View Slide

  49. @SPEAKERNAME/#SMX
    @peakaceag
    Naturally, the GSC HTML displays the links:
    Again, they’re flattened into the DOM of the parent URL

    View Slide

  50. @SPEAKERNAME/#SMX
    @peakaceag
    GSC’s “Top linked pages” report is also helpful
    The parent URL appears as the linking page for bastiangrimm.com – however, this URL
    doesn’t have any links in its HTML mark-up.

    View Slide

  51. @SPEAKERNAME/#SMX
    @peakaceag
    So what can you do?

    View Slide

  52. @SPEAKERNAME/#SMX
    @peakaceag
    Of course, you can noindex/robots.txt the frame content
    If you do, auto-generated meta descriptions will lack any iFrame content, also GSC
    rendered HTML doesn’t show the in-lined content (from the iFrame):

    View Slide

  53. @SPEAKERNAME/#SMX
    @peakaceag
    Content from/in “hidden” frames won’t be indexed either!
    Similar to noindexed frames, a meta description does not appear in their SERPs
    The iFrame tag is using
    a display:none
    annotation, the content
    is not inlined with the
    rendered DOM No in-lining into the
    rendered DOM due to
    “hidden” applied via JS

    View Slide

  54. @SPEAKERNAME/#SMX
    @peakaceag
    What about this new Robots tag?
    In early 2022, Google released “indexifembedded“ specifically for iFrame usage:
    Source: https://pa.ag/3KGvMfN

    View Slide

  55. @SPEAKERNAME/#SMX
    @peakaceag
    X-Frame-Options Header
    If you want to prevent someone from loading
    (and ranking for) your content in an iFrame

    View Slide

  56. @SPEAKERNAME/#SMX
    @peakaceag
    #5 Longform content
    Can pages actually become “too long“?

    View Slide

  57. @SPEAKERNAME/#SMX
    @peakaceag
    You all recall this, I presume?
    Source: http://pa.ag/2A5630t

    View Slide

  58. @SPEAKERNAME/#SMX
    @peakaceag
    In case you need it:
    Still true for desktop; for smartphone
    it’s fixed at ~1,700px in height, no scroll

    View Slide

  59. @SPEAKERNAME/#SMX
    @peakaceag
    GSC is really only a preview!
    And here’s some further “proof” of that…

    View Slide

  60. @SPEAKERNAME/#SMX
    @peakaceag
    GSC screenshot vs post-rendered (live) viewport
    Pushing the iFrame below 15,000 pixels so that the GSC will cut it off in its preview,
    still results in post-rendered content being found, just like in the first test:
    GSC preview
    doesn’t show
    any text
    content
    This content is only shown “below” a 15k pixel
    div container; GSC rendered HTML does
    indeed show the container, and of course, the
    test phrase was returned as well.

    View Slide

  61. @SPEAKERNAME/#SMX
    @peakaceag
    The “More Info” tab is really awesome – use it!
    It can really help with troubleshooting and debugging, so make good use of it
    This is the same/similar
    to your Chrome
    Developer Console

    View Slide

  62. @SPEAKERNAME/#SMX
    @peakaceag
    #6 CSS selectors
    Ever heard of .class::before and .class::after?

    View Slide

  63. @SPEAKERNAME/#SMX
    @peakaceag
    What are CSS selectors and how do they work?
    ::before creates a pseudo element that is the first child of the matched element
    Source: https://pa.ag/2QRr9aH

    View Slide

  64. @SPEAKERNAME/#SMX
    @peakaceag
    Content that lives in
    the HTML mark-up
    Content that lives in a CSS
    selector such as ::before

    View Slide

  65. @SPEAKERNAME/#SMX
    @peakaceag
    Again, the GSC preview shows what it would look like:
    Googlebot seems to treat this identically to Chrome on desktop/smartphone, the
    rendered DOM remains unchanged (to be expected since it’s a pseudo class):
    HTML
    CSS

    View Slide

  66. @SPEAKERNAME/#SMX
    @peakaceag
    Content from within CSS selectors won’t be indexed
    Whether Googlebot renders the URL or not, the content will not be found
    Content that lives in the HTML mark-up
    will be found and indexed, as expected
    Content that lives in a CSS selector
    such as ::before won’t be indexed.

    View Slide

  67. @SPEAKERNAME/#SMX
    @peakaceag
    Why should you care?
    Maybe you have to display certain content that gets classified as “boilerplate”
    (e.g. shipping info) or you want to create a certain content footprint?

    View Slide

  68. @SPEAKERNAME/#SMX
    @peakaceag
    #7 User-agent client hints
    API-based access to information about a user's browser –
    or a crawler’s features

    View Slide

  69. @SPEAKERNAME/#SMX
    @peakaceag
    The User-Agent string is messy, like, very messy:
    Over the decades, this string has accrued a variety of details about the client making the
    request as well as cruft, due to backwards compatibility:
    Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P)
    AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82
    Mobile Safari/537.36 (compatible; Googlebot/2.1;
    +http://www.google.com/bot.html)

    View Slide

  70. @SPEAKERNAME/#SMX
    @peakaceag
    The UA string will be frozen, client hints to take over
    User-Agent Client Hints are a new expansion to the Client Hints API and enables
    developers to access information about a user's browser – or a crawler’s features:
    Source: https://pa.ag/3AiiUaI

    View Slide

  71. @SPEAKERNAME/#SMX
    @peakaceag
    It‘s never too early to start testing these things:
    Googlebot (running Chrome >89) apparently already populates those CH-headers:

    View Slide

  72. @SPEAKERNAME/#SMX
    @peakaceag
    #8 Redirect fun
    Redirect chains:
    301 vs 302 vs JS

    View Slide

  73. @SPEAKERNAME/#SMX
    @peakaceag
    Redirect chains are bad –
    avoid them!
    But what if you have to use them?

    View Slide

  74. @SPEAKERNAME/#SMX
    @peakaceag
    Up to 5 hops, they’ll show you the final destination
    GSC shows the content
    from the final “destination”
    of a URL in a redirect chain

    View Slide

  75. @SPEAKERNAME/#SMX
    @peakaceag
    For 30x chains, GSC cuts you/the preview off after 5 hops:
    Behaviour seems to be in sync with Google’s statements concerning this:
    Source: https://pa.ag/2XdvKVr
    In general, what happens is
    Googlebot will follow five 301s in
    a row, then if we can’t reach the
    destination page, then we will try
    again the next time.

    View Slide

  76. @SPEAKERNAME/#SMX
    @peakaceag
    Using JS, you could use 10 hops & it still seems to work
    However - I’m not saying using 10 redirects in a row is a great idea. They might not pass
    the same equity (if any) and are super sloooooow!

    View Slide

  77. @SPEAKERNAME/#SMX
    @peakaceag
    Glad you asked, yes – they’ll even index the destination!
    Again, this is the content
    from the URL after 10x JS
    redirects have been executed

    View Slide

  78. @SPEAKERNAME/#SMX
    @peakaceag
    Yeah, I really like to break things…
    GSC gave up when I tried to go for 15 hops… still, I wonder why the limit is different from
    server-side redirects – maybe render timeout?

    View Slide

  79. @SPEAKERNAME/#SMX
    @peakaceag
    twitter.com/peakaceag
    facebook.com/peakaceag
    www.pa.ag
    Take your career to the next level: jobs.pa.ag
    THANK YOU!
    SEE YOU AT THE NEXT SMX!
    Care for the slides? Any questions?
    email us > [email protected]
    https://pa.ag/smxtesting22

    View Slide