$30 off During Our Annual Pro Sale. View Details »

The Technical Monitoring Compendium - SMX Munich Virtual 2021

The Technical Monitoring Compendium - SMX Munich Virtual 2021

My talk from SMX Munich 2021 titled The Technical Monitoring Compendium covering everything you need to know about quality control for your website including tools, processes and much more!

Bastian Grimm
PRO

November 07, 2022
Tweet

More Decks by Bastian Grimm

Other Decks in Technology

Transcript

  1. The Technical
    Monitoring Compendium
    Everything you need to know about quality control
    for your website
    Bastian Grimm, Peak Ace AG | @basgr

    View Slide

  2. All the way back to 2008… yep, that's 13 years ago!
    To set the scene,
    let‘s go back in time

    View Slide

  3. pa.ag
    @peakaceag
    3
    Back then, I used to explain SEO to C-suites like this:
    Yes, that’s an original (wonderfully ugly 4:3 layout) slide from way back then…
    1. Build an (optimised) site that’s easy for crawlers
    to understand. Plain and simple HTML wins!
    2. New content daily (quantity over quality –
    if it’s readable, it’ll do!)
    3. Most importantly: links, links, and more links!
    Quality doesn’t matter - a link is a link, isn’t it?

    View Slide

  4. pa.ag
    @peakaceag
    4
    The three cornerstones of SEO – 2021 edition
    Ensure crawl- & renderability, optimise
    architecture, intl. targeting and linking.
    Provide unique, holistic coverage of
    relevant topics for your readership.
    Off-page
    On-page
    “Get people to talk about us.”
    External linking, citations, brand mentions & PR
    Trust
    Technical
    Content
    User
    Experience

    View Slide

  5. pa.ag
    @peakaceag
    5
    There's a broad range of tools available
    From very simple “one-off availability testing“ down to large-scale, continuous
    monitoring and trend- and comparison reporting:
    … and many, many more!

    View Slide

  6. How to keep track of things at every level of your domain
    Domain-wide monitoring

    View Slide

  7. pa.ag
    @peakaceag
    7
    Monitoring on host-, domain- and server-level
    The following is usually checked on a global level, once per domain:
    robots.txt
    (availability & changes)
    404, 410 & 503 error pages
    (proper status returned)
    Domain name
    (expiration)
    XML sitemap
    (availability & changes)
    Nameserver & MX records
    (hosting/backend changes)
    IP address
    (changes)
    SSL certificate
    (expiration)

    View Slide

  8. pa.ag
    @peakaceag
    8
    Example: Little Warden's host-level checks

    View Slide

  9. pa.ag
    @peakaceag
    9
    Tailor to your own needs (and preferences)
    You might have a different sitemap (index) URL or feel like changing intervals for certain
    (expiry) notification checks though:

    View Slide

  10. pa.ag
    @peakaceag
    10
    Alert detected on peakace.agency
    Detected on Feb 19, 2021
    Notification pops up (depending on your tool of choice)
    By default, almost all tools rely on emails, however some also allow other methods such
    as Slack; more on that later:
    View alert details on
    monitoring platform
    Type of issue, affected
    domain/URLs & issue details
    Monitoring project
    name/identifier

    View Slide

  11. pa.ag
    @peakaceag
    11
    Monitoring availability (e.g. http 200) is often not enough
    For your robots.txt file you need to understand when its contents has been changed:
    RYTE’s robots.txt history feature allows you
    to seamlessly roll back to older versions.
    RYTE sends a handy
    notification telling
    you that this specific
    line has been added
    to your robots.txt file

    View Slide

  12. pa.ag
    @peakaceag
    12
    Beyond validation: monitor URL inventory changes
    You might also want to know if certain URLs dropped out of your XML sitemap
    Customize severity as needed

    View Slide

  13. Monitoring your HTML mark-up and server headers
    for changes and sending notifications
    SEO-centred monitoring

    View Slide

  14. pa.ag
    @peakaceag
    14
    Default must-have HTML mark-up monitoring
    As a minimum, check for title, meta description and canonical tag (if used):
    Whoops… looks like someone
    forget to change the subdomain to
    WWW for the production server… ;)

    View Slide

  15. pa.ag
    @peakaceag
    15
    Even for those “basic“ checks, customisation is crucial
    Maybe you don’t just want to know when “something” changed, but precisely what’s
    new. Depending on the tool, you have a variety of validation options available:

    View Slide

  16. pa.ag
    @peakaceag
    16
    Also, we need to talk about (and monitor) indexability
    Note: (potentially) just looking at a robot’s meta directive might not always be enough:
    Ensure you also check for
    (accidental) blocking of URLs
    either through robots.txt or
    X-Robots headers.

    View Slide

  17. pa.ag
    @peakaceag
    17
    Speaking of headers: monitoring HTTP status codes
    Because you want to know when (old) redirects stop working, or relevant URLs all of a
    sudden become broken and return a 4xx error:
    This should have been an http
    200 status code – so maybe
    someone redirected this by
    accident?

    View Slide

  18. GA, GSC & GTM tags? Twitter cards & OpenGraph?
    Different types of schema.org mark-up?
    Our very own (custom) HTML tags or
    presence of scripts?
    But what about monitoring XYZ?

    View Slide

  19. pa.ag
    @peakaceag
    19
    Use regular expressions to check for anything you want
    In reality, it doesn’t really matter
    what a tool can monitor for you
    as long as you can make use of
    RegEx or Xpath.

    View Slide

  20. “A regular expression (shortened as regex or regexp) is a
    sequence of characters that specifies a search pattern.”
    RegEx… RegWhat?

    View Slide

  21. pa.ag
    @peakaceag
    21
    A practical example: finding GSC verification tags
    Scenario: you have a website and you want/need to find the GSC verification tag(s)
    within the HTML mark-up:

    View Slide

  22. pa.ag
    @peakaceag
    22
    regexr.com for building and testing stuff on the fly
    More: https://regexr.com

    View Slide

  23. pa.ag
    @peakaceag
    23
    Use regular expressions to check for anything you want
    Don’t know how to write RegEx? Check out this fantastic guide for marketers by Annie
    Cushing with tons of real-world examples:
    More: https://pa.ag/30uPiak

    View Slide

  24. As an SEO, I certainly want to know when my link graph
    undergoes a significant change such as this one
    Tip: monitor the sites‘ main
    navigation using RegEx

    View Slide

  25. pa.ag
    @peakaceag
    25
    xpather.com does the same for XML Path Language
    XPath uses path expressions to select nodes in an HTML/XML document and allows you
    to navigate through the document:
    More: http://xpather.com

    View Slide

  26. pa.ag
    @peakaceag
    26
    Large-scale monitoring over time: now the fun starts
    Spotting anomalies and understanding trends by comparing crawl data at scale:

    View Slide

  27. pa.ag
    @peakaceag
    27
    The RYTE dashboard provides a handy issue timeline
    This makes it also very easy to see if crucial issues were tackled in time, and also if
    there's a pattern (e.g. caused by certain types of deployments)

    View Slide

  28. pa.ag
    @peakaceag
    28
    Smart segmentation to make things more tangible
    Especially for large sites, this really can help to understand the impact straight away:
    ContentKing Weekly Report
    MAR 1, 2021 – MAR 7, 2021
    ContentKing sends a weekly report to update you on detected changes and issues.
    Want to customise which websites are included in the report? Configure your email setting here.
    ContentKing provides
    a health score per
    segment, allowing you
    to easily prioritize fixes

    View Slide

  29. pa.ag
    @peakaceag
    29
    ContentKing does continues crawling
    Which allows for some really cool stuff such as “live discovery“ e.g. when links break,
    including respective recovery – without you having to re-run your crawl
    Alert detected on peakace.agency
    Detected on Mar 1, 2021
    Alert resolved on peakace.agency
    Resolved on Mar 2, 2021

    View Slide

  30. pa.ag
    @peakaceag
    30
    Enterprise-level monitoring: crawl depth comparison
    Especially for large sites, a spike of 100k+ pages losing all internal links is something I
    certainly would want to be notified about:

    View Slide

  31. Some ideas for additional monitoring and checks
    we‘d recommend for anyone in e-commerce
    Monitoring in e-commerce

    View Slide

  32. pa.ag
    @peakaceag
    32
    Legal texts such as imprint, terms, etc.
    Ensure you are compliant by serving links / pages for imprint, respective terms (e.g.
    shipping, pricing, etc) and necessary opt-outs (tracking) or other legal texts:
    LeanKoala offers a variety of
    default “legal” checks; of course,
    these can be customised and/or
    extended as needed.
    1 2 3

    View Slide

  33. pa.ag
    @peakaceag
    33
    Even better: go all-in on GDPR compliance monitoring
    RYTE has a very handy GDPR compliance report showing external scripts on a website
    that are active prior to the user giving their consent:

    View Slide

  34. pa.ag
    @peakaceag
    34
    Change of available inventory (e.g. in categories)
    Monitor for categories and/or other listing pages and ensure they have a certain
    minimum number of products available at all times:

    View Slide

  35. Monitor the same for your filters / facetted navigation!
    And while you’re at it:

    View Slide

  36. pa.ag
    @peakaceag
    36
    Lost categories vs new categories
    Especially in enterprise e-commerce setups, it’s quite common to have a dedicated
    “shop management” team responsible for maintaining the category tree…

    View Slide

  37. pa.ag
    @peakaceag
    37
    Monitoring for product details: pricing, availability, etc.
    Sure, these aren’t purely SEO-related elements – they’re still crucial. Some ideas:
    URL
    ✓ Clean-URL & HTTP 200 status
    ✓ Self-canonicalised
    ✓ Matches schema.org breadcrumb
    Product recommendations
    ✓ Element is present
    ✓ Minimum number of
    elements present
    ✓ Schema.org mark-up for
    related products
    Benefits & trust
    ✓ Elements present
    Legal & payment info
    ✓ Necessary legal elements / links
    (imprint, privacy) are present
    ✓ Payment information available
    HTTPS
    ✓ Valid SSL certificate
    ✓ Check corresponding non-HTTPs
    URL for 301 redirect
    Price
    ✓ Amount > 0,00 €
    ✓ Currency according to geo setup
    Shipping info
    ✓ Elements present
    Product description
    ✓ Description present
    ✓ Length OK (again, define threshold)
    ✓ If present: check for internal links or
    structured elements such as
    Product
    ✓ Title present
    ✓ Length OK (define threshold)
    ✓ Correct HTML mark-up, e.g.
    ✓ Schema.org product mark-up
    ✓ Match availability w/ indexing rules

    View Slide

  38. For example: monitor their availability and
    dynamically adjust your pricing
    Why not spy on your competitors too?

    View Slide

  39. Detecting and monitoring apparent duplication within
    your website to prevent negative performance
    Canonicalisation monitoring

    View Slide

  40. pa.ag
    @peakaceag
    40
    Most common causes of duplicate content
    E.g. for Google, these examples are each two different URLs:
    Production server
    vs
    https://pa.ag/url-A/
    https://pa.ag/url-a/
    Case sensitivity
    https://pa.ag/url-b
    https://pa.ag/url-b/
    Trailing slashes
    Staging / testing server
    https://pa.ag
    https://www.pa.ag
    non-www vs www
    http://pa.ag
    https://pa.ag
    HTTP vs HTTPS

    View Slide

  41. pa.ag
    @peakaceag
    41
    Most common causes of duplicate content
    E.g. for Google, these examples are each two different URLs:
    https://pa.ag/url-A/
    https://pa.ag/url-a/
    Case sensitivity
    https://pa.ag/url-b
    https://pa.ag/url-b/
    Trailing slashes
    https://pa.ag
    https://www.pa.ag
    non-www vs www
    http://pa.ag
    https://pa.ag
    HTTP vs HTTPS
    Dealing with duplication issues
    ▪ 301 redirect: e.g. non-www vs www, HTTP vs HTTPs,
    casing (upper/lower), trailing slashes, Index pages
    (index.php)
    ▪ noindex: e.g. white labelling, internal search result pages,
    work-in-progress content, PPC- and other landing pages
    ▪ (Self-referencing) canonicals: e.g. for parameters used for
    tracking, session IDs, printer friendly version, PDF to
    HTML, etc.
    ▪ 403 password protect: e.g. staging-/development servers
    ▪ 404/410 gone: e.g. feeded content that needs to go fast,
    other outdated/irrelevant or low-quality content
    i
    301
    301

    View Slide

  42. pa.ag
    @peakaceag
    42
    Advanced URL checks for top-notch canonicalisation
    This only really works well if you’ve cleaned your URL structure beforehand:
    1 2
    3

    View Slide

  43. And production environments, e.g. in other data centres
    or reachable via additional host names
    Don‘t forget your
    staging server(s)

    View Slide

  44. pa.ag
    @peakaceag
    44
    Different types of staging/test servers are possible
    Make sure the server is locked down properly to ensure your content doesn’t get
    indexed in advance – and set up monitoring accordingly
    Methodology Pros Cons
    noindex (meta tag/header)
    ▪ External tools can access without
    separate access rules
    ▪ URLs are definitely not indexed
    ▪ Indexing rules cannot be tested fully
    (all noindex)
    ▪ Waste of crawl budget
    robots.txt
    ▪ External tools can access without
    separate access rules
    ▪ No crawl budget is wasted
    ▪ Indexing rules cannot be tested fully
    (only with robots.txt override)
    ▪ If linked, test URLs may appear in the index
    (without title/metas).
    password secured (.htaccess)
    ▪ No crawl budget is wasted
    ▪ URLs are definitely not indexed
    ▪ Everything can be tested properly
    ▪ External tools must be able to handle
    password authentication.
    IP-based access
    ▪ No crawl budget is wasted
    ▪ URLs are definitely not indexed
    ▪ Everything can be tested properly
    ▪ External tools must be able to handle
    IP-based authentication.
    VPN ▪ Completely safe! ▪ So safe, only a few tools can handle it!

    View Slide

  45. pa.ag
    @peakaceag
    45
    Monitoring your staging/test server URLs
    Depending on your server’s setup, you need to either check for the http response code
    or (non-) indexability:
    1
    2
    3

    View Slide

  46. I’m not a fan of geo redirects, but
    sometimes they’re necessary (e.g. for legal reasons)
    Geo redirect monitoring

    View Slide

  47. pa.ag
    @peakaceag
    47
    Don‘t automatically redirect users without giving options
    Better just let the user pick the suggested/relevant international website instead:

    View Slide

  48. pa.ag
    @peakaceag
    48
    Geo redirects how-to (e.g. if you need to, due to licensing)
    If the user is guided to a special language folder based on their IP
    , the redirect needs to
    be temporary (302 or 307), otherwise caching issues will come up:

    View Slide

  49. pa.ag
    @peakaceag
    49
    Oh and btw: don‘t do this, either…
    Disney wastes loads of link equity by relying on JS-redirects:

    View Slide

  50. pa.ag
    @peakaceag
    50
    Geo redirect monitoring: because this always “breaks“
    Ensure proper redirects are in place according to the request’s origin:
    Requesting the URL using an IP address located
    in the following geographical region:
    www.domain.com
    www.domain.com/de/
    e.g. 302 redirect

    View Slide

  51. pa.ag
    @peakaceag
    51
    How to check geo redirects e.g. with Little Warden
    Ensure proper redirects are in place according to the request’s origin:

    View Slide

  52. Moving beyond HTML mark-up checks
    Visual change monitoring

    View Slide

  53. pa.ag
    @peakaceag
    53
    You’re all aware of this by now, right?
    Google renders almost every URL; but why?
    Source: https://pa.ag/3t0RVgv
    According to W3Techs,
    JavaScript is used […] by
    97.1% of all websites.
    Rendered preview of any given website,
    including “executed” JavaScript

    View Slide

  54. So you can‘t either – otherwise, on JS-heavy websites you
    could just not do any form of content monitoring/checks
    (as it’s not part of the regular mark-up)
    Google doesn’t rely on
    just an HTML mark-up

    View Slide

  55. pa.ag
    @peakaceag
    55
    visualping.io - website change detection & alerts
    Select and relax: visualping lets you know when the page (or selected area) changes
    What we like about the tool / key features:
    ▪ Lets you specify %-change (any, tiny, medium, major)
    ▪ Proxy functionality for geo specific monitoring
    ▪ Down to checking for changes every 5 minutes
    ▪ Rendering capabilities, so also great for JS-heavy sites
    ▪ Loads more!

    View Slide

  56. pa.ag
    @peakaceag
    56
    hexowatch.com - monitor any website for visual changes
    Your AI sidekick to monitor any website for changes to visuals, content, source code,
    technology, availability, or price.

    View Slide

  57. No matter what you want and need to monitor,
    there's a solution for you
    More monitoring tools

    View Slide

  58. pa.ag
    @peakaceag
    58
    Uptimerobot.com - continuous uptime checks every 5 min
    Simple, yet efficient uptime / availability monitoring of domains and/or URLs
    More: https://uptimerobot.com/#features
    What we like about the tool / key features:
    ▪ Super simple interface, 1 minute and you’re good
    to go
    ▪ “TV mode” to run monitoring on large screens
    ▪ Allows custom port- and service monitoring
    ▪ Pre-defined maintenance windows to automatically
    pause and re-enable monitoring
    ▪ Seamless integration with status pages

    View Slide

  59. pa.ag
    @peakaceag
    59
    Pingbreak.com - free BETA uptime monitor
    Pingbreak relies on Twitter as its default communication channel (account required) but
    allows alerting to Slack, Discord, Mattermost, Telegram and custom alert services:
    More: https://pingbreak.com
    What we like about the tool / key features:
    ▪ Entirely free
    ▪ Unlimited monitoring of websites, at 1 minute
    intervals
    ▪ Webhook support allows alerting to almost any
    service you can think of

    View Slide

  60. pa.ag
    @peakaceag
    60
    Testomato.com - website content & uptime monitoring
    Very affordable solution (incl. uptime monitoring + API, pricing starts at $49 monthly)
    More: https://www.testomato.com
    What we like about the tool / key features:
    ▪ Robust and easy-to-use interface
    ▪ Nice email reports, including direct notification to
    3rd party services such as Slack
    ▪ Errors are automatically re-tested/checked, also
    from other (geo-) locations, meaning fewer false
    alarms
    ▪ Specific checks for server headers, redirects, etc
    can be set up with one click
    ▪ Extremely simple setup, not only for URL checks
    but also for specific content monitoring tasks

    View Slide

  61. pa.ag
    @peakaceag
    61
    Testomato.com - website content & uptime monitoring
    Setting up checks is super simple and very visual; totally doable for SEOs without any
    coding skills:
    Very easy setup, e.g. custom matching of HTML elements using
    XPath, so in this case we’re actually monitoring a domain to
    contain a certain by the “id” of “page-content”.
    i

    View Slide

  62. pa.ag
    @peakaceag
    62
    Leankoala.com - monitoring meets testing
    With Leankoala, you can move beyond just simple testing; it comes with 30+ tools that
    check for different characteristics to ensure a website is functioning correctly.
    More: https://www.leankoala.com/en/features

    View Slide

  63. pa.ag
    @peakaceag
    63
    fluxgard.com - enterprise change monitoring
    Monitor website changes, detect content, code and design edits (includes
    lighthouse/cookie/network activity changes). Expensive ($10K per quarter)!
    More: https://fluxguard.com/features

    View Slide

  64. pa.ag
    @peakaceag
    64
    Need to publish monitoring info to your customers?
    Try out either status.io or Atlassian Statuspage to make monitoring info available to the
    public:
    More: https://status.io & https://pa.ag/3rCpTaS

    View Slide

  65. Email is great, but maybe not enough? Connecting
    monitoring systems to your “working environment”
    Notifications & alerts

    View Slide

  66. pa.ag
    @peakaceag
    66
    Zapier.com – connect your apps and automate workflows
    Zapier allows you to push data from one software system to another, without writing
    custom code. You can also create multi-step Zaps.
    Check out this tutorial: https://pa.ag/2PTbQja

    View Slide

  67. pa.ag
    @peakaceag
    67
    integromat.com - complex automation made easy
    Harder to use, yet allows for more complex workflows; a really strong Zapier alternative:
    More: https://www.integromat.com

    View Slide

  68. pa.ag
    @peakaceag
    68
    tray.io - enterprise level automation solution
    More: https://tray.io

    View Slide

  69. pa.ag
    @peakaceag
    69
    Pipedream.com - testing webhooks & APIs made easy
    More: https://pipedream.com
    Webhooks allow you
    to send real-time data
    from one application
    to another whenever
    a given event occurs.

    View Slide

  70. Monitoring 3rd party security services / providers
    Protect your website

    View Slide

  71. pa.ag
    @peakaceag
    71
    Google Safe Browsing powers e.g. warnings in Chrome
    The “Deceptive site ahead” is a warning in the Chrome browser that can protect you
    from phishing, scams, and malware-laden sites

    View Slide

  72. pa.ag
    @peakaceag
    72
    Google Safe Browsing site status monitoring
    To ensure your domain doesn‘t show a warning in Google Chrome and Search:
    Try it for your domain: https://pa.ag/3cjsuQn

    View Slide

  73. pa.ag
    @peakaceag
    73
    Monitoring this only works with a tool for visual checks
    Google’s transparency report website was built using the popular JavaScript framework
    AngularJS, therefore you can’t simply check the HTML mark-up:
    Tools like visualping render
    the target URL’s content and
    therefore can also monitor
    websites built, for example,
    entirely in JavaScript.

    View Slide

  74. pa.ag
    @peakaceag
    74
    Other relevant players on the market
    Norton Safe Web, Web of Trust, Avira Browser Safety, BitDefender Traffic Light, etc.
    Source: https://pa.ag/3rwTgv6
    These services usually power browser extensions via API, and if your domain is flagged
    “suspicious” this extension (if installed) might prevent other activity e.g. a click in Google’s
    SERP which would normally lead a visitor to your website.
    i

    View Slide

  75. There’s tons of smart things to monitor that aren’t
    necessarily SEO-related at their core; here are some ideas:
    Monitoring beyond SEO

    View Slide

  76. pa.ag
    @peakaceag
    76
    Setting up custom alerts directly in Google Analytics
    (Technical) issues can also be measured using GA, with no external monitoring needed
    to send those alerts at all:
    Certain measurements need to be set up by Google Tag Manager: 404 errors, JS errors etc.
    i

    View Slide

  77. pa.ag
    @peakaceag
    77
    GTM & Cloud: combine both to measure faulty tracking
    Monitor your very own marketing tags and ensure they run properly by using GTM call
    back functionality which passes the data to Google cloud:
    How to
    1. Set up Google Tag Manager Monitoring template
    2. Choose website / marketing tracking tags to be monitored
    3. Define request URL to talk to Google Cloud Functions
    (GET request endpoint)
    4. Send Google Tag Manager Callback data to Google Cloud
    5. Connect Cloud function to send data to BigQuery table
    6. Evaluate failing tracking tags in Google BigQuery
    i
    +

    View Slide

  78. pa.ag
    @peakaceag
    78
    Analyse, check and review faulty marketing tracking
    Using BigQuery you can review both failing and successful tracking tags – which in turn
    also means you can spot pages with no tracking at all:

    View Slide

  79. pa.ag
    @peakaceag
    79
    Web performance is crucial - monitor respectively!
    The current Core Web Vitals set focuses on three aspects of user experience -
    loading, interactivity, and visual stability - and includes the following metrics/thresholds:
    Source: https://pa.ag/3irantb
    LCP measures loading performance. To provide a good UX, LCP should occur within 2.5 seconds.
    FID measures interactivity. To provide a good UX, pages should have an FID under 100 milliseconds.
    CLS measures visual stability. To provide a good UX, pages should maintain a CLS of less than 0.1.
    i

    View Slide

  80. pa.ag
    @peakaceag
    80
    SpeedCurve: all you need in #webperf monitoring
    By far the most comprehensive toolset on the market allowing you to monitor ANY
    metric you deem relevant, not only for yourself but also for your competitors:
    More: https://speedcurve.com

    View Slide

  81. pa.ag
    @peakaceag
    81
    Peak Ace 🖤 SMX Munich → https://pa.ag/smx21

    View Slide

  82. Care for the slides? www.pa.ag
    twitter.com/peakaceag
    facebook.com/peakaceag
    Take your career to the next level: jobs.pa.ag
    Email us: [email protected]
    Bastian Grimm
    [email protected]

    View Slide