Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Technical Monitoring Compendium - SMX Munic...

The Technical Monitoring Compendium - SMX Munich Virtual 2021

My talk from SMX Munich 2021 titled The Technical Monitoring Compendium covering everything you need to know about quality control for your website including tools, processes and much more!

Bastian Grimm

November 07, 2022
Tweet

More Decks by Bastian Grimm

Other Decks in Technology

Transcript

  1. The Technical Monitoring Compendium Everything you need to know about

    quality control for your website Bastian Grimm, Peak Ace AG | @basgr
  2. All the way back to 2008… yep, that's 13 years

    ago! To set the scene, let‘s go back in time
  3. pa.ag @peakaceag 3 Back then, I used to explain SEO

    to C-suites like this: Yes, that’s an original (wonderfully ugly 4:3 layout) slide from way back then… 1. Build an (optimised) site that’s easy for crawlers to understand. Plain and simple HTML wins! 2. New content daily (quantity over quality – if it’s readable, it’ll do!) 3. Most importantly: links, links, and more links! Quality doesn’t matter - a link is a link, isn’t it?
  4. pa.ag @peakaceag 4 The three cornerstones of SEO – 2021

    edition Ensure crawl- & renderability, optimise architecture, intl. targeting and linking. Provide unique, holistic coverage of relevant topics for your readership. Off-page On-page “Get people to talk about us.” External linking, citations, brand mentions & PR Trust Technical Content User Experience
  5. pa.ag @peakaceag 5 There's a broad range of tools available

    From very simple “one-off availability testing“ down to large-scale, continuous monitoring and trend- and comparison reporting: … and many, many more!
  6. How to keep track of things at every level of

    your domain Domain-wide monitoring
  7. pa.ag @peakaceag 7 Monitoring on host-, domain- and server-level The

    following is usually checked on a global level, once per domain: robots.txt (availability & changes) 404, 410 & 503 error pages (proper status returned) Domain name (expiration) XML sitemap (availability & changes) Nameserver & MX records (hosting/backend changes) IP address (changes) SSL certificate (expiration)
  8. pa.ag @peakaceag 9 Tailor to your own needs (and preferences)

    You might have a different sitemap (index) URL or feel like changing intervals for certain (expiry) notification checks though:
  9. pa.ag @peakaceag 10 Alert detected on peakace.agency Detected on Feb

    19, 2021 Notification pops up (depending on your tool of choice) By default, almost all tools rely on emails, however some also allow other methods such as Slack; more on that later: View alert details on monitoring platform Type of issue, affected domain/URLs & issue details Monitoring project name/identifier
  10. pa.ag @peakaceag 11 Monitoring availability (e.g. http 200) is often

    not enough For your robots.txt file you need to understand when its contents has been changed: RYTE’s robots.txt history feature allows you to seamlessly roll back to older versions. RYTE sends a handy notification telling you that this specific line has been added to your robots.txt file
  11. pa.ag @peakaceag 12 Beyond validation: monitor URL inventory changes You

    might also want to know if certain URLs dropped out of your XML sitemap Customize severity as needed
  12. Monitoring your HTML mark-up and server headers for changes and

    sending notifications SEO-centred monitoring
  13. pa.ag @peakaceag 14 Default must-have HTML mark-up monitoring As a

    minimum, check for title, meta description and canonical tag (if used): Whoops… looks like someone forget to change the subdomain to WWW for the production server… ;)
  14. pa.ag @peakaceag 15 Even for those “basic“ checks, customisation is

    crucial Maybe you don’t just want to know when “something” changed, but precisely what’s new. Depending on the tool, you have a variety of validation options available:
  15. pa.ag @peakaceag 16 Also, we need to talk about (and

    monitor) indexability Note: (potentially) just looking at a robot’s meta directive might not always be enough: Ensure you also check for (accidental) blocking of URLs either through robots.txt or X-Robots headers.
  16. pa.ag @peakaceag 17 Speaking of headers: monitoring HTTP status codes

    Because you want to know when (old) redirects stop working, or relevant URLs all of a sudden become broken and return a 4xx error: This should have been an http 200 status code – so maybe someone redirected this by accident?
  17. GA, GSC & GTM tags? Twitter cards & OpenGraph? Different

    types of schema.org mark-up? Our very own (custom) HTML tags or presence of scripts? But what about monitoring XYZ?
  18. pa.ag @peakaceag 19 Use regular expressions to check for anything

    you want In reality, it doesn’t really matter what a tool can monitor for you as long as you can make use of RegEx or Xpath.
  19. “A regular expression (shortened as regex or regexp) is a

    sequence of characters that specifies a search pattern.” RegEx… RegWhat?
  20. pa.ag @peakaceag 21 A practical example: finding GSC verification tags

    Scenario: you have a website and you want/need to find the GSC verification tag(s) within the HTML mark-up:
  21. pa.ag @peakaceag 23 Use regular expressions to check for anything

    you want Don’t know how to write RegEx? Check out this fantastic guide for marketers by Annie Cushing with tons of real-world examples: More: https://pa.ag/30uPiak
  22. As an SEO, I certainly want to know when my

    link graph undergoes a significant change such as this one Tip: monitor the sites‘ main navigation using RegEx
  23. pa.ag @peakaceag 25 xpather.com does the same for XML Path

    Language XPath uses path expressions to select nodes in an HTML/XML document and allows you to navigate through the document: More: http://xpather.com
  24. pa.ag @peakaceag 26 Large-scale monitoring over time: now the fun

    starts Spotting anomalies and understanding trends by comparing crawl data at scale:
  25. pa.ag @peakaceag 27 The RYTE dashboard provides a handy issue

    timeline This makes it also very easy to see if crucial issues were tackled in time, and also if there's a pattern (e.g. caused by certain types of deployments)
  26. pa.ag @peakaceag 28 Smart segmentation to make things more tangible

    Especially for large sites, this really can help to understand the impact straight away: ContentKing Weekly Report MAR 1, 2021 – MAR 7, 2021 ContentKing sends a weekly report to update you on detected changes and issues. Want to customise which websites are included in the report? Configure your email setting here. ContentKing provides a health score per segment, allowing you to easily prioritize fixes
  27. pa.ag @peakaceag 29 ContentKing does continues crawling Which allows for

    some really cool stuff such as “live discovery“ e.g. when links break, including respective recovery – without you having to re-run your crawl Alert detected on peakace.agency Detected on Mar 1, 2021 Alert resolved on peakace.agency Resolved on Mar 2, 2021
  28. pa.ag @peakaceag 30 Enterprise-level monitoring: crawl depth comparison Especially for

    large sites, a spike of 100k+ pages losing all internal links is something I certainly would want to be notified about:
  29. Some ideas for additional monitoring and checks we‘d recommend for

    anyone in e-commerce Monitoring in e-commerce
  30. pa.ag @peakaceag 32 Legal texts such as imprint, terms, etc.

    Ensure you are compliant by serving links / pages for imprint, respective terms (e.g. shipping, pricing, etc) and necessary opt-outs (tracking) or other legal texts: LeanKoala offers a variety of default “legal” checks; of course, these can be customised and/or extended as needed. 1 2 3
  31. pa.ag @peakaceag 33 Even better: go all-in on GDPR compliance

    monitoring RYTE has a very handy GDPR compliance report showing external scripts on a website that are active prior to the user giving their consent:
  32. pa.ag @peakaceag 34 Change of available inventory (e.g. in categories)

    Monitor for categories and/or other listing pages and ensure they have a certain minimum number of products available at all times:
  33. pa.ag @peakaceag 36 Lost categories vs new categories Especially in

    enterprise e-commerce setups, it’s quite common to have a dedicated “shop management” team responsible for maintaining the category tree…
  34. pa.ag @peakaceag 37 Monitoring for product details: pricing, availability, etc.

    Sure, these aren’t purely SEO-related elements – they’re still crucial. Some ideas: URL ✓ Clean-URL & HTTP 200 status ✓ Self-canonicalised ✓ Matches schema.org breadcrumb Product recommendations ✓ Element is present ✓ Minimum number of elements present ✓ Schema.org mark-up for related products Benefits & trust ✓ Elements present Legal & payment info ✓ Necessary legal elements / links (imprint, privacy) are present ✓ Payment information available HTTPS ✓ Valid SSL certificate ✓ Check corresponding non-HTTPs URL for 301 redirect Price ✓ Amount > 0,00 € ✓ Currency according to geo setup Shipping info ✓ Elements present Product description ✓ Description present ✓ Length OK (again, define threshold) ✓ If present: check for internal links or structured elements such as <li> Product ✓ Title present ✓ Length OK (define threshold) ✓ Correct HTML mark-up, e.g. <h1> ✓ Schema.org product mark-up ✓ Match availability w/ indexing rules
  35. Detecting and monitoring apparent duplication within your website to prevent

    negative performance Canonicalisation monitoring
  36. pa.ag @peakaceag 40 Most common causes of duplicate content E.g.

    for Google, these examples are each two different URLs: Production server vs https://pa.ag/url-A/ https://pa.ag/url-a/ Case sensitivity https://pa.ag/url-b https://pa.ag/url-b/ Trailing slashes Staging / testing server https://pa.ag https://www.pa.ag non-www vs www http://pa.ag https://pa.ag HTTP vs HTTPS
  37. pa.ag @peakaceag 41 Most common causes of duplicate content E.g.

    for Google, these examples are each two different URLs: https://pa.ag/url-A/ https://pa.ag/url-a/ Case sensitivity https://pa.ag/url-b https://pa.ag/url-b/ Trailing slashes https://pa.ag https://www.pa.ag non-www vs www http://pa.ag https://pa.ag HTTP vs HTTPS Dealing with duplication issues ▪ 301 redirect: e.g. non-www vs www, HTTP vs HTTPs, casing (upper/lower), trailing slashes, Index pages (index.php) ▪ noindex: e.g. white labelling, internal search result pages, work-in-progress content, PPC- and other landing pages ▪ (Self-referencing) canonicals: e.g. for parameters used for tracking, session IDs, printer friendly version, PDF to HTML, etc. ▪ 403 password protect: e.g. staging-/development servers ▪ 404/410 gone: e.g. feeded content that needs to go fast, other outdated/irrelevant or low-quality content i 301 301
  38. pa.ag @peakaceag 42 Advanced URL checks for top-notch canonicalisation This

    only really works well if you’ve cleaned your URL structure beforehand: 1 2 3
  39. And production environments, e.g. in other data centres or reachable

    via additional host names Don‘t forget your staging server(s)
  40. pa.ag @peakaceag 44 Different types of staging/test servers are possible

    Make sure the server is locked down properly to ensure your content doesn’t get indexed in advance – and set up monitoring accordingly Methodology Pros Cons noindex (meta tag/header) ▪ External tools can access without separate access rules ▪ URLs are definitely not indexed ▪ Indexing rules cannot be tested fully (all noindex) ▪ Waste of crawl budget robots.txt ▪ External tools can access without separate access rules ▪ No crawl budget is wasted ▪ Indexing rules cannot be tested fully (only with robots.txt override) ▪ If linked, test URLs may appear in the index (without title/metas). password secured (.htaccess) ▪ No crawl budget is wasted ▪ URLs are definitely not indexed ▪ Everything can be tested properly ▪ External tools must be able to handle password authentication. IP-based access ▪ No crawl budget is wasted ▪ URLs are definitely not indexed ▪ Everything can be tested properly ▪ External tools must be able to handle IP-based authentication. VPN ▪ Completely safe! ▪ So safe, only a few tools can handle it!
  41. pa.ag @peakaceag 45 Monitoring your staging/test server URLs Depending on

    your server’s setup, you need to either check for the http response code or (non-) indexability: 1 2 3
  42. I’m not a fan of geo redirects, but sometimes they’re

    necessary (e.g. for legal reasons) Geo redirect monitoring
  43. pa.ag @peakaceag 47 Don‘t automatically redirect users without giving options

    Better just let the user pick the suggested/relevant international website instead:
  44. pa.ag @peakaceag 48 Geo redirects how-to (e.g. if you need

    to, due to licensing) If the user is guided to a special language folder based on their IP , the redirect needs to be temporary (302 or 307), otherwise caching issues will come up:
  45. pa.ag @peakaceag 49 Oh and btw: don‘t do this, either…

    Disney wastes loads of link equity by relying on JS-redirects:
  46. pa.ag @peakaceag 50 Geo redirect monitoring: because this always “breaks“

    Ensure proper redirects are in place according to the request’s origin: Requesting the URL using an IP address located in the following geographical region: www.domain.com www.domain.com/de/ e.g. 302 redirect
  47. pa.ag @peakaceag 51 How to check geo redirects e.g. with

    Little Warden Ensure proper redirects are in place according to the request’s origin:
  48. pa.ag @peakaceag 53 You’re all aware of this by now,

    right? Google renders almost every URL; but why? Source: https://pa.ag/3t0RVgv According to W3Techs, JavaScript is used […] by 97.1% of all websites. Rendered preview of any given website, including “executed” JavaScript
  49. So you can‘t either – otherwise, on JS-heavy websites you

    could just not do any form of content monitoring/checks (as it’s not part of the regular mark-up) Google doesn’t rely on just an HTML mark-up
  50. pa.ag @peakaceag 55 visualping.io - website change detection & alerts

    Select and relax: visualping lets you know when the page (or selected area) changes What we like about the tool / key features: ▪ Lets you specify %-change (any, tiny, medium, major) ▪ Proxy functionality for geo specific monitoring ▪ Down to checking for changes every 5 minutes ▪ Rendering capabilities, so also great for JS-heavy sites ▪ Loads more!
  51. pa.ag @peakaceag 56 hexowatch.com - monitor any website for visual

    changes Your AI sidekick to monitor any website for changes to visuals, content, source code, technology, availability, or price.
  52. No matter what you want and need to monitor, there's

    a solution for you More monitoring tools
  53. pa.ag @peakaceag 58 Uptimerobot.com - continuous uptime checks every 5

    min Simple, yet efficient uptime / availability monitoring of domains and/or URLs More: https://uptimerobot.com/#features What we like about the tool / key features: ▪ Super simple interface, 1 minute and you’re good to go ▪ “TV mode” to run monitoring on large screens ▪ Allows custom port- and service monitoring ▪ Pre-defined maintenance windows to automatically pause and re-enable monitoring ▪ Seamless integration with status pages
  54. pa.ag @peakaceag 59 Pingbreak.com - free BETA uptime monitor Pingbreak

    relies on Twitter as its default communication channel (account required) but allows alerting to Slack, Discord, Mattermost, Telegram and custom alert services: More: https://pingbreak.com What we like about the tool / key features: ▪ Entirely free ▪ Unlimited monitoring of websites, at 1 minute intervals ▪ Webhook support allows alerting to almost any service you can think of
  55. pa.ag @peakaceag 60 Testomato.com - website content & uptime monitoring

    Very affordable solution (incl. uptime monitoring + API, pricing starts at $49 monthly) More: https://www.testomato.com What we like about the tool / key features: ▪ Robust and easy-to-use interface ▪ Nice email reports, including direct notification to 3rd party services such as Slack ▪ Errors are automatically re-tested/checked, also from other (geo-) locations, meaning fewer false alarms ▪ Specific checks for server headers, redirects, etc can be set up with one click ▪ Extremely simple setup, not only for URL checks but also for specific content monitoring tasks
  56. pa.ag @peakaceag 61 Testomato.com - website content & uptime monitoring

    Setting up checks is super simple and very visual; totally doable for SEOs without any coding skills: Very easy setup, e.g. custom matching of HTML elements using XPath, so in this case we’re actually monitoring a domain to contain a certain <div> by the “id” of “page-content”. i
  57. pa.ag @peakaceag 62 Leankoala.com - monitoring meets testing With Leankoala,

    you can move beyond just simple testing; it comes with 30+ tools that check for different characteristics to ensure a website is functioning correctly. More: https://www.leankoala.com/en/features
  58. pa.ag @peakaceag 63 fluxgard.com - enterprise change monitoring Monitor website

    changes, detect content, code and design edits (includes lighthouse/cookie/network activity changes). Expensive ($10K per quarter)! More: https://fluxguard.com/features
  59. pa.ag @peakaceag 64 Need to publish monitoring info to your

    customers? Try out either status.io or Atlassian Statuspage to make monitoring info available to the public: More: https://status.io & https://pa.ag/3rCpTaS
  60. Email is great, but maybe not enough? Connecting monitoring systems

    to your “working environment” Notifications & alerts
  61. pa.ag @peakaceag 66 Zapier.com – connect your apps and automate

    workflows Zapier allows you to push data from one software system to another, without writing custom code. You can also create multi-step Zaps. Check out this tutorial: https://pa.ag/2PTbQja
  62. pa.ag @peakaceag 67 integromat.com - complex automation made easy Harder

    to use, yet allows for more complex workflows; a really strong Zapier alternative: More: https://www.integromat.com
  63. pa.ag @peakaceag 69 Pipedream.com - testing webhooks & APIs made

    easy More: https://pipedream.com Webhooks allow you to send real-time data from one application to another whenever a given event occurs.
  64. pa.ag @peakaceag 71 Google Safe Browsing powers e.g. warnings in

    Chrome The “Deceptive site ahead” is a warning in the Chrome browser that can protect you from phishing, scams, and malware-laden sites
  65. pa.ag @peakaceag 72 Google Safe Browsing site status monitoring To

    ensure your domain doesn‘t show a warning in Google Chrome and Search: Try it for your domain: https://pa.ag/3cjsuQn
  66. pa.ag @peakaceag 73 Monitoring this only works with a tool

    for visual checks Google’s transparency report website was built using the popular JavaScript framework AngularJS, therefore you can’t simply check the HTML mark-up: Tools like visualping render the target URL’s content and therefore can also monitor websites built, for example, entirely in JavaScript.
  67. pa.ag @peakaceag 74 Other relevant players on the market Norton

    Safe Web, Web of Trust, Avira Browser Safety, BitDefender Traffic Light, etc. Source: https://pa.ag/3rwTgv6 These services usually power browser extensions via API, and if your domain is flagged “suspicious” this extension (if installed) might prevent other activity e.g. a click in Google’s SERP which would normally lead a visitor to your website. i
  68. There’s tons of smart things to monitor that aren’t necessarily

    SEO-related at their core; here are some ideas: Monitoring beyond SEO
  69. pa.ag @peakaceag 76 Setting up custom alerts directly in Google

    Analytics (Technical) issues can also be measured using GA, with no external monitoring needed to send those alerts at all: Certain measurements need to be set up by Google Tag Manager: 404 errors, JS errors etc. i
  70. pa.ag @peakaceag 77 GTM & Cloud: combine both to measure

    faulty tracking Monitor your very own marketing tags and ensure they run properly by using GTM call back functionality which passes the data to Google cloud: How to 1. Set up Google Tag Manager Monitoring template 2. Choose website / marketing tracking tags to be monitored 3. Define request URL to talk to Google Cloud Functions (GET request endpoint) 4. Send Google Tag Manager Callback data to Google Cloud 5. Connect Cloud function to send data to BigQuery table 6. Evaluate failing tracking tags in Google BigQuery i +
  71. pa.ag @peakaceag 78 Analyse, check and review faulty marketing tracking

    Using BigQuery you can review both failing and successful tracking tags – which in turn also means you can spot pages with no tracking at all:
  72. pa.ag @peakaceag 79 Web performance is crucial - monitor respectively!

    The current Core Web Vitals set focuses on three aspects of user experience - loading, interactivity, and visual stability - and includes the following metrics/thresholds: Source: https://pa.ag/3irantb LCP measures loading performance. To provide a good UX, LCP should occur within 2.5 seconds. FID measures interactivity. To provide a good UX, pages should have an FID under 100 milliseconds. CLS measures visual stability. To provide a good UX, pages should maintain a CLS of less than 0.1. i
  73. pa.ag @peakaceag 80 SpeedCurve: all you need in #webperf monitoring

    By far the most comprehensive toolset on the market allowing you to monitor ANY metric you deem relevant, not only for yourself but also for your competitors: More: https://speedcurve.com