The Technical Monitoring Compendium - SMX Munich Virtual 2021
My talk from SMX Munich 2021 titled The Technical Monitoring Compendium covering everything you need to know about quality control for your website including tools, processes and much more!
to C-suites like this: Yes, that’s an original (wonderfully ugly 4:3 layout) slide from way back then… 1. Build an (optimised) site that’s easy for crawlers to understand. Plain and simple HTML wins! 2. New content daily (quantity over quality – if it’s readable, it’ll do!) 3. Most importantly: links, links, and more links! Quality doesn’t matter - a link is a link, isn’t it?
edition Ensure crawl- & renderability, optimise architecture, intl. targeting and linking. Provide unique, holistic coverage of relevant topics for your readership. Off-page On-page “Get people to talk about us.” External linking, citations, brand mentions & PR Trust Technical Content User Experience
following is usually checked on a global level, once per domain: robots.txt (availability & changes) 404, 410 & 503 error pages (proper status returned) Domain name (expiration) XML sitemap (availability & changes) Nameserver & MX records (hosting/backend changes) IP address (changes) SSL certificate (expiration)
19, 2021 Notification pops up (depending on your tool of choice) By default, almost all tools rely on emails, however some also allow other methods such as Slack; more on that later: View alert details on monitoring platform Type of issue, affected domain/URLs & issue details Monitoring project name/identifier
not enough For your robots.txt file you need to understand when its contents has been changed: RYTE’s robots.txt history feature allows you to seamlessly roll back to older versions. RYTE sends a handy notification telling you that this specific line has been added to your robots.txt file
minimum, check for title, meta description and canonical tag (if used): Whoops… looks like someone forget to change the subdomain to WWW for the production server… ;)
crucial Maybe you don’t just want to know when “something” changed, but precisely what’s new. Depending on the tool, you have a variety of validation options available:
monitor) indexability Note: (potentially) just looking at a robot’s meta directive might not always be enough: Ensure you also check for (accidental) blocking of URLs either through robots.txt or X-Robots headers.
Because you want to know when (old) redirects stop working, or relevant URLs all of a sudden become broken and return a 4xx error: This should have been an http 200 status code – so maybe someone redirected this by accident?
you want Don’t know how to write RegEx? Check out this fantastic guide for marketers by Annie Cushing with tons of real-world examples: More: https://pa.ag/30uPiak
timeline This makes it also very easy to see if crucial issues were tackled in time, and also if there's a pattern (e.g. caused by certain types of deployments)
Especially for large sites, this really can help to understand the impact straight away: ContentKing Weekly Report MAR 1, 2021 – MAR 7, 2021 ContentKing sends a weekly report to update you on detected changes and issues. Want to customise which websites are included in the report? Configure your email setting here. ContentKing provides a health score per segment, allowing you to easily prioritize fixes
some really cool stuff such as “live discovery“ e.g. when links break, including respective recovery – without you having to re-run your crawl Alert detected on peakace.agency Detected on Mar 1, 2021 Alert resolved on peakace.agency Resolved on Mar 2, 2021
Ensure you are compliant by serving links / pages for imprint, respective terms (e.g. shipping, pricing, etc) and necessary opt-outs (tracking) or other legal texts: LeanKoala offers a variety of default “legal” checks; of course, these can be customised and/or extended as needed. 1 2 3
Sure, these aren’t purely SEO-related elements – they’re still crucial. Some ideas: URL ✓ Clean-URL & HTTP 200 status ✓ Self-canonicalised ✓ Matches schema.org breadcrumb Product recommendations ✓ Element is present ✓ Minimum number of elements present ✓ Schema.org mark-up for related products Benefits & trust ✓ Elements present Legal & payment info ✓ Necessary legal elements / links (imprint, privacy) are present ✓ Payment information available HTTPS ✓ Valid SSL certificate ✓ Check corresponding non-HTTPs URL for 301 redirect Price ✓ Amount > 0,00 € ✓ Currency according to geo setup Shipping info ✓ Elements present Product description ✓ Description present ✓ Length OK (again, define threshold) ✓ If present: check for internal links or structured elements such as <li> Product ✓ Title present ✓ Length OK (define threshold) ✓ Correct HTML mark-up, e.g. <h1> ✓ Schema.org product mark-up ✓ Match availability w/ indexing rules
for Google, these examples are each two different URLs: Production server vs https://pa.ag/url-A/ https://pa.ag/url-a/ Case sensitivity https://pa.ag/url-b https://pa.ag/url-b/ Trailing slashes Staging / testing server https://pa.ag https://www.pa.ag non-www vs www http://pa.ag https://pa.ag HTTP vs HTTPS
for Google, these examples are each two different URLs: https://pa.ag/url-A/ https://pa.ag/url-a/ Case sensitivity https://pa.ag/url-b https://pa.ag/url-b/ Trailing slashes https://pa.ag https://www.pa.ag non-www vs www http://pa.ag https://pa.ag HTTP vs HTTPS Dealing with duplication issues ▪ 301 redirect: e.g. non-www vs www, HTTP vs HTTPs, casing (upper/lower), trailing slashes, Index pages (index.php) ▪ noindex: e.g. white labelling, internal search result pages, work-in-progress content, PPC- and other landing pages ▪ (Self-referencing) canonicals: e.g. for parameters used for tracking, session IDs, printer friendly version, PDF to HTML, etc. ▪ 403 password protect: e.g. staging-/development servers ▪ 404/410 gone: e.g. feeded content that needs to go fast, other outdated/irrelevant or low-quality content i 301 301
Make sure the server is locked down properly to ensure your content doesn’t get indexed in advance – and set up monitoring accordingly Methodology Pros Cons noindex (meta tag/header) ▪ External tools can access without separate access rules ▪ URLs are definitely not indexed ▪ Indexing rules cannot be tested fully (all noindex) ▪ Waste of crawl budget robots.txt ▪ External tools can access without separate access rules ▪ No crawl budget is wasted ▪ Indexing rules cannot be tested fully (only with robots.txt override) ▪ If linked, test URLs may appear in the index (without title/metas). password secured (.htaccess) ▪ No crawl budget is wasted ▪ URLs are definitely not indexed ▪ Everything can be tested properly ▪ External tools must be able to handle password authentication. IP-based access ▪ No crawl budget is wasted ▪ URLs are definitely not indexed ▪ Everything can be tested properly ▪ External tools must be able to handle IP-based authentication. VPN ▪ Completely safe! ▪ So safe, only a few tools can handle it!
to, due to licensing) If the user is guided to a special language folder based on their IP , the redirect needs to be temporary (302 or 307), otherwise caching issues will come up:
Ensure proper redirects are in place according to the request’s origin: Requesting the URL using an IP address located in the following geographical region: www.domain.com www.domain.com/de/ e.g. 302 redirect
right? Google renders almost every URL; but why? Source: https://pa.ag/3t0RVgv According to W3Techs, JavaScript is used […] by 97.1% of all websites. Rendered preview of any given website, including “executed” JavaScript
Select and relax: visualping lets you know when the page (or selected area) changes What we like about the tool / key features: ▪ Lets you specify %-change (any, tiny, medium, major) ▪ Proxy functionality for geo specific monitoring ▪ Down to checking for changes every 5 minutes ▪ Rendering capabilities, so also great for JS-heavy sites ▪ Loads more!
min Simple, yet efficient uptime / availability monitoring of domains and/or URLs More: https://uptimerobot.com/#features What we like about the tool / key features: ▪ Super simple interface, 1 minute and you’re good to go ▪ “TV mode” to run monitoring on large screens ▪ Allows custom port- and service monitoring ▪ Pre-defined maintenance windows to automatically pause and re-enable monitoring ▪ Seamless integration with status pages
relies on Twitter as its default communication channel (account required) but allows alerting to Slack, Discord, Mattermost, Telegram and custom alert services: More: https://pingbreak.com What we like about the tool / key features: ▪ Entirely free ▪ Unlimited monitoring of websites, at 1 minute intervals ▪ Webhook support allows alerting to almost any service you can think of
Very affordable solution (incl. uptime monitoring + API, pricing starts at $49 monthly) More: https://www.testomato.com What we like about the tool / key features: ▪ Robust and easy-to-use interface ▪ Nice email reports, including direct notification to 3rd party services such as Slack ▪ Errors are automatically re-tested/checked, also from other (geo-) locations, meaning fewer false alarms ▪ Specific checks for server headers, redirects, etc can be set up with one click ▪ Extremely simple setup, not only for URL checks but also for specific content monitoring tasks
Setting up checks is super simple and very visual; totally doable for SEOs without any coding skills: Very easy setup, e.g. custom matching of HTML elements using XPath, so in this case we’re actually monitoring a domain to contain a certain <div> by the “id” of “page-content”. i
you can move beyond just simple testing; it comes with 30+ tools that check for different characteristics to ensure a website is functioning correctly. More: https://www.leankoala.com/en/features
customers? Try out either status.io or Atlassian Statuspage to make monitoring info available to the public: More: https://status.io & https://pa.ag/3rCpTaS
workflows Zapier allows you to push data from one software system to another, without writing custom code. You can also create multi-step Zaps. Check out this tutorial: https://pa.ag/2PTbQja
for visual checks Google’s transparency report website was built using the popular JavaScript framework AngularJS, therefore you can’t simply check the HTML mark-up: Tools like visualping render the target URL’s content and therefore can also monitor websites built, for example, entirely in JavaScript.
Safe Web, Web of Trust, Avira Browser Safety, BitDefender Traffic Light, etc. Source: https://pa.ag/3rwTgv6 These services usually power browser extensions via API, and if your domain is flagged “suspicious” this extension (if installed) might prevent other activity e.g. a click in Google’s SERP which would normally lead a visitor to your website. i
Analytics (Technical) issues can also be measured using GA, with no external monitoring needed to send those alerts at all: Certain measurements need to be set up by Google Tag Manager: 404 errors, JS errors etc. i
faulty tracking Monitor your very own marketing tags and ensure they run properly by using GTM call back functionality which passes the data to Google cloud: How to 1. Set up Google Tag Manager Monitoring template 2. Choose website / marketing tracking tags to be monitored 3. Define request URL to talk to Google Cloud Functions (GET request endpoint) 4. Send Google Tag Manager Callback data to Google Cloud 5. Connect Cloud function to send data to BigQuery table 6. Evaluate failing tracking tags in Google BigQuery i +
The current Core Web Vitals set focuses on three aspects of user experience - loading, interactivity, and visual stability - and includes the following metrics/thresholds: Source: https://pa.ag/3irantb LCP measures loading performance. To provide a good UX, LCP should occur within 2.5 seconds. FID measures interactivity. To provide a good UX, pages should have an FID under 100 milliseconds. CLS measures visual stability. To provide a good UX, pages should maintain a CLS of less than 0.1. i
By far the most comprehensive toolset on the market allowing you to monitor ANY metric you deem relevant, not only for yourself but also for your competitors: More: https://speedcurve.com