The Technical Monitoring Compendium - SMX Munich Virtual 2021
My talk from SMX Munich 2021 titled The Technical Monitoring Compendium covering everything you need to know about quality control for your website including tools, processes and much more!
pa.ag @peakaceag 3 Back then, I used to explain SEO to C-suites like this: Yes, that’s an original (wonderfully ugly 4:3 layout) slide from way back then… 1. Build an (optimised) site that’s easy for crawlers to understand. Plain and simple HTML wins! 2. New content daily (quantity over quality – if it’s readable, it’ll do!) 3. Most importantly: links, links, and more links! Quality doesn’t matter - a link is a link, isn’t it?
pa.ag @peakaceag 4 The three cornerstones of SEO – 2021 edition Ensure crawl- & renderability, optimise architecture, intl. targeting and linking. Provide unique, holistic coverage of relevant topics for your readership. Off-page On-page “Get people to talk about us.” External linking, citations, brand mentions & PR Trust Technical Content User Experience
pa.ag @peakaceag 5 There's a broad range of tools available From very simple “one-off availability testing“ down to large-scale, continuous monitoring and trend- and comparison reporting: … and many, many more!
pa.ag @peakaceag 7 Monitoring on host-, domain- and server-level The following is usually checked on a global level, once per domain: robots.txt (availability & changes) 404, 410 & 503 error pages (proper status returned) Domain name (expiration) XML sitemap (availability & changes) Nameserver & MX records (hosting/backend changes) IP address (changes) SSL certificate (expiration)
pa.ag @peakaceag 9 Tailor to your own needs (and preferences) You might have a different sitemap (index) URL or feel like changing intervals for certain (expiry) notification checks though:
pa.ag @peakaceag 10 Alert detected on peakace.agency Detected on Feb 19, 2021 Notification pops up (depending on your tool of choice) By default, almost all tools rely on emails, however some also allow other methods such as Slack; more on that later: View alert details on monitoring platform Type of issue, affected domain/URLs & issue details Monitoring project name/identifier
pa.ag @peakaceag 11 Monitoring availability (e.g. http 200) is often not enough For your robots.txt file you need to understand when its contents has been changed: RYTE’s robots.txt history feature allows you to seamlessly roll back to older versions. RYTE sends a handy notification telling you that this specific line has been added to your robots.txt file
pa.ag @peakaceag 12 Beyond validation: monitor URL inventory changes You might also want to know if certain URLs dropped out of your XML sitemap Customize severity as needed
pa.ag @peakaceag 14 Default must-have HTML mark-up monitoring As a minimum, check for title, meta description and canonical tag (if used): Whoops… looks like someone forget to change the subdomain to WWW for the production server… ;)
pa.ag @peakaceag 15 Even for those “basic“ checks, customisation is crucial Maybe you don’t just want to know when “something” changed, but precisely what’s new. Depending on the tool, you have a variety of validation options available:
pa.ag @peakaceag 16 Also, we need to talk about (and monitor) indexability Note: (potentially) just looking at a robot’s meta directive might not always be enough: Ensure you also check for (accidental) blocking of URLs either through robots.txt or X-Robots headers.
pa.ag @peakaceag 17 Speaking of headers: monitoring HTTP status codes Because you want to know when (old) redirects stop working, or relevant URLs all of a sudden become broken and return a 4xx error: This should have been an http 200 status code – so maybe someone redirected this by accident?
GA, GSC & GTM tags? Twitter cards & OpenGraph? Different types of schema.org mark-up? Our very own (custom) HTML tags or presence of scripts? But what about monitoring XYZ?
pa.ag @peakaceag 19 Use regular expressions to check for anything you want In reality, it doesn’t really matter what a tool can monitor for you as long as you can make use of RegEx or Xpath.
pa.ag @peakaceag 21 A practical example: finding GSC verification tags Scenario: you have a website and you want/need to find the GSC verification tag(s) within the HTML mark-up:
pa.ag @peakaceag 23 Use regular expressions to check for anything you want Don’t know how to write RegEx? Check out this fantastic guide for marketers by Annie Cushing with tons of real-world examples: More: https://pa.ag/30uPiak
As an SEO, I certainly want to know when my link graph undergoes a significant change such as this one Tip: monitor the sites‘ main navigation using RegEx
pa.ag @peakaceag 25 xpather.com does the same for XML Path Language XPath uses path expressions to select nodes in an HTML/XML document and allows you to navigate through the document: More: http://xpather.com
pa.ag @peakaceag 27 The RYTE dashboard provides a handy issue timeline This makes it also very easy to see if crucial issues were tackled in time, and also if there's a pattern (e.g. caused by certain types of deployments)
pa.ag @peakaceag 28 Smart segmentation to make things more tangible Especially for large sites, this really can help to understand the impact straight away: ContentKing Weekly Report MAR 1, 2021 – MAR 7, 2021 ContentKing sends a weekly report to update you on detected changes and issues. Want to customise which websites are included in the report? Configure your email setting here. ContentKing provides a health score per segment, allowing you to easily prioritize fixes
pa.ag @peakaceag 29 ContentKing does continues crawling Which allows for some really cool stuff such as “live discovery“ e.g. when links break, including respective recovery – without you having to re-run your crawl Alert detected on peakace.agency Detected on Mar 1, 2021 Alert resolved on peakace.agency Resolved on Mar 2, 2021
pa.ag @peakaceag 30 Enterprise-level monitoring: crawl depth comparison Especially for large sites, a spike of 100k+ pages losing all internal links is something I certainly would want to be notified about:
pa.ag @peakaceag 32 Legal texts such as imprint, terms, etc. Ensure you are compliant by serving links / pages for imprint, respective terms (e.g. shipping, pricing, etc) and necessary opt-outs (tracking) or other legal texts: LeanKoala offers a variety of default “legal” checks; of course, these can be customised and/or extended as needed. 1 2 3
pa.ag @peakaceag 33 Even better: go all-in on GDPR compliance monitoring RYTE has a very handy GDPR compliance report showing external scripts on a website that are active prior to the user giving their consent:
pa.ag @peakaceag 34 Change of available inventory (e.g. in categories) Monitor for categories and/or other listing pages and ensure they have a certain minimum number of products available at all times:
pa.ag @peakaceag 36 Lost categories vs new categories Especially in enterprise e-commerce setups, it’s quite common to have a dedicated “shop management” team responsible for maintaining the category tree…
pa.ag @peakaceag 37 Monitoring for product details: pricing, availability, etc. Sure, these aren’t purely SEO-related elements – they’re still crucial. Some ideas: URL ✓ Clean-URL & HTTP 200 status ✓ Self-canonicalised ✓ Matches schema.org breadcrumb Product recommendations ✓ Element is present ✓ Minimum number of elements present ✓ Schema.org mark-up for related products Benefits & trust ✓ Elements present Legal & payment info ✓ Necessary legal elements / links (imprint, privacy) are present ✓ Payment information available HTTPS ✓ Valid SSL certificate ✓ Check corresponding non-HTTPs URL for 301 redirect Price ✓ Amount > 0,00 € ✓ Currency according to geo setup Shipping info ✓ Elements present Product description ✓ Description present ✓ Length OK (again, define threshold) ✓ If present: check for internal links or structured elements such as Product ✓ Title present ✓ Length OK (define threshold) ✓ Correct HTML mark-up, e.g. ✓ Schema.org product mark-up ✓ Match availability w/ indexing rules
pa.ag @peakaceag 40 Most common causes of duplicate content E.g. for Google, these examples are each two different URLs: Production server vs https://pa.ag/url-A/ https://pa.ag/url-a/ Case sensitivity https://pa.ag/url-b https://pa.ag/url-b/ Trailing slashes Staging / testing server https://pa.ag https://www.pa.ag non-www vs www http://pa.ag https://pa.ag HTTP vs HTTPS
pa.ag @peakaceag 41 Most common causes of duplicate content E.g. for Google, these examples are each two different URLs: https://pa.ag/url-A/ https://pa.ag/url-a/ Case sensitivity https://pa.ag/url-b https://pa.ag/url-b/ Trailing slashes https://pa.ag https://www.pa.ag non-www vs www http://pa.ag https://pa.ag HTTP vs HTTPS Dealing with duplication issues ▪ 301 redirect: e.g. non-www vs www, HTTP vs HTTPs, casing (upper/lower), trailing slashes, Index pages (index.php) ▪ noindex: e.g. white labelling, internal search result pages, work-in-progress content, PPC- and other landing pages ▪ (Self-referencing) canonicals: e.g. for parameters used for tracking, session IDs, printer friendly version, PDF to HTML, etc. ▪ 403 password protect: e.g. staging-/development servers ▪ 404/410 gone: e.g. feeded content that needs to go fast, other outdated/irrelevant or low-quality content i 301 301
pa.ag @peakaceag 42 Advanced URL checks for top-notch canonicalisation This only really works well if you’ve cleaned your URL structure beforehand: 1 2 3
pa.ag @peakaceag 44 Different types of staging/test servers are possible Make sure the server is locked down properly to ensure your content doesn’t get indexed in advance – and set up monitoring accordingly Methodology Pros Cons noindex (meta tag/header) ▪ External tools can access without separate access rules ▪ URLs are definitely not indexed ▪ Indexing rules cannot be tested fully (all noindex) ▪ Waste of crawl budget robots.txt ▪ External tools can access without separate access rules ▪ No crawl budget is wasted ▪ Indexing rules cannot be tested fully (only with robots.txt override) ▪ If linked, test URLs may appear in the index (without title/metas). password secured (.htaccess) ▪ No crawl budget is wasted ▪ URLs are definitely not indexed ▪ Everything can be tested properly ▪ External tools must be able to handle password authentication. IP-based access ▪ No crawl budget is wasted ▪ URLs are definitely not indexed ▪ Everything can be tested properly ▪ External tools must be able to handle IP-based authentication. VPN ▪ Completely safe! ▪ So safe, only a few tools can handle it!
pa.ag @peakaceag 45 Monitoring your staging/test server URLs Depending on your server’s setup, you need to either check for the http response code or (non-) indexability: 1 2 3
pa.ag @peakaceag 47 Don‘t automatically redirect users without giving options Better just let the user pick the suggested/relevant international website instead:
pa.ag @peakaceag 48 Geo redirects how-to (e.g. if you need to, due to licensing) If the user is guided to a special language folder based on their IP , the redirect needs to be temporary (302 or 307), otherwise caching issues will come up:
pa.ag @peakaceag 50 Geo redirect monitoring: because this always “breaks“ Ensure proper redirects are in place according to the request’s origin: Requesting the URL using an IP address located in the following geographical region: www.domain.com www.domain.com/de/ e.g. 302 redirect
pa.ag @peakaceag 53 You’re all aware of this by now, right? Google renders almost every URL; but why? Source: https://pa.ag/3t0RVgv According to W3Techs, JavaScript is used […] by 97.1% of all websites. Rendered preview of any given website, including “executed” JavaScript
So you can‘t either – otherwise, on JS-heavy websites you could just not do any form of content monitoring/checks (as it’s not part of the regular mark-up) Google doesn’t rely on just an HTML mark-up
pa.ag @peakaceag 55 visualping.io - website change detection & alerts Select and relax: visualping lets you know when the page (or selected area) changes What we like about the tool / key features: ▪ Lets you specify %-change (any, tiny, medium, major) ▪ Proxy functionality for geo specific monitoring ▪ Down to checking for changes every 5 minutes ▪ Rendering capabilities, so also great for JS-heavy sites ▪ Loads more!
pa.ag @peakaceag 56 hexowatch.com - monitor any website for visual changes Your AI sidekick to monitor any website for changes to visuals, content, source code, technology, availability, or price.
pa.ag @peakaceag 58 Uptimerobot.com - continuous uptime checks every 5 min Simple, yet efficient uptime / availability monitoring of domains and/or URLs More: https://uptimerobot.com/#features What we like about the tool / key features: ▪ Super simple interface, 1 minute and you’re good to go ▪ “TV mode” to run monitoring on large screens ▪ Allows custom port- and service monitoring ▪ Pre-defined maintenance windows to automatically pause and re-enable monitoring ▪ Seamless integration with status pages
pa.ag @peakaceag 59 Pingbreak.com - free BETA uptime monitor Pingbreak relies on Twitter as its default communication channel (account required) but allows alerting to Slack, Discord, Mattermost, Telegram and custom alert services: More: https://pingbreak.com What we like about the tool / key features: ▪ Entirely free ▪ Unlimited monitoring of websites, at 1 minute intervals ▪ Webhook support allows alerting to almost any service you can think of
pa.ag @peakaceag 60 Testomato.com - website content & uptime monitoring Very affordable solution (incl. uptime monitoring + API, pricing starts at $49 monthly) More: https://www.testomato.com What we like about the tool / key features: ▪ Robust and easy-to-use interface ▪ Nice email reports, including direct notification to 3rd party services such as Slack ▪ Errors are automatically re-tested/checked, also from other (geo-) locations, meaning fewer false alarms ▪ Specific checks for server headers, redirects, etc can be set up with one click ▪ Extremely simple setup, not only for URL checks but also for specific content monitoring tasks
pa.ag @peakaceag 61 Testomato.com - website content & uptime monitoring Setting up checks is super simple and very visual; totally doable for SEOs without any coding skills: Very easy setup, e.g. custom matching of HTML elements using XPath, so in this case we’re actually monitoring a domain to contain a certain by the “id” of “page-content”. i
pa.ag @peakaceag 62 Leankoala.com - monitoring meets testing With Leankoala, you can move beyond just simple testing; it comes with 30+ tools that check for different characteristics to ensure a website is functioning correctly. More: https://www.leankoala.com/en/features
pa.ag @peakaceag 64 Need to publish monitoring info to your customers? Try out either status.io or Atlassian Statuspage to make monitoring info available to the public: More: https://status.io & https://pa.ag/3rCpTaS
pa.ag @peakaceag 66 Zapier.com – connect your apps and automate workflows Zapier allows you to push data from one software system to another, without writing custom code. You can also create multi-step Zaps. Check out this tutorial: https://pa.ag/2PTbQja
pa.ag @peakaceag 67 integromat.com - complex automation made easy Harder to use, yet allows for more complex workflows; a really strong Zapier alternative: More: https://www.integromat.com
pa.ag @peakaceag 69 Pipedream.com - testing webhooks & APIs made easy More: https://pipedream.com Webhooks allow you to send real-time data from one application to another whenever a given event occurs.
pa.ag @peakaceag 71 Google Safe Browsing powers e.g. warnings in Chrome The “Deceptive site ahead” is a warning in the Chrome browser that can protect you from phishing, scams, and malware-laden sites
pa.ag @peakaceag 72 Google Safe Browsing site status monitoring To ensure your domain doesn‘t show a warning in Google Chrome and Search: Try it for your domain: https://pa.ag/3cjsuQn
pa.ag @peakaceag 73 Monitoring this only works with a tool for visual checks Google’s transparency report website was built using the popular JavaScript framework AngularJS, therefore you can’t simply check the HTML mark-up: Tools like visualping render the target URL’s content and therefore can also monitor websites built, for example, entirely in JavaScript.
pa.ag @peakaceag 74 Other relevant players on the market Norton Safe Web, Web of Trust, Avira Browser Safety, BitDefender Traffic Light, etc. Source: https://pa.ag/3rwTgv6 These services usually power browser extensions via API, and if your domain is flagged “suspicious” this extension (if installed) might prevent other activity e.g. a click in Google’s SERP which would normally lead a visitor to your website. i
pa.ag @peakaceag 76 Setting up custom alerts directly in Google Analytics (Technical) issues can also be measured using GA, with no external monitoring needed to send those alerts at all: Certain measurements need to be set up by Google Tag Manager: 404 errors, JS errors etc. i
pa.ag @peakaceag 77 GTM & Cloud: combine both to measure faulty tracking Monitor your very own marketing tags and ensure they run properly by using GTM call back functionality which passes the data to Google cloud: How to 1. Set up Google Tag Manager Monitoring template 2. Choose website / marketing tracking tags to be monitored 3. Define request URL to talk to Google Cloud Functions (GET request endpoint) 4. Send Google Tag Manager Callback data to Google Cloud 5. Connect Cloud function to send data to BigQuery table 6. Evaluate failing tracking tags in Google BigQuery i +
pa.ag @peakaceag 78 Analyse, check and review faulty marketing tracking Using BigQuery you can review both failing and successful tracking tags – which in turn also means you can spot pages with no tracking at all:
pa.ag @peakaceag 79 Web performance is crucial - monitor respectively! The current Core Web Vitals set focuses on three aspects of user experience - loading, interactivity, and visual stability - and includes the following metrics/thresholds: Source: https://pa.ag/3irantb LCP measures loading performance. To provide a good UX, LCP should occur within 2.5 seconds. FID measures interactivity. To provide a good UX, pages should have an FID under 100 milliseconds. CLS measures visual stability. To provide a good UX, pages should maintain a CLS of less than 0.1. i
pa.ag @peakaceag 80 SpeedCurve: all you need in #webperf monitoring By far the most comprehensive toolset on the market allowing you to monitor ANY metric you deem relevant, not only for yourself but also for your competitors: More: https://speedcurve.com
Care for the slides? www.pa.ag twitter.com/peakaceag facebook.com/peakaceag Take your career to the next level: jobs.pa.ag Email us: [email protected] Bastian Grimm [email protected]