Slide 1

Slide 1 text

The Technical Monitoring Compendium Everything you need to know about quality control for your website Bastian Grimm, Peak Ace AG | @basgr

Slide 2

Slide 2 text

All the way back to 2008… yep, that's 13 years ago! To set the scene, let‘s go back in time

Slide 3

Slide 3 text

pa.ag @peakaceag 3 Back then, I used to explain SEO to C-suites like this: Yes, that’s an original (wonderfully ugly 4:3 layout) slide from way back then… 1. Build an (optimised) site that’s easy for crawlers to understand. Plain and simple HTML wins! 2. New content daily (quantity over quality – if it’s readable, it’ll do!) 3. Most importantly: links, links, and more links! Quality doesn’t matter - a link is a link, isn’t it?

Slide 4

Slide 4 text

pa.ag @peakaceag 4 The three cornerstones of SEO – 2021 edition Ensure crawl- & renderability, optimise architecture, intl. targeting and linking. Provide unique, holistic coverage of relevant topics for your readership. Off-page On-page “Get people to talk about us.” External linking, citations, brand mentions & PR Trust Technical Content User Experience

Slide 5

Slide 5 text

pa.ag @peakaceag 5 There's a broad range of tools available From very simple “one-off availability testing“ down to large-scale, continuous monitoring and trend- and comparison reporting: … and many, many more!

Slide 6

Slide 6 text

How to keep track of things at every level of your domain Domain-wide monitoring

Slide 7

Slide 7 text

pa.ag @peakaceag 7 Monitoring on host-, domain- and server-level The following is usually checked on a global level, once per domain: robots.txt (availability & changes) 404, 410 & 503 error pages (proper status returned) Domain name (expiration) XML sitemap (availability & changes) Nameserver & MX records (hosting/backend changes) IP address (changes) SSL certificate (expiration)

Slide 8

Slide 8 text

pa.ag @peakaceag 8 Example: Little Warden's host-level checks

Slide 9

Slide 9 text

pa.ag @peakaceag 9 Tailor to your own needs (and preferences) You might have a different sitemap (index) URL or feel like changing intervals for certain (expiry) notification checks though:

Slide 10

Slide 10 text

pa.ag @peakaceag 10 Alert detected on peakace.agency Detected on Feb 19, 2021 Notification pops up (depending on your tool of choice) By default, almost all tools rely on emails, however some also allow other methods such as Slack; more on that later: View alert details on monitoring platform Type of issue, affected domain/URLs & issue details Monitoring project name/identifier

Slide 11

Slide 11 text

pa.ag @peakaceag 11 Monitoring availability (e.g. http 200) is often not enough For your robots.txt file you need to understand when its contents has been changed: RYTE’s robots.txt history feature allows you to seamlessly roll back to older versions. RYTE sends a handy notification telling you that this specific line has been added to your robots.txt file

Slide 12

Slide 12 text

pa.ag @peakaceag 12 Beyond validation: monitor URL inventory changes You might also want to know if certain URLs dropped out of your XML sitemap Customize severity as needed

Slide 13

Slide 13 text

Monitoring your HTML mark-up and server headers for changes and sending notifications SEO-centred monitoring

Slide 14

Slide 14 text

pa.ag @peakaceag 14 Default must-have HTML mark-up monitoring As a minimum, check for title, meta description and canonical tag (if used): Whoops… looks like someone forget to change the subdomain to WWW for the production server… ;)

Slide 15

Slide 15 text

pa.ag @peakaceag 15 Even for those “basic“ checks, customisation is crucial Maybe you don’t just want to know when “something” changed, but precisely what’s new. Depending on the tool, you have a variety of validation options available:

Slide 16

Slide 16 text

pa.ag @peakaceag 16 Also, we need to talk about (and monitor) indexability Note: (potentially) just looking at a robot’s meta directive might not always be enough: Ensure you also check for (accidental) blocking of URLs either through robots.txt or X-Robots headers.

Slide 17

Slide 17 text

pa.ag @peakaceag 17 Speaking of headers: monitoring HTTP status codes Because you want to know when (old) redirects stop working, or relevant URLs all of a sudden become broken and return a 4xx error: This should have been an http 200 status code – so maybe someone redirected this by accident?

Slide 18

Slide 18 text

GA, GSC & GTM tags? Twitter cards & OpenGraph? Different types of schema.org mark-up? Our very own (custom) HTML tags or presence of scripts? But what about monitoring XYZ?

Slide 19

Slide 19 text

pa.ag @peakaceag 19 Use regular expressions to check for anything you want In reality, it doesn’t really matter what a tool can monitor for you as long as you can make use of RegEx or Xpath.

Slide 20

Slide 20 text

“A regular expression (shortened as regex or regexp) is a sequence of characters that specifies a search pattern.” RegEx… RegWhat?

Slide 21

Slide 21 text

pa.ag @peakaceag 21 A practical example: finding GSC verification tags Scenario: you have a website and you want/need to find the GSC verification tag(s) within the HTML mark-up:

Slide 22

Slide 22 text

pa.ag @peakaceag 22 regexr.com for building and testing stuff on the fly More: https://regexr.com

Slide 23

Slide 23 text

pa.ag @peakaceag 23 Use regular expressions to check for anything you want Don’t know how to write RegEx? Check out this fantastic guide for marketers by Annie Cushing with tons of real-world examples: More: https://pa.ag/30uPiak

Slide 24

Slide 24 text

As an SEO, I certainly want to know when my link graph undergoes a significant change such as this one Tip: monitor the sites‘ main navigation using RegEx

Slide 25

Slide 25 text

pa.ag @peakaceag 25 xpather.com does the same for XML Path Language XPath uses path expressions to select nodes in an HTML/XML document and allows you to navigate through the document: More: http://xpather.com

Slide 26

Slide 26 text

pa.ag @peakaceag 26 Large-scale monitoring over time: now the fun starts Spotting anomalies and understanding trends by comparing crawl data at scale:

Slide 27

Slide 27 text

pa.ag @peakaceag 27 The RYTE dashboard provides a handy issue timeline This makes it also very easy to see if crucial issues were tackled in time, and also if there's a pattern (e.g. caused by certain types of deployments)

Slide 28

Slide 28 text

pa.ag @peakaceag 28 Smart segmentation to make things more tangible Especially for large sites, this really can help to understand the impact straight away: ContentKing Weekly Report MAR 1, 2021 – MAR 7, 2021 ContentKing sends a weekly report to update you on detected changes and issues. Want to customise which websites are included in the report? Configure your email setting here. ContentKing provides a health score per segment, allowing you to easily prioritize fixes

Slide 29

Slide 29 text

pa.ag @peakaceag 29 ContentKing does continues crawling Which allows for some really cool stuff such as “live discovery“ e.g. when links break, including respective recovery – without you having to re-run your crawl Alert detected on peakace.agency Detected on Mar 1, 2021 Alert resolved on peakace.agency Resolved on Mar 2, 2021

Slide 30

Slide 30 text

pa.ag @peakaceag 30 Enterprise-level monitoring: crawl depth comparison Especially for large sites, a spike of 100k+ pages losing all internal links is something I certainly would want to be notified about:

Slide 31

Slide 31 text

Some ideas for additional monitoring and checks we‘d recommend for anyone in e-commerce Monitoring in e-commerce

Slide 32

Slide 32 text

pa.ag @peakaceag 32 Legal texts such as imprint, terms, etc. Ensure you are compliant by serving links / pages for imprint, respective terms (e.g. shipping, pricing, etc) and necessary opt-outs (tracking) or other legal texts: LeanKoala offers a variety of default “legal” checks; of course, these can be customised and/or extended as needed. 1 2 3

Slide 33

Slide 33 text

pa.ag @peakaceag 33 Even better: go all-in on GDPR compliance monitoring RYTE has a very handy GDPR compliance report showing external scripts on a website that are active prior to the user giving their consent:

Slide 34

Slide 34 text

pa.ag @peakaceag 34 Change of available inventory (e.g. in categories) Monitor for categories and/or other listing pages and ensure they have a certain minimum number of products available at all times:

Slide 35

Slide 35 text

Monitor the same for your filters / facetted navigation! And while you’re at it:

Slide 36

Slide 36 text

pa.ag @peakaceag 36 Lost categories vs new categories Especially in enterprise e-commerce setups, it’s quite common to have a dedicated “shop management” team responsible for maintaining the category tree…

Slide 37

Slide 37 text

pa.ag @peakaceag 37 Monitoring for product details: pricing, availability, etc. Sure, these aren’t purely SEO-related elements – they’re still crucial. Some ideas: URL ✓ Clean-URL & HTTP 200 status ✓ Self-canonicalised ✓ Matches schema.org breadcrumb Product recommendations ✓ Element is present ✓ Minimum number of elements present ✓ Schema.org mark-up for related products Benefits & trust ✓ Elements present Legal & payment info ✓ Necessary legal elements / links (imprint, privacy) are present ✓ Payment information available HTTPS ✓ Valid SSL certificate ✓ Check corresponding non-HTTPs URL for 301 redirect Price ✓ Amount > 0,00 € ✓ Currency according to geo setup Shipping info ✓ Elements present Product description ✓ Description present ✓ Length OK (again, define threshold) ✓ If present: check for internal links or structured elements such as
  • Product ✓ Title present ✓ Length OK (define threshold) ✓ Correct HTML mark-up, e.g.

    ✓ Schema.org product mark-up ✓ Match availability w/ indexing rules

  • Slide 38

    Slide 38 text

    For example: monitor their availability and dynamically adjust your pricing Why not spy on your competitors too?

    Slide 39

    Slide 39 text

    Detecting and monitoring apparent duplication within your website to prevent negative performance Canonicalisation monitoring

    Slide 40

    Slide 40 text

    pa.ag @peakaceag 40 Most common causes of duplicate content E.g. for Google, these examples are each two different URLs: Production server vs https://pa.ag/url-A/ https://pa.ag/url-a/ Case sensitivity https://pa.ag/url-b https://pa.ag/url-b/ Trailing slashes Staging / testing server https://pa.ag https://www.pa.ag non-www vs www http://pa.ag https://pa.ag HTTP vs HTTPS

    Slide 41

    Slide 41 text

    pa.ag @peakaceag 41 Most common causes of duplicate content E.g. for Google, these examples are each two different URLs: https://pa.ag/url-A/ https://pa.ag/url-a/ Case sensitivity https://pa.ag/url-b https://pa.ag/url-b/ Trailing slashes https://pa.ag https://www.pa.ag non-www vs www http://pa.ag https://pa.ag HTTP vs HTTPS Dealing with duplication issues ▪ 301 redirect: e.g. non-www vs www, HTTP vs HTTPs, casing (upper/lower), trailing slashes, Index pages (index.php) ▪ noindex: e.g. white labelling, internal search result pages, work-in-progress content, PPC- and other landing pages ▪ (Self-referencing) canonicals: e.g. for parameters used for tracking, session IDs, printer friendly version, PDF to HTML, etc. ▪ 403 password protect: e.g. staging-/development servers ▪ 404/410 gone: e.g. feeded content that needs to go fast, other outdated/irrelevant or low-quality content i 301 301

    Slide 42

    Slide 42 text

    pa.ag @peakaceag 42 Advanced URL checks for top-notch canonicalisation This only really works well if you’ve cleaned your URL structure beforehand: 1 2 3

    Slide 43

    Slide 43 text

    And production environments, e.g. in other data centres or reachable via additional host names Don‘t forget your staging server(s)

    Slide 44

    Slide 44 text

    pa.ag @peakaceag 44 Different types of staging/test servers are possible Make sure the server is locked down properly to ensure your content doesn’t get indexed in advance – and set up monitoring accordingly Methodology Pros Cons noindex (meta tag/header) ▪ External tools can access without separate access rules ▪ URLs are definitely not indexed ▪ Indexing rules cannot be tested fully (all noindex) ▪ Waste of crawl budget robots.txt ▪ External tools can access without separate access rules ▪ No crawl budget is wasted ▪ Indexing rules cannot be tested fully (only with robots.txt override) ▪ If linked, test URLs may appear in the index (without title/metas). password secured (.htaccess) ▪ No crawl budget is wasted ▪ URLs are definitely not indexed ▪ Everything can be tested properly ▪ External tools must be able to handle password authentication. IP-based access ▪ No crawl budget is wasted ▪ URLs are definitely not indexed ▪ Everything can be tested properly ▪ External tools must be able to handle IP-based authentication. VPN ▪ Completely safe! ▪ So safe, only a few tools can handle it!

    Slide 45

    Slide 45 text

    pa.ag @peakaceag 45 Monitoring your staging/test server URLs Depending on your server’s setup, you need to either check for the http response code or (non-) indexability: 1 2 3

    Slide 46

    Slide 46 text

    I’m not a fan of geo redirects, but sometimes they’re necessary (e.g. for legal reasons) Geo redirect monitoring

    Slide 47

    Slide 47 text

    pa.ag @peakaceag 47 Don‘t automatically redirect users without giving options Better just let the user pick the suggested/relevant international website instead:

    Slide 48

    Slide 48 text

    pa.ag @peakaceag 48 Geo redirects how-to (e.g. if you need to, due to licensing) If the user is guided to a special language folder based on their IP , the redirect needs to be temporary (302 or 307), otherwise caching issues will come up:

    Slide 49

    Slide 49 text

    pa.ag @peakaceag 49 Oh and btw: don‘t do this, either… Disney wastes loads of link equity by relying on JS-redirects:

    Slide 50

    Slide 50 text

    pa.ag @peakaceag 50 Geo redirect monitoring: because this always “breaks“ Ensure proper redirects are in place according to the request’s origin: Requesting the URL using an IP address located in the following geographical region: www.domain.com www.domain.com/de/ e.g. 302 redirect

    Slide 51

    Slide 51 text

    pa.ag @peakaceag 51 How to check geo redirects e.g. with Little Warden Ensure proper redirects are in place according to the request’s origin:

    Slide 52

    Slide 52 text

    Moving beyond HTML mark-up checks Visual change monitoring

    Slide 53

    Slide 53 text

    pa.ag @peakaceag 53 You’re all aware of this by now, right? Google renders almost every URL; but why? Source: https://pa.ag/3t0RVgv According to W3Techs, JavaScript is used […] by 97.1% of all websites. Rendered preview of any given website, including “executed” JavaScript

    Slide 54

    Slide 54 text

    So you can‘t either – otherwise, on JS-heavy websites you could just not do any form of content monitoring/checks (as it’s not part of the regular mark-up) Google doesn’t rely on just an HTML mark-up

    Slide 55

    Slide 55 text

    pa.ag @peakaceag 55 visualping.io - website change detection & alerts Select and relax: visualping lets you know when the page (or selected area) changes What we like about the tool / key features: ▪ Lets you specify %-change (any, tiny, medium, major) ▪ Proxy functionality for geo specific monitoring ▪ Down to checking for changes every 5 minutes ▪ Rendering capabilities, so also great for JS-heavy sites ▪ Loads more!

    Slide 56

    Slide 56 text

    pa.ag @peakaceag 56 hexowatch.com - monitor any website for visual changes Your AI sidekick to monitor any website for changes to visuals, content, source code, technology, availability, or price.

    Slide 57

    Slide 57 text

    No matter what you want and need to monitor, there's a solution for you More monitoring tools

    Slide 58

    Slide 58 text

    pa.ag @peakaceag 58 Uptimerobot.com - continuous uptime checks every 5 min Simple, yet efficient uptime / availability monitoring of domains and/or URLs More: https://uptimerobot.com/#features What we like about the tool / key features: ▪ Super simple interface, 1 minute and you’re good to go ▪ “TV mode” to run monitoring on large screens ▪ Allows custom port- and service monitoring ▪ Pre-defined maintenance windows to automatically pause and re-enable monitoring ▪ Seamless integration with status pages

    Slide 59

    Slide 59 text

    pa.ag @peakaceag 59 Pingbreak.com - free BETA uptime monitor Pingbreak relies on Twitter as its default communication channel (account required) but allows alerting to Slack, Discord, Mattermost, Telegram and custom alert services: More: https://pingbreak.com What we like about the tool / key features: ▪ Entirely free ▪ Unlimited monitoring of websites, at 1 minute intervals ▪ Webhook support allows alerting to almost any service you can think of

    Slide 60

    Slide 60 text

    pa.ag @peakaceag 60 Testomato.com - website content & uptime monitoring Very affordable solution (incl. uptime monitoring + API, pricing starts at $49 monthly) More: https://www.testomato.com What we like about the tool / key features: ▪ Robust and easy-to-use interface ▪ Nice email reports, including direct notification to 3rd party services such as Slack ▪ Errors are automatically re-tested/checked, also from other (geo-) locations, meaning fewer false alarms ▪ Specific checks for server headers, redirects, etc can be set up with one click ▪ Extremely simple setup, not only for URL checks but also for specific content monitoring tasks

    Slide 61

    Slide 61 text

    pa.ag @peakaceag 61 Testomato.com - website content & uptime monitoring Setting up checks is super simple and very visual; totally doable for SEOs without any coding skills: Very easy setup, e.g. custom matching of HTML elements using XPath, so in this case we’re actually monitoring a domain to contain a certain
    by the “id” of “page-content”. i

    Slide 62

    Slide 62 text

    pa.ag @peakaceag 62 Leankoala.com - monitoring meets testing With Leankoala, you can move beyond just simple testing; it comes with 30+ tools that check for different characteristics to ensure a website is functioning correctly. More: https://www.leankoala.com/en/features

    Slide 63

    Slide 63 text

    pa.ag @peakaceag 63 fluxgard.com - enterprise change monitoring Monitor website changes, detect content, code and design edits (includes lighthouse/cookie/network activity changes). Expensive ($10K per quarter)! More: https://fluxguard.com/features

    Slide 64

    Slide 64 text

    pa.ag @peakaceag 64 Need to publish monitoring info to your customers? Try out either status.io or Atlassian Statuspage to make monitoring info available to the public: More: https://status.io & https://pa.ag/3rCpTaS

    Slide 65

    Slide 65 text

    Email is great, but maybe not enough? Connecting monitoring systems to your “working environment” Notifications & alerts

    Slide 66

    Slide 66 text

    pa.ag @peakaceag 66 Zapier.com – connect your apps and automate workflows Zapier allows you to push data from one software system to another, without writing custom code. You can also create multi-step Zaps. Check out this tutorial: https://pa.ag/2PTbQja

    Slide 67

    Slide 67 text

    pa.ag @peakaceag 67 integromat.com - complex automation made easy Harder to use, yet allows for more complex workflows; a really strong Zapier alternative: More: https://www.integromat.com

    Slide 68

    Slide 68 text

    pa.ag @peakaceag 68 tray.io - enterprise level automation solution More: https://tray.io

    Slide 69

    Slide 69 text

    pa.ag @peakaceag 69 Pipedream.com - testing webhooks & APIs made easy More: https://pipedream.com Webhooks allow you to send real-time data from one application to another whenever a given event occurs.

    Slide 70

    Slide 70 text

    Monitoring 3rd party security services / providers Protect your website

    Slide 71

    Slide 71 text

    pa.ag @peakaceag 71 Google Safe Browsing powers e.g. warnings in Chrome The “Deceptive site ahead” is a warning in the Chrome browser that can protect you from phishing, scams, and malware-laden sites

    Slide 72

    Slide 72 text

    pa.ag @peakaceag 72 Google Safe Browsing site status monitoring To ensure your domain doesn‘t show a warning in Google Chrome and Search: Try it for your domain: https://pa.ag/3cjsuQn

    Slide 73

    Slide 73 text

    pa.ag @peakaceag 73 Monitoring this only works with a tool for visual checks Google’s transparency report website was built using the popular JavaScript framework AngularJS, therefore you can’t simply check the HTML mark-up: Tools like visualping render the target URL’s content and therefore can also monitor websites built, for example, entirely in JavaScript.

    Slide 74

    Slide 74 text

    pa.ag @peakaceag 74 Other relevant players on the market Norton Safe Web, Web of Trust, Avira Browser Safety, BitDefender Traffic Light, etc. Source: https://pa.ag/3rwTgv6 These services usually power browser extensions via API, and if your domain is flagged “suspicious” this extension (if installed) might prevent other activity e.g. a click in Google’s SERP which would normally lead a visitor to your website. i

    Slide 75

    Slide 75 text

    There’s tons of smart things to monitor that aren’t necessarily SEO-related at their core; here are some ideas: Monitoring beyond SEO

    Slide 76

    Slide 76 text

    pa.ag @peakaceag 76 Setting up custom alerts directly in Google Analytics (Technical) issues can also be measured using GA, with no external monitoring needed to send those alerts at all: Certain measurements need to be set up by Google Tag Manager: 404 errors, JS errors etc. i

    Slide 77

    Slide 77 text

    pa.ag @peakaceag 77 GTM & Cloud: combine both to measure faulty tracking Monitor your very own marketing tags and ensure they run properly by using GTM call back functionality which passes the data to Google cloud: How to 1. Set up Google Tag Manager Monitoring template 2. Choose website / marketing tracking tags to be monitored 3. Define request URL to talk to Google Cloud Functions (GET request endpoint) 4. Send Google Tag Manager Callback data to Google Cloud 5. Connect Cloud function to send data to BigQuery table 6. Evaluate failing tracking tags in Google BigQuery i +

    Slide 78

    Slide 78 text

    pa.ag @peakaceag 78 Analyse, check and review faulty marketing tracking Using BigQuery you can review both failing and successful tracking tags – which in turn also means you can spot pages with no tracking at all:

    Slide 79

    Slide 79 text

    pa.ag @peakaceag 79 Web performance is crucial - monitor respectively! The current Core Web Vitals set focuses on three aspects of user experience - loading, interactivity, and visual stability - and includes the following metrics/thresholds: Source: https://pa.ag/3irantb LCP measures loading performance. To provide a good UX, LCP should occur within 2.5 seconds. FID measures interactivity. To provide a good UX, pages should have an FID under 100 milliseconds. CLS measures visual stability. To provide a good UX, pages should maintain a CLS of less than 0.1. i

    Slide 80

    Slide 80 text

    pa.ag @peakaceag 80 SpeedCurve: all you need in #webperf monitoring By far the most comprehensive toolset on the market allowing you to monitor ANY metric you deem relevant, not only for yourself but also for your competitors: More: https://speedcurve.com

    Slide 81

    Slide 81 text

    pa.ag @peakaceag 81 Peak Ace 🖤 SMX Munich → https://pa.ag/smx21

    Slide 82

    Slide 82 text

    Care for the slides? www.pa.ag twitter.com/peakaceag facebook.com/peakaceag Take your career to the next level: jobs.pa.ag Email us: [email protected] Bastian Grimm [email protected]