Slide 1

Slide 1 text

Five tech SEO hacks you never knew existed Bastian Grimm, Peak Ace AG | @basgr

Slide 2

Slide 2 text

I couldn’t choose just five … First of all: I lied!

Slide 3

Slide 3 text

Instead, I’m going to help you.

Slide 4

Slide 4 text

To prove to management that they need your SEO program I’m going to help you

Slide 5

Slide 5 text

To demonstrate SEO ROI with simple “feature test” deployments I am going to help you

Slide 6

Slide 6 text

To fix legacy systems without having to beg for development resources I am going to help you

Slide 7

Slide 7 text

To easily build a proof-of-concept rollout I am going to help you

Slide 8

Slide 8 text

But how? Sound good? Great!

Slide 9

Slide 9 text

Or as some people call it: “edge SEO“. Ever heard of it? Serverless SEO

Slide 10

Slide 10 text

Using Workers to overcome challenges and limitations with popular CMS and ecommerce platforms.

Slide 11

Slide 11 text

Wait… what? Because: in 2021, you won‘t need server access for SEO anymore!

Slide 12

Slide 12 text

Before we talk about Workers, we need to talk HTTP requests – and CDNs: Establishing a common ground

Slide 13

Slide 13 text

pa.ag @peakaceag 13 A (very) simplified request lifecycle Your computer Your browser Database server (in most cases) DNS server e.g. to translate domain<>IP Web server aka “origin server”

Slide 14

Slide 14 text

If you’re not familiar with the term CDN: “A content delivery network (CDN) is a globally distributed network of servers deployed in multiple data centers around the globe.” Let's introduce a CDN to the mix

Slide 15

Slide 15 text

pa.ag @peakaceag 15 Using a CDN, all requests will pass through “edge servers“ When we ignore DNS, databases etc for a minute, this is what it would look like: First request, ever. peakace.js is not cached on edge server yet Origin server Request: peakace.js Request: peakace.js peakace.js delivered from origin server Response: peakace.js peakace.js gets cached on edge server

Slide 16

Slide 16 text

pa.ag @peakaceag 16 Using a CDN, all requests will pass through “edge servers“ When we ignore DNS, databases etc for a minute, this is what it would look like: Origin server Request: peakace.js peakace.js delivered from edge server peakace.js is cached on edge server Second request (independent of user)

Slide 17

Slide 17 text

pa.ag @peakaceag 17 Especially for global businesses, CDNs can be a great help Use CDNPerf.com to find the one that suits you best, depending on where you are and which regions/countries you serve most. This will positively impact TTFB! Give it a try: https://www.cdnperf.com/ vs

Slide 18

Slide 18 text

pa.ag @peakaceag 18 CDNs at a glance Some of the most popular CDN providers out there

Slide 19

Slide 19 text

pa.ag @peakaceag 19 Back in Sep. 2017, Cloudflare introduced their “Workers“ Which ultimately became publicly available in March 2018: Source: https://blog.cloudflare.com/introducing-cloudflare-workers/

Slide 20

Slide 20 text

Workers use the V8 JavaScript engine built by Google and run globally on Cloudflare's edge servers. A typical Worker script executes in <1ms – that’s fast! So… what‘s a Worker?

Slide 21

Slide 21 text

… using the latest standard language features You can execute any JavaScript…

Slide 22

Slide 22 text

…directly from your Worker, or forward them elsewhere Respond to requests…

Slide 23

Slide 23 text

Also, you can do multiple requests, in series or parallel and combine the results Send requests to 3rd-party servers

Slide 24

Slide 24 text

Intercept and modify HTTP request and response URLs, status, headers, and body content. Seriously though, this is WILD!

Slide 25

Slide 25 text

Some of the potential use-cases could be to:

Slide 26

Slide 26 text

301s, 302s – or even geo-specific ones, if needed Implement redirects

Slide 27

Slide 27 text

Adding/changing X-robots or even hreflang annotations Modify HTTP headers

Slide 28

Slide 28 text

Overwrite the full file, add or remove single directives Modify robots.txt

Slide 29

Slide 29 text

Inject, change or even remove robots meta-annotations Modify meta directives

Slide 30

Slide 30 text

Create unique page titles or meta descriptions when needed Update page titles & descriptions

Slide 31

Slide 31 text

… or schema.org mark-up Implement hreflang

Slide 32

Slide 32 text

Essentially, you can do almost anything – because you have full access to the request and response objects! Inject/remove (body) content

Slide 33

Slide 33 text

pa.ag @peakaceag 33 However, does this only work with Cloudflare? Similar implementations are also available with other CDN providers: Lambda@Edge Compute@Edge Edge Workers Cloudflare Workers

Slide 34

Slide 34 text

pa.ag @peakaceag 34 But today it‘s all about Cloudflare, because: The top 3 providers (CF, AWS, Akamai) have 89% of all customers; Cloudflare alone is used by 81% of all sites that rely on a CDN (according to W3Techs): Source: https://pa.ag/2U9kvAh

Slide 35

Slide 35 text

A practical and hands-on guide to setting up and running Cloudflare Workers for your SEO Excited? Let‘s go!

Slide 36

Slide 36 text

pa.ag @peakaceag 36 Go create your own (free) account over at cloudflare.com Once your account is activated, you can add your first site/domain: Add your domain name - it can be registered anywhere (as long as you can change the DNS at your current provider)

Slide 37

Slide 37 text

pa.ag @peakaceag 37 To play with it, the free account + $0 plan is sufficient: This is good enough for testing things out…!

Slide 38

Slide 38 text

pa.ag @peakaceag 38 Next, you‘ll get to see the current DNS configuration Yours should look a little like this: at least two records, one for the root-domain, one for the www sub-domain, both pointing to the IP address of your hosting provider: On to the next screen!

Slide 39

Slide 39 text

pa.ag @peakaceag 39 Now, CF will show you which nameservers to use instead: Nameservers with the current provider, in my case nsX.inwx.de My new nameservers with Cloudflare to be used instead

Slide 40

Slide 40 text

pa.ag @peakaceag 40 Switching existing nameservers over to Cloudflare At my hosting provider, it looks like this: My new nameservers Cloudflare told me to use instead (see prev. screen)

Slide 41

Slide 41 text

pa.ag @peakaceag 41 Switch back to tell them you’re all set: Nameservers with the current provider, in my case nsX.inwx.de My new nameservers with Cloudflare, to be used instead

Slide 42

Slide 42 text

pa.ag @peakaceag 42 Cloudflare is going to email you when things are ready: Beware, this can take up to 24hrs depending on the registrars and nameservers: Your CF dashboard should look like this after the successful NS change.

Slide 43

Slide 43 text

pa.ag @peakaceag 43 Speaking of nameservers – are you already using 1.1.1.1 ? Cloudflare runs the fastest DNS resolver available. Why wouldn‘t you use it? More: https://pa.ag/3zueHRX

Slide 44

Slide 44 text

pa.ag @peakaceag 44 Really impatient? Purge cache (e.g. A records) on 1.1.1.1

Slide 45

Slide 45 text

pa.ag @peakaceag 45 Can‘t wait – or just want to check DNS records? Free tool recommendation: MxToolbox > DNS Lookup Source: https://pa.ag/3vuBObV

Slide 46

Slide 46 text

So far, so good – let‘s talk Workers now.

Slide 47

Slide 47 text

pa.ag @peakaceag 47 A Worker, in its simplest form: This function defines triggers for a Worker script to execute. In this case, we intercept the request and send a (custom) response. Our custom response is defined here, for now we simply: (6) log the request object (7) fetch the requested URL from the origin server (8) log the response object (10) send the (unmodified) response back to the client

Slide 48

Slide 48 text

pa.ag @peakaceag 48 Cloudflare Workers Playground: cloudflareworkers.com Test-drive Cloudflare Workers; create/edit and see the results live:

Slide 49

Slide 49 text

pa.ag @peakaceag 49 Cloudflare Workers Playground: cloudflareworkers.com Test-drive Cloudflare Workers; create/edit and see the results live:

Slide 50

Slide 50 text

pa.ag @peakaceag 50 Let's build our own first Worker / custom handleRequest:

Slide 51

Slide 51 text

pa.ag @peakaceag 51 Let‘s test-drive this on the Workers Playground:

Slide 52

Slide 52 text

Let's live-deploy your first Worker! Enough testing…

Slide 53

Slide 53 text

pa.ag @peakaceag 53 Let‘s live-deploy the Worker to Cloudflare's edge servers Select your domain > Workers > Manage Workers

Slide 54

Slide 54 text

pa.ag @peakaceag 54 Here’s how to add a Worker You‘ll be redirected from the “all Workers“ overview to the following mask: Give your Worker a unique name Copy & paste the Workers code you just tested on the Playground

Slide 55

Slide 55 text

pa.ag @peakaceag 55 Confirm deployment and assign routing Go back to > Workers > Add route 1 2

Slide 56

Slide 56 text

pa.ag @peakaceag 56 Comparison: left (original), right (Worker-enabled) Double-check live! Also, don‘t fall victim to caching, use “Disable Cache“ (see: Network tab) in Chrome Dev Tools to be sure you‘re seeing the latest version: vs

Slide 57

Slide 57 text

Let’s have some SEO fun then? So much for theory…

Slide 58

Slide 58 text

Please understand that all scripts / source codes are meant as examples only. Ensure you know what you’re doing when using them in a production environment! Warning: (maybe) not production-ready!

Slide 59

Slide 59 text

301 / 302 / bulk redirects & proxy passthrough 1. Redirects

Slide 60

Slide 60 text

pa.ag @peakaceag 60 Redirects on the edge using the Response API To execute any type of HTTP redirect, we need to use the Response Runtime API which – conveniently – also provides a static method called “redirect()”: Source: https://pa.ag/3gvXYoL let response = new Response(body, options) return Response.redirect(destination, status) or just:

Slide 61

Slide 61 text

pa.ag @peakaceag 61 The Cloudflare Workers Docs is a solid starting point: More: https://pa.ag/3gNd8Gn

Slide 62

Slide 62 text

pa.ag @peakaceag 62 Different types of implementations at a glance (#18): 302 redirect, (#22): 301 redirect, (#26): a reverse proxy call and (#31-36): multiple redirects, selecting a single destination from a map based on a URL parameter:

Slide 63

Slide 63 text

pa.ag @peakaceag 63 A quick overview to see how things are working… Source: https://httpstatus.io Correct, in fact this is not a redirect ID is not configured in redirectMap

Slide 64

Slide 64 text

pa.ag @peakaceag 64 To “reverse proxy” a request, you can use the Fetch API It provides an interface for (asynchronously) fetching resources via HTTP requests inside of a Worker script: Source: https://pa.ag/3wpS3YT const response = await fetch(URL, options) Asynchronous tasks, such as fetch, are not executed at the top level in a Worker script and must be executed within a FetchEvent handler.

Slide 65

Slide 65 text

pa.ag @peakaceag 65 return await fetch(“https://example.com ”) Easily “migrate” a blog hosted on a sub-domain to a sub-folder on your main domain – without actually moving it. Great tutorial: https://pa.ag/2Tw7LD8 Content shown from example.com Request sent from bastiangrimm.dev

Slide 66

Slide 66 text

pa.ag @peakaceag 66 Verifying that this all happens “on the edge“: Zoom into any of the response headers for an originally requested URL such as bastiangrimm.dev/redirects/302:

Slide 67

Slide 67 text

Safeguards, monitoring, serving a default file, etc. 2. robots.txt

Slide 68

Slide 68 text

pa.ag @peakaceag 68 Which version would you like to wake up to? Preventing “SEO heart attacks“ using a Worker to monitor and safeguard your robots.txt file is one of many use-cases that are super easy to do: This is how I uploaded the robots.txt file to my test server This is what the Worker running in the background changed the output to vs

Slide 69

Slide 69 text

pa.ag @peakaceag 69 Preventing a global “Disallow: /“ in robots.txt (#5-6): define defaults, (#15-16): if robots.txt returns 200, read its content, (#19-24): replace if global disallow exists, (#27-29): return default “allow all” if file doesn’t exist

Slide 70

Slide 70 text

So, let’s do some “dynamic serving“ shall we? Static files become dynamic

Slide 71

Slide 71 text

pa.ag @peakaceag 71 For demonstration purposes only: UA-based delivery (#10): get User-Agent, (#16-17): add dynamic Sitemap-link if UA contains “googlebot”

Slide 72

Slide 72 text

pa.ag @peakaceag 72 vs Live-test & compare robots.txt using technicalseo.com Left screen shows bastiangrimm.dev/robots.txt being requested using a Googlebot User-Agent string, right screen is the default output: Free testing tool: https://technicalseo.com/tools/robots-txt/ Or use…

Slide 73

Slide 73 text

Some systems cause endless headaches for SEOs – routing them through Cloudflare and using a Worker works very well! Easily overwrite files which are “not meant” to be changed?

Slide 74

Slide 74 text

Modifying robots HTML directives on the fly 3. Robots meta tags

Slide 75

Slide 75 text

pa.ag @peakaceag 75 Say hello to the HTMLRewriter class! The HTMLRewriter allows you to build comprehensive and expressive HTML parsers inside of a Cloudflare Workers application: Source: https://pa.ag/2RTpqEt new HTMLRewriter() .on("*", new ElementHandler()) .onDocument(new DocumentHandler())

Slide 76

Slide 76 text

pa.ag @peakaceag 76 Let's give it a try and work with and first (#24-25): pass tags to ElementHandler, (#9-11): if it’s , set it to “index,nofollow”, (#14-16): if it’s , add another directive for bingbot:

Slide 77

Slide 77 text

pa.ag @peakaceag 77 I mean, this should be clear – but just in case: Verifying presence of Worker-modified robots meta directives via GSC

Slide 78

Slide 78 text

pa.ag @peakaceag 78 If you want to work with/on every HTML element… This selector would pass every HTML element to your ElementHandler. By using element.tagName, you could then identify which element has been passed along: return new HTMLRewriter() .on("*", new ElementHandler()) .transform(response)

Slide 79

Slide 79 text

Of course, updating, changing or entirely replacing both elements is also possible! 4. Title and meta description

Slide 80

Slide 80 text

pa.ag @peakaceag 80 Using element selectors in HTMLRewriter Often, you want only to process very specific elements, e.g. tags – but not all of them. Maybe it’s just the meta description you care about? new HTMLRewriter() .on('meta[name="description"]', new ElementHandler()) .transform(response) More on selectors: https://pa.ag/35xw073

Slide 81

Slide 81 text

pa.ag @peakaceag 81 Updating or replacing titles and descriptions is easy… (#10): forced overwrite, (#14-22): conditional changes to the meta description Element selectors are super powerful yet easy to use:

Slide 82

Slide 82 text

Maybe you missed some during your last migration? 5. Rewriting links

Slide 83

Slide 83 text

pa.ag @peakaceag 83 Maybe you should have listened to me in the first place!? Check out my presentation over at SlideShare: Slides: http://pa.ag/migration_search_y

Slide 85

Slide 85 text

Tell Google about localised versions of your page 6. Deploying hreflang

Slide 86

Slide 86 text

pa.ag @peakaceag 86 HTTP hreflang annotations on the edge We‘ve just had plenty of HTML, so let‘s use HTTP headers instead – of course both ways work just fine though:

Slide 87

Slide 87 text

pa.ag @peakaceag 87 Verify, e.g. using Chrome Developer Console: Network > %URL (disable cache) > Headers > Response Headers

Slide 88

Slide 88 text

pa.ag @peakaceag 88 Before you ask: X-Robots directives are also possible… … and the same is true for X-Robots Rel-Canonical annotations:

Slide 89

Slide 89 text

You can serve “proper“ under maintenance pages directly from your Cloudflare Worker 7. Serving HTTP 503s

Slide 90

Slide 90 text

pa.ag @peakaceag 90 Combining HTTP 503 error with a Retry-After header Retry-After indicates how long the UA should wait before making a follow-up request: The server is currently unable to handle the request due to a temporary overloading or maintenance of the server […]. If known, the length of the delay MAY be indicated in a Retry-After header.

Slide 91

Slide 91 text

Just in case I’ve somehow not made my point yet – you can do REALLY cool stuff and have control over the full HTML response – so adding content is easy. 8. Injecting content

Slide 92

Slide 92 text

pa.ag @peakaceag 92 Replacing, prepending, appending … whatever you like?

Slide 93

Slide 93 text

pa.ag @peakaceag 93 You could also (dynamically) read from an external feed Feeding in content from other sources is simple; below shows reading a JSON feed, parsing the input and inject to the

of the target page:

Slide 94

Slide 94 text

One of the key challenges when using CDNs: logfiles are literally everywhere – and a lot of requests don‘t even make it to the origin server… 9. Collecting logfiles

Slide 95

Slide 95 text

pa.ag @peakaceag 95 Cloudflare provides extensive possibilities for logfiles What I really love about this: direct integration with Google Cloud products! Note: you need the Enterprise plan for this. More: https://pa.ag/3gnj8GF

Slide 96

Slide 96 text

pa.ag @peakaceag 96 Peak Ace log file auditing stack. Interested? > hi@pa.ag Log files are stored in Google Cloud Storage, processed in Dataprep, exported to BigQuery and visualised in Data Studio via BigQuery Connector. 8 Google Data Studio Data transmission Display data Import / API Google Dataprep 6 7 Google BigQuery 1 Log files GSC API v3 GA API v4 GA GSC 2 3 6 5 Google Apps Script API 4

Slide 97

Slide 97 text

pa.ag @peakaceag 97 New to logfile auditing? No worries, I got you covered: Check out my presentation over at SlideShare: Slides: http://pa.ag/slides

Slide 98

Slide 98 text

Yeah… actually this is how it all started; and still it‘s (one of) the most powerful tools to use for it! 10. Web performance

Slide 99

Slide 99 text

pa.ag @peakaceag 99 Add native lazy loading for images to your HMTL mark-up Keep in mind: you don‘t want to lazy load all of your images (e.g. not the hero image); also, if you‘re using iframes, you might want to pass “iframe“ to the HTMLRewriter:

Slide 100

Slide 100 text

pa.ag @peakaceag 100 Cleaning up HTML code for performance reasons E.g. by removing unwanted pre*-stages, or by adding async/defer to JS calls: More clean-up Worker scripts: https://gist.github.com/Nooshu

Slide 101

Slide 101 text

pa.ag @peakaceag 101 A detailed guide on how to cache HTML with CF Workers More: https://pa.ag/3xk8rdt

Slide 102

Slide 102 text

You can use Workers to fix broken tracking, allow for better accessibility, and much more. And tons of other things…

Slide 103

Slide 103 text

Some stuff to make your (Worker) life just a bit easier… Tool recommendations

Slide 104

Slide 104 text

pa.ag @peakaceag 104 Sloth: an advanced CF Worker Code Generator & CMS A very handy (and free) UI to manage Workers for changing robots.txt, titles & descriptions, redirects, hreflang, and much more: Check it out: https://sloth.cloud

Slide 105

Slide 105 text

pa.ag @peakaceag 105 Tool recommendation: Lil Redirector “Lil Redirector works by persisting and querying redirects inside of Workers KV, and includes an administrator UI for creating, modifying, and deleting redirects.” More: https://pa.ag/3q3EZGx

Slide 106

Slide 106 text

pa.ag @peakaceag 106 Workers KV – wait, what? Source: https://pa.ag/3vmTiXB Workers KV is a global, low-latency, key- value data store. It supports exceptionally high read volumes […] Workers KV is generally good for use-cases where you need to write relatively infrequently, but read quickly and frequently. It is optimised for these high-read applications.

Slide 107

Slide 107 text

pa.ag @peakaceag 107 Web Scraper based on Cloudflare Workers “Web Scraper makes it effortless to scrape websites. You provide a URL & CSS selector, and it will return you JSON containing the text contents of the matching elements.” More: https://pa.ag/3woCv7T

Slide 108

Slide 108 text

pa.ag @peakaceag 108 Technically not a tool, but a very comprehensive guide: More: https://pa.ag/3xnWDqy

Slide 109

Slide 109 text

Downsides, risks – and more… We need to talk responsibility

Slide 110

Slide 110 text

pa.ag @peakaceag 110 [This] dates back to the time of the French Revolution At least, if you believe Wikipedia that is… Source: https://pa.ag/35nQSx6 With great power comes great responsibility.

Slide 111

Slide 111 text

You could essentially change everything you wanted.

Slide 112

Slide 112 text

pa.ag @peakaceag 112 Great summary over at ContentKing, well worth a read: What are the downsides […] What risks are involved? Source: https://pa.ag/3xhYUUk

Slide 113

Slide 113 text

10 million requests are included, every 1 million currently costs $0.50 extra - not crazy expensive, but in larger-scale setups certainly means additional costs. Risk of costs

Slide 114

Slide 114 text

This might interfere with current processes, or at the very least, ensure Workers become part of a standardised process (e.g. deployment). PCI compliance

Slide 115

Slide 115 text

The underlying codebase might do/require something that could accidentally be overwritten on the edge Potential conflict in code

Slide 116

Slide 116 text

Additional modifications on the edge could result in massive debugging. Again: proper documentation and processes are crucial! Potential to introduce frontend bugs

Slide 117

Slide 117 text

Always synchronise your activities with relevant stakeholders!

Slide 118

Slide 118 text

… there is also evil Where there is good…

Slide 119

Slide 119 text

pa.ag @peakaceag 119 Yep, you can do evil things with Workers for sure: Source: https://pa.ag/3cFq0Nq

Slide 120

Slide 120 text

pa.ag @peakaceag 120 Dynamically creating links to “Baccarat Sites” “[…] at the CF Workers management area, there was a suspicious Worker listed called hang. It had been set to run on any URL route requests to the website:” Source: https://pa.ag/3cFq0Nq After further investigation [by sucuri], it was found that the website was actually loading SEO spam content through Cloudflare’s Workers service. This service allows someone to load external third-party JavaScript that’s not on their website’s hosting server.

Slide 121

Slide 121 text

pa.ag @peakaceag 121 The suspicious “hang” Worker injection in detail: Source: https://pa.ag/3cFq0Nq ▪ The JavaScript Worker first checks for the HTTP request’s user-agent and identifies whether it contains Google/Googlebot or naver within the string text. ▪ If the user-agent string contains either of these keywords, then the JavaScript makes a request to the malicious domain naverbot[.]live to generate the SEO spam links to be injected into the victim’s website. ▪ After this step, the Worker then injects the retrieved SEO spam link data right before the final

Slide 122

Slide 122 text

pa.ag @peakaceag 122 If you‘re now wondering how to distribute Workers…? Source: https://pa.ag/3zq0Mwd

Slide 123

Slide 123 text

Use at least two-factor authentication with Cloudflare While you‘re at it:

Slide 124

Slide 124 text

Care for the slides? www.pa.ag twitter.com/peakaceag facebook.com/peakaceag Take your career to the next level: jobs.pa.ag hi@pa.ag Bastian Grimm bg@pa.ag