A hands-on guide on how to use "Edge SEO" to drive your SEO program forward, including a detailed explanation on how to setup, run and create Cloudflare Workers for SEO tasks. Presented at SEOkomm 2021 in Salzburg, Austria.
pa.ag @peakaceag 6 A (very) simplified request lifecycle Your computer Your browser Database server (in most cases) DNS server e.g. to translate domain<>IP Web server aka “origin server”
If you’re not familiar with the term CDN: “A content delivery network (CDN) is a globally distributed network of servers deployed in multiple data centers around the globe.” Let's introduce a CDN to the mix
pa.ag @peakaceag 8 Using a CDN, all requests will pass through “edge servers“ When we ignore DNS, databases etc for a minute, this is what it would look like: First request, ever. peakace.js is not cached on edge server yet Origin server Request: peakace.js Request: peakace.js peakace.js delivered from origin server Response: peakace.js peakace.js gets cached on edge server
pa.ag @peakaceag 9 Using a CDN, all requests will pass through “edge servers“ When we ignore DNS, databases etc for a minute, this is what it would look like: Origin server Request: peakace.js peakace.js delivered from edge server peakace.js is cached on edge server Second request (independent of user)
pa.ag @peakaceag 11 Back in Sep. 2017, Cloudflare introduced their “Workers“ Which ultimately became publicly available in March 2018: Source: https://blog.cloudflare.com/introducing-cloudflare-workers/
Workers use the V8 JavaScript engine built by Google and run globally on Cloudflare's edge servers. A typical Worker script executes in <1ms – that’s fast! So… what‘s a Worker?
pa.ag @peakaceag 25 However, does this only work with Cloudflare? Similar implementations are also available with other CDN providers: [email protected][email protected] Edge Workers Cloudflare Workers
pa.ag @peakaceag 26 But today it‘s all about Cloudflare, because: The top 3 providers (CF, AWS, Akamai) have 89% of all customers; Cloudflare alone is used by 81% of all sites that rely on a CDN (according to W3Techs): Source: https://pa.ag/2U9kvAh
pa.ag @peakaceag 28 Go create your own (free) account over at cloudflare.com Once your account is activated, you can add your first site/domain: Add your domain name - it can be registered anywhere (as long as you can change the DNS at your current provider)
pa.ag @peakaceag 30 Next, you‘ll get to see the current DNS configuration Yours should look a little like this: at least two records, one for the root-domain, one for the www sub-domain, both pointing to the IP address of your hosting provider: On to the next screen!
pa.ag @peakaceag 31 Now, CF will show you which nameservers to use instead: Nameservers with the current provider, in my case nsX.inwx.de My new nameservers with Cloudflare to be used instead
pa.ag @peakaceag 32 Switching existing nameservers over to Cloudflare At my hosting provider, it looks like this: My new nameservers Cloudflare told me to use instead (see prev. screen)
pa.ag @peakaceag 33 Switch back to tell them you’re all set: Nameservers with the current provider, in my case nsX.inwx.de My new nameservers with Cloudflare, to be used instead
pa.ag @peakaceag 34 Cloudflare is going to email you when things are ready: Beware, this can take up to 24hrs depending on the registrars and nameservers: Your CF dashboard should look like this after the successful NS change.
pa.ag @peakaceag 36 A Worker, in its simplest form: This function defines triggers for a Worker script to execute. In this case, we intercept the request and send a (custom) response. Our custom response is defined here, for now we simply: (6) log the request object (7) fetch the requested URL from the origin server (8) log the response object (10) send the (unmodified) response back to the client
pa.ag @peakaceag 43 Here’s how to add a Worker You‘ll be redirected from the “all Workers“ overview to the following mask: Give your Worker a unique name Copy & paste the Workers code you just tested on the Playground
pa.ag @peakaceag 45 Comparison: left (original), right (Worker-enabled) Double-check live! Also, don‘t fall victim to caching, use “Disable Cache“ (see: Network tab) in Chrome Dev Tools to be sure you‘re seeing the latest version: vs
Please understand that all scripts / source codes are meant as examples only. Ensure you know what you’re doing when using them in a production environment! Warning: (maybe) not production-ready!
pa.ag @peakaceag 49 Redirects on the edge using the Response API To execute any type of HTTP redirect, we need to use the Response Runtime API which – conveniently – also provides a static method called “redirect()”: Source: https://pa.ag/3gvXYoL let response = new Response(body, options) return Response.redirect(destination, status) or just:
pa.ag @peakaceag 51 Different types of implementations at a glance (#18): 302 redirect, (#22): 301 redirect, (#26): a reverse proxy call and (#31-36): multiple redirects, selecting a single destination from a map based on a URL parameter:
pa.ag @peakaceag 52 A quick overview to see how things are working… Source: https://httpstatus.io Correct, in fact this is not a redirect ID is not configured in redirectMap
pa.ag @peakaceag 53 To “reverse proxy” a request, you can use the Fetch API It provides an interface for (asynchronously) fetching resources via HTTP requests inside of a Worker script: Source: https://pa.ag/3wpS3YT const response = await fetch(URL, options) Asynchronous tasks, such as fetch, are not executed at the top level in a Worker script and must be executed within a FetchEvent handler.
pa.ag @peakaceag 54 return await fetch(“https://example.com ”) Easily “migrate” a blog hosted on a sub-domain to a sub-folder on your main domain – without actually moving it. Great tutorial: https://pa.ag/2Tw7LD8 Content shown from example.com Request sent from bastiangrimm.dev
pa.ag @peakaceag 55 Verifying that this all happens “on the edge“: Zoom into any of the response headers for an originally requested URL such as bastiangrimm.dev/redirects/302:
pa.ag @peakaceag 57 Which version would you like to wake up to? Preventing “SEO heart attacks“ using a Worker to monitor and safeguard your robots.txt file is one of many use-cases that are super easy to do: This is how I uploaded the robots.txt file to my test server This is what the Worker running in the background changed the output to vs
pa.ag @peakaceag 58 Preventing a global “Disallow: /“ in robots.txt (#5-6): define defaults, (#15-16): if robots.txt returns 200, read its content, (#19-24): replace if global disallow exists, (#27-29): return default “allow all” if file doesn’t exist
pa.ag @peakaceag 60 For demonstration purposes only: UA-based delivery (#10): get User-Agent, (#16-17): add dynamic Sitemap-link if UA contains “googlebot”
pa.ag @peakaceag 61 vs Live-test & compare robots.txt using technicalseo.com Left screen shows bastiangrimm.dev/robots.txt being requested using a Googlebot User-Agent string, right screen is the default output: Free testing tool: https://technicalseo.com/tools/robots-txt/ Or use…
Some systems cause endless headaches for SEOs – routing them through Cloudflare and using a Worker works very well! Easily overwrite files which are “not meant” to be changed?
pa.ag @peakaceag 64 Say hello to the HTMLRewriter class! The HTMLRewriter allows you to build comprehensive and expressive HTML parsers inside of a Cloudflare Workers application: Source: https://pa.ag/2RTpqEt new HTMLRewriter() .on("*", new ElementHandler()) .onDocument(new DocumentHandler())
pa.ag @peakaceag 65 Let's give it a try and work with and first (#24-25): pass tags to ElementHandler, (#9-11): if it’s , set it to “index,nofollow”, (#14-16): if it’s , add another directive for bingbot:
pa.ag @peakaceag 67 If you want to work with/on every HTML element… This selector would pass every HTML element to your ElementHandler. By using element.tagName, you could then identify which element has been passed along: return new HTMLRewriter() .on("*", new ElementHandler()) .transform(response)
pa.ag @peakaceag 69 Using element selectors in HTMLRewriter Often, you want only to process very specific elements, e.g. tags – but not all of them. Maybe it’s just the meta description you care about? new HTMLRewriter() .on('meta[name="description"]', new ElementHandler()) .transform(response) More on selectors: https://pa.ag/35xw073
pa.ag @peakaceag 70 Updating or replacing titles and descriptions is easy… (#10): forced overwrite, (#14-22): conditional changes to the meta description Element selectors are super powerful yet easy to use:
pa.ag @peakaceag 72 HTMLRewriter listening to and tags (#29-30): Passing href/src attributes to a class which (#20): replaces oldURL with newUrl and (#16-18): ensures https-availability Based on: https://pa.ag/35llTSo
pa.ag @peakaceag 74 HTTP hreflang annotations on the edge We‘ve just had plenty of HTML, so let‘s use HTTP headers instead – of course both ways work just fine though:
pa.ag @peakaceag 78 Combining HTTP 503 error with a Retry-After header Retry-After indicates how long the UA should wait before making a follow-up request: The server is currently unable to handle the request due to a temporary overloading or maintenance of the server […]. If known, the length of the delay MAY be indicated in a Retry-After header.
Just in case I’ve somehow not made my point yet – you can do REALLY cool stuff and have control over the full HTML response – so adding content is easy. 8. Injecting content
pa.ag @peakaceag 81 You could also (dynamically) read from an external feed Feeding in content from other sources is simple; below shows reading a JSON feed, parsing the input and inject to the of the target page:
pa.ag @peakaceag 83 Add native lazy loading for images to your HMTL mark-up Keep in mind: you don‘t want to lazy load all of your images (e.g. not the hero image); also, if you‘re using iframes, you might want to pass “iframe“ to the HTMLRewriter:
pa.ag @peakaceag 84 Cleaning up HTML code for performance reasons E.g. by removing unwanted pre*-stages, or by adding async/defer to JS calls: More clean-up Worker scripts: https://gist.github.com/Nooshu
pa.ag @peakaceag 88 [This] dates back to the time of the French Revolution At least, if you believe Wikipedia that is… Source: https://pa.ag/35nQSx6 With great power comes great responsibility.
pa.ag @peakaceag 90 Great summary over at ContentKing, well worth a read: What are the downsides […] What risks are involved? Source: https://pa.ag/3xhYUUk
10 million requests are included, every 1 million currently costs $0.50 extra - not crazy expensive, but in larger-scale setups certainly means additional costs. Risk of costs
This might interfere with current processes, or at the very least, ensure Workers become part of a standardised process (e.g. deployment). PCI compliance
Additional modifications on the edge could result in massive debugging. Again: proper documentation and processes are crucial! Potential to introduce frontend bugs
pa.ag @peakaceag 98 Dynamically creating links to “Baccarat Sites” “[…] at the CF Workers management area, there was a suspicious Worker listed called hang. It had been set to run on any URL route requests to the website:” Source: https://pa.ag/3cFq0Nq After further investigation [by sucuri], it was found that the website was actually loading SEO spam content through Cloudflare’s Workers service. This service allows someone to load external third-party JavaScript that’s not on their website’s hosting server.
pa.ag @peakaceag 99 The suspicious “hang” Worker injection in detail: Source: https://pa.ag/3cFq0Nq ▪ The JavaScript Worker first checks for the HTTP request’s user-agent and identifies whether it contains Google/Googlebot or naver within the string text. ▪ If the user-agent string contains either of these keywords, then the JavaScript makes a request to the malicious domain naverbot[.]live to generate the SEO spam links to be injected into the victim’s website. ▪ After this step, the Worker then injects the retrieved SEO spam link data right before the final
Care for the slides? www.pa.ag twitter.com/peakaceag facebook.com/peakaceag Take your career to the next level: jobs.pa.ag [email protected] Bastian Grimm [email protected]