Serverless SEO / Edge SEO - SEOkomm 2021

Bastian Grimm, Peak Ace AG | @basgr Serverless SEO a.k.a.
“Edge SEO” - ein Hands-on-Guide.

Cloudflare Worker

In 2022 braucht ihr für SEO nicht mehr unbedingt einen
Serverzugang! Access not needed!

Daher: Folien auf Englisch … Ich bin nur Ersatz!

Before we talk about Workers, we need to talk HTTP
requests – and CDNs: Establishing a common ground

pa.ag @peakaceag 6 A (very) simplified request lifecycle Your computer
Your browser Database server (in most cases) DNS server e.g. to translate domain<>IP Web server aka “origin server”

If you’re not familiar with the term CDN: “A content
delivery network (CDN) is a globally distributed network of servers deployed in multiple data centers around the globe.” Let's introduce a CDN to the mix

pa.ag @peakaceag 8 Using a CDN, all requests will pass
through “edge servers“ When we ignore DNS, databases etc for a minute, this is what it would look like: First request, ever. peakace.js is not cached on edge server yet Origin server Request: peakace.js Request: peakace.js peakace.js delivered from origin server Response: peakace.js peakace.js gets cached on edge server

pa.ag @peakaceag 9 Using a CDN, all requests will pass
through “edge servers“ When we ignore DNS, databases etc for a minute, this is what it would look like: Origin server Request: peakace.js peakace.js delivered from edge server peakace.js is cached on edge server Second request (independent of user)

pa.ag @peakaceag 10 CDNs at a glance Some of the
most popular CDN providers out there

pa.ag @peakaceag 11 Back in Sep. 2017, Cloudflare introduced their
“Workers“ Which ultimately became publicly available in March 2018: Source: https://blog.cloudflare.com/introducing-cloudflare-workers/

Workers use the V8 JavaScript engine built by Google and
run globally on Cloudflare's edge servers. A typical Worker script executes in <1ms – that’s fast! So… what‘s a Worker?

… using the latest standard language features You can execute
any JavaScript…

…directly from your Worker, or forward them elsewhere Respond to
requests…

Also, you can do multiple requests, in series or parallel
and combine the results Send requests to 3rd-party servers

Intercept and modify HTTP request and response URLs, status, headers,
and body content. Seriously though, this is WILD!

Some of the potential use-cases could be to:

301s, 302s – or even geo-specific ones, if needed Implement
redirects

Adding/changing X-robots or even hreflang annotations Modify HTTP headers

Overwrite the full file, add or remove single directives Modify
robots.txt

Inject, change or even remove robots meta-annotations Modify meta directives

Create unique page titles or meta descriptions when needed Update
page titles & descriptions

… or schema.org mark-up Implement hreflang

Essentially, you can do almost anything – because you have
full access to the request and response objects! Inject/remove (body) content

pa.ag @peakaceag 25 However, does this only work with Cloudflare?
Similar implementations are also available with other CDN providers: Lambda@Edge Compute@Edge Edge Workers Cloudflare Workers

pa.ag @peakaceag 26 But today it‘s all about Cloudflare, because:
The top 3 providers (CF, AWS, Akamai) have 89% of all customers; Cloudflare alone is used by 81% of all sites that rely on a CDN (according to W3Techs): Source: https://pa.ag/2U9kvAh

A practical and hands-on guide to setting up and running
Cloudflare Workers for your SEO Excited? Let‘s go!

pa.ag @peakaceag 28 Go create your own (free) account over
at cloudflare.com Once your account is activated, you can add your first site/domain: Add your domain name - it can be registered anywhere (as long as you can change the DNS at your current provider)

pa.ag @peakaceag 29 To play with it, the free account
+ $0 plan is sufficient: This is good enough for testing things out…!

pa.ag @peakaceag 30 Next, you‘ll get to see the current
DNS configuration Yours should look a little like this: at least two records, one for the root-domain, one for the www sub-domain, both pointing to the IP address of your hosting provider: On to the next screen!

pa.ag @peakaceag 31 Now, CF will show you which nameservers
to use instead: Nameservers with the current provider, in my case nsX.inwx.de My new nameservers with Cloudflare to be used instead

pa.ag @peakaceag 32 Switching existing nameservers over to Cloudflare At
my hosting provider, it looks like this: My new nameservers Cloudflare told me to use instead (see prev. screen)

pa.ag @peakaceag 33 Switch back to tell them you’re all
set: Nameservers with the current provider, in my case nsX.inwx.de My new nameservers with Cloudflare, to be used instead

pa.ag @peakaceag 34 Cloudflare is going to email you when
things are ready: Beware, this can take up to 24hrs depending on the registrars and nameservers: Your CF dashboard should look like this after the successful NS change.

So far, so good – let‘s talk Workers now.

pa.ag @peakaceag 36 A Worker, in its simplest form: This
function defines triggers for a Worker script to execute. In this case, we intercept the request and send a (custom) response. Our custom response is defined here, for now we simply: (6) log the request object (7) fetch the requested URL from the origin server (8) log the response object (10) send the (unmodified) response back to the client

pa.ag @peakaceag 37 Cloudflare Workers Playground: cloudflareworkers.com Test-drive Cloudflare Workers;
create/edit and see the results live:

pa.ag @peakaceag 38 Cloudflare Workers Playground: cloudflareworkers.com Test-drive Cloudflare Workers;
create/edit and see the results live:

pa.ag @peakaceag 39 Let's build our own first Worker /
custom handleRequest:

pa.ag @peakaceag 40 Let‘s test-drive this on the Workers Playground:

Let's live-deploy your first Worker! Enough testing…

pa.ag @peakaceag 42 Let‘s live-deploy the Worker to Cloudflare's edge
servers Select your domain > Workers > Manage Workers

pa.ag @peakaceag 43 Here’s how to add a Worker You‘ll
be redirected from the “all Workers“ overview to the following mask: Give your Worker a unique name Copy & paste the Workers code you just tested on the Playground

pa.ag @peakaceag 44 Confirm deployment and assign routing Go back
to > Workers > Add route 1 2

pa.ag @peakaceag 45 Comparison: left (original), right (Worker-enabled) Double-check live!
Also, don‘t fall victim to caching, use “Disable Cache“ (see: Network tab) in Chrome Dev Tools to be sure you‘re seeing the latest version: vs

Let’s have some SEO fun then? So much for theory…

Please understand that all scripts / source codes are meant
as examples only. Ensure you know what you’re doing when using them in a production environment! Warning: (maybe) not production-ready!

301 / 302 / bulk redirects & proxy passthrough 1.
Redirects

pa.ag @peakaceag 49 Redirects on the edge using the Response
API To execute any type of HTTP redirect, we need to use the Response Runtime API which – conveniently – also provides a static method called “redirect()”: Source: https://pa.ag/3gvXYoL let response = new Response(body, options) return Response.redirect(destination, status) or just:

pa.ag @peakaceag 50 The Cloudflare Workers Docs is a solid
starting point: More: https://pa.ag/3gNd8Gn

pa.ag @peakaceag 51 Different types of implementations at a glance
(#18): 302 redirect, (#22): 301 redirect, (#26): a reverse proxy call and (#31-36): multiple redirects, selecting a single destination from a map based on a URL parameter:

pa.ag @peakaceag 52 A quick overview to see how things
are working… Source: https://httpstatus.io Correct, in fact this is not a redirect ID is not configured in redirectMap

pa.ag @peakaceag 53 To “reverse proxy” a request, you can
use the Fetch API It provides an interface for (asynchronously) fetching resources via HTTP requests inside of a Worker script: Source: https://pa.ag/3wpS3YT const response = await fetch(URL, options) Asynchronous tasks, such as fetch, are not executed at the top level in a Worker script and must be executed within a FetchEvent handler.

pa.ag @peakaceag 54 return await fetch(“https://example.com ”) Easily “migrate” a
blog hosted on a sub-domain to a sub-folder on your main domain – without actually moving it. Great tutorial: https://pa.ag/2Tw7LD8 Content shown from example.com Request sent from bastiangrimm.dev

pa.ag @peakaceag 55 Verifying that this all happens “on the
edge“: Zoom into any of the response headers for an originally requested URL such as bastiangrimm.dev/redirects/302:

Safeguards, monitoring, serving a default file, etc. 2. robots.txt

pa.ag @peakaceag 57 Which version would you like to wake
up to? Preventing “SEO heart attacks“ using a Worker to monitor and safeguard your robots.txt file is one of many use-cases that are super easy to do: This is how I uploaded the robots.txt file to my test server This is what the Worker running in the background changed the output to vs

pa.ag @peakaceag 58 Preventing a global “Disallow: /“ in robots.txt
(#5-6): define defaults, (#15-16): if robots.txt returns 200, read its content, (#19-24): replace if global disallow exists, (#27-29): return default “allow all” if file doesn’t exist

So, let’s do some “dynamic serving“ shall we? Static files
become dynamic

pa.ag @peakaceag 60 For demonstration purposes only: UA-based delivery (#10):
get User-Agent, (#16-17): add dynamic Sitemap-link if UA contains “googlebot”

pa.ag @peakaceag 61 vs Live-test & compare robots.txt using technicalseo.com
Left screen shows bastiangrimm.dev/robots.txt being requested using a Googlebot User-Agent string, right screen is the default output: Free testing tool: https://technicalseo.com/tools/robots-txt/ Or use…

Some systems cause endless headaches for SEOs – routing them
through Cloudflare and using a Worker works very well! Easily overwrite files which are “not meant” to be changed?

Modifying robots HTML directives on the fly 3. Robots meta
tags

pa.ag @peakaceag 64 Say hello to the HTMLRewriter class! The
HTMLRewriter allows you to build comprehensive and expressive HTML parsers inside of a Cloudflare Workers application: Source: https://pa.ag/2RTpqEt new HTMLRewriter() .on("*", new ElementHandler()) .onDocument(new DocumentHandler())

pa.ag @peakaceag 65 Let's give it a try and work
with <head> and <meta> first (#24-25): pass tags to ElementHandler, (#9-11): if it’s <meta name=“robots”>, set it to “index,nofollow”, (#14-16): if it’s <head>, add another directive for bingbot:

pa.ag @peakaceag 66 I mean, this should be clear –
but just in case: Verifying presence of Worker-modified robots meta directives via GSC

pa.ag @peakaceag 67 If you want to work with/on every
HTML element… This selector would pass every HTML element to your ElementHandler. By using element.tagName, you could then identify which element has been passed along: return new HTMLRewriter() .on("*", new ElementHandler()) .transform(response)

Of course, updating, changing or entirely replacing both elements is
also possible! 4. Title and meta description

pa.ag @peakaceag 69 Using element selectors in HTMLRewriter Often, you
want only to process very specific elements, e.g. <meta> tags – but not all of them. Maybe it’s just the meta description you care about? new HTMLRewriter() .on('meta[name="description"]', new ElementHandler()) .transform(response) More on selectors: https://pa.ag/35xw073

pa.ag @peakaceag 70 Updating or replacing titles and descriptions is
easy… (#10): forced <title> overwrite, (#14-22): conditional changes to the meta description Element selectors are super powerful yet easy to use:

Maybe you missed some during your last migration? 5. Rewriting
links

pa.ag @peakaceag 72 HTMLRewriter listening to <a> and <img> tags
(#29-30): Passing href/src attributes to a class which (#20): replaces oldURL with newUrl and (#16-18): ensures https-availability Based on: https://pa.ag/35llTSo

Tell Google about localised versions of your page 6. Deploying
hreflang

pa.ag @peakaceag 74 HTTP hreflang annotations on the edge We‘ve
just had plenty of HTML, so let‘s use HTTP headers instead – of course both ways work just fine though:

pa.ag @peakaceag 75 Verify, e.g. using Chrome Developer Console: Network
> %URL (disable cache) > Headers > Response Headers

pa.ag @peakaceag 76 Before you ask: X-Robots directives are also
possible… … and the same is true for X-Robots Rel-Canonical annotations:

You can serve “proper“ under maintenance pages directly from your
Cloudflare Worker 7. Serving HTTP 503s

pa.ag @peakaceag 78 Combining HTTP 503 error with a Retry-After
header Retry-After indicates how long the UA should wait before making a follow-up request: The server is currently unable to handle the request due to a temporary overloading or maintenance of the server […]. If known, the length of the delay MAY be indicated in a Retry-After header.

Just in case I’ve somehow not made my point yet
– you can do REALLY cool stuff and have control over the full HTML response – so adding content is easy. 8. Injecting content

pa.ag @peakaceag 80 Replacing, prepending, appending … whatever you like?

pa.ag @peakaceag 81 You could also (dynamically) read from an
external feed Feeding in content from other sources is simple; below shows reading a JSON feed, parsing the input and inject to the <h1> of the target page:

Yeah… actually this is how it all started; and still
it‘s (one of) the most powerful tools to use for it! 9. Web performance

pa.ag @peakaceag 83 Add native lazy loading for images to
your HMTL mark-up Keep in mind: you don‘t want to lazy load all of your images (e.g. not the hero image); also, if you‘re using iframes, you might want to pass “iframe“ to the HTMLRewriter:

pa.ag @peakaceag 84 Cleaning up HTML code for performance reasons
E.g. by removing unwanted pre*-stages, or by adding async/defer to JS calls: More clean-up Worker scripts: https://gist.github.com/Nooshu

pa.ag @peakaceag 85 A detailed guide on how to cache
HTML with CF Workers More: https://pa.ag/3xk8rdt

You can use Workers to fix broken tracking, allow for
better accessibility, and much more. And tons of other things…

Downsides, risks – and more… We need to talk responsibility

pa.ag @peakaceag 88 [This] dates back to the time of
the French Revolution At least, if you believe Wikipedia that is… Source: https://pa.ag/35nQSx6 With great power comes great responsibility.

You could essentially change everything you wanted.

pa.ag @peakaceag 90 Great summary over at ContentKing, well worth
a read: What are the downsides […] What risks are involved? Source: https://pa.ag/3xhYUUk

10 million requests are included, every 1 million currently costs
$0.50 extra - not crazy expensive, but in larger-scale setups certainly means additional costs. Risk of costs

This might interfere with current processes, or at the very
least, ensure Workers become part of a standardised process (e.g. deployment). PCI compliance

The underlying codebase might do/require something that could accidentally be
overwritten on the edge Potential conflict in code

Additional modifications on the edge could result in massive debugging.
Again: proper documentation and processes are crucial! Potential to introduce frontend bugs

Always synchronise your activities with relevant stakeholders!

… there is also evil Where there is good…

pa.ag @peakaceag 97 Yep, you can do evil things with
Workers for sure: Source: https://pa.ag/3cFq0Nq

pa.ag @peakaceag 98 Dynamically creating links to “Baccarat Sites” “[…]
at the CF Workers management area, there was a suspicious Worker listed called hang. It had been set to run on any URL route requests to the website:” Source: https://pa.ag/3cFq0Nq After further investigation [by sucuri], it was found that the website was actually loading SEO spam content through Cloudflare’s Workers service. This service allows someone to load external third-party JavaScript that’s not on their website’s hosting server.

pa.ag @peakaceag 99 The suspicious “hang” Worker injection in detail:
Source: https://pa.ag/3cFq0Nq ▪ The JavaScript Worker first checks for the HTTP request’s user-agent and identifies whether it contains Google/Googlebot or naver within the string text. ▪ If the user-agent string contains either of these keywords, then the JavaScript makes a request to the malicious domain naverbot[.]live to generate the SEO spam links to be injected into the victim’s website. ▪ After this step, the Worker then injects the retrieved SEO spam link data right before the final </body> tag on the infected website’s HTML source. ▪ The malicious JavaScript can also be triggered if the user-agent matches a crawler that is entirely separate from Googlebot: naver.

pa.ag @peakaceag 100 If you‘re now wondering how to distribute
Workers…? Source: https://pa.ag/3zq0Mwd

Use at least two-factor authentication with Cloudflare While you‘re at
it:

Care for the slides? www.pa.ag twitter.com/peakaceag facebook.com/peakaceag Take your career
to the next level: jobs.pa.ag [email protected] Bastian Grimm [email protected]

Serverless SEO / Edge SEO - SEOkomm 2021

Serverless SEO / Edge SEO - SEOkomm 2021

More Decks by Bastian Grimm

Other Decks in Marketing & SEO

Featured

Transcript