Serverless SEO - SMX Advanced Europe 2021

Five tech SEO hacks you never knew existed Bastian Grimm,
Peak Ace AG | @basgr

I couldn’t choose just five … First of all: I
lied!

Instead, I’m going to help you.

To prove to management that they need your SEO program
I’m going to help you

To demonstrate SEO ROI with simple “feature test” deployments I
am going to help you

To fix legacy systems without having to beg for development
resources I am going to help you

To easily build a proof-of-concept rollout I am going to
help you

But how? Sound good? Great!

Or as some people call it: “edge SEO“. Ever heard
of it? Serverless SEO

Using Workers to overcome challenges and limitations with popular CMS
and ecommerce platforms.

Wait… what? Because: in 2021, you won‘t need server access
for SEO anymore!

Before we talk about Workers, we need to talk HTTP
requests – and CDNs: Establishing a common ground

pa.ag @peakaceag 13 A (very) simplified request lifecycle Your computer
Your browser Database server (in most cases) DNS server e.g. to translate domain<>IP Web server aka “origin server”

If you’re not familiar with the term CDN: “A content
delivery network (CDN) is a globally distributed network of servers deployed in multiple data centers around the globe.” Let's introduce a CDN to the mix

pa.ag @peakaceag 15 Using a CDN, all requests will pass
through “edge servers“ When we ignore DNS, databases etc for a minute, this is what it would look like: First request, ever. peakace.js is not cached on edge server yet Origin server Request: peakace.js Request: peakace.js peakace.js delivered from origin server Response: peakace.js peakace.js gets cached on edge server

pa.ag @peakaceag 16 Using a CDN, all requests will pass
through “edge servers“ When we ignore DNS, databases etc for a minute, this is what it would look like: Origin server Request: peakace.js peakace.js delivered from edge server peakace.js is cached on edge server Second request (independent of user)

pa.ag @peakaceag 17 Especially for global businesses, CDNs can be
a great help Use CDNPerf.com to find the one that suits you best, depending on where you are and which regions/countries you serve most. This will positively impact TTFB! Give it a try: https://www.cdnperf.com/ vs

pa.ag @peakaceag 18 CDNs at a glance Some of the
most popular CDN providers out there

pa.ag @peakaceag 19 Back in Sep. 2017, Cloudflare introduced their
“Workers“ Which ultimately became publicly available in March 2018: Source: https://blog.cloudflare.com/introducing-cloudflare-workers/

Workers use the V8 JavaScript engine built by Google and
run globally on Cloudflare's edge servers. A typical Worker script executes in <1ms – that’s fast! So… what‘s a Worker?

… using the latest standard language features You can execute
any JavaScript…

…directly from your Worker, or forward them elsewhere Respond to
requests…

Also, you can do multiple requests, in series or parallel
and combine the results Send requests to 3rd-party servers

Intercept and modify HTTP request and response URLs, status, headers,
and body content. Seriously though, this is WILD!

Some of the potential use-cases could be to:

301s, 302s – or even geo-specific ones, if needed Implement
redirects

Adding/changing X-robots or even hreflang annotations Modify HTTP headers

Overwrite the full file, add or remove single directives Modify
robots.txt

Inject, change or even remove robots meta-annotations Modify meta directives

Create unique page titles or meta descriptions when needed Update
page titles & descriptions

… or schema.org mark-up Implement hreflang

Essentially, you can do almost anything – because you have
full access to the request and response objects! Inject/remove (body) content

pa.ag @peakaceag 33 However, does this only work with Cloudflare?
Similar implementations are also available with other CDN providers: Lambda@Edge Compute@Edge Edge Workers Cloudflare Workers

pa.ag @peakaceag 34 But today it‘s all about Cloudflare, because:
The top 3 providers (CF, AWS, Akamai) have 89% of all customers; Cloudflare alone is used by 81% of all sites that rely on a CDN (according to W3Techs): Source: https://pa.ag/2U9kvAh

A practical and hands-on guide to setting up and running
Cloudflare Workers for your SEO Excited? Let‘s go!

pa.ag @peakaceag 36 Go create your own (free) account over
at cloudflare.com Once your account is activated, you can add your first site/domain: Add your domain name - it can be registered anywhere (as long as you can change the DNS at your current provider)

pa.ag @peakaceag 37 To play with it, the free account
+ $0 plan is sufficient: This is good enough for testing things out…!

pa.ag @peakaceag 38 Next, you‘ll get to see the current
DNS configuration Yours should look a little like this: at least two records, one for the root-domain, one for the www sub-domain, both pointing to the IP address of your hosting provider: On to the next screen!

pa.ag @peakaceag 39 Now, CF will show you which nameservers
to use instead: Nameservers with the current provider, in my case nsX.inwx.de My new nameservers with Cloudflare to be used instead

pa.ag @peakaceag 40 Switching existing nameservers over to Cloudflare At
my hosting provider, it looks like this: My new nameservers Cloudflare told me to use instead (see prev. screen)

pa.ag @peakaceag 41 Switch back to tell them you’re all
set: Nameservers with the current provider, in my case nsX.inwx.de My new nameservers with Cloudflare, to be used instead

pa.ag @peakaceag 42 Cloudflare is going to email you when
things are ready: Beware, this can take up to 24hrs depending on the registrars and nameservers: Your CF dashboard should look like this after the successful NS change.

pa.ag @peakaceag 43 Speaking of nameservers – are you already
using 1.1.1.1 ? Cloudflare runs the fastest DNS resolver available. Why wouldn‘t you use it? More: https://pa.ag/3zueHRX

pa.ag @peakaceag 44 Really impatient? Purge cache (e.g. A records)
on 1.1.1.1

pa.ag @peakaceag 45 Can‘t wait – or just want to
check DNS records? Free tool recommendation: MxToolbox > DNS Lookup Source: https://pa.ag/3vuBObV

So far, so good – let‘s talk Workers now.

pa.ag @peakaceag 47 A Worker, in its simplest form: This
function defines triggers for a Worker script to execute. In this case, we intercept the request and send a (custom) response. Our custom response is defined here, for now we simply: (6) log the request object (7) fetch the requested URL from the origin server (8) log the response object (10) send the (unmodified) response back to the client

pa.ag @peakaceag 48 Cloudflare Workers Playground: cloudflareworkers.com Test-drive Cloudflare Workers;
create/edit and see the results live:

pa.ag @peakaceag 49 Cloudflare Workers Playground: cloudflareworkers.com Test-drive Cloudflare Workers;
create/edit and see the results live:

pa.ag @peakaceag 50 Let's build our own first Worker /
custom handleRequest:

pa.ag @peakaceag 51 Let‘s test-drive this on the Workers Playground:

Let's live-deploy your first Worker! Enough testing…

pa.ag @peakaceag 53 Let‘s live-deploy the Worker to Cloudflare's edge
servers Select your domain > Workers > Manage Workers

pa.ag @peakaceag 54 Here’s how to add a Worker You‘ll
be redirected from the “all Workers“ overview to the following mask: Give your Worker a unique name Copy & paste the Workers code you just tested on the Playground

pa.ag @peakaceag 55 Confirm deployment and assign routing Go back
to > Workers > Add route 1 2

pa.ag @peakaceag 56 Comparison: left (original), right (Worker-enabled) Double-check live!
Also, don‘t fall victim to caching, use “Disable Cache“ (see: Network tab) in Chrome Dev Tools to be sure you‘re seeing the latest version: vs

Let’s have some SEO fun then? So much for theory…

Please understand that all scripts / source codes are meant
as examples only. Ensure you know what you’re doing when using them in a production environment! Warning: (maybe) not production-ready!

301 / 302 / bulk redirects & proxy passthrough 1.
Redirects

pa.ag @peakaceag 60 Redirects on the edge using the Response
API To execute any type of HTTP redirect, we need to use the Response Runtime API which – conveniently – also provides a static method called “redirect()”: Source: https://pa.ag/3gvXYoL let response = new Response(body, options) return Response.redirect(destination, status) or just:

pa.ag @peakaceag 61 The Cloudflare Workers Docs is a solid
starting point: More: https://pa.ag/3gNd8Gn

pa.ag @peakaceag 62 Different types of implementations at a glance
(#18): 302 redirect, (#22): 301 redirect, (#26): a reverse proxy call and (#31-36): multiple redirects, selecting a single destination from a map based on a URL parameter:

pa.ag @peakaceag 63 A quick overview to see how things
are working… Source: https://httpstatus.io Correct, in fact this is not a redirect ID is not configured in redirectMap

pa.ag @peakaceag 64 To “reverse proxy” a request, you can
use the Fetch API It provides an interface for (asynchronously) fetching resources via HTTP requests inside of a Worker script: Source: https://pa.ag/3wpS3YT const response = await fetch(URL, options) Asynchronous tasks, such as fetch, are not executed at the top level in a Worker script and must be executed within a FetchEvent handler.

pa.ag @peakaceag 65 return await fetch(“https://example.com ”) Easily “migrate” a
blog hosted on a sub-domain to a sub-folder on your main domain – without actually moving it. Great tutorial: https://pa.ag/2Tw7LD8 Content shown from example.com Request sent from bastiangrimm.dev

pa.ag @peakaceag 66 Verifying that this all happens “on the
edge“: Zoom into any of the response headers for an originally requested URL such as bastiangrimm.dev/redirects/302:

Safeguards, monitoring, serving a default file, etc. 2. robots.txt

pa.ag @peakaceag 68 Which version would you like to wake
up to? Preventing “SEO heart attacks“ using a Worker to monitor and safeguard your robots.txt file is one of many use-cases that are super easy to do: This is how I uploaded the robots.txt file to my test server This is what the Worker running in the background changed the output to vs

pa.ag @peakaceag 69 Preventing a global “Disallow: /“ in robots.txt
(#5-6): define defaults, (#15-16): if robots.txt returns 200, read its content, (#19-24): replace if global disallow exists, (#27-29): return default “allow all” if file doesn’t exist

So, let’s do some “dynamic serving“ shall we? Static files
become dynamic

pa.ag @peakaceag 71 For demonstration purposes only: UA-based delivery (#10):
get User-Agent, (#16-17): add dynamic Sitemap-link if UA contains “googlebot”

pa.ag @peakaceag 72 vs Live-test & compare robots.txt using technicalseo.com
Left screen shows bastiangrimm.dev/robots.txt being requested using a Googlebot User-Agent string, right screen is the default output: Free testing tool: https://technicalseo.com/tools/robots-txt/ Or use…

Some systems cause endless headaches for SEOs – routing them
through Cloudflare and using a Worker works very well! Easily overwrite files which are “not meant” to be changed?

Modifying robots HTML directives on the fly 3. Robots meta
tags

pa.ag @peakaceag 75 Say hello to the HTMLRewriter class! The
HTMLRewriter allows you to build comprehensive and expressive HTML parsers inside of a Cloudflare Workers application: Source: https://pa.ag/2RTpqEt new HTMLRewriter() .on("*", new ElementHandler()) .onDocument(new DocumentHandler())

pa.ag @peakaceag 76 Let's give it a try and work
with <head> and <meta> first (#24-25): pass tags to ElementHandler, (#9-11): if it’s <meta name=“robots”>, set it to “index,nofollow”, (#14-16): if it’s <head>, add another directive for bingbot:

pa.ag @peakaceag 77 I mean, this should be clear –
but just in case: Verifying presence of Worker-modified robots meta directives via GSC

pa.ag @peakaceag 78 If you want to work with/on every
HTML element… This selector would pass every HTML element to your ElementHandler. By using element.tagName, you could then identify which element has been passed along: return new HTMLRewriter() .on("*", new ElementHandler()) .transform(response)

Of course, updating, changing or entirely replacing both elements is
also possible! 4. Title and meta description

pa.ag @peakaceag 80 Using element selectors in HTMLRewriter Often, you
want only to process very specific elements, e.g. <meta> tags – but not all of them. Maybe it’s just the meta description you care about? new HTMLRewriter() .on('meta[name="description"]', new ElementHandler()) .transform(response) More on selectors: https://pa.ag/35xw073

pa.ag @peakaceag 81 Updating or replacing titles and descriptions is
easy… (#10): forced <title> overwrite, (#14-22): conditional changes to the meta description Element selectors are super powerful yet easy to use:

Maybe you missed some during your last migration? 5. Rewriting
links

pa.ag @peakaceag 83 Maybe you should have listened to me
in the first place!? Check out my presentation over at SlideShare: Slides: http://pa.ag/migration_search_y

pa.ag @peakaceag 84 HTMLRewriter listening to <a> and <img> tags
(#29-30): Passing href/src attributes to a class which (#20): replaces oldURL with newUrl and (#16-18): ensures https-availability Based on: https://pa.ag/35llTSo

Tell Google about localised versions of your page 6. Deploying
hreflang

pa.ag @peakaceag 86 HTTP hreflang annotations on the edge We‘ve
just had plenty of HTML, so let‘s use HTTP headers instead – of course both ways work just fine though:

pa.ag @peakaceag 87 Verify, e.g. using Chrome Developer Console: Network
> %URL (disable cache) > Headers > Response Headers

pa.ag @peakaceag 88 Before you ask: X-Robots directives are also
possible… … and the same is true for X-Robots Rel-Canonical annotations:

You can serve “proper“ under maintenance pages directly from your
Cloudflare Worker 7. Serving HTTP 503s

pa.ag @peakaceag 90 Combining HTTP 503 error with a Retry-After
header Retry-After indicates how long the UA should wait before making a follow-up request: The server is currently unable to handle the request due to a temporary overloading or maintenance of the server […]. If known, the length of the delay MAY be indicated in a Retry-After header.

Just in case I’ve somehow not made my point yet
– you can do REALLY cool stuff and have control over the full HTML response – so adding content is easy. 8. Injecting content

pa.ag @peakaceag 92 Replacing, prepending, appending … whatever you like?

pa.ag @peakaceag 93 You could also (dynamically) read from an
external feed Feeding in content from other sources is simple; below shows reading a JSON feed, parsing the input and inject to the <h1> of the target page:

One of the key challenges when using CDNs: logfiles are
literally everywhere – and a lot of requests don‘t even make it to the origin server… 9. Collecting logfiles

pa.ag @peakaceag 95 Cloudflare provides extensive possibilities for logfiles What
I really love about this: direct integration with Google Cloud products! Note: you need the Enterprise plan for this. More: https://pa.ag/3gnj8GF

pa.ag @peakaceag 96 Peak Ace log file auditing stack. Interested?
> [email protected] Log files are stored in Google Cloud Storage, processed in Dataprep, exported to BigQuery and visualised in Data Studio via BigQuery Connector. 8 Google Data Studio Data transmission Display data Import / API Google Dataprep 6 7 Google BigQuery 1 Log files GSC API v3 GA API v4 GA GSC 2 3 6 5 Google Apps Script API 4

pa.ag @peakaceag 97 New to logfile auditing? No worries, I
got you covered: Check out my presentation over at SlideShare: Slides: http://pa.ag/slides

Yeah… actually this is how it all started; and still
it‘s (one of) the most powerful tools to use for it! 10. Web performance

pa.ag @peakaceag 99 Add native lazy loading for images to
your HMTL mark-up Keep in mind: you don‘t want to lazy load all of your images (e.g. not the hero image); also, if you‘re using iframes, you might want to pass “iframe“ to the HTMLRewriter:

pa.ag @peakaceag 100 Cleaning up HTML code for performance reasons
E.g. by removing unwanted pre*-stages, or by adding async/defer to JS calls: More clean-up Worker scripts: https://gist.github.com/Nooshu

pa.ag @peakaceag 101 A detailed guide on how to cache
HTML with CF Workers More: https://pa.ag/3xk8rdt

You can use Workers to fix broken tracking, allow for
better accessibility, and much more. And tons of other things…

Some stuff to make your (Worker) life just a bit
easier… Tool recommendations

pa.ag @peakaceag 104 Sloth: an advanced CF Worker Code Generator
& CMS A very handy (and free) UI to manage Workers for changing robots.txt, titles & descriptions, redirects, hreflang, and much more: Check it out: https://sloth.cloud

pa.ag @peakaceag 105 Tool recommendation: Lil Redirector “Lil Redirector works
by persisting and querying redirects inside of Workers KV, and includes an administrator UI for creating, modifying, and deleting redirects.” More: https://pa.ag/3q3EZGx

pa.ag @peakaceag 106 Workers KV – wait, what? Source: https://pa.ag/3vmTiXB
Workers KV is a global, low-latency, key- value data store. It supports exceptionally high read volumes […] Workers KV is generally good for use-cases where you need to write relatively infrequently, but read quickly and frequently. It is optimised for these high-read applications.

pa.ag @peakaceag 107 Web Scraper based on Cloudflare Workers “Web
Scraper makes it effortless to scrape websites. You provide a URL & CSS selector, and it will return you JSON containing the text contents of the matching elements.” More: https://pa.ag/3woCv7T

pa.ag @peakaceag 108 Technically not a tool, but a very
comprehensive guide: More: https://pa.ag/3xnWDqy

Downsides, risks – and more… We need to talk responsibility

pa.ag @peakaceag 110 [This] dates back to the time of
the French Revolution At least, if you believe Wikipedia that is… Source: https://pa.ag/35nQSx6 With great power comes great responsibility.

You could essentially change everything you wanted.

pa.ag @peakaceag 112 Great summary over at ContentKing, well worth
a read: What are the downsides […] What risks are involved? Source: https://pa.ag/3xhYUUk

10 million requests are included, every 1 million currently costs
$0.50 extra - not crazy expensive, but in larger-scale setups certainly means additional costs. Risk of costs

This might interfere with current processes, or at the very
least, ensure Workers become part of a standardised process (e.g. deployment). PCI compliance

The underlying codebase might do/require something that could accidentally be
overwritten on the edge Potential conflict in code

Additional modifications on the edge could result in massive debugging.
Again: proper documentation and processes are crucial! Potential to introduce frontend bugs

Always synchronise your activities with relevant stakeholders!

… there is also evil Where there is good…

pa.ag @peakaceag 119 Yep, you can do evil things with
Workers for sure: Source: https://pa.ag/3cFq0Nq

pa.ag @peakaceag 120 Dynamically creating links to “Baccarat Sites” “[…]
at the CF Workers management area, there was a suspicious Worker listed called hang. It had been set to run on any URL route requests to the website:” Source: https://pa.ag/3cFq0Nq After further investigation [by sucuri], it was found that the website was actually loading SEO spam content through Cloudflare’s Workers service. This service allows someone to load external third-party JavaScript that’s not on their website’s hosting server.

pa.ag @peakaceag 121 The suspicious “hang” Worker injection in detail:
Source: https://pa.ag/3cFq0Nq ▪ The JavaScript Worker first checks for the HTTP request’s user-agent and identifies whether it contains Google/Googlebot or naver within the string text. ▪ If the user-agent string contains either of these keywords, then the JavaScript makes a request to the malicious domain naverbot[.]live to generate the SEO spam links to be injected into the victim’s website. ▪ After this step, the Worker then injects the retrieved SEO spam link data right before the final </body> tag on the infected website’s HTML source. ▪ The malicious JavaScript can also be triggered if the user-agent matches a crawler that is entirely separate from Googlebot: naver.

pa.ag @peakaceag 122 If you‘re now wondering how to distribute
Workers…? Source: https://pa.ag/3zq0Mwd

Use at least two-factor authentication with Cloudflare While you‘re at
it:

Care for the slides? www.pa.ag twitter.com/peakaceag facebook.com/peakaceag Take your career
to the next level: jobs.pa.ag [email protected] Bastian Grimm [email protected]

Serverless SEO - SMX Advanced Europe 2021

Serverless SEO - SMX Advanced Europe 2021

More Decks by Bastian Grimm

Other Decks in Technology

Featured

Transcript