Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Serverless SEO / Edge SEO - SEOkomm 2021

Serverless SEO / Edge SEO - SEOkomm 2021

A hands-on guide on how to use "Edge SEO" to drive your SEO program forward, including a detailed explanation on how to setup, run and create Cloudflare Workers for SEO tasks. Presented at SEOkomm 2021 in Salzburg, Austria.

Other Decks in Marketing & SEO

Transcript

  1. Bastian Grimm, Peak Ace AG | @basgr Serverless SEO a.k.a.

    “Edge SEO” - ein Hands-on-Guide.
  2. Cloudflare Worker

  3. In 2022 braucht ihr für SEO nicht mehr unbedingt einen

    Serverzugang! Access not needed!
  4. Daher: Folien auf Englisch … Ich bin nur Ersatz!

  5. Before we talk about Workers, we need to talk HTTP

    requests – and CDNs: Establishing a common ground
  6. pa.ag @peakaceag 6 A (very) simplified request lifecycle Your computer

    Your browser Database server (in most cases) DNS server e.g. to translate domain<>IP Web server aka “origin server”
  7. If you’re not familiar with the term CDN: “A content

    delivery network (CDN) is a globally distributed network of servers deployed in multiple data centers around the globe.” Let's introduce a CDN to the mix
  8. pa.ag @peakaceag 8 Using a CDN, all requests will pass

    through “edge servers“ When we ignore DNS, databases etc for a minute, this is what it would look like: First request, ever. peakace.js is not cached on edge server yet Origin server Request: peakace.js Request: peakace.js peakace.js delivered from origin server Response: peakace.js peakace.js gets cached on edge server
  9. pa.ag @peakaceag 9 Using a CDN, all requests will pass

    through “edge servers“ When we ignore DNS, databases etc for a minute, this is what it would look like: Origin server Request: peakace.js peakace.js delivered from edge server peakace.js is cached on edge server Second request (independent of user)
  10. pa.ag @peakaceag 10 CDNs at a glance Some of the

    most popular CDN providers out there
  11. pa.ag @peakaceag 11 Back in Sep. 2017, Cloudflare introduced their

    “Workers“ Which ultimately became publicly available in March 2018: Source: https://blog.cloudflare.com/introducing-cloudflare-workers/
  12. Workers use the V8 JavaScript engine built by Google and

    run globally on Cloudflare's edge servers. A typical Worker script executes in <1ms – that’s fast! So… what‘s a Worker?
  13. … using the latest standard language features You can execute

    any JavaScript…
  14. …directly from your Worker, or forward them elsewhere Respond to

    requests…
  15. Also, you can do multiple requests, in series or parallel

    and combine the results Send requests to 3rd-party servers
  16. Intercept and modify HTTP request and response URLs, status, headers,

    and body content. Seriously though, this is WILD!
  17. Some of the potential use-cases could be to:

  18. 301s, 302s – or even geo-specific ones, if needed Implement

    redirects
  19. Adding/changing X-robots or even hreflang annotations Modify HTTP headers

  20. Overwrite the full file, add or remove single directives Modify

    robots.txt
  21. Inject, change or even remove robots meta-annotations Modify meta directives

  22. Create unique page titles or meta descriptions when needed Update

    page titles & descriptions
  23. … or schema.org mark-up Implement hreflang

  24. Essentially, you can do almost anything – because you have

    full access to the request and response objects! Inject/remove (body) content
  25. pa.ag @peakaceag 25 However, does this only work with Cloudflare?

    Similar implementations are also available with other CDN providers: Lambda@Edge Compute@Edge Edge Workers Cloudflare Workers
  26. pa.ag @peakaceag 26 But today it‘s all about Cloudflare, because:

    The top 3 providers (CF, AWS, Akamai) have 89% of all customers; Cloudflare alone is used by 81% of all sites that rely on a CDN (according to W3Techs): Source: https://pa.ag/2U9kvAh
  27. A practical and hands-on guide to setting up and running

    Cloudflare Workers for your SEO Excited? Let‘s go!
  28. pa.ag @peakaceag 28 Go create your own (free) account over

    at cloudflare.com Once your account is activated, you can add your first site/domain: Add your domain name - it can be registered anywhere (as long as you can change the DNS at your current provider)
  29. pa.ag @peakaceag 29 To play with it, the free account

    + $0 plan is sufficient: This is good enough for testing things out…!
  30. pa.ag @peakaceag 30 Next, you‘ll get to see the current

    DNS configuration Yours should look a little like this: at least two records, one for the root-domain, one for the www sub-domain, both pointing to the IP address of your hosting provider: On to the next screen!
  31. pa.ag @peakaceag 31 Now, CF will show you which nameservers

    to use instead: Nameservers with the current provider, in my case nsX.inwx.de My new nameservers with Cloudflare to be used instead
  32. pa.ag @peakaceag 32 Switching existing nameservers over to Cloudflare At

    my hosting provider, it looks like this: My new nameservers Cloudflare told me to use instead (see prev. screen)
  33. pa.ag @peakaceag 33 Switch back to tell them you’re all

    set: Nameservers with the current provider, in my case nsX.inwx.de My new nameservers with Cloudflare, to be used instead
  34. pa.ag @peakaceag 34 Cloudflare is going to email you when

    things are ready: Beware, this can take up to 24hrs depending on the registrars and nameservers: Your CF dashboard should look like this after the successful NS change.
  35. So far, so good – let‘s talk Workers now.

  36. pa.ag @peakaceag 36 A Worker, in its simplest form: This

    function defines triggers for a Worker script to execute. In this case, we intercept the request and send a (custom) response. Our custom response is defined here, for now we simply: (6) log the request object (7) fetch the requested URL from the origin server (8) log the response object (10) send the (unmodified) response back to the client
  37. pa.ag @peakaceag 37 Cloudflare Workers Playground: cloudflareworkers.com Test-drive Cloudflare Workers;

    create/edit and see the results live:
  38. pa.ag @peakaceag 38 Cloudflare Workers Playground: cloudflareworkers.com Test-drive Cloudflare Workers;

    create/edit and see the results live:
  39. pa.ag @peakaceag 39 Let's build our own first Worker /

    custom handleRequest:
  40. pa.ag @peakaceag 40 Let‘s test-drive this on the Workers Playground:

  41. Let's live-deploy your first Worker! Enough testing…

  42. pa.ag @peakaceag 42 Let‘s live-deploy the Worker to Cloudflare's edge

    servers Select your domain > Workers > Manage Workers
  43. pa.ag @peakaceag 43 Here’s how to add a Worker You‘ll

    be redirected from the “all Workers“ overview to the following mask: Give your Worker a unique name Copy & paste the Workers code you just tested on the Playground
  44. pa.ag @peakaceag 44 Confirm deployment and assign routing Go back

    to > Workers > Add route 1 2
  45. pa.ag @peakaceag 45 Comparison: left (original), right (Worker-enabled) Double-check live!

    Also, don‘t fall victim to caching, use “Disable Cache“ (see: Network tab) in Chrome Dev Tools to be sure you‘re seeing the latest version: vs
  46. Let’s have some SEO fun then? So much for theory…

  47. Please understand that all scripts / source codes are meant

    as examples only. Ensure you know what you’re doing when using them in a production environment! Warning: (maybe) not production-ready!
  48. 301 / 302 / bulk redirects & proxy passthrough 1.

    Redirects
  49. pa.ag @peakaceag 49 Redirects on the edge using the Response

    API To execute any type of HTTP redirect, we need to use the Response Runtime API which – conveniently – also provides a static method called “redirect()”: Source: https://pa.ag/3gvXYoL let response = new Response(body, options) return Response.redirect(destination, status) or just:
  50. pa.ag @peakaceag 50 The Cloudflare Workers Docs is a solid

    starting point: More: https://pa.ag/3gNd8Gn
  51. pa.ag @peakaceag 51 Different types of implementations at a glance

    (#18): 302 redirect, (#22): 301 redirect, (#26): a reverse proxy call and (#31-36): multiple redirects, selecting a single destination from a map based on a URL parameter:
  52. pa.ag @peakaceag 52 A quick overview to see how things

    are working… Source: https://httpstatus.io Correct, in fact this is not a redirect ID is not configured in redirectMap
  53. pa.ag @peakaceag 53 To “reverse proxy” a request, you can

    use the Fetch API It provides an interface for (asynchronously) fetching resources via HTTP requests inside of a Worker script: Source: https://pa.ag/3wpS3YT const response = await fetch(URL, options) Asynchronous tasks, such as fetch, are not executed at the top level in a Worker script and must be executed within a FetchEvent handler.
  54. pa.ag @peakaceag 54 return await fetch(“https://example.com ”) Easily “migrate” a

    blog hosted on a sub-domain to a sub-folder on your main domain – without actually moving it. Great tutorial: https://pa.ag/2Tw7LD8 Content shown from example.com Request sent from bastiangrimm.dev
  55. pa.ag @peakaceag 55 Verifying that this all happens “on the

    edge“: Zoom into any of the response headers for an originally requested URL such as bastiangrimm.dev/redirects/302:
  56. Safeguards, monitoring, serving a default file, etc. 2. robots.txt

  57. pa.ag @peakaceag 57 Which version would you like to wake

    up to? Preventing “SEO heart attacks“ using a Worker to monitor and safeguard your robots.txt file is one of many use-cases that are super easy to do: This is how I uploaded the robots.txt file to my test server This is what the Worker running in the background changed the output to vs
  58. pa.ag @peakaceag 58 Preventing a global “Disallow: /“ in robots.txt

    (#5-6): define defaults, (#15-16): if robots.txt returns 200, read its content, (#19-24): replace if global disallow exists, (#27-29): return default “allow all” if file doesn’t exist
  59. So, let’s do some “dynamic serving“ shall we? Static files

    become dynamic
  60. pa.ag @peakaceag 60 For demonstration purposes only: UA-based delivery (#10):

    get User-Agent, (#16-17): add dynamic Sitemap-link if UA contains “googlebot”
  61. pa.ag @peakaceag 61 vs Live-test & compare robots.txt using technicalseo.com

    Left screen shows bastiangrimm.dev/robots.txt being requested using a Googlebot User-Agent string, right screen is the default output: Free testing tool: https://technicalseo.com/tools/robots-txt/ Or use…
  62. Some systems cause endless headaches for SEOs – routing them

    through Cloudflare and using a Worker works very well! Easily overwrite files which are “not meant” to be changed?
  63. Modifying robots HTML directives on the fly 3. Robots meta

    tags
  64. pa.ag @peakaceag 64 Say hello to the HTMLRewriter class! The

    HTMLRewriter allows you to build comprehensive and expressive HTML parsers inside of a Cloudflare Workers application: Source: https://pa.ag/2RTpqEt new HTMLRewriter() .on("*", new ElementHandler()) .onDocument(new DocumentHandler())
  65. pa.ag @peakaceag 65 Let's give it a try and work

    with <head> and <meta> first (#24-25): pass tags to ElementHandler, (#9-11): if it’s <meta name=“robots”>, set it to “index,nofollow”, (#14-16): if it’s <head>, add another directive for bingbot:
  66. pa.ag @peakaceag 66 I mean, this should be clear –

    but just in case: Verifying presence of Worker-modified robots meta directives via GSC
  67. pa.ag @peakaceag 67 If you want to work with/on every

    HTML element… This selector would pass every HTML element to your ElementHandler. By using element.tagName, you could then identify which element has been passed along: return new HTMLRewriter() .on("*", new ElementHandler()) .transform(response)
  68. Of course, updating, changing or entirely replacing both elements is

    also possible! 4. Title and meta description
  69. pa.ag @peakaceag 69 Using element selectors in HTMLRewriter Often, you

    want only to process very specific elements, e.g. <meta> tags – but not all of them. Maybe it’s just the meta description you care about? new HTMLRewriter() .on('meta[name="description"]', new ElementHandler()) .transform(response) More on selectors: https://pa.ag/35xw073
  70. pa.ag @peakaceag 70 Updating or replacing titles and descriptions is

    easy… (#10): forced <title> overwrite, (#14-22): conditional changes to the meta description Element selectors are super powerful yet easy to use:
  71. Maybe you missed some during your last migration? 5. Rewriting

    links
  72. pa.ag @peakaceag 72 HTMLRewriter listening to <a> and <img> tags

    (#29-30): Passing href/src attributes to a class which (#20): replaces oldURL with newUrl and (#16-18): ensures https-availability Based on: https://pa.ag/35llTSo
  73. Tell Google about localised versions of your page 6. Deploying

    hreflang
  74. pa.ag @peakaceag 74 HTTP hreflang annotations on the edge We‘ve

    just had plenty of HTML, so let‘s use HTTP headers instead – of course both ways work just fine though:
  75. pa.ag @peakaceag 75 Verify, e.g. using Chrome Developer Console: Network

    > %URL (disable cache) > Headers > Response Headers
  76. pa.ag @peakaceag 76 Before you ask: X-Robots directives are also

    possible… … and the same is true for X-Robots Rel-Canonical annotations:
  77. You can serve “proper“ under maintenance pages directly from your

    Cloudflare Worker 7. Serving HTTP 503s
  78. pa.ag @peakaceag 78 Combining HTTP 503 error with a Retry-After

    header Retry-After indicates how long the UA should wait before making a follow-up request: The server is currently unable to handle the request due to a temporary overloading or maintenance of the server […]. If known, the length of the delay MAY be indicated in a Retry-After header.
  79. Just in case I’ve somehow not made my point yet

    – you can do REALLY cool stuff and have control over the full HTML response – so adding content is easy. 8. Injecting content
  80. pa.ag @peakaceag 80 Replacing, prepending, appending … whatever you like?

  81. pa.ag @peakaceag 81 You could also (dynamically) read from an

    external feed Feeding in content from other sources is simple; below shows reading a JSON feed, parsing the input and inject to the <h1> of the target page:
  82. Yeah… actually this is how it all started; and still

    it‘s (one of) the most powerful tools to use for it! 9. Web performance
  83. pa.ag @peakaceag 83 Add native lazy loading for images to

    your HMTL mark-up Keep in mind: you don‘t want to lazy load all of your images (e.g. not the hero image); also, if you‘re using iframes, you might want to pass “iframe“ to the HTMLRewriter:
  84. pa.ag @peakaceag 84 Cleaning up HTML code for performance reasons

    E.g. by removing unwanted pre*-stages, or by adding async/defer to JS calls: More clean-up Worker scripts: https://gist.github.com/Nooshu
  85. pa.ag @peakaceag 85 A detailed guide on how to cache

    HTML with CF Workers More: https://pa.ag/3xk8rdt
  86. You can use Workers to fix broken tracking, allow for

    better accessibility, and much more. And tons of other things…
  87. Downsides, risks – and more… We need to talk responsibility

  88. pa.ag @peakaceag 88 [This] dates back to the time of

    the French Revolution At least, if you believe Wikipedia that is… Source: https://pa.ag/35nQSx6 With great power comes great responsibility.
  89. You could essentially change everything you wanted.

  90. pa.ag @peakaceag 90 Great summary over at ContentKing, well worth

    a read: What are the downsides […] What risks are involved? Source: https://pa.ag/3xhYUUk
  91. 10 million requests are included, every 1 million currently costs

    $0.50 extra - not crazy expensive, but in larger-scale setups certainly means additional costs. Risk of costs
  92. This might interfere with current processes, or at the very

    least, ensure Workers become part of a standardised process (e.g. deployment). PCI compliance
  93. The underlying codebase might do/require something that could accidentally be

    overwritten on the edge Potential conflict in code
  94. Additional modifications on the edge could result in massive debugging.

    Again: proper documentation and processes are crucial! Potential to introduce frontend bugs
  95. Always synchronise your activities with relevant stakeholders!

  96. … there is also evil Where there is good…

  97. pa.ag @peakaceag 97 Yep, you can do evil things with

    Workers for sure: Source: https://pa.ag/3cFq0Nq
  98. pa.ag @peakaceag 98 Dynamically creating links to “Baccarat Sites” “[…]

    at the CF Workers management area, there was a suspicious Worker listed called hang. It had been set to run on any URL route requests to the website:” Source: https://pa.ag/3cFq0Nq After further investigation [by sucuri], it was found that the website was actually loading SEO spam content through Cloudflare’s Workers service. This service allows someone to load external third-party JavaScript that’s not on their website’s hosting server.
  99. pa.ag @peakaceag 99 The suspicious “hang” Worker injection in detail:

    Source: https://pa.ag/3cFq0Nq ▪ The JavaScript Worker first checks for the HTTP request’s user-agent and identifies whether it contains Google/Googlebot or naver within the string text. ▪ If the user-agent string contains either of these keywords, then the JavaScript makes a request to the malicious domain naverbot[.]live to generate the SEO spam links to be injected into the victim’s website. ▪ After this step, the Worker then injects the retrieved SEO spam link data right before the final </body> tag on the infected website’s HTML source. ▪ The malicious JavaScript can also be triggered if the user-agent matches a crawler that is entirely separate from Googlebot: naver.
  100. pa.ag @peakaceag 100 If you‘re now wondering how to distribute

    Workers…? Source: https://pa.ag/3zq0Mwd
  101. Use at least two-factor authentication with Cloudflare While you‘re at

    it:
  102. Care for the slides? www.pa.ag twitter.com/peakaceag facebook.com/peakaceag Take your career

    to the next level: jobs.pa.ag hi@pa.ag Bastian Grimm bg@pa.ag