Serverless SEO / Edge SEO - SEOkomm 2021

Serverless SEO / Edge SEO - SEOkomm 2021

A hands-on guide on how to use "Edge SEO" to drive your SEO program forward, including a detailed explanation on how to setup, run and create Cloudflare Workers for SEO tasks. Presented at SEOkomm 2021 in Salzburg, Austria.

More Decks by Bastian Grimm

Other Decks in Marketing & SEO

Transcript

  1. Bastian Grimm, Peak Ace AG | @basgr Serverless SEO a.k.a.

    “Edge SEO” - ein Hands-on-Guide.
  2. Before we talk about Workers, we need to talk HTTP

    requests – and CDNs: Establishing a common ground
  3. pa.ag @peakaceag 6 A (very) simplified request lifecycle Your computer

    Your browser Database server (in most cases) DNS server e.g. to translate domain<>IP Web server aka “origin server”
  4. If you’re not familiar with the term CDN: “A content

    delivery network (CDN) is a globally distributed network of servers deployed in multiple data centers around the globe.” Let's introduce a CDN to the mix
  5. pa.ag @peakaceag 8 Using a CDN, all requests will pass

    through “edge servers“ When we ignore DNS, databases etc for a minute, this is what it would look like: First request, ever. peakace.js is not cached on edge server yet Origin server Request: peakace.js Request: peakace.js peakace.js delivered from origin server Response: peakace.js peakace.js gets cached on edge server
  6. pa.ag @peakaceag 9 Using a CDN, all requests will pass

    through “edge servers“ When we ignore DNS, databases etc for a minute, this is what it would look like: Origin server Request: peakace.js peakace.js delivered from edge server peakace.js is cached on edge server Second request (independent of user)
  7. pa.ag @peakaceag 10 CDNs at a glance Some of the

    most popular CDN providers out there
  8. pa.ag @peakaceag 11 Back in Sep. 2017, Cloudflare introduced their

    “Workers“ Which ultimately became publicly available in March 2018: Source: https://blog.cloudflare.com/introducing-cloudflare-workers/
  9. Workers use the V8 JavaScript engine built by Google and

    run globally on Cloudflare's edge servers. A typical Worker script executes in <1ms – that’s fast! So… what‘s a Worker?
  10. Also, you can do multiple requests, in series or parallel

    and combine the results Send requests to 3rd-party servers
  11. Intercept and modify HTTP request and response URLs, status, headers,

    and body content. Seriously though, this is WILD!
  12. Essentially, you can do almost anything – because you have

    full access to the request and response objects! Inject/remove (body) content
  13. pa.ag @peakaceag 25 However, does this only work with Cloudflare?

    Similar implementations are also available with other CDN providers: Lambda@Edge Compute@Edge Edge Workers Cloudflare Workers
  14. pa.ag @peakaceag 26 But today it‘s all about Cloudflare, because:

    The top 3 providers (CF, AWS, Akamai) have 89% of all customers; Cloudflare alone is used by 81% of all sites that rely on a CDN (according to W3Techs): Source: https://pa.ag/2U9kvAh
  15. A practical and hands-on guide to setting up and running

    Cloudflare Workers for your SEO Excited? Let‘s go!
  16. pa.ag @peakaceag 28 Go create your own (free) account over

    at cloudflare.com Once your account is activated, you can add your first site/domain: Add your domain name - it can be registered anywhere (as long as you can change the DNS at your current provider)
  17. pa.ag @peakaceag 29 To play with it, the free account

    + $0 plan is sufficient: This is good enough for testing things out…!
  18. pa.ag @peakaceag 30 Next, you‘ll get to see the current

    DNS configuration Yours should look a little like this: at least two records, one for the root-domain, one for the www sub-domain, both pointing to the IP address of your hosting provider: On to the next screen!
  19. pa.ag @peakaceag 31 Now, CF will show you which nameservers

    to use instead: Nameservers with the current provider, in my case nsX.inwx.de My new nameservers with Cloudflare to be used instead
  20. pa.ag @peakaceag 32 Switching existing nameservers over to Cloudflare At

    my hosting provider, it looks like this: My new nameservers Cloudflare told me to use instead (see prev. screen)
  21. pa.ag @peakaceag 33 Switch back to tell them you’re all

    set: Nameservers with the current provider, in my case nsX.inwx.de My new nameservers with Cloudflare, to be used instead
  22. pa.ag @peakaceag 34 Cloudflare is going to email you when

    things are ready: Beware, this can take up to 24hrs depending on the registrars and nameservers: Your CF dashboard should look like this after the successful NS change.
  23. pa.ag @peakaceag 36 A Worker, in its simplest form: This

    function defines triggers for a Worker script to execute. In this case, we intercept the request and send a (custom) response. Our custom response is defined here, for now we simply: (6) log the request object (7) fetch the requested URL from the origin server (8) log the response object (10) send the (unmodified) response back to the client
  24. pa.ag @peakaceag 42 Let‘s live-deploy the Worker to Cloudflare's edge

    servers Select your domain > Workers > Manage Workers
  25. pa.ag @peakaceag 43 Here’s how to add a Worker You‘ll

    be redirected from the “all Workers“ overview to the following mask: Give your Worker a unique name Copy & paste the Workers code you just tested on the Playground
  26. pa.ag @peakaceag 45 Comparison: left (original), right (Worker-enabled) Double-check live!

    Also, don‘t fall victim to caching, use “Disable Cache“ (see: Network tab) in Chrome Dev Tools to be sure you‘re seeing the latest version: vs
  27. Please understand that all scripts / source codes are meant

    as examples only. Ensure you know what you’re doing when using them in a production environment! Warning: (maybe) not production-ready!
  28. pa.ag @peakaceag 49 Redirects on the edge using the Response

    API To execute any type of HTTP redirect, we need to use the Response Runtime API which – conveniently – also provides a static method called “redirect()”: Source: https://pa.ag/3gvXYoL let response = new Response(body, options) return Response.redirect(destination, status) or just:
  29. pa.ag @peakaceag 50 The Cloudflare Workers Docs is a solid

    starting point: More: https://pa.ag/3gNd8Gn
  30. pa.ag @peakaceag 51 Different types of implementations at a glance

    (#18): 302 redirect, (#22): 301 redirect, (#26): a reverse proxy call and (#31-36): multiple redirects, selecting a single destination from a map based on a URL parameter:
  31. pa.ag @peakaceag 52 A quick overview to see how things

    are working… Source: https://httpstatus.io Correct, in fact this is not a redirect ID is not configured in redirectMap
  32. pa.ag @peakaceag 53 To “reverse proxy” a request, you can

    use the Fetch API It provides an interface for (asynchronously) fetching resources via HTTP requests inside of a Worker script: Source: https://pa.ag/3wpS3YT const response = await fetch(URL, options) Asynchronous tasks, such as fetch, are not executed at the top level in a Worker script and must be executed within a FetchEvent handler.
  33. pa.ag @peakaceag 54 return await fetch(“https://example.com ”) Easily “migrate” a

    blog hosted on a sub-domain to a sub-folder on your main domain – without actually moving it. Great tutorial: https://pa.ag/2Tw7LD8 Content shown from example.com Request sent from bastiangrimm.dev
  34. pa.ag @peakaceag 55 Verifying that this all happens “on the

    edge“: Zoom into any of the response headers for an originally requested URL such as bastiangrimm.dev/redirects/302:
  35. pa.ag @peakaceag 57 Which version would you like to wake

    up to? Preventing “SEO heart attacks“ using a Worker to monitor and safeguard your robots.txt file is one of many use-cases that are super easy to do: This is how I uploaded the robots.txt file to my test server This is what the Worker running in the background changed the output to vs
  36. pa.ag @peakaceag 58 Preventing a global “Disallow: /“ in robots.txt

    (#5-6): define defaults, (#15-16): if robots.txt returns 200, read its content, (#19-24): replace if global disallow exists, (#27-29): return default “allow all” if file doesn’t exist
  37. pa.ag @peakaceag 60 For demonstration purposes only: UA-based delivery (#10):

    get User-Agent, (#16-17): add dynamic Sitemap-link if UA contains “googlebot”
  38. pa.ag @peakaceag 61 vs Live-test & compare robots.txt using technicalseo.com

    Left screen shows bastiangrimm.dev/robots.txt being requested using a Googlebot User-Agent string, right screen is the default output: Free testing tool: https://technicalseo.com/tools/robots-txt/ Or use…
  39. Some systems cause endless headaches for SEOs – routing them

    through Cloudflare and using a Worker works very well! Easily overwrite files which are “not meant” to be changed?
  40. pa.ag @peakaceag 64 Say hello to the HTMLRewriter class! The

    HTMLRewriter allows you to build comprehensive and expressive HTML parsers inside of a Cloudflare Workers application: Source: https://pa.ag/2RTpqEt new HTMLRewriter() .on("*", new ElementHandler()) .onDocument(new DocumentHandler())
  41. pa.ag @peakaceag 65 Let's give it a try and work

    with <head> and <meta> first (#24-25): pass tags to ElementHandler, (#9-11): if it’s <meta name=“robots”>, set it to “index,nofollow”, (#14-16): if it’s <head>, add another directive for bingbot:
  42. pa.ag @peakaceag 66 I mean, this should be clear –

    but just in case: Verifying presence of Worker-modified robots meta directives via GSC
  43. pa.ag @peakaceag 67 If you want to work with/on every

    HTML element… This selector would pass every HTML element to your ElementHandler. By using element.tagName, you could then identify which element has been passed along: return new HTMLRewriter() .on("*", new ElementHandler()) .transform(response)
  44. Of course, updating, changing or entirely replacing both elements is

    also possible! 4. Title and meta description
  45. pa.ag @peakaceag 69 Using element selectors in HTMLRewriter Often, you

    want only to process very specific elements, e.g. <meta> tags – but not all of them. Maybe it’s just the meta description you care about? new HTMLRewriter() .on('meta[name="description"]', new ElementHandler()) .transform(response) More on selectors: https://pa.ag/35xw073
  46. pa.ag @peakaceag 70 Updating or replacing titles and descriptions is

    easy… (#10): forced <title> overwrite, (#14-22): conditional changes to the meta description Element selectors are super powerful yet easy to use:
  47. pa.ag @peakaceag 72 HTMLRewriter listening to <a> and <img> tags

    (#29-30): Passing href/src attributes to a class which (#20): replaces oldURL with newUrl and (#16-18): ensures https-availability Based on: https://pa.ag/35llTSo
  48. pa.ag @peakaceag 74 HTTP hreflang annotations on the edge We‘ve

    just had plenty of HTML, so let‘s use HTTP headers instead – of course both ways work just fine though:
  49. pa.ag @peakaceag 75 Verify, e.g. using Chrome Developer Console: Network

    > %URL (disable cache) > Headers > Response Headers
  50. pa.ag @peakaceag 76 Before you ask: X-Robots directives are also

    possible… … and the same is true for X-Robots Rel-Canonical annotations:
  51. pa.ag @peakaceag 78 Combining HTTP 503 error with a Retry-After

    header Retry-After indicates how long the UA should wait before making a follow-up request: The server is currently unable to handle the request due to a temporary overloading or maintenance of the server […]. If known, the length of the delay MAY be indicated in a Retry-After header.
  52. Just in case I’ve somehow not made my point yet

    – you can do REALLY cool stuff and have control over the full HTML response – so adding content is easy. 8. Injecting content
  53. pa.ag @peakaceag 81 You could also (dynamically) read from an

    external feed Feeding in content from other sources is simple; below shows reading a JSON feed, parsing the input and inject to the <h1> of the target page:
  54. Yeah… actually this is how it all started; and still

    it‘s (one of) the most powerful tools to use for it! 9. Web performance
  55. pa.ag @peakaceag 83 Add native lazy loading for images to

    your HMTL mark-up Keep in mind: you don‘t want to lazy load all of your images (e.g. not the hero image); also, if you‘re using iframes, you might want to pass “iframe“ to the HTMLRewriter:
  56. pa.ag @peakaceag 84 Cleaning up HTML code for performance reasons

    E.g. by removing unwanted pre*-stages, or by adding async/defer to JS calls: More clean-up Worker scripts: https://gist.github.com/Nooshu
  57. pa.ag @peakaceag 85 A detailed guide on how to cache

    HTML with CF Workers More: https://pa.ag/3xk8rdt
  58. You can use Workers to fix broken tracking, allow for

    better accessibility, and much more. And tons of other things…
  59. pa.ag @peakaceag 88 [This] dates back to the time of

    the French Revolution At least, if you believe Wikipedia that is… Source: https://pa.ag/35nQSx6 With great power comes great responsibility.
  60. pa.ag @peakaceag 90 Great summary over at ContentKing, well worth

    a read: What are the downsides […] What risks are involved? Source: https://pa.ag/3xhYUUk
  61. 10 million requests are included, every 1 million currently costs

    $0.50 extra - not crazy expensive, but in larger-scale setups certainly means additional costs. Risk of costs
  62. This might interfere with current processes, or at the very

    least, ensure Workers become part of a standardised process (e.g. deployment). PCI compliance
  63. The underlying codebase might do/require something that could accidentally be

    overwritten on the edge Potential conflict in code
  64. Additional modifications on the edge could result in massive debugging.

    Again: proper documentation and processes are crucial! Potential to introduce frontend bugs
  65. pa.ag @peakaceag 97 Yep, you can do evil things with

    Workers for sure: Source: https://pa.ag/3cFq0Nq
  66. pa.ag @peakaceag 98 Dynamically creating links to “Baccarat Sites” “[…]

    at the CF Workers management area, there was a suspicious Worker listed called hang. It had been set to run on any URL route requests to the website:” Source: https://pa.ag/3cFq0Nq After further investigation [by sucuri], it was found that the website was actually loading SEO spam content through Cloudflare’s Workers service. This service allows someone to load external third-party JavaScript that’s not on their website’s hosting server.
  67. pa.ag @peakaceag 99 The suspicious “hang” Worker injection in detail:

    Source: https://pa.ag/3cFq0Nq ▪ The JavaScript Worker first checks for the HTTP request’s user-agent and identifies whether it contains Google/Googlebot or naver within the string text. ▪ If the user-agent string contains either of these keywords, then the JavaScript makes a request to the malicious domain naverbot[.]live to generate the SEO spam links to be injected into the victim’s website. ▪ After this step, the Worker then injects the retrieved SEO spam link data right before the final </body> tag on the infected website’s HTML source. ▪ The malicious JavaScript can also be triggered if the user-agent matches a crawler that is entirely separate from Googlebot: naver.
  68. pa.ag @peakaceag 100 If you‘re now wondering how to distribute

    Workers…? Source: https://pa.ag/3zq0Mwd