Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Serverless SEO - SMX Advanced Europe 2021

Serverless SEO - SMX Advanced Europe 2021

My talk from SMX Advanced 2021 outlining how to use Cloudflare Workers to overcome challenges and limitations with popular CMS and ecommerce platforms.

Bastian Grimm
PRO

November 07, 2022
Tweet

More Decks by Bastian Grimm

Other Decks in Technology

Transcript

  1. Five tech SEO hacks you
    never knew existed
    Bastian Grimm, Peak Ace AG | @basgr

    View Slide

  2. I couldn’t choose just five …
    First of all: I lied!

    View Slide

  3. Instead, I’m going to help you.

    View Slide

  4. To prove to management that they need
    your SEO program
    I’m going to help you

    View Slide

  5. To demonstrate SEO ROI with simple
    “feature test” deployments
    I am going to help you

    View Slide

  6. To fix legacy systems without having to beg for
    development resources
    I am going to help you

    View Slide

  7. To easily build a proof-of-concept rollout
    I am going to help you

    View Slide

  8. But how?
    Sound good? Great!

    View Slide

  9. Or as some people call it: “edge SEO“.
    Ever heard of it?
    Serverless SEO

    View Slide

  10. Using Workers to overcome challenges
    and limitations with popular CMS and
    ecommerce platforms.

    View Slide

  11. Wait… what?
    Because: in 2021, you won‘t need
    server access for SEO anymore!

    View Slide

  12. Before we talk about Workers, we need to talk HTTP
    requests – and CDNs:
    Establishing a common ground

    View Slide

  13. pa.ag
    @peakaceag
    13
    A (very) simplified request lifecycle
    Your computer Your browser Database server
    (in most cases)
    DNS server
    e.g. to translate
    domain<>IP
    Web server
    aka “origin server”

    View Slide

  14. If you’re not familiar with the term CDN:
    “A content delivery network (CDN) is a globally distributed
    network of servers deployed in multiple data centers
    around the globe.”
    Let's introduce a CDN to the mix

    View Slide

  15. pa.ag
    @peakaceag
    15
    Using a CDN, all requests will pass through “edge servers“
    When we ignore DNS, databases etc for a minute, this is what it would look like:
    First request, ever.
    peakace.js is not cached
    on edge server yet
    Origin server
    Request: peakace.js Request: peakace.js
    peakace.js delivered
    from origin server
    Response: peakace.js
    peakace.js gets cached
    on edge server

    View Slide

  16. pa.ag
    @peakaceag
    16
    Using a CDN, all requests will pass through “edge servers“
    When we ignore DNS, databases etc for a minute, this is what it would look like:
    Origin server
    Request: peakace.js
    peakace.js delivered
    from edge server
    peakace.js is cached
    on edge server
    Second request
    (independent of user)

    View Slide

  17. pa.ag
    @peakaceag
    17
    Especially for global businesses, CDNs can be a great help
    Use CDNPerf.com to find the one that suits you best, depending on where you are and
    which regions/countries you serve most. This will positively impact TTFB!
    Give it a try: https://www.cdnperf.com/
    vs

    View Slide

  18. pa.ag
    @peakaceag
    18
    CDNs at a glance
    Some of the most popular CDN providers out there

    View Slide

  19. pa.ag
    @peakaceag
    19
    Back in Sep. 2017, Cloudflare introduced their “Workers“
    Which ultimately became publicly available in March 2018:
    Source: https://blog.cloudflare.com/introducing-cloudflare-workers/

    View Slide

  20. Workers use the V8 JavaScript engine built by Google
    and run globally on Cloudflare's edge servers.
    A typical Worker script executes in <1ms – that’s fast!
    So… what‘s a Worker?

    View Slide

  21. … using the latest standard language features
    You can execute any JavaScript…

    View Slide

  22. …directly from your Worker, or forward them elsewhere
    Respond to requests…

    View Slide

  23. Also, you can do multiple requests, in series or parallel
    and combine the results
    Send requests to 3rd-party servers

    View Slide

  24. Intercept and modify HTTP request and response
    URLs, status, headers, and body content.
    Seriously though, this is WILD!

    View Slide

  25. Some of the potential use-cases
    could be to:

    View Slide

  26. 301s, 302s – or even geo-specific ones, if needed
    Implement redirects

    View Slide

  27. Adding/changing X-robots or even hreflang annotations
    Modify HTTP headers

    View Slide

  28. Overwrite the full file, add or remove single directives
    Modify robots.txt

    View Slide

  29. Inject, change or even remove robots meta-annotations
    Modify meta directives

    View Slide

  30. Create unique page titles or
    meta descriptions when needed
    Update page titles & descriptions

    View Slide

  31. … or schema.org mark-up
    Implement hreflang

    View Slide

  32. Essentially, you can do almost anything – because you
    have full access to the request and response objects!
    Inject/remove (body) content

    View Slide

  33. pa.ag
    @peakaceag
    33
    However, does this only work with Cloudflare?
    Similar implementations are also available with other CDN providers:
    [email protected] [email protected] Edge Workers Cloudflare Workers

    View Slide

  34. pa.ag
    @peakaceag
    34
    But today it‘s all about Cloudflare, because:
    The top 3 providers (CF, AWS, Akamai) have 89% of all customers; Cloudflare alone is
    used by 81% of all sites that rely on a CDN (according to W3Techs):
    Source: https://pa.ag/2U9kvAh

    View Slide

  35. A practical and hands-on guide to setting up and running
    Cloudflare Workers for your SEO
    Excited? Let‘s go!

    View Slide

  36. pa.ag
    @peakaceag
    36
    Go create your own (free) account over at cloudflare.com
    Once your account is activated, you can add your first site/domain:
    Add your domain name - it can
    be registered anywhere (as
    long as you can change the
    DNS at your current provider)

    View Slide

  37. pa.ag
    @peakaceag
    37
    To play with it, the free account + $0 plan is sufficient:
    This is good enough
    for testing things out…!

    View Slide

  38. pa.ag
    @peakaceag
    38
    Next, you‘ll get to see the current DNS configuration
    Yours should look a little like this: at least two records, one for the root-domain,
    one for the www sub-domain, both pointing to the IP address of your hosting provider:
    On to the next screen!

    View Slide

  39. pa.ag
    @peakaceag
    39
    Now, CF will show you which nameservers to use instead:
    Nameservers with the
    current provider, in my
    case nsX.inwx.de
    My new nameservers
    with Cloudflare to be
    used instead

    View Slide

  40. pa.ag
    @peakaceag
    40
    Switching existing nameservers over to Cloudflare
    At my hosting provider, it looks like this:
    My new nameservers
    Cloudflare told me to use
    instead (see prev. screen)

    View Slide

  41. pa.ag
    @peakaceag
    41
    Switch back to tell them you’re all set:
    Nameservers with the
    current provider, in my
    case nsX.inwx.de
    My new nameservers
    with Cloudflare, to be
    used instead

    View Slide

  42. pa.ag
    @peakaceag
    42
    Cloudflare is going to email you when things are ready:
    Beware, this can take up to 24hrs depending on the registrars and nameservers:
    Your CF dashboard
    should look like this after
    the successful NS change.

    View Slide

  43. pa.ag
    @peakaceag
    43
    Speaking of nameservers – are you already using 1.1.1.1 ?
    Cloudflare runs the fastest DNS resolver available. Why wouldn‘t you use it?
    More: https://pa.ag/3zueHRX

    View Slide

  44. pa.ag
    @peakaceag
    44
    Really impatient? Purge cache (e.g. A records) on 1.1.1.1

    View Slide

  45. pa.ag
    @peakaceag
    45
    Can‘t wait – or just want to check DNS records?
    Free tool recommendation: MxToolbox > DNS Lookup
    Source: https://pa.ag/3vuBObV

    View Slide

  46. So far, so good – let‘s talk
    Workers now.

    View Slide

  47. pa.ag
    @peakaceag
    47
    A Worker, in its simplest form:
    This function defines triggers for a
    Worker script to execute. In this
    case, we intercept the request and
    send a (custom) response.
    Our custom response is defined
    here, for now we simply:
    (6) log the request object
    (7) fetch the requested URL from
    the origin server
    (8) log the response object
    (10) send the (unmodified) response
    back to the client

    View Slide

  48. pa.ag
    @peakaceag
    48
    Cloudflare Workers Playground: cloudflareworkers.com
    Test-drive Cloudflare Workers; create/edit and see the results live:

    View Slide

  49. pa.ag
    @peakaceag
    49
    Cloudflare Workers Playground: cloudflareworkers.com
    Test-drive Cloudflare Workers; create/edit and see the results live:

    View Slide

  50. pa.ag
    @peakaceag
    50
    Let's build our own first Worker / custom handleRequest:

    View Slide

  51. pa.ag
    @peakaceag
    51
    Let‘s test-drive this on the Workers Playground:

    View Slide

  52. Let's live-deploy your first Worker!
    Enough testing…

    View Slide

  53. pa.ag
    @peakaceag
    53
    Let‘s live-deploy the Worker to Cloudflare's edge servers
    Select your domain > Workers > Manage Workers

    View Slide

  54. pa.ag
    @peakaceag
    54
    Here’s how to add a Worker
    You‘ll be redirected from the “all Workers“ overview to the following mask:
    Give your Worker a unique name
    Copy & paste the Workers code
    you just tested on the Playground

    View Slide

  55. pa.ag
    @peakaceag
    55
    Confirm deployment and assign routing
    Go back to > Workers > Add route
    1 2

    View Slide

  56. pa.ag
    @peakaceag
    56
    Comparison: left (original), right (Worker-enabled)
    Double-check live! Also, don‘t fall victim to caching, use “Disable Cache“ (see: Network
    tab) in Chrome Dev Tools to be sure you‘re seeing the latest version:
    vs

    View Slide

  57. Let’s have some SEO fun then?
    So much for theory…

    View Slide

  58. Please understand that all scripts / source codes are meant as
    examples only. Ensure you know what you’re doing when using
    them in a production environment!
    Warning: (maybe) not production-ready!

    View Slide

  59. 301 / 302 / bulk redirects & proxy passthrough
    1. Redirects

    View Slide

  60. pa.ag
    @peakaceag
    60
    Redirects on the edge using the Response API
    To execute any type of HTTP redirect, we need to use the Response Runtime API which
    – conveniently – also provides a static method called “redirect()”:
    Source: https://pa.ag/3gvXYoL
    let response = new Response(body, options)
    return Response.redirect(destination, status)
    or just:

    View Slide

  61. pa.ag
    @peakaceag
    61
    The Cloudflare Workers Docs is a solid starting point:
    More: https://pa.ag/3gNd8Gn

    View Slide

  62. pa.ag
    @peakaceag
    62
    Different types of implementations at a glance
    (#18): 302 redirect, (#22): 301 redirect, (#26): a reverse proxy call and (#31-36): multiple
    redirects, selecting a single destination from a map based on a URL parameter:

    View Slide

  63. pa.ag
    @peakaceag
    63
    A quick overview to see how things are working…
    Source: https://httpstatus.io
    Correct, in fact this
    is not a redirect
    ID is not configured
    in redirectMap

    View Slide

  64. pa.ag
    @peakaceag
    64
    To “reverse proxy” a request, you can use the Fetch API
    It provides an interface for (asynchronously) fetching resources via HTTP requests inside
    of a Worker script:
    Source: https://pa.ag/3wpS3YT
    const response = await fetch(URL, options)
    Asynchronous tasks, such as fetch, are not executed
    at the top level in a Worker script and must be
    executed within a FetchEvent handler.

    View Slide

  65. pa.ag
    @peakaceag
    65
    return await fetch(“https://example.com ”)
    Easily “migrate” a blog hosted on a sub-domain to a sub-folder on your main domain –
    without actually moving it.
    Great tutorial: https://pa.ag/2Tw7LD8
    Content shown from
    example.com
    Request sent from
    bastiangrimm.dev

    View Slide

  66. pa.ag
    @peakaceag
    66
    Verifying that this all happens “on the edge“:
    Zoom into any of the response headers for an originally requested URL such as
    bastiangrimm.dev/redirects/302:

    View Slide

  67. Safeguards, monitoring, serving a default file, etc.
    2. robots.txt

    View Slide

  68. pa.ag
    @peakaceag
    68
    Which version would you like to wake up to?
    Preventing “SEO heart attacks“ using a Worker to monitor and safeguard your robots.txt
    file is one of many use-cases that are super easy to do:
    This is how I uploaded the robots.txt
    file to my test server
    This is what the Worker running in the
    background changed the output to
    vs

    View Slide

  69. pa.ag
    @peakaceag
    69
    Preventing a global “Disallow: /“ in robots.txt
    (#5-6): define defaults, (#15-16): if robots.txt returns 200, read its content, (#19-24):
    replace if global disallow exists, (#27-29): return default “allow all” if file doesn’t exist

    View Slide

  70. So, let’s do some “dynamic serving“ shall we?
    Static files become dynamic

    View Slide

  71. pa.ag
    @peakaceag
    71
    For demonstration purposes only: UA-based delivery
    (#10): get User-Agent, (#16-17): add dynamic Sitemap-link if UA contains “googlebot”

    View Slide

  72. pa.ag
    @peakaceag
    72
    vs
    Live-test & compare robots.txt using technicalseo.com
    Left screen shows bastiangrimm.dev/robots.txt being requested using a Googlebot
    User-Agent string, right screen is the default output:
    Free testing tool: https://technicalseo.com/tools/robots-txt/
    Or use…

    View Slide

  73. Some systems cause endless headaches for SEOs –
    routing them through Cloudflare and using a Worker
    works very well!
    Easily overwrite files which are
    “not meant” to be changed?

    View Slide

  74. Modifying robots HTML directives on the fly
    3. Robots meta tags

    View Slide

  75. pa.ag
    @peakaceag
    75
    Say hello to the HTMLRewriter class!
    The HTMLRewriter allows you to build comprehensive and expressive HTML parsers
    inside of a Cloudflare Workers application:
    Source: https://pa.ag/2RTpqEt
    new HTMLRewriter()
    .on("*", new ElementHandler())
    .onDocument(new DocumentHandler())

    View Slide

  76. pa.ag
    @peakaceag
    76
    Let's give it a try and work with and first
    (#24-25): pass tags to ElementHandler, (#9-11): if it’s , set it to
    “index,nofollow”, (#14-16): if it’s , add another directive for bingbot:

    View Slide

  77. pa.ag
    @peakaceag
    77
    I mean, this should be clear – but just in case:
    Verifying presence of
    Worker-modified robots
    meta directives via GSC

    View Slide

  78. pa.ag
    @peakaceag
    78
    If you want to work with/on every HTML element…
    This selector would pass every HTML element to your ElementHandler. By using
    element.tagName, you could then identify which element has been passed along:
    return new HTMLRewriter()
    .on("*", new ElementHandler())
    .transform(response)

    View Slide

  79. Of course, updating, changing or entirely replacing both
    elements is also possible!
    4. Title and meta description

    View Slide

  80. pa.ag
    @peakaceag
    80
    Using element selectors in HTMLRewriter
    Often, you want only to process very specific elements, e.g. tags – but not all
    of them. Maybe it’s just the meta description you care about?
    new HTMLRewriter()
    .on('meta[name="description"]', new ElementHandler())
    .transform(response)
    More on selectors: https://pa.ag/35xw073

    View Slide

  81. pa.ag
    @peakaceag
    81
    Updating or replacing titles and descriptions is easy…
    (#10): forced overwrite, (#14-22): conditional changes to the meta description
    Element selectors are super
    powerful yet easy to use:

    View Slide

  82. Maybe you missed some during your last migration?
    5. Rewriting links

    View Slide

  83. pa.ag
    @peakaceag
    83
    Maybe you should have listened to me in the first place!?
    Check out my presentation over at SlideShare:
    Slides: http://pa.ag/migration_search_y

    View Slide

  84. pa.ag
    @peakaceag
    84
    HTMLRewriter listening to and tags
    (#29-30): Passing href/src attributes to a class which (#20): replaces oldURL with newUrl
    and (#16-18): ensures https-availability
    Based on: https://pa.ag/35llTSo

    View Slide

  85. Tell Google about localised versions of your page
    6. Deploying hreflang

    View Slide

  86. pa.ag
    @peakaceag
    86
    HTTP hreflang annotations on the edge
    We‘ve just had plenty of HTML, so let‘s use HTTP headers instead – of course both ways
    work just fine though:

    View Slide

  87. pa.ag
    @peakaceag
    87
    Verify, e.g. using Chrome Developer Console:
    Network > %URL (disable cache) > Headers > Response Headers

    View Slide

  88. pa.ag
    @peakaceag
    88
    Before you ask: X-Robots directives are also possible…
    … and the same is true for X-Robots Rel-Canonical annotations:

    View Slide

  89. You can serve “proper“ under maintenance pages directly
    from your Cloudflare Worker
    7. Serving HTTP 503s

    View Slide

  90. pa.ag
    @peakaceag
    90
    Combining HTTP 503 error with a Retry-After header
    Retry-After indicates how long the UA should wait before making a follow-up request:
    The server is currently unable
    to handle the request due to
    a temporary overloading or
    maintenance of the server
    […]. If known, the length of
    the delay MAY be indicated in
    a Retry-After header.

    View Slide

  91. Just in case I’ve somehow not made my point yet – you
    can do REALLY cool stuff and have control over the full
    HTML response – so adding content is easy.
    8. Injecting content

    View Slide

  92. pa.ag
    @peakaceag
    92
    Replacing, prepending, appending … whatever you like?

    View Slide

  93. pa.ag
    @peakaceag
    93
    You could also (dynamically) read from an external feed
    Feeding in content from other sources is simple; below shows reading a JSON feed,
    parsing the input and inject to the of the target page:

    View Slide

  94. One of the key challenges when using CDNs:
    logfiles are literally everywhere – and a lot of requests
    don‘t even make it to the origin server…
    9. Collecting logfiles

    View Slide

  95. pa.ag
    @peakaceag
    95
    Cloudflare provides extensive possibilities for logfiles
    What I really love about this: direct integration with Google Cloud products!
    Note: you need the Enterprise plan for this.
    More: https://pa.ag/3gnj8GF

    View Slide

  96. pa.ag
    @peakaceag
    96
    Peak Ace log file auditing stack. Interested? > [email protected]
    Log files are stored in Google Cloud Storage, processed in Dataprep, exported to
    BigQuery and visualised in Data Studio via BigQuery Connector.
    8
    Google Data
    Studio
    Data
    transmission
    Display
    data
    Import /
    API
    Google Dataprep
    6 7
    Google BigQuery
    1
    Log files
    GSC
    API v3
    GA
    API v4
    GA
    GSC
    2
    3
    6
    5
    Google Apps
    Script
    API
    4

    View Slide

  97. pa.ag
    @peakaceag
    97
    New to logfile auditing? No worries, I got you covered:
    Check out my presentation over at SlideShare:
    Slides: http://pa.ag/slides

    View Slide

  98. Yeah… actually this is how it all started; and still it‘s
    (one of) the most powerful tools to use for it!
    10. Web performance

    View Slide

  99. pa.ag
    @peakaceag
    99
    Add native lazy loading for images to your HMTL mark-up
    Keep in mind: you don‘t want to lazy load all of your images (e.g. not the hero image);
    also, if you‘re using iframes, you might want to pass “iframe“ to the HTMLRewriter:

    View Slide

  100. pa.ag
    @peakaceag
    100
    Cleaning up HTML code for performance reasons
    E.g. by removing unwanted pre*-stages, or by adding async/defer to JS calls:
    More clean-up Worker scripts: https://gist.github.com/Nooshu

    View Slide

  101. pa.ag
    @peakaceag
    101
    A detailed guide on how to cache HTML with CF Workers
    More: https://pa.ag/3xk8rdt

    View Slide

  102. You can use Workers to fix broken tracking, allow for
    better accessibility, and much more.
    And tons of other things…

    View Slide

  103. Some stuff to make your (Worker) life just a bit easier…
    Tool recommendations

    View Slide

  104. pa.ag
    @peakaceag
    104
    Sloth: an advanced CF Worker Code Generator & CMS
    A very handy (and free) UI to manage Workers for changing robots.txt, titles &
    descriptions, redirects, hreflang, and much more:
    Check it out: https://sloth.cloud

    View Slide

  105. pa.ag
    @peakaceag
    105
    Tool recommendation: Lil Redirector
    “Lil Redirector works by persisting and querying redirects inside of Workers KV, and
    includes an administrator UI for creating, modifying, and deleting redirects.”
    More: https://pa.ag/3q3EZGx

    View Slide

  106. pa.ag
    @peakaceag
    106
    Workers KV – wait, what?
    Source: https://pa.ag/3vmTiXB
    Workers KV is a global, low-latency, key-
    value data store. It supports exceptionally
    high read volumes […] Workers KV is
    generally good for use-cases where you need
    to write relatively infrequently, but read
    quickly and frequently. It is optimised for
    these high-read applications.

    View Slide

  107. pa.ag
    @peakaceag
    107
    Web Scraper based on Cloudflare Workers
    “Web Scraper makes it effortless to scrape websites. You provide a URL & CSS selector,
    and it will return you JSON containing the text contents of the matching elements.”
    More: https://pa.ag/3woCv7T

    View Slide

  108. pa.ag
    @peakaceag
    108
    Technically not a tool, but a very comprehensive guide:
    More: https://pa.ag/3xnWDqy

    View Slide

  109. Downsides, risks – and more…
    We need to talk responsibility

    View Slide

  110. pa.ag
    @peakaceag
    110
    [This] dates back to the time of the French Revolution
    At least, if you believe Wikipedia that is…
    Source: https://pa.ag/35nQSx6
    With great power comes great
    responsibility.

    View Slide

  111. You could essentially change
    everything you wanted.

    View Slide

  112. pa.ag
    @peakaceag
    112
    Great summary over at ContentKing, well worth a read:
    What are the downsides […] What risks are involved?
    Source: https://pa.ag/3xhYUUk

    View Slide

  113. 10 million requests are included, every 1 million currently
    costs $0.50 extra - not crazy expensive, but in larger-scale
    setups certainly means additional costs.
    Risk of costs

    View Slide

  114. This might interfere with current processes, or at the very
    least, ensure Workers become part of a standardised
    process (e.g. deployment).
    PCI compliance

    View Slide

  115. The underlying codebase might do/require something
    that could accidentally be overwritten on the edge
    Potential conflict in code

    View Slide

  116. Additional modifications on the edge could result in
    massive debugging. Again: proper documentation and
    processes are crucial!
    Potential to introduce frontend bugs

    View Slide

  117. Always synchronise your activities
    with relevant stakeholders!

    View Slide

  118. … there is also evil
    Where there is good…

    View Slide

  119. pa.ag
    @peakaceag
    119
    Yep, you can do evil things with Workers for sure:
    Source: https://pa.ag/3cFq0Nq

    View Slide

  120. pa.ag
    @peakaceag
    120
    Dynamically creating links to “Baccarat Sites”
    “[…] at the CF Workers management area, there was a suspicious Worker listed called
    hang. It had been set to run on any URL route requests to the website:”
    Source: https://pa.ag/3cFq0Nq
    After further investigation
    [by sucuri], it was found that
    the website was actually
    loading SEO spam content
    through Cloudflare’s Workers
    service. This service allows
    someone to load external
    third-party JavaScript that’s
    not on their website’s
    hosting server.

    View Slide

  121. pa.ag
    @peakaceag
    121
    The suspicious “hang” Worker injection in detail:
    Source: https://pa.ag/3cFq0Nq
    ▪ The JavaScript Worker first checks for the HTTP
    request’s user-agent and identifies whether it
    contains Google/Googlebot or naver within the string
    text.
    ▪ If the user-agent string contains either of these
    keywords, then the JavaScript makes a request to
    the malicious domain naverbot[.]live to generate
    the SEO spam links to be injected into the victim’s
    website.
    ▪ After this step, the Worker then injects the
    retrieved SEO spam link data right before the final

    View Slide

  122. pa.ag
    @peakaceag
    122
    If you‘re now wondering how to distribute Workers…?
    Source: https://pa.ag/3zq0Mwd

    View Slide

  123. Use at least two-factor authentication with Cloudflare
    While you‘re at it:

    View Slide

  124. Care for the slides? www.pa.ag
    twitter.com/peakaceag
    facebook.com/peakaceag
    Take your career to the next level: jobs.pa.ag
    [email protected]
    Bastian Grimm
    [email protected]

    View Slide