Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Serverless SEO / Edge SEO - SEOkomm 2021

Serverless SEO / Edge SEO - SEOkomm 2021

A hands-on guide on how to use "Edge SEO" to drive your SEO program forward, including a detailed explanation on how to setup, run and create Cloudflare Workers for SEO tasks. Presented at SEOkomm 2021 in Salzburg, Austria.

More Decks by Bastian Grimm

Other Decks in Marketing & SEO

Transcript

  1. Bastian Grimm, Peak Ace AG | @basgr
    Serverless SEO
    a.k.a. “Edge SEO” - ein Hands-on-Guide.

    View Slide

  2. Cloudflare Worker

    View Slide

  3. In 2022 braucht ihr für SEO nicht mehr unbedingt
    einen Serverzugang!
    Access not needed!

    View Slide

  4. Daher: Folien auf Englisch …
    Ich bin nur Ersatz!

    View Slide

  5. Before we talk about Workers, we need to talk HTTP
    requests – and CDNs:
    Establishing a common ground

    View Slide

  6. pa.ag
    @peakaceag
    6
    A (very) simplified request lifecycle
    Your computer Your browser Database server
    (in most cases)
    DNS server
    e.g. to translate
    domain<>IP
    Web server
    aka “origin server”

    View Slide

  7. If you’re not familiar with the term CDN:
    “A content delivery network (CDN) is a globally distributed
    network of servers deployed in multiple data centers
    around the globe.”
    Let's introduce a CDN to the mix

    View Slide

  8. pa.ag
    @peakaceag
    8
    Using a CDN, all requests will pass through “edge servers“
    When we ignore DNS, databases etc for a minute, this is what it would look like:
    First request, ever.
    peakace.js is not cached
    on edge server yet
    Origin server
    Request: peakace.js Request: peakace.js
    peakace.js delivered
    from origin server
    Response: peakace.js
    peakace.js gets cached
    on edge server

    View Slide

  9. pa.ag
    @peakaceag
    9
    Using a CDN, all requests will pass through “edge servers“
    When we ignore DNS, databases etc for a minute, this is what it would look like:
    Origin server
    Request: peakace.js
    peakace.js delivered
    from edge server
    peakace.js is cached
    on edge server
    Second request
    (independent of user)

    View Slide

  10. pa.ag
    @peakaceag
    10
    CDNs at a glance
    Some of the most popular CDN providers out there

    View Slide

  11. pa.ag
    @peakaceag
    11
    Back in Sep. 2017, Cloudflare introduced their “Workers“
    Which ultimately became publicly available in March 2018:
    Source: https://blog.cloudflare.com/introducing-cloudflare-workers/

    View Slide

  12. Workers use the V8 JavaScript engine built by Google
    and run globally on Cloudflare's edge servers.
    A typical Worker script executes in <1ms – that’s fast!
    So… what‘s a Worker?

    View Slide

  13. … using the latest standard language features
    You can execute any JavaScript…

    View Slide

  14. …directly from your Worker, or forward them elsewhere
    Respond to requests…

    View Slide

  15. Also, you can do multiple requests, in series or parallel
    and combine the results
    Send requests to 3rd-party servers

    View Slide

  16. Intercept and modify HTTP request and response
    URLs, status, headers, and body content.
    Seriously though, this is WILD!

    View Slide

  17. Some of the potential use-cases
    could be to:

    View Slide

  18. 301s, 302s – or even geo-specific ones, if needed
    Implement redirects

    View Slide

  19. Adding/changing X-robots or even hreflang annotations
    Modify HTTP headers

    View Slide

  20. Overwrite the full file, add or remove single directives
    Modify robots.txt

    View Slide

  21. Inject, change or even remove robots meta-annotations
    Modify meta directives

    View Slide

  22. Create unique page titles or
    meta descriptions when needed
    Update page titles & descriptions

    View Slide

  23. … or schema.org mark-up
    Implement hreflang

    View Slide

  24. Essentially, you can do almost anything – because you
    have full access to the request and response objects!
    Inject/remove (body) content

    View Slide

  25. pa.ag
    @peakaceag
    25
    However, does this only work with Cloudflare?
    Similar implementations are also available with other CDN providers:
    [email protected] [email protected] Edge Workers Cloudflare Workers

    View Slide

  26. pa.ag
    @peakaceag
    26
    But today it‘s all about Cloudflare, because:
    The top 3 providers (CF, AWS, Akamai) have 89% of all customers; Cloudflare alone is
    used by 81% of all sites that rely on a CDN (according to W3Techs):
    Source: https://pa.ag/2U9kvAh

    View Slide

  27. A practical and hands-on guide to setting up and running
    Cloudflare Workers for your SEO
    Excited? Let‘s go!

    View Slide

  28. pa.ag
    @peakaceag
    28
    Go create your own (free) account over at cloudflare.com
    Once your account is activated, you can add your first site/domain:
    Add your domain name - it can
    be registered anywhere (as
    long as you can change the
    DNS at your current provider)

    View Slide

  29. pa.ag
    @peakaceag
    29
    To play with it, the free account + $0 plan is sufficient:
    This is good enough
    for testing things out…!

    View Slide

  30. pa.ag
    @peakaceag
    30
    Next, you‘ll get to see the current DNS configuration
    Yours should look a little like this: at least two records, one for the root-domain,
    one for the www sub-domain, both pointing to the IP address of your hosting provider:
    On to the next screen!

    View Slide

  31. pa.ag
    @peakaceag
    31
    Now, CF will show you which nameservers to use instead:
    Nameservers with the
    current provider, in my
    case nsX.inwx.de
    My new nameservers
    with Cloudflare to be
    used instead

    View Slide

  32. pa.ag
    @peakaceag
    32
    Switching existing nameservers over to Cloudflare
    At my hosting provider, it looks like this:
    My new nameservers
    Cloudflare told me to use
    instead (see prev. screen)

    View Slide

  33. pa.ag
    @peakaceag
    33
    Switch back to tell them you’re all set:
    Nameservers with the
    current provider, in my
    case nsX.inwx.de
    My new nameservers
    with Cloudflare, to be
    used instead

    View Slide

  34. pa.ag
    @peakaceag
    34
    Cloudflare is going to email you when things are ready:
    Beware, this can take up to 24hrs depending on the registrars and nameservers:
    Your CF dashboard
    should look like this after
    the successful NS change.

    View Slide

  35. So far, so good – let‘s talk
    Workers now.

    View Slide

  36. pa.ag
    @peakaceag
    36
    A Worker, in its simplest form:
    This function defines triggers for a
    Worker script to execute. In this
    case, we intercept the request and
    send a (custom) response.
    Our custom response is defined
    here, for now we simply:
    (6) log the request object
    (7) fetch the requested URL from
    the origin server
    (8) log the response object
    (10) send the (unmodified) response
    back to the client

    View Slide

  37. pa.ag
    @peakaceag
    37
    Cloudflare Workers Playground: cloudflareworkers.com
    Test-drive Cloudflare Workers; create/edit and see the results live:

    View Slide

  38. pa.ag
    @peakaceag
    38
    Cloudflare Workers Playground: cloudflareworkers.com
    Test-drive Cloudflare Workers; create/edit and see the results live:

    View Slide

  39. pa.ag
    @peakaceag
    39
    Let's build our own first Worker / custom handleRequest:

    View Slide

  40. pa.ag
    @peakaceag
    40
    Let‘s test-drive this on the Workers Playground:

    View Slide

  41. Let's live-deploy your first Worker!
    Enough testing…

    View Slide

  42. pa.ag
    @peakaceag
    42
    Let‘s live-deploy the Worker to Cloudflare's edge servers
    Select your domain > Workers > Manage Workers

    View Slide

  43. pa.ag
    @peakaceag
    43
    Here’s how to add a Worker
    You‘ll be redirected from the “all Workers“ overview to the following mask:
    Give your Worker a unique name
    Copy & paste the Workers code
    you just tested on the Playground

    View Slide

  44. pa.ag
    @peakaceag
    44
    Confirm deployment and assign routing
    Go back to > Workers > Add route
    1 2

    View Slide

  45. pa.ag
    @peakaceag
    45
    Comparison: left (original), right (Worker-enabled)
    Double-check live! Also, don‘t fall victim to caching, use “Disable Cache“ (see: Network
    tab) in Chrome Dev Tools to be sure you‘re seeing the latest version:
    vs

    View Slide

  46. Let’s have some SEO fun then?
    So much for theory…

    View Slide

  47. Please understand that all scripts / source codes are meant as
    examples only. Ensure you know what you’re doing when using
    them in a production environment!
    Warning: (maybe) not production-ready!

    View Slide

  48. 301 / 302 / bulk redirects & proxy passthrough
    1. Redirects

    View Slide

  49. pa.ag
    @peakaceag
    49
    Redirects on the edge using the Response API
    To execute any type of HTTP redirect, we need to use the Response Runtime API which
    – conveniently – also provides a static method called “redirect()”:
    Source: https://pa.ag/3gvXYoL
    let response = new Response(body, options)
    return Response.redirect(destination, status)
    or just:

    View Slide

  50. pa.ag
    @peakaceag
    50
    The Cloudflare Workers Docs is a solid starting point:
    More: https://pa.ag/3gNd8Gn

    View Slide

  51. pa.ag
    @peakaceag
    51
    Different types of implementations at a glance
    (#18): 302 redirect, (#22): 301 redirect, (#26): a reverse proxy call and (#31-36): multiple
    redirects, selecting a single destination from a map based on a URL parameter:

    View Slide

  52. pa.ag
    @peakaceag
    52
    A quick overview to see how things are working…
    Source: https://httpstatus.io
    Correct, in fact this
    is not a redirect
    ID is not configured
    in redirectMap

    View Slide

  53. pa.ag
    @peakaceag
    53
    To “reverse proxy” a request, you can use the Fetch API
    It provides an interface for (asynchronously) fetching resources via HTTP requests inside
    of a Worker script:
    Source: https://pa.ag/3wpS3YT
    const response = await fetch(URL, options)
    Asynchronous tasks, such as fetch, are not executed
    at the top level in a Worker script and must be
    executed within a FetchEvent handler.

    View Slide

  54. pa.ag
    @peakaceag
    54
    return await fetch(“https://example.com ”)
    Easily “migrate” a blog hosted on a sub-domain to a sub-folder on your main domain –
    without actually moving it.
    Great tutorial: https://pa.ag/2Tw7LD8
    Content shown from
    example.com
    Request sent from
    bastiangrimm.dev

    View Slide

  55. pa.ag
    @peakaceag
    55
    Verifying that this all happens “on the edge“:
    Zoom into any of the response headers for an originally requested URL such as
    bastiangrimm.dev/redirects/302:

    View Slide

  56. Safeguards, monitoring, serving a default file, etc.
    2. robots.txt

    View Slide

  57. pa.ag
    @peakaceag
    57
    Which version would you like to wake up to?
    Preventing “SEO heart attacks“ using a Worker to monitor and safeguard your robots.txt
    file is one of many use-cases that are super easy to do:
    This is how I uploaded the robots.txt
    file to my test server
    This is what the Worker running in the
    background changed the output to
    vs

    View Slide

  58. pa.ag
    @peakaceag
    58
    Preventing a global “Disallow: /“ in robots.txt
    (#5-6): define defaults, (#15-16): if robots.txt returns 200, read its content, (#19-24):
    replace if global disallow exists, (#27-29): return default “allow all” if file doesn’t exist

    View Slide

  59. So, let’s do some “dynamic serving“ shall we?
    Static files become dynamic

    View Slide

  60. pa.ag
    @peakaceag
    60
    For demonstration purposes only: UA-based delivery
    (#10): get User-Agent, (#16-17): add dynamic Sitemap-link if UA contains “googlebot”

    View Slide

  61. pa.ag
    @peakaceag
    61
    vs
    Live-test & compare robots.txt using technicalseo.com
    Left screen shows bastiangrimm.dev/robots.txt being requested using a Googlebot
    User-Agent string, right screen is the default output:
    Free testing tool: https://technicalseo.com/tools/robots-txt/
    Or use…

    View Slide

  62. Some systems cause endless headaches for SEOs –
    routing them through Cloudflare and using a Worker
    works very well!
    Easily overwrite files which are
    “not meant” to be changed?

    View Slide

  63. Modifying robots HTML directives on the fly
    3. Robots meta tags

    View Slide

  64. pa.ag
    @peakaceag
    64
    Say hello to the HTMLRewriter class!
    The HTMLRewriter allows you to build comprehensive and expressive HTML parsers
    inside of a Cloudflare Workers application:
    Source: https://pa.ag/2RTpqEt
    new HTMLRewriter()
    .on("*", new ElementHandler())
    .onDocument(new DocumentHandler())

    View Slide

  65. pa.ag
    @peakaceag
    65
    Let's give it a try and work with and first
    (#24-25): pass tags to ElementHandler, (#9-11): if it’s , set it to
    “index,nofollow”, (#14-16): if it’s , add another directive for bingbot:

    View Slide

  66. pa.ag
    @peakaceag
    66
    I mean, this should be clear – but just in case:
    Verifying presence of
    Worker-modified robots
    meta directives via GSC

    View Slide

  67. pa.ag
    @peakaceag
    67
    If you want to work with/on every HTML element…
    This selector would pass every HTML element to your ElementHandler. By using
    element.tagName, you could then identify which element has been passed along:
    return new HTMLRewriter()
    .on("*", new ElementHandler())
    .transform(response)

    View Slide

  68. Of course, updating, changing or entirely replacing both
    elements is also possible!
    4. Title and meta description

    View Slide

  69. pa.ag
    @peakaceag
    69
    Using element selectors in HTMLRewriter
    Often, you want only to process very specific elements, e.g. tags – but not all
    of them. Maybe it’s just the meta description you care about?
    new HTMLRewriter()
    .on('meta[name="description"]', new ElementHandler())
    .transform(response)
    More on selectors: https://pa.ag/35xw073

    View Slide

  70. pa.ag
    @peakaceag
    70
    Updating or replacing titles and descriptions is easy…
    (#10): forced overwrite, (#14-22): conditional changes to the meta description
    Element selectors are super
    powerful yet easy to use:

    View Slide

  71. Maybe you missed some during your last migration?
    5. Rewriting links

    View Slide

  72. pa.ag
    @peakaceag
    72
    HTMLRewriter listening to and tags
    (#29-30): Passing href/src attributes to a class which (#20): replaces oldURL with newUrl
    and (#16-18): ensures https-availability
    Based on: https://pa.ag/35llTSo

    View Slide

  73. Tell Google about localised versions of your page
    6. Deploying hreflang

    View Slide

  74. pa.ag
    @peakaceag
    74
    HTTP hreflang annotations on the edge
    We‘ve just had plenty of HTML, so let‘s use HTTP headers instead – of course both ways
    work just fine though:

    View Slide

  75. pa.ag
    @peakaceag
    75
    Verify, e.g. using Chrome Developer Console:
    Network > %URL (disable cache) > Headers > Response Headers

    View Slide

  76. pa.ag
    @peakaceag
    76
    Before you ask: X-Robots directives are also possible…
    … and the same is true for X-Robots Rel-Canonical annotations:

    View Slide

  77. You can serve “proper“ under maintenance pages directly
    from your Cloudflare Worker
    7. Serving HTTP 503s

    View Slide

  78. pa.ag
    @peakaceag
    78
    Combining HTTP 503 error with a Retry-After header
    Retry-After indicates how long the UA should wait before making a follow-up request:
    The server is currently unable
    to handle the request due to
    a temporary overloading or
    maintenance of the server
    […]. If known, the length of
    the delay MAY be indicated in
    a Retry-After header.

    View Slide

  79. Just in case I’ve somehow not made my point yet – you
    can do REALLY cool stuff and have control over the full
    HTML response – so adding content is easy.
    8. Injecting content

    View Slide

  80. pa.ag
    @peakaceag
    80
    Replacing, prepending, appending … whatever you like?

    View Slide

  81. pa.ag
    @peakaceag
    81
    You could also (dynamically) read from an external feed
    Feeding in content from other sources is simple; below shows reading a JSON feed,
    parsing the input and inject to the of the target page:

    View Slide

  82. Yeah… actually this is how it all started; and still it‘s
    (one of) the most powerful tools to use for it!
    9. Web performance

    View Slide

  83. pa.ag
    @peakaceag
    83
    Add native lazy loading for images to your HMTL mark-up
    Keep in mind: you don‘t want to lazy load all of your images (e.g. not the hero image);
    also, if you‘re using iframes, you might want to pass “iframe“ to the HTMLRewriter:

    View Slide

  84. pa.ag
    @peakaceag
    84
    Cleaning up HTML code for performance reasons
    E.g. by removing unwanted pre*-stages, or by adding async/defer to JS calls:
    More clean-up Worker scripts: https://gist.github.com/Nooshu

    View Slide

  85. pa.ag
    @peakaceag
    85
    A detailed guide on how to cache HTML with CF Workers
    More: https://pa.ag/3xk8rdt

    View Slide

  86. You can use Workers to fix broken tracking, allow for
    better accessibility, and much more.
    And tons of other things…

    View Slide

  87. Downsides, risks – and more…
    We need to talk responsibility

    View Slide

  88. pa.ag
    @peakaceag
    88
    [This] dates back to the time of the French Revolution
    At least, if you believe Wikipedia that is…
    Source: https://pa.ag/35nQSx6
    With great power comes great
    responsibility.

    View Slide

  89. You could essentially change
    everything you wanted.

    View Slide

  90. pa.ag
    @peakaceag
    90
    Great summary over at ContentKing, well worth a read:
    What are the downsides […] What risks are involved?
    Source: https://pa.ag/3xhYUUk

    View Slide

  91. 10 million requests are included, every 1 million currently
    costs $0.50 extra - not crazy expensive, but in larger-scale
    setups certainly means additional costs.
    Risk of costs

    View Slide

  92. This might interfere with current processes, or at the very
    least, ensure Workers become part of a standardised
    process (e.g. deployment).
    PCI compliance

    View Slide

  93. The underlying codebase might do/require something
    that could accidentally be overwritten on the edge
    Potential conflict in code

    View Slide

  94. Additional modifications on the edge could result in
    massive debugging. Again: proper documentation and
    processes are crucial!
    Potential to introduce frontend bugs

    View Slide

  95. Always synchronise your activities
    with relevant stakeholders!

    View Slide

  96. … there is also evil
    Where there is good…

    View Slide

  97. pa.ag
    @peakaceag
    97
    Yep, you can do evil things with Workers for sure:
    Source: https://pa.ag/3cFq0Nq

    View Slide

  98. pa.ag
    @peakaceag
    98
    Dynamically creating links to “Baccarat Sites”
    “[…] at the CF Workers management area, there was a suspicious Worker listed called
    hang. It had been set to run on any URL route requests to the website:”
    Source: https://pa.ag/3cFq0Nq
    After further investigation
    [by sucuri], it was found that
    the website was actually
    loading SEO spam content
    through Cloudflare’s Workers
    service. This service allows
    someone to load external
    third-party JavaScript that’s
    not on their website’s
    hosting server.

    View Slide

  99. pa.ag
    @peakaceag
    99
    The suspicious “hang” Worker injection in detail:
    Source: https://pa.ag/3cFq0Nq
    ▪ The JavaScript Worker first checks for the HTTP
    request’s user-agent and identifies whether it
    contains Google/Googlebot or naver within the string
    text.
    ▪ If the user-agent string contains either of these
    keywords, then the JavaScript makes a request to
    the malicious domain naverbot[.]live to generate
    the SEO spam links to be injected into the victim’s
    website.
    ▪ After this step, the Worker then injects the
    retrieved SEO spam link data right before the final

    View Slide

  100. pa.ag
    @peakaceag
    100
    If you‘re now wondering how to distribute Workers…?
    Source: https://pa.ag/3zq0Mwd

    View Slide

  101. Use at least two-factor authentication with Cloudflare
    While you‘re at it:

    View Slide

  102. Care for the slides? www.pa.ag
    twitter.com/peakaceag
    facebook.com/peakaceag
    Take your career to the next level: jobs.pa.ag
    [email protected]
    Bastian Grimm
    [email protected]

    View Slide