Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Save Yourself From A Disaster

Save Yourself From A Disaster

The only certain thing is that it’s not a matter of IF there’ll be a disaster but rather WHEN, so better be not caught off guard. I’ll show and guide you through the details of each step I took to make my websites disaster-proof, while keeping my cloud spending on a tight leash (so you could do this too).

Fabio Cicerchia

May 26, 2021
Tweet

More Decks by Fabio Cicerchia

Other Decks in Technology

Transcript

  1. View Slide

  2. Hello!
    I AM FABIO CICERCHIA
    SW & Cloud Engineer @
    You can find me at: @fabiocicerchia

    View Slide

  3. s://www.ovh.com/world/news/press/cpl1787.fire-our-strasbourg-site

    View Slide

  4. What can we learn from the latest major cloud incident (ie.
    burning OVH datacenter)?
    Do not put all your eggs in one basket!

    View Slide

  5. The only certain thing is that it's not a matter of
    IF there'll be a disaster, but rather WHEN.
    So better be not caught off guard.

    View Slide

  6. https://slate.com/technology/2014/08/shark-attacks-threaten-google-s-undersea-internet-cables-video.html

    View Slide

  7. https://www.reddit.com/r/DataHoarder/comments/bccfl6/forklift_accident/ekqwycj/

    View Slide

  8. https://www.zdnet.com/article/company-shuts-down-because-of-ransomware-leaves-300-without-jobs-just-before-holidays/
    https://www.cybersecurity-insiders.com/ransomware-might-likely-force-travelex-into-bankruptcy/
    https://www.bankinfosecurity.com/hospital-ransomware-attacks-surge-so-now-what-a-8987

    View Slide

  9. https://blog.cloudflare.com/how-verizon-and-a-bgp-optimizer-knocked-large-parts-of-the-internet-offline-today/

    View Slide

  10. https://www.wired.com/story/far-right-extremist-allegedly-plotted-blow-up-amazon-data-centers/

    View Slide

  11. https://www.reddit.com/r/cscareerquestions/comments/6ez8ag/accidentally_destroyed_production_database_on/

    View Slide

  12. https://betterprogramming.pub/how-a-cache-stampede-caused-one-of-facebooks-biggest-outages-dbb964ffc8ed

    View Slide

  13. https://twitter.com/fabiocicerchia/status/1338465077998071809

    View Slide

  14. https://twitter.com/gitlabstatus/status/826591961444384768

    View Slide

  15. So here are the details of each step I took to make my
    website disaster-proof while keeping my cloud spending on
    a tight leash (so you could do this too).
    Shit happens, deal with it!
    Better safe than sorry!

    View Slide

  16. I'm running a bunch of very small websites (with very simple
    infrastructure topology)
    and I wanted to put in practice something on a budget.
    So, I've decided to go multi-cloud.

    View Slide

  17. I've started with an infrastructure that looked like this:
    Evolution

    View Slide

  18. Evolution
    Then, I ended up with something like this:

    View Slide

  19. OUTLINE
    This is the outline plan I followed to upgrade my infrastructure:
    1. Secure the Database
    2. Secure the Storage
    3. Redundancy of Database
    4. Redundancy of Storage
    5. Redundancy of Web Servers
    6. Redundancy of DNS
    7. Billing Impact
    8. Manual Configurations
    9. Disaster Recovery Plan
    10. Play with Providers

    View Slide

  20. DISCLAIMER
    I wrote this ebook: https://leanpub.com/savefromdisaster

    View Slide

  21. SAVE YOURSELF
    FROM A DISASTER #1
    Secure the
    Database

    View Slide

  22. Start doing the DB backups (with mysqldump or xtrabackup) and define a policy for RTO and RPO, so
    you'll know what is the accepted loss (there's always loss - even if very minimal).
    RTO defines how long can the infrastructure can be down, and RPO defines how much data can you
    afford to lose (ie. how old the latest backup is).
    #1: Database - Backups

    View Slide

  23. #1: Database - ROTATION
    To rotate the DB backups we could simply use logrotate.
    We could simply start with a basic daily backup rotation (or any interval you have defined as RPO):
    /var/backups/daily/alldb.sql.gz {
    notifempty
    daily
    rotate 7
    nocompress
    create 640 root adm
    dateext
    dateformat -%Y%m%d-%s
    postrotate
    mysqldump -u$USER -p$PASSWD --single-transaction --all-databases | gzip -9f > /var/backups/daily/alldb.sql.gz
    endscript
    }
    This will create the rotated DB backups on the same server where logrotate is running (most likely
    the same DB instance). We have seen that this is very wrong, so you must always store the backups
    somewhere else (and also offline).

    View Slide

  24. #1: Database - Remote Storage
    With a simple change, we can upload to an AWS S3 bucket (with cold storage access set to rarely-used):
    lastaction
    BUCKET="..."
    REGION="eu-west-1"
    aws s3 sync /var/backups/hourly "s3://$BUCKET/daily/" --region $REGION --exclude "*" --include "*.gz-$FORMAT*"
    --storage-class GLACIER
    endscript

    View Slide

  25. #1: Database - Local Storage
    Just do a rsync (better if scheduled) to download it locally to an external hard-drive:
    rsync -e "ssh -i $HOME/.ssh/id_rsa" --progress -auv @:/var/backups ./path/to/backups
    There you go, you have now backups on-site (for faster restore), remote on another provider (for more
    reliability), offline (for more peace of mind).

    View Slide

  26. #1: Database - Security
    Remember the good practices, and do not forget about GDPR, the backups must be stored encrypted
    at-rest (and use a key instead of a plain password).

    View Slide

  27. #1: Database - Restore
    Once everything is backed up, you need to think about how to restore the dump properly, or at least
    switch the connection to the other node. I'll cover this in the Disaster Recovery Plan post.

    View Slide

  28. SAVE YOURSELF
    FROM A DISASTER #2
    Secure the
    Storage

    View Slide

  29. Let's back up them on an external (cloud) storage disk.
    Why not offline? Because the burden of re-uploading all file stored in a shared folder (which usually are
    not-so-few) will make the restore process very slow.
    #2: Storage - Backups

    View Slide

  30. #2: Storage - Option #1: Remote VM
    Let's use a simple cronjob every hour to sync the whole shared folder to a remote location:
    rsync -auv --progress /path/to/shared/folder :/path/to/shared/folder
    Some provider can offer pluggable storage and it would be perfect to detach it and reattach it to
    another node (only if using the same provider). Alternatively, the VM could be exported and mounted
    as NFS (with some performance degradation).
    By using some cheap storages you could leverage the cost of a cloud-native one. Some providers can
    offer 2TB for ~$10/month, like TransIP or AlphaVPS. If you combine them together you'll end up with a
    slightly higher cost (than using only one) but have definitively greater redundancy.

    View Slide

  31. #2: Storage - Option #2: Cloud-Native Storage
    Still, with a simple cronjob we could sync the whole shared folder to an S3 bucket (using cold storage
    access):
    aws s3 sync --storage-class GLACIER /path/to/shared/folder s3:///
    It is free to send data into AWS S3 but to take it out you need to pay roughly an extra $0.09 per GB, so in
    case you have lots of data, you might want to consider this very carefully: to restore 1TB of data it could
    costs you ~$23/month + ~$90 to restore it.
    A cheaper provider for Cloud-Native Storage is Scaleway with ~0.002€/GB/month (1TB = ~€2.5).
    You need to consider the loss of permissions when saving to AWS S3, so when restoring you need to
    double-check it to verify they are correct.

    View Slide

  32. #2: Storage - Restore
    Once everything is backed up, you need to think about how to restore the data properly, or at least
    switch the access on-the-fly. I'll cover this in the Disaster Recovery Plan post.

    View Slide

  33. SAVE YOURSELF
    FROM A DISASTER #3
    Redundancy of
    Database

    View Slide

  34. Create a cluster to have at least a structure like master/slave primary/secondary, 3 nodes will be
    recommended so we'll have the flexibility to do planned maintenance without suffering and/or
    affecting the performance of the whole cluster.
    #3: Database - Redundancy

    View Slide

  35. #3: Database - Spin up a secondary node
    Create another VM somewhere else (better if in another availability zone/region/provider), then
    configure a MySQL/MariaDB/Percona/... instance and plug it in as a secondary node.
    We can set it up even with fewer resources and make it the write-only node (in case we have less writing
    activity, otherwise the read-only one).

    View Slide

  36. #3: Database - Balancing requests
    I prefer to use something like HAProxy as a TCP load balancer, or (even better) using ProxySQL (which
    has a nice query caching capability). I'd go with ProxySQL load balancing the 2 nodes created, then just
    change the database connection string in the application and the setup is done (we could even partition
    the queries and define to which node they should be sent).
    In my case, a primary/secondary topology could more than enough, but I went for a primary/primary
    configuration (you can follow a simple tutorial or a more structured configuration) without balancing
    (because each web node will access their local DB instance).

    View Slide

  37. #3: Database - Security
    The replica must be done over a secure connection, so you need to generate a certificate and use it.

    View Slide

  38. SAVE YOURSELF
    FROM A DISASTER #4
    Redundancy of
    Storage

    View Slide

  39. Although we could use some distributed filesystems like Ceph, DRBD, GlusterFS, or ZFS, then it won't be
    on a budget and also the complexity introduced by those tools will need to addressed properly. I will
    not cover it here due to the costs of extra nodes and extra configuration needed - you're time have a
    cost too (but if your filesystem changes frequently this is your only option).
    #4: Storage - Distributed Storage

    View Slide

  40. #4: Storage - Ad-Hoc Solutions
    ● How to build a Ceph Distributed Storage Cluster on CentOS 7
    ● How to Setup DRBD to Replicate Storage on Two CentOS 7 Servers
    ● How To Create a Redundant Storage Pool Using GlusterFS on Ubuntu 18.04
    ● An Introduction to the Z File System (ZFS) for Linux

    View Slide

  41. #4: Storage - Quick & Dirty: Cross Sync
    Let's use a simple cronjob every hour to sync the whole shared folder to all remote locations.
    Server #1:
    rsync -e "ssh -i $HOME/.ssh/somekey" -auv --progress /path/to/shared/folder/ [email protected]:/path/to/shared/folder
    Server #2:
    rsync -e "ssh -i $HOME/.ssh/somekey" -auv --progress /path/to/shared/folder/ [email protected]:/path/to/shared/folder
    Remember, this is not a proper distributed solution, rsync looks like an old-fashioned solution, it did
    save me lots of times. This approach is not feasible for "real-time" synchronization, they are just for
    (very) infrequent changes. Distributed filesystems like GlusterFS (or Ceph, or DRBD) are solutions for
    the long run.

    View Slide

  42. #4: Storage - Security
    Remember to secure the connection between one host to the others (eg. with a firewall).

    View Slide

  43. SAVE YOURSELF
    FROM A DISASTER #5
    Redundancy of
    Web Servers

    View Slide

  44. Nowadays many cloud providers (also virtualization platforms) are giving you the possibility to take a
    snapshot of the VM and then restore/clone it. I'll not cover it in this tutorial as we'll increase the overall
    cost of the infrastructure. Although, sometimes (based on the application) it can be very time-saving
    doing a clone of the VM compared to the other method I'm proposing here below.
    #5: Web Servers - Duplicate VM

    View Slide

  45. #5: Web Servers - Docker
    We live in 2021, everyone is running containers and wishing to have a k8s cluster to play with. So, let's
    convert the simple applications into containers, there are a lot of already-ready containers on Docker
    Hub.

    View Slide

  46. #5: Web Servers - Docker Swarm
    Let's start nice and easy, with Docker Swarm (which eliminates the extra complexity of Kubernetes) on
    ONE node (then we can scale out as much as we like).
    First, setup your nodes, I'm going to use standard images for my dockerized infrastructure, no custom
    images (for now - I've got pretty simple configurations). I've picked bitnami images, as they cover a lot
    of scenarios and provide pre-packaged images for most of the popular server software (more reasons
    why pick them).
    If you really want to start using custom images you could publish them publicly for free on Docker Hub
    (but has got recently some limitations) or on Canister. After the announcement from Docker Hub about
    limiting the rates of pull, AWS decided to offer public repositories (and they are almost free if you don't
    exceed 500GB/month when not logged or 5TB/month when logged).

    View Slide

  47. #5: Web Servers - Docker Compose
    This is an example of a WordPress website configured with docker-compose:
    version: "3.9"
    services:
    wordpress:
    image: wordpress:5.7.0
    ports:
    - 8000:80
    deploy:
    replicas: 1
    restart_policy:
    condition: on-failure
    extra_hosts:
    - "host.docker.internal:host-gateway"
    environment:
    WORDPRESS_DB_HOST: host.docker.internal:3306
    WORDPRESS_DB_USER: ***
    WORDPRESS_DB_PASSWORD: ***
    WORDPRESS_DB_NAME: ***
    volumes:
    - /path/to/wp-content:/var/www/html/wp-content
    healthcheck:
    test: ["CMD", "curl", "-f", "http://localhost"]
    interval: 30s
    timeout: 10s
    retries: 3

    View Slide

  48. When using Docker Swarm with lots of containers and services (which bounds a dedicated port), you'll
    need an ingress system to sort the requests to the right service. You could use one of the 2 most used
    solutions: Nginx or Traefik.
    #5: Web Servers - Ingress

    View Slide

  49. #5: Web Servers - Ingress
    I decided to use a simple bitnami/nginx with a custom config (pretty straightforward proxy):
    version: "3.9"
    services:
    client:
    image: bitnami/nginx:1.19.8
    ports:
    - 80:8080
    - 443:8443
    deploy:
    replicas: 2
    restart_policy:
    condition: on-failure
    extra_hosts:
    - "host.docker.internal:host-gateway"
    volumes:
    - /root/docker-compose/nginx/lb.conf:/opt/bitnami/nginx/conf/server_blocks/lb.conf:ro
    - /etc/letsencrypt:/etc/letsencrypt

    View Slide

  50. #5: Web Servers - TLS Termination
    This is the tricky part. If you have already bought the certificates (eg. from SSLs) you're good for 1 year
    (at least). If you don't want to buy them and want to rely on Let's Encrypt, you'll need to be ready to
    sweat a bit to set it up. Setting it up on one node is pretty simple, but if you need to replicate it on
    multiple nodes then you need to start being creative.
    One proposed solution would be having a primary node that generates (or renews) the certificate(s)
    and then it'll spread them to the other servers:
    rsync -e "ssh -i $HOME/.ssh/somekey" -auv --progress /etc/letsencrypt/ [email protected]:/etc/letsencrypt
    rsync -e "ssh -i $HOME/.ssh/somekey" -auv --progress /etc/letsencrypt/ [email protected]:/etc/letsencrypt

    View Slide

  51. #5: Web Servers - Kubernetes
    Kubernetes is more complex and require more time to configure it, but once done there could be no
    vendor lock-in for you (as many providers are offering managed k8s), also it is more extensible (but
    more complex than swarm).
    If you have already a Docker Swarm cluster and want to migrate try following these guides:
    ● From Docker-Swarm to Kubernetes – the Easy Way!
    ● Translate a Docker Compose File to Kubernetes Resources
    Remember to either use a dockerized database or rely on cloud-native managed solutions.

    View Slide

  52. SAVE YOURSELF
    FROM A DISASTER #6
    Redundancy of
    DNS

    View Slide

  53. This is not really a practical solution because whenever one server is down the traffic will still be routed
    to that server, and your customers will be affected. If you have 2 records A pointing to 2 different
    servers you could potentially lose 50% of your traffic.
    In case you need to remove (manually) quickly the unresponsive server, you need to take into account
    the DNS TTL. If it is set to a high value (like 24h or - even worse - a week) you cannot do anything to
    change that, other than wait. There are pro and cons for setting either a low or high TTL.
    Usually, the DNS propagation time is around 24 hours, but it could also be around 72 hours, this is
    because ISP can override the TTL you have specified and the time for your changes to propagate can be
    longer than expected.
    #6: DNS - DNS Round-Robin

    View Slide

  54. By having multiple nameservers you can have a fallback in case your DNS provider is having issues (very
    unlikely but possible).
    Generally, you need to maintain manually the records aligned between the two providers. Sometimes
    the DNS provider will give you the ability to manage those records by pulling the data from your
    primary provider or by giving you API access so you can do it programmatically.
    The RFC 1035 (Domain Names - Implementation And Specification), in fact, proposes to have more than
    nameserver configured.
    #6: DNS - Secondary DNS

    View Slide

  55. #6: DNS - Secondary DNS
    First of all, you need to verify that your registrar has got a nice and good DNS management panel. Some
    services that are offering such functionality are for example FreeDNS (premium version ~$5/year),
    DNSMadeEasy, and many more. Cloudflare can act as Secondary DNS but the setup seems quite long,
    DNSimple has out-of-the-box integration with it (but you cannot use any of the functionality offered by
    CF - which makes it a bit of a loss).
    I went with PremiumDNS which claims that it "keeps your website running, even when flooded with
    traffic. It secures the very deepest level of the Domain Name System (DNS), preventing Distributed
    Denial of Service (DDoS) attacks, and giving you 100% uptime, guaranteed."
    Great point is that you can buy it even for 3rd party domain.
    You can check the nameservers by running:
    dig +short NS example.com

    View Slide

  56. #6: DNS - Inspecting TTLs
    PremiumDNS has a TTL on the NS records of 30m, so you can be unavailable roughly for that amount of
    time (only if the ISP is not overriding the TTL). Cloudflare has a TTL on the NS records of 6 hours.

    View Slide

  57. #6: DNS - Manual Switch
    When everything goes south, sometimes happens to have issues with DNS, and you know you are going
    to be affected for too long, the last resort is to change manually the authoritative name servers
    registered on the domain (you could do that via your registrar) and point them to a fallback DNS (you
    could set up an offline clone of your records in Cloudflare).

    View Slide

  58. SAVE YOURSELF
    FROM A DISASTER #7
    Billing
    Impact

    View Slide

  59. #7: Billing - Original Cost
    ● DigitalOcean VPS: $5/mo x 12 = $60
    ● Setup hours: $0
    Monthly Cost: $5
    Annual Cost: $60

    View Slide

  60. #7: Billing - Upgrade Cost
    ● PremiumDNS: $2.88/yr x 5 domains = $14.4/yr
    ● DigitalOcean VPS: $5/mo x 12 = $60
    ● Hetzner VPS: €3.04/mo x 12 = €36.48 ($43.07)
    ● AWS S3: ~$0.07 x 365 = $26
    ● Setup hours: $? (fill here your cost time to follow this guide)
    Monthly Cost: ~$12
    Annual Cost: $143.47+
    That's more than 2x the original price you might say, and you'll be not so wrong about it. Obviously, for
    different original prices, it won't necessarily be 2x.

    View Slide

  61. Note: The following optimizations will only be shown on the annual bill, if you take action immediately
    you'll not reach 2x cost.
    Let's start cutting down what we really know is not necessary for our domains.
    First, we need to visualize the spending.
    Hetzner and DigitalOcean are the 2 biggest chunks of our spending (which was predictable). I'll try to
    cover some scenarios to optimize the cost.
    #7: Billing - Cost Optimization

    View Slide

  62. #7: Billing - Cost Optimization

    View Slide

  63. My retention for AWS S3 was the following:
    ● 24 hourly backups
    ● 31 daily backups
    ● 12 weekly backups
    ● 3 monthly backups
    I had 70 of them which were taking 3GB of space and costing just a few cents per year.
    #7: Billing - Reduce Backup Retention

    View Slide

  64. This is the most space consuming on AWS S3 since it is a mirror of your websites. It is not necessary as
    we have redundancy on our services, it was done just a last resort in case everything burns down so at
    least we could serve to the user the static content to access the information (even though they cannot
    interact with the dynamic part of the website).
    #7: Billing - Avoid the Static Clone

    View Slide

  65. We could replace our $60 spending with an additional Hetzner VPS (in another region) and move from
    $96.96 to $86.14 (saving $10/yr).
    #7: Billing - Replace DigitalOcean

    View Slide

  66. The concern is that, even if we have a VPS in Germany and another in Finland, we are relying on ONE
    provider (I know, there's vendor lock-in - but I have IaC fully configured so I can switch provider in a
    matter of minutes):
    #7: Billing - Replace DigitalOcean

    View Slide

  67. If you don't have very important, or profitable, applications/websites, and given the frequency of a DNS
    going down you might want to save some money on this. If you have many websites the cost can
    become quickly high, even if it's a few dollars per domain.
    If you make money, it's advisable to have a fallback DNS (or a premium service with guaranteed uptime
    at 100%), even because of the low cost (and impact on your bill).
    #7: Billing - DNS Fallback

    View Slide

  68. #7: Billing - Sum Up
    I don't run critical (nor very profitable) applications, so I can give up (at the moment) on having multiple
    cloud providers, to bring strong HA, in favour of saving some money.
    This will increase my spending from $60 to $112* $88, which is not optimal (compared to the initial
    figure): it's an extra $28/year (~$2.5/month - it's just like a couple of coffees) to have peace of mind.
    *Note: I've got some free credits on AWS so the backups are for free (at least for some time - not
    forever).

    View Slide

  69. #7: Billing - Sum Up

    View Slide

  70. SAVE YOURSELF
    FROM A DISASTER #8
    Manual
    Configurations

    View Slide

  71. In our toolbox are necessary Ansible and Terraform, these two will be your best friends in documenting
    the infrastructure and make everything replicable to scale up/out easily.
    Those 2 tools are vendor-agnostic, so they can work with any provider and avoid you to lock-in with a
    configuration management tool, like AWS CloudFormation / CDK.
    Other tools for provisioning are Puppet, Chef and SaltStack.
    Remember to keep the Infrastructure as Code always up-to-date, avoid any configuration drifting
    whatsoever.
    #8: Manual Configs - Tools

    View Slide

  72. #8: Manual Configs - Creating VMs
    For creating the infrastructure we'll use Terraform.
    This is an example of how to create a new VM (or like they call it a Droplet to be precise):
    # Create a web server
    resource "digitalocean_droplet" "web" {
    image = "ubuntu-20-04-x64"
    name = "web-1"
    region = "fra1"
    size = "s-1vcpu-1gb"
    monitoring = "true"
    ssh_keys = [digitalocean_ssh_key.default.fingerprint]
    depends_on = [
    digitalocean_ssh_key.default,
    ]
    }
    Just like that we could simply do copy & paste and create many others (even though it is best practice to
    use the count argument).

    View Slide

  73. ---
    - name: "Initial Provisioning"
    hosts: all
    become: true
    vars_files:
    - ../vars/init.yml
    roles:
    - oefenweb.swapfile
    - oefenweb.apt
    - ahuffman.resolv
    - ajsalminen.hosts
    - geerlingguy.ntp
    - geerlingguy.firewall
    - dev-sec.os-hardening
    - dev-sec.ssh-hardening
    - uzer.crontab
    tasks:
    - name: Add user manager
    ansible.builtin.user:
    name: "manager"
    shell: /bin/bash
    generate_ssh_key: yes
    ssh_key_type: rsa
    ssh_key_bits: 4096
    #8: Manual Configs - Provisioning
    - name: Allow manager to have passwordless sudo
    lineinfile:
    dest: /etc/sudoers
    state: present
    insertafter: '^root'
    line: 'manager ALL=(ALL) NOPASSWD: ALL'
    validate: 'visudo -cf %s'
    - name: "Logrotate Configs"
    copy:
    src: "{{ item.src }}"
    dest: "{{ item.dst }}"
    with_items: "{{ app_logrotate_config_items }}"
    - name: Set the policy for the INPUT chain to DROP
    Ansible.builtin.iptables:
    chain: INPUT
    policy: DROP

    View Slide

  74. SAVE YOURSELF
    FROM A DISASTER #9
    Disaster
    Recovery Plan

    View Slide

  75. Try to answer, in an honest way, the following questions:
    ● What are your weaknesses?
    ● What are your SPOF?
    ● What if the DNS provider will be down?
    ○ How do we switch name servers?
    ● What will you do if your HDD will fail?
    ● What if you get a ransomware?
    ○ How to make sure we don't fall into a ransom?
    ● What needs to be restored?
    ● Do we need to point the DB to a fallback node?
    ● How do we restore the backups?
    ○ Where are the backups stored?
    ○ Who can access them?
    ● How to serve static content when everything is lost?
    These are just some questions in order to get your head around the Disaster Recovery Plan you'll outline.
    #9: DRP - Plan Ahead

    View Slide

  76. #9: DRP - Possible Failures
    ● Application
    ● Network
    ● Data Center
    ● Citywide
    ● Regional
    ● National
    ● Multinational

    View Slide

  77. #9: DRP - Outline
    What are the RTO and RPO for your plan?
    ● RTO, Recovery Time Objective, it's the time needed to bring the service back online before
    creating too much of an unacceptable disruption for your users.
    ● RPO, Recovery Point Objective, it's the maximum amount of time allowed where the data is lost (a
    backup every hour has a RPO of 1h)

    View Slide

  78. SAVE YOURSELF
    FROM A DISASTER #10
    Play with
    Providers

    View Slide

  79. #10: Play with Providers
    For example, now that you have everything in containers you could migrate to Kubernetes, or maybe to
    cloud-native solutions for containers like AWS ECS, GCP GCE, Azure ACI, or even to serverless (since
    AWS allows to serve traffic from a docker image).

    View Slide

  80. #10: Play with Providers
    Is it better AWS or maybe it is more convenient Azure or GCP, don't fall into vendor lock-in: mix them up.
    Yeah but then they don't play along out-of-the-box, who cares? Make them work FOR you, you might
    need to spend some more time to get it right, but in the long run (it's always about the long run - if you
    focus on now just don't experiment and run for covers) you'll get the benefit of ALL the services they
    can offer to you.

    View Slide

  81. #10: Play with Providers
    It doesn't have to be a big player, great solutions can work also from not-mainstream providers. I've had
    experiences with bare-metal, AWS, Contabo, Hetzner, DigitalOcean, Aruba, TransIP, Scaleway, Linode,
    FlareVM, Heroku, Linode, OVH.

    View Slide

  82. THANK YOU

    View Slide

  83. QUESTIONS?

    View Slide

  84. 2 LITTLE GIFTs
    ——————————————————————————————
    FREE ebook:
    https://leanpub.com/savefromdisaster
    ——————————————————————————————
    GITHUB REPO:
    https://github.com/fabiocicerchia/save-from-disaster

    View Slide