Upgrade to Pro — share decks privately, control downloads, hide ads and more …

To The Cloud

To The Cloud

Originally presented at an internal event for BPDTS, this presentation covers how to think about deploying to the cloud, for people who may be more familiar with on-premises infrastructure.

The Scale Factory

September 25, 2019
Tweet

More Decks by The Scale Factory

Other Decks in Technology

Transcript

  1. WHO AM I?_ 18 years of experience Previously: Lead Developer

    & Systems Administrator, Trutap Senior Network and Systems Engineer, DSVR Now: Founder/CTO/CEO The Scale Factory
  2. FOUR KEY METRICS_ Lead Time Deployment Frequency Mean Time To

    Recovery (MTTR) Change Fail Percentage
  3. Aspect of Software Delivery Performance* Elite High Medium Low Deployment

    frequency For the primary application or service you work on, how often does your organization deploy code to production or release it to end users? On-demand (multiple deploys per day) Between once per day and once per week Between once per week and once per month Between once per month and once every six months Lead time for changes For the primary application or service you work on, what is your lead time for changes (i.e., how long does it take to go from code committed to code successfully running in production)? Less than one day Between one day and one week Between one week and one month Between one month and six months Time to restore service For the primary application or service you work on, how long does it generally take to restore service when a service incident or a defect that impacts users occurs (e.g., unplanned outage or service impairment)? Less than one hour Less than one daya Less than one daya Between one week and one month Change failure rate For the primary application or service you work on, what percentage of changes to production or released to users result in degraded service (e.g., lead to service impairment or service outage) and subsequently require remediation (e.g., require a hotfix, rollback, fix forward, patch)? 0-15%b,c 0-15%b,d 0-15%c,d 46-60%
  4. ACCELERATE STATE OF DEVOPS REPORT_ 2019 Findings Cloud continues to

    be a differentiator for elite performers and drives high performance.
 The use of cloud—as defined by NIST Special Publication 800-145—
 is predictive of software delivery performance and availability. The highest performing teams were 24 times more likely than low performers to execute on all five capabilities of cloud computing. “
  5. WHY THE CLOUD?_ Increased agility Unparalleled scalability Enterprise-grade services as

    standard AWS run data centres better than you do Lower cost?
  6. An SEP is something we can't see, or don't see,

    or our brain doesn't let us see, because we think that it's somebody else's problem.... The brain just edits it out, it's like a blind spot. If you look at it directly you won't see it unless you know precisely what it is. Your only hope is to catch it by surprise out of the corner of your eye. SOMEBODY ELSE’S PROBLEM_ “
  7. HARDWARE HYPERVISOR VM VIRTUAL HW KERNEL APP APP APP VM

    VIRTUAL HW KERNEL APP APP APP VM VIRTUAL HW KERNEL APP APP APP VM VIRTUAL HW KERNEL APP APP APP Infrastructure Applications ON PREM_
  8. HARDWARE HYPERVISOR VM VIRTUAL HW KERNEL APP APP APP VM

    VIRTUAL HW KERNEL APP APP APP VM VIRTUAL HW KERNEL APP APP APP VM VIRTUAL HW KERNEL APP APP APP Somebody Else's Problem Applications CLOUD_
  9. MAIN CHANGES_ APIs for everything You don’t control network addressing

    Security is a shared responsibility Failure modes are different Billing is more complex Backups are easier You can easily refactor infrastructure
  10. resource "aws_instance" "web" { instance_type = "t2.micro" # Lookup the

    correct AMI based on the region # we specified ami = "${lookup(var.aws_amis, var.aws_region)}" # The name of our SSH keypair we created above. key_name = "${aws_key_pair.auth.id}" # Our Security group to allow HTTP and SSH access vpc_security_group_ids = ["${aws_security_group.default.id}"] subnet_id = "${aws_subnet.default.id}" provisioner "remote-exec" { inline = [ "sudo apt-get -y update", "sudo apt-get -y install nginx", "sudo service nginx start", ] } }
  11. { "variables": { "aws_access_key": "", "aws_secret_key": "" }, "builders": [{

    "type": "amazon-ebs", "access_key": "{{user `aws_access_key`}}", "secret_key": "{{user `aws_secret_key`}}", "region": "us-east-1", "source_ami_filter": { "filters": { "virtualization-type": "hvm", "name": "ubuntu/images/*ubuntu-xenial-16.04-amd64-server-*", "root-device-type": "ebs" }, "owners": ["099720109477"], "most_recent": true }, "instance_type": "t2.micro", "ssh_username": "ubuntu", "ami_name": "packer-example {{timestamp}}" }] }
  12. user { 'jtopper': ensure => present, uid => '1000', gid

    => '1000', shell => '/bin/bash', home => '/home/jtopper' } service { 'httpd': ensure => 'running' }
  13. Visible Invisible Value Chain Evolution Genesis Custom Product Commodity Power

    Customer MySQL Compute Storage Data Centre HA Scripts Monitoring Config Mgmt Networking
  14. n=1

  15. n=6

  16. n=3

  17. VPC (eu-west-1) AZ eu-west-1a AZ eu-west-1b AZ eu-west-1c Public Subnet

    Public Subnet Public Subnet Private Subnet Private Subnet Private Subnet
  18. n=6

  19. n=1

  20. PLAN FOR FAILURE_ Build a list of potential failure scenarios

    Understand how your platform will react Game days (tabletop) Mitigate / Document Game days (live failure injection)
  21. SOFTWARE DEPLOYMENT: CI/CD_ Syntax checks Linting Unit Tests Static Security

    Scanning Build Artefact Dynamic Security Scanning
  22. CENTRAL LOGGING_ Structured logs Add relevant IDs (transaction, user, etc)

    Make dashboards available Represent “events” on dashboards
  23. TO THE CLOUD! RECAP_ Embrace the SEP field Embrace DevOps,

    Agile & SRE thinking Unlearn some old ways Build your solutions to be “cloud native”