Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building an AMI Factory with Open Source Tools

Building an AMI Factory with Open Source Tools

DevOpsDays Portland - 2013

Jeremy Carroll

November 06, 2013
Tweet

More Decks by Jeremy Carroll

Other Decks in Technology

Transcript

  1. - Bake: produce a complete virtual machine offline, before first

    use. - Fry: Produce a skeleton virtual machine by booting a basic VM, and then applying configuration. - Tradeoffs in both approaches.
  2. - You see a lot of appliance VM’s on EC2

    public images. - Make it easy. Press start, get application. - Bake once, use many times. ‘Network Effect’ in time by doing something once and then reuse. - You can always ‘fry’ something later as well. Example: Launching an instance then using Puppet or another conf. mgmt tool for live management.
  3. Why? - Makes things more efficient. Bake your base roles

    and classes to reduce work at instance launch time. - Example: Your package repository server is probably sad. Every machine launch installs the same packages over and over. - Operations and Development love the system. Not much more complicated then launching an instance. - You will have a better time.
  4. `Fry` Method - On Demand System Provisioning • Puppet run

    at boot • Many dependent services - 3rd party network calls • Boot times vary. Usually around 10-15 minutes - Provisioning systems at boot is slow with on-demand provisioning. Can take 15+ minutes for a system to become fully ready. - For us, autoscaling was a primary driver. - Autoscaling as a way of using EC2 API’s as a lightweight ‘state’ machine. Always keep ‘x’ of these running in multiple availability zones.
  5. AutoScale - In Production - Feels like this picture. When

    it works, it’s like being a rockstar. Taming the beast. - Orchestration of legend. Lots of moving parts thousands of times per day. - But has trade offs as it pertains with AutoScaling services. We still on-demand provision instances for persistent services at this time. - Previous system was ‘fry’ based. Had Just Enough Operating System (JEoS), and provisioned via Puppet at boot. - Anybody checked in bugs to Puppet trunk broke autoscaling. - Increased load on dependent services (Puppet, Apt, Internal APIs, etc..). Have to provision for spikiness (Launching 100’s of Instances in case of failure). - Spot pricing story. Lost an entire AZ due to run on spot, relaunch entire DataCenter for application.
  6. Reliability - Previous system was fragile. Patterns to increase reliability

    - Decouple from lower SLA system to increase availability. - Application can continue to function without remote system. - System that does not require any third party services can have improved SLA. - Reduce or eliminate dependency on CMDB. Eliminate or reduce third party network calls
  7. Image Creation Tools Used • Packer.io (https://github.com/mitchellh/packer) • Jenkins (https://https://github.com/jenkinsci/jenkins)

    • BATS (https://github.com/sstephenson/bats) • Puppet (https://github.com/puppetlabs/puppet) • CloudInit (https://help.ubuntu.com/community/CloudInit) Build Pipeline Testing Configuration Management Runtime Modificaitons - Tool availability is increasing. Aminator, BoxGrinder (Kinda Stale), Packer, or in-house scripts with EC2 tools (bundle-volume). - Looked at Aminator, was EBS specific. A lot of patterns gleaned from contribution. - Packer felt right due to multi-cloud approach and ability to deal with EBS and instance-store systems. - Puppet for deterministic configuration management. Insert your CMDB here. - Packer concurrency was a must have feature. Building multiple images at a time due to the 'launch and provision via SSH' style system. - Jenkins for workflow. Really just a scheduler as Low resource utilization on Jenkins slaves. - Cloud-init for run-time modifications. Packer builds can modify launch time operations such as 'Do not register in Route53', 'Which puppet role to apply', and 'What BATs tests to run'. - BATS for testing due to simplicity. Just write bash, exit codes are your friend.
  8. Image Evolution - Images have to start somewhere. - Foundation

    AMI to start. Can use ubuntu cloud images. Or roll your own. Just the Base operating system with patches, not much else. - Golden AMI to hous
  9. How do we create these images? Packer.io • Packer is

    a tool for creating identical machine images for multiple platforms from a single source configuration. • Written in GO • Supports a lot of platforms • EC2, DigitalOcean, OpenStack, VirtualBox, VMWare • A lot better than scripts + ec2-bundle-volume • Some bugs, but it’s getting better all the time - Written in GO - Supports parallel builds. Ex: Multiple builds at one time. - DSL for describing images. - Looked at Aminator (Too EBS specific). Boxgrinder was CentOS only, and almost abandon ware. - Does one job really well.
  10. { "provisioners": [ { "type": "shell", "scripts": [ "provision.sh" ]

    } ], "builders": [ { "access_key": "{{user `ec2_access_key`}}", "source_ami": "ami-1234567", "account_id": "1234", "bundle_destination": "/mnt", "region": "{{user `region`}}", "tags": { "application": "myapp", "environment": "test", "release": "precise", "host": "packerci-slave001", "owner": "root", "ancestor": "ami-34567890" }, "user_data": "#cloud-config\nrole: myapp", "x509_key_path": "{{user `ec2_private_key`}}", "instance_type": "{{user `size`}}", "x509_upload_path": "/mnt", "x509_cert_path": "{{user `ec2_cert`}}", "iam_instance_profile": "provisioning", "ami_name": "{{user `application`}}-{{user `version`}}.{{user `build_number`}}-{{user `architecture`}}-{{user `user_timestamp`}}-{{user `type`}}", "ami_description": "store=amazon-instance,ancestor_name=golden-12.04-precise-amd64-201311010411- instance_store,ancestor_id=ami-12345678,version=0.1,env=test,app=myapp,release=precise", "secret_key": "{{user `ec2_secret_key`}}", "security_group_id": "sg-12345678", "type": "amazon-instance", "s3_bucket": "mybucket", "ssh_timeout": "15m" } ] } Packer.json - Example of packer.json output. JSON configuration with a lot of options - Has variable substitution for environment variables. - Supports a lot of options, such as ‘instance-store’, EBS, or chroot builds. - We call a single script ‘provision.sh’ when the instance launches. - Script waits for CloudInit to drop a file telling the system that puppet has finished. - We then use our tests at this point to determine if we have a successful build. Puppet has converged, etc.. - If successful, the script then cleans up temporary data. SSH keys, log files, /etc/hosts if managed, CloudInit data
  11. Packer Wrapper - Packer does not know anything about our

    image management systems. - What’s the current ‘Golden’ AMI i need to use to create a machine in us-east-1 with an EBS store, on raring, etc.. - Jenkins calls this script with the build parameters to generate a usable packer.json file. - We also put business logic on how to create the packer file here to individuals do not need to know all the syntax. - What tags do I need to create on this ami? What’s the ami naming convention, etc..
  12. Image Query - Currently using Edda to store information. Cache

    of EC2 API - Enriches data structures to allow for dynamic querying for a variety of tags. Ex: prod vs dev vs test. - Query for certain types of images. EBS vs instance store. Different regions (Us-west vs east). Etc. - Can use Boto with EC2 tag filters as well. Sort by version number, take the most recent highest version number. That’s your build.
  13. The Factory - All starts with Jenkins. Can build on

    a schedule, on a code commit. Etc.. - Using the packer wrapper, creates a json templte. Boots a machine inside of EC2 and starts provisioning. - Integration testing is tricky, and depends on the service. - Sometimes for our autoscaling stateless services that we build nightly and deploy a canary for testing. Once reviewed (Automatic or manual) can be moved to production. - Tags and metadata about images is important. Allows us to filter / move images along the pipeline.
  14. So What Just Happened? - The whole process. Take the

    latest golden AMI with user-data. - Run puppet on the machine. - Using the provision.sh script, run tests to return exit codes. Packer will look at exit codes and determine to abort the build or continue. - If successfull, prepare the image for bundling.
  15. Parameterized Build Pipeline Jenkins • Takes parameters to launch a

    build • Uses ‘Packer Wrapper’ to query for Golden AMI • Triggers downstream jobs on success / failure • If successfully build AMI, trigger test run. - Jenkins job as a parameterized build pipeline. - Create parameters on what type of image to create. - Want a new image, copy any current job and modify the parameters. A little cumbersome but works. - Jenkins will do multi step builds. Create the ami. Launch the AMI. Do some testing (Ex: does code work?). Then tag the image as production. Etc..
  16. def handle(name, cfg, cloud, log, args): # puppet: # run_on_boot:

    true # shutdown_on_error: true # shutdown_timer: 5 config = {} # Create puppet object p = puppet.puppet() if 'puppet' in cfg: if isinstance(cfg['puppet'], dict): config = cfg['puppet'] # Puppet CloudInit result file puppet_status = os.path.join(cloud.get_cpath('data'), 'puppet') run_on_boot = util.get_cfg_option_bool(config, 'run_on_boot', True) shutdown_on_error = util.get_cfg_option_bool(config, 'shutdown_on_error', True) if run_on_boot: try: log.info('Puppet: running puppet') do_puppet_run(p, log) log.info('Puppet: convergance successful') util.write_file(puppet_status, "ok\n", 0644) except Exception: log.error('Puppet: failed to converge') util.write_file(puppet_status, "failed\n", 0644) if shutdown_on_error: log.error('Puppet: shutting down instance') CloudInit Handlers - CloudInit allows runtime modification. - Example of a cloudinit handler for puppet. - Any python you can write can take user-data in the form of strings, booleans, etc.. - We use it to turn on / off features. Ex: create Route53 DNS entries on boot. Which puppet group to provision. - Should you shutdown or terminate if puppet cannot converge? Have a bad box out there for a while. - Phone home to notify a system that we are ready for operation. You name it.
  17. Puppet Configuration Management • Still use a configuration management product

    to manage state on baked images. Repeatable process. Deterministic. • Works with CloudInit to pull down the modules required to make this image. Not much different than `Fry` model • Beware: Dynamic Variables. IP Addresses. Hostnames - Beware of dynamic fact driven templating. - The system you launch will not have the same IP address, hostname, availability zone, etc.. - Still need run-time modification of these attributes. Can run something lightweight on boot, or use CloudInit to manage these types of files.
  18. #!/usr/bin/env bats @test "addition using bc" { result="$(echo 2+2 |

    bc)" [ "$result" -eq 4 ] } @test "addition using dc" { result="$(echo 2 2+p | dc)" [ "$result" -eq 4 ] } BATS - Drop dead simple. - Use exit codes and bash. - For example. When testing Puppet. We parse last_run_summary.yaml, and determine if we have failed resources / classes. - Can look for other things. Such as ‘Does mount point exist’, ‘is package installed’, ‘is service running and has network port’, ‘can query for web health interface’. Packer works with exit codes, so works well.
  19. Lots, and lots of appliances. We Have Appliances! • EC2

    console becomes pretty much unusable • Console does not encapsulate your business rules, and image management practices • What’s the latest Golden AMI for 12.04? - Wish we would have known about this before starting to roll this out. - Users have a hard time finding images when you have an explosion of them. - Not so simple to launch a new instance now. Need something that can query your metadata and tags about image state.
  20. Tools - Internal tool. Still a work in progress -

    We have command line tools at this time, but not many web interfaces. - Influenced the cloud image finder on Canonical website. - Filter for images that you would like. Give a big red launch button. - We need more additional work on a unifed tool which represents our business logic. Ex: Asgard.
  21. Janitor Your Packer Builds Image Management • Packer only builds

    images. It does not attempt to manage them in any way. After they're built, it is up to you to launch or destroy them as you see fit. • Process like NetFlix ‘JanitorMonkey’ to clean up unused images / snapshots. Tags used heavily to influence process • Image not in use by any ASG launch configuration, or a ‘Golden’ AMI • https://github.com/Netflix/SimianArmy/blob/master/src/main/java/com/ netflix/simianarmy/aws/janitor/rule/ami/UnusedImageRule.java - Not so simple. Lots of subtile business rules. - Last time an instance launched with this AMI? Track this information. Cleanup after <X> days - Cannot delete an AMI part of an ASG. Or a Base AMI - Multi region, or multi account cleanup. Look at Janitor monkey for lots of good examples.
  22. Future Work • EBS images for speed of creation •

    Generic ‘Application AMI’ • Would allow for ‘Just add your configuration’ images. Ex: Jetty. • More work on Janitor process • Business workflow UI (Ex: Asgard) - Foundation AMI is the core operating system you will build from. Barely any modifications at this stage. - An example would be the Ubuntu Cloud Images. - Modifications to the Foundation AMI then creates our ‘Base’ AMI for an application stack. - Example: Foundation + Jetty = Jetty Base AMI. Then can use to install ‘Jetty’ applications. - Example2: Foundation + Hadoop + HBase RegionServer = RegionServer Base AMI. - Adding Code + Configuration can create an ‘application ami’. You can launch these instances and they will go to work on launch.