Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CloudInit: The Good Parts

CloudInit: The Good Parts

Cloud-Init is the de facto industry standard for early-stage initialization of virtual machines in the cloud, but few engineers are familiar with everything that it has to offer.

All Linux virtual machines have Cloud-Init in their boot phase, whether they're as small as a t3.nano instance in AWS, as large as a Standard_HB60rs in Azure, or and on-premise OpenStack instance. Originally designed for Ubuntu in EC2, Cloud-Init provides at-first-boot configuration support across most Linux distributions and all major clouds.

Many operators are familiar with supplying a shell script via user-data when provisioning their compute resources, but Cloud-Init has a massive amount of other functionality that is, more often than not, left untapped.

In this talk, Event Store co-founder James Nugent explores that untapped potential by looking through some of the features of Cloud-Init and how to take advantage of them to improve the operability and resilience of your cloud operations.

James Nugent

August 05, 2019

More Decks by James Nugent

Other Decks in Technology


  1. Why Cloud-Init? • The de-facto industry standard for early-stage initialisation

    of virtual machines in the cloud running Unix-derived operating systems. • Used to specialise a generic operating system image at runtime to by provisioning a given set of configuration. • Originally developed by Canonical for configuring Ubuntu Linux running in Amazon EC2. • Now prevalent across all major clouds and most Unix-based operating systems.
  2. Why Cloud-Init? • Building a new machine image for each

    role a virtual machine must play in your infrastructure can be costly in terms of time: • The cycle of booting a machine, customising it, and imaging can take anywhere from 5 minutes to over an hour. • For rapidly evolving software, building an image for each version of a program increases the amount of time it takes to get that version to production. • Correctly constrained, runtime customisation can provide many of the benefits of an image-based workflow, with different trade-offs. • Image-based workflow can still make use of CloudInit in the image building process!
  3. Configuring Cloud-Init • The cloud-init package is installed in the

    operating system images supplied by most clouds. On systemd-based Linux, cloud-init.service usually runs at boot, as a oneshot service. • When started with the init sub-command, cloud-init runs the commands defined in a sequence of modules to specialise the operating system installation for the intended purpose. • Configuration comes from two sources: • Cloud provider-supplied metadata • User-supplied configuration
  4. #cloud-config Schema • #cloud-config is a complex YAML schema, whose

    valid components are affected by which modules are installed. • Documentation is somewhat hit-and-miss. Most of the information is in the docs, somewhere. All of the information is in the (python) source code of cloud-init. • Unless referring constantly to the code, writing the configuration files can be an iterative process of trial and error until you have a sufficiently large collection of reusable sections which you can cargo-cult into doing what you want. • cloud-init(1) has limited built-in schema validation functionality, but most modern editors will do just as well here with a YAML plugin.
  5. Host SSH Keys • There is no built-in module for

    specifying host keys at first boot, so we’ll need to build this ourselves. • Breaking this task down, we’ll need to do a few different things: • Generate some known host keys, and get them to the virtual machine • Move the keys into /etc/ssh before the first time the ssh.service unit starts • To do this correctly, it’s necessary for us to dig into the cloud-init default configuration to see what order modules run in, and choose the correct places to insert our logic.
  6. Host SSH Keys • Cloud-init runs in three phases: •

    Init - essential configuration that must be done early on • Config - configuration that doesn’t affect other stages of boot • Final - configuration that must be run as late as possible • The configuration (by default) lives in /etc/cloud/cloud.cfg, and is YAML. • One of the pieces of configuration sets in cloud.cfg is which modules run in which phase.
  7. More about write-files • File content needs to be provided

    embedded in the YAML configuration file. • If we wanted to provide them from a remote source, we could use a script to download and verify checksums of whatever files we downloaded. • Properties we can set for each file are: • Content (in one of a variety of encodings) • Path • Owner • Mode
  8. Other Use Cases • Some of the use cases we

    didn’t look at today, but are easy enough to accomplish: • Change the filesystem types for attached volumes (e.g. to XFS) • Configure a package repository (yum or apt) and install packages at boot • Install Docker, pull and run an image from Docker Hub • Run Chef or Puppet in standalone mode on boot, after downloading or installing the configuration from a package • Write out an /etc/machine-role to use as a base for accessing SSM Parameter Store trees • Join a node to a Serf or Consul cluster • Post a notification to Slack using the “phone home” module
  9. Summary • Cloud-init packs in a huge amount of functionality,

    however it’s not necessarily very discoverable. • It is worth learning at least the basics if: • You want a runtime-specialisation-based workflow • You work across a diverse range of clouds or operating systems and want a common configuration tool