Fancy Containers (Velocity SF 2018)

Containers on AWS: let’s get fancy Abby Fuller, Developer Relations,
AWS @abbyfuller

What we’re going to talk about • First things first
• You’ve deployed your cluster- now what? • Speeding up deployments • Cleaning up after yourself • Fun with user-data • A little bit on monitoring/observability/debugging

First things first…

We hear a lot about “how to get started with
${thing}”

This leaves you with a (hopefully) working cluster, and not
much else.

So what happens after?

Speeding up deployments

In a lot of cases, first problem that crops up
is deployments. As in, they’re SLOW. Lots of reasons this happens!

Before you blame your orchestration tool…

Check your image sizes Larger images mean slower deploys (slower
to build/push/pull) Image size is determined (mostly), by the number of layers your image has, and how large those layers are. A few tips: • Use shared base images wherever possible • Limit the data written to the container layer • Chain RUN statements • Prevent cache misses at build for as long as possible

CACHE Most importantly for images, understand the Docker cache

Smaller images means understanding the cache Starting from the parent
instruction, Docker will look at each following instruction to see if it matches the cached version. Only ADD and COPY will look at checksums for a match Other than ADD and COPY, only the string of the command is used, not the contents of the files. Once the cache is broken, every subsequent layer is built again.

How quickly your images are pulling from your registry is
important for speeding up your deploys: smaller, lighter images can build and push more quickly during your CI/CD process, and pull more quickly during your deploys.

Next up for faster deployments: EC2 settings

“Secret”* ASG settings • There are some less intuitive settings
(UX! ) that can impact how quickly your group can a) autoscale, b) how long it takes to mark your containers healthy.

What are these? Default cooldown: how long the autoscaling group
will wait before evaluating the rule again Health check grace period: amount of time, in seconds, that the autoscaling group will wait before health checking the service.

Since services run through ECS and EKS are backed by
autoscaling groups, changing these settings can cut down the amount of time it takes your service to a) scale, b) become healthy after starting.

You can also customize settings at the load balancer health
check level!

Advanced health check settings can also influence your deployments! This
happens through a few settings: Healthy and unhealthy thresholds determine how many times your health check endpoint is tried before your container is declared healthy or unhealthy. The shorter you can make the interval between these checks, the faster your deployment will be. Be careful though- you need to give your application enough time to actually pass.

Moving off deployments, let’s get more efficient

Garbage collection for fun and profit

Garbage collection for containers is used to remove things like
dangling/untagged images, containers, and volumes. Most orchestration platforms (like ECS/EKS, or Kubernetes) will do some garbage collection for you, but it’s not always enough. Too many unused images, containers and volumes will: • Steal your disk space • Wake you up • Cost you cash money

Tune the image cleanup parameters with ECS • With ECS,
you can tune the options you have available for garbage collection. • If you want to cleanup more aggressively, you can look at setting these ecs-agent options*: ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION ECS_IMAGE_CLEANUP_INTERVAL ECS_NUM_IMAGES_DELETE_PER_CYCLE * don’t know how to set ecs-agent options? We’ll talk about it in a second!

Tune the image cleanup parameters with kubelet • Kubelet also
has a cleanup function! • Can clean up images and containers, controlled through kubelet flags • Image collection is based on disk usage • Container collection is based on flags (or the defaults) • minimum-container-ttl-duration • maximum-dead-containers-per-container • maximum-dead-containers Just like with ECS, be careful not to collect so aggressively that you lose useful containers

What happens when the defaults aren’t enough? You can: A)
tune (like we just talked about), B) use a 3rd party tool (like spotify-gc) C) All of the above

It’s also worth mentioning that unused images (and volumes and
containers, you know the drill), aren’t the only way to lose your disk space. Log rotate is also configurable. Rotate away more files, or more often, or both. With Ubuntu, you can do that with /etc/logrotate.conf. For example, you could change monthly to daily.

*Configuring the ecs-agent, a belated footnote For ECS, you can
customize quite a bit through the ecs-agent. This is good for things like customizing image cleanup, and changing how your instances interact with Docker and AWS (for example, changing Docker flags, or resource usage). Full list of options is available here. Options can be set in /etc/ecs/ecs.config

Bringing your own AMI

Sometimes, you might want to bring your own AMI to
ECS or EKS (or Fargate, but that’s not possible yet). With EKS, this is easy! Just select your own when you start the cluster. For ECS, this is a little trickier. Fun fact: this is the icon for AMI

The easiest way to use ECS is with the ECS-optimized
AMI, but life is too short to not have custom AMIs. A few steps to registering your own AMI with ECS: 0. Make sure your AMI has the right requirements! 1. Install ecs-agent 2. Install Docker daemon (if it’s not already installed) 3. Register instance with cluster 4. Optional (but good): some sort of init process to manage the ecs-agent Most of these options can be set in EC2 user-data (more on that in a sec)

Your instance needs a few things before you can use
it If you’re using Amazon Linux, you need the correct role, and Docker installed. If you’re not using Amazon Linux, you have a few more steps to follow: documentation is here.

Install and start the ecs-agent and Docker daemon $ yum
install -y ecs-init $ service docker start $ start ecs Full instructions here.

Register your instance with the cluster With Amazon Linux, in
/etc/ecs/ecs.config: ECS_ENABLE_TASK_IAM_ROLE=true ECS_ENABLE_TASK_IAM_ROLE_NETWORK_HOST=true ECS_LOGFILE=/log/ecs-agent.log ECS_AVAILABLE_LOGGING_DRIVERS=["json-file","awslogs"] ECS_LOGLEVEL=info ECS_CLUSTER=default Full instructions here.

User-data is where the magic happens

Little known fact: user-data is amazing. You can do ALL
THE FUN ACTIVITIES.

User-data: literally just bash™

All of those options we just talked about (plus many
more) can be scripted in EC2 user-data. To start with, here is a walkthrough on bootstrapping ECS container instances with user-data.

You can do all kinds of stuff here, both with
shell scripts, and cloud-init directives. Or MIME multi-part files. Wild. User-data is good for everything from starting services, to configuring your environment, to enabling options and flags that aren’t supported in the UI. Scripts run at boot, and you can modify them through the EC2 Console.

Speaking of unsupported flags, what happens when I want to
use a Docker flag that’s not supported in the UI? $ echo ”WHATEVER_FLAG=hi” >> /etc/sysconfig/docker An actual, candid reaction from the ECS team when I told them I do this.

You’ve done some cool stuff with your containers: how do
you know when things are working?

Monitoring and observability!

Hang on: you said monitoring AND observability

With both observability and monitoring, most people start out like
this:

But rather than building this:

They end up with this:

If it doesn’t make sense to you at 3am, it’s
probably not helping much: a memoir.

You don’t have to go it alone! Lots of tool
out there to help

Just a few things to think about: • Reduce the
noise • Page only on issues that require immediate, emergency attention • You need all of it- both monitoring AND observability (more than just logs!) • Make sure you’re asking and answering the right questions.

Hear from the experts: Charity Majors (@mipsytipsy): on Twitter, and
on the honeycomb.io blog. And a great article from Cindy Sridharan (@copyconstruct) on the differences between monitoring and observability

Thank you! @abbyfuller

Fancy Containers (Velocity SF 2018)

Fancy Containers (Velocity SF 2018)

More Decks by Abby Fuller

Featured

Transcript