Slide 1

Slide 1 text

Ben Whaley | @iAmTheWhaley … And You Thought You Knew EC2

Slide 2

Slide 2 text

Instance Types Family Intended use C5 Compute-optimized with Intel Skylake* R4 Memory-optimized, up to 488GiB** I3 High I/O - Watch for sharp corners F1 FPGA P2 GPU - 2x perf of G2 at 1.5x the cost D2 Dense Storage, up to 48TiB HDD & 10GiB network * Coming soon? ** Only 244GiB currently

Slide 3

Slide 3 text

4 EBS Volume Types Type Description I/O Throughput Cost io1 Provisioned IOPS SSD Highest High Highest gp2 General purpose SSD High Low(ish) Medium st1 Throughput-optimized HDD Low Highest Low sc1 Non-optimized HDD Low Medium Lowest

Slide 4

Slide 4 text

Miscellaneous improvements & features • IPv6! • VPC endpoints for DynamoDB, S3 • Lightsail - EC2 quick start • Elastic GPUs • New regions • 2016 - Ohio, Canada, London, Mumbia, Seoul • Ningxia, Paris, Stockholm

Slide 5

Slide 5 text

EC2 Systems Manager Superpowers for EC2 instances and on-premises systems. • Remote command execution with Run Commands • Controlled secrets and configuration data with the Parameter Store • Periodic tasks with the State Manager and Maintenance Windows • Stepwise Automation workflows for initializing nodes • Collect and query Inventory and Patch status

Slide 6

Slide 6 text

Systems Manager Essentials • The SSM agent • Open source (Golang) executable for Linux and Windows • Available for cloud and on-premises systems • Assign IAM role with permissions to interface with SSM API • Install at boot or on existing systems • Polls for commands to execute • All actions recorded in CloudTrail (e.g. immutable audit trail) • Trigger SNS, Lambda from Systems Manager events • Store command history and output to S3 • Fine-grained access control to Run Commands • Integration with Config to track changes over time

Slide 7

Slide 7 text

Systems Manager Documents JSON Schema describing actions for the systems manager { "schemaVersion":"2.0", "description":"Run a script", "parameters":{ "commands":{ "type":"String", “description”:"Commands to run" } }, "mainSteps":[ { "action":"aws:runShellScript", "name":"runShellScript", "inputs":{ "runCommand":"{{ commands }}" } } ] }

Slide 8

Slide 8 text

Quick & dirty SSM demo: 1. Scale up ASG of managed instances 2. Run a command on remote agents 3. Store and retrieve a parameter Virtual Private Cloud Private subnet EC2 Instances Systems Manager ssm-agent Workstations

Slide 9

Slide 9 text

Five Ways to Provision Instances Userdata 1

Slide 10

Slide 10 text

Five Ways to Provision Instances Bake an AMI 2 (Alternative: half-baked)

Slide 11

Slide 11 text

Five Ways to Provision Instances Configuration management runs at boot, registers with server, converges a configuration 3

Slide 12

Slide 12 text

Five Ways to Provision Instances Autoscaling lifecycle hook —> CloudWatch event —> Run Command —> execute provisioning documents 4 (Alternative: CloudWatch event —> Lambda)

Slide 13

Slide 13 text

Five Ways to Provision Instances AWS::CloudFormation::Init 5

Slide 14

Slide 14 text

via fbrnc.net

Slide 15

Slide 15 text

Application Load Balancer • Almost as good as HAProxy or NGINX. Almost. • Host- and path-based routing • Additional metrics (# active connections, total traffic) • Improved health checks • Websockets • HTTP/2 • Integration with X-Ray (adds X-Amzn-Trace-Id header) • Integration with ECS • Integration with WAF

Slide 16

Slide 16 text

One solution: route53 health checks with autoscaling lifecycle events Problems: Rotating IP addresses Traffic surges ELBs

Slide 17

Slide 17 text

ELB IP 1 ELB IP 2 Clients Clients Clients Clients Moar clients? Feeling… weak… must… scale up. ELB IP 1 ELB IP 2 Clients Clients Couple of clients? Come at me, bro.

Slide 18

Slide 18 text

ELB IP 3 ELB IP 4 Clients Clients I sense a disturbance in the force… as if hundreds of clients suddenly cried out in pain Clients Clients ELB IP 1 ELB IP 2 Bueller? Bueller? … moments later

Slide 19

Slide 19 text

Clients Lookup A record for myservice.example.com Route53 IP 1 IP 2 Clients Client Connect to IP 1 Node with IP 1 Client Connect to IP 2 Node with IP 2 health checks ASG

Slide 20

Slide 20 text

Clients Connect to IP 1 Clients Connect to IP 2 ASG Node with IP 1 Node with IP 2 Clients Clients Clients Clients Clients Clients 1. Autoscale via custom CloudWatch metric 2. New node boots with IP 3 3. Autoscaling lifecycle hook adds IP 3 to myservice.example.com A record Moar clients!

Slide 21

Slide 21 text

Clients Connect to IP 1 Clients Connect to IP 2 ASG Node with IP 1 Node with IP 2 Clients Clients Clients Clients Clients Clients Clients Connect to IP 3 Node with IP 3 Clients Clients Clients

Slide 22

Slide 22 text

Pros: Solves problems Cons: Causes problems

Slide 23

Slide 23 text

EC2 Container Service Updates • ECR - best registry to use for AWS container workloads • Support for volumes • CloudWatch metrics for CPU and memory utilization across the cluster (set alarms for autoscaling) • IAM roles for ECS tasks • Blox allows custom schedulers (github.com/blox/blox) • 3rd party tooling (Convox, Empire) • Integration with ALB • Run tasks on a schedule • Execute tasks in response to CloudWatch events

Slide 24

Slide 24 text

Curated Pro Tips

Slide 25

Slide 25 text

Autoscaling groups of size one for self-healing and resilience.

Slide 26

Slide 26 text

Many EC2 IAM actions do not support resource-level permissions. Exercise caution.

Slide 27

Slide 27 text

Use the BurstBalance CloudWatch metric to monitor I/O credit balance for gp2, st1, sc1 EBS volumes.

Slide 28

Slide 28 text

Network throughput increases substantially with instance type. (Don't forget to enable enhanced networking)

Slide 29

Slide 29 text

Explicitly request SSD ephemeral disks when desired. Otherwise you may not get them.

Slide 30

Slide 30 text

Require MFA for SSH access.

Slide 31

Slide 31 text

Improve your SSH experience with ControlPersist.

Slide 32

Slide 32 text

Use ssh -D and the SwitchyOmega Chrome extension for convenient access to services in a private network.

Slide 33

Slide 33 text

Running multiple apps per instance? Use AssumeRole to assign granular permissions to each app.

Slide 34

Slide 34 text

Protect the EC2 metadata and userdata.

Slide 35

Slide 35 text

Enable Fail2ban to block IPs with failed login attempts.

Slide 36

Slide 36 text

Use Linux >= 4.4 for best results.

Slide 37

Slide 37 text

Capture userdata output to a file. #!/bin/bash -x exec > /var/log/userdata.log 2>&1

Slide 38

Slide 38 text

Coming this August!

Slide 39

Slide 39 text

Thanks! Ben Whaley @iAmTheWhaley