Ben Whaley | @iAmTheWhaley
… And You Thought You Knew EC2
Slide 2
Slide 2 text
Instance Types
Family Intended use
C5 Compute-optimized with Intel Skylake*
R4 Memory-optimized, up to 488GiB**
I3 High I/O - Watch for sharp corners
F1 FPGA
P2 GPU - 2x perf of G2 at 1.5x the cost
D2 Dense Storage, up to 48TiB HDD & 10GiB network
* Coming soon?
** Only 244GiB currently
Slide 3
Slide 3 text
4 EBS Volume Types
Type Description I/O Throughput Cost
io1 Provisioned IOPS SSD Highest High Highest
gp2 General purpose SSD High Low(ish) Medium
st1 Throughput-optimized HDD Low Highest Low
sc1 Non-optimized HDD Low Medium Lowest
Slide 4
Slide 4 text
Miscellaneous improvements & features
• IPv6!
• VPC endpoints for DynamoDB, S3
• Lightsail - EC2 quick start
• Elastic GPUs
• New regions
• 2016 - Ohio, Canada, London, Mumbia, Seoul
• Ningxia, Paris, Stockholm
Slide 5
Slide 5 text
EC2 Systems Manager
Superpowers for EC2 instances and on-premises systems.
• Remote command execution with Run Commands
• Controlled secrets and configuration data with the Parameter Store
• Periodic tasks with the State Manager and Maintenance Windows
• Stepwise Automation workflows for initializing nodes
• Collect and query Inventory and Patch status
Slide 6
Slide 6 text
Systems Manager Essentials
• The SSM agent
• Open source (Golang) executable for Linux and Windows
• Available for cloud and on-premises systems
• Assign IAM role with permissions to interface with SSM API
• Install at boot or on existing systems
• Polls for commands to execute
• All actions recorded in CloudTrail (e.g. immutable audit trail)
• Trigger SNS, Lambda from Systems Manager events
• Store command history and output to S3
• Fine-grained access control to Run Commands
• Integration with Config to track changes over time
Slide 7
Slide 7 text
Systems Manager Documents
JSON Schema describing actions for the systems manager
{
"schemaVersion":"2.0",
"description":"Run a script",
"parameters":{
"commands":{
"type":"String",
“description”:"Commands to run"
}
},
"mainSteps":[
{
"action":"aws:runShellScript",
"name":"runShellScript",
"inputs":{
"runCommand":"{{ commands }}"
}
} ]
}
Slide 8
Slide 8 text
Quick & dirty SSM demo:
1. Scale up ASG of managed instances
2. Run a command on remote agents
3. Store and retrieve a parameter
Virtual Private Cloud
Private subnet
EC2 Instances
Systems Manager
ssm-agent
Workstations
Slide 9
Slide 9 text
Five Ways to Provision Instances
Userdata
1
Slide 10
Slide 10 text
Five Ways to Provision Instances
Bake an AMI
2
(Alternative: half-baked)
Slide 11
Slide 11 text
Five Ways to Provision Instances
Configuration management runs
at boot, registers with server,
converges a configuration
3
Slide 12
Slide 12 text
Five Ways to Provision Instances
Autoscaling lifecycle hook —> CloudWatch
event —> Run Command —> execute
provisioning documents
4
(Alternative: CloudWatch event —> Lambda)
Slide 13
Slide 13 text
Five Ways to Provision Instances
AWS::CloudFormation::Init
5
Slide 14
Slide 14 text
via fbrnc.net
Slide 15
Slide 15 text
Application Load Balancer
• Almost as good as HAProxy or NGINX. Almost.
• Host- and path-based routing
• Additional metrics (# active connections, total traffic)
• Improved health checks
• Websockets
• HTTP/2
• Integration with X-Ray (adds X-Amzn-Trace-Id header)
• Integration with ECS
• Integration with WAF
Slide 16
Slide 16 text
One solution:
route53 health checks with
autoscaling lifecycle events
Problems:
Rotating IP addresses
Traffic surges
ELBs
Slide 17
Slide 17 text
ELB IP 1
ELB IP 2
Clients
Clients
Clients
Clients
Moar clients?
Feeling… weak…
must… scale up.
ELB IP 1
ELB IP 2
Clients
Clients
Couple of clients?
Come at me, bro.
Slide 18
Slide 18 text
ELB IP 3
ELB IP 4
Clients
Clients
I sense a
disturbance in the
force… as if hundreds
of clients suddenly
cried out in pain
Clients
Clients
ELB IP 1
ELB IP 2
Bueller?
Bueller?
… moments later
Slide 19
Slide 19 text
Clients
Lookup A record for
myservice.example.com
Route53
IP 1
IP 2
Clients
Client
Connect to IP 1 Node with
IP 1
Client
Connect to IP 2
Node with
IP 2
health
checks
ASG
Slide 20
Slide 20 text
Clients
Connect to IP 1
Clients
Connect to IP 2
ASG
Node with
IP 1
Node with
IP 2
Clients
Clients
Clients
Clients
Clients
Clients
1. Autoscale via custom CloudWatch metric
2. New node boots with IP 3
3. Autoscaling lifecycle hook adds IP 3 to
myservice.example.com A record
Moar clients!
Slide 21
Slide 21 text
Clients
Connect to IP 1
Clients
Connect to IP 2
ASG
Node with
IP 1
Node with
IP 2
Clients
Clients
Clients
Clients
Clients
Clients
Clients
Connect to IP 3
Node with
IP 3
Clients
Clients
Clients
Slide 22
Slide 22 text
Pros: Solves problems
Cons: Causes problems
Slide 23
Slide 23 text
EC2 Container Service Updates
• ECR - best registry to use for AWS container workloads
• Support for volumes
• CloudWatch metrics for CPU and memory utilization
across the cluster (set alarms for autoscaling)
• IAM roles for ECS tasks
• Blox allows custom schedulers (github.com/blox/blox)
• 3rd party tooling (Convox, Empire)
• Integration with ALB
• Run tasks on a schedule
• Execute tasks in response to CloudWatch events
Slide 24
Slide 24 text
Curated Pro Tips
Slide 25
Slide 25 text
Autoscaling groups of size one for
self-healing and resilience.
Slide 26
Slide 26 text
Many EC2 IAM actions do not support
resource-level permissions. Exercise caution.
Slide 27
Slide 27 text
Use the BurstBalance CloudWatch metric to
monitor I/O credit balance for gp2, st1, sc1
EBS volumes.
Slide 28
Slide 28 text
Network throughput increases substantially
with instance type.
(Don't forget to enable enhanced networking)
Slide 29
Slide 29 text
Explicitly request SSD ephemeral disks when
desired. Otherwise you may not get them.
Slide 30
Slide 30 text
Require MFA for SSH access.
Slide 31
Slide 31 text
Improve your SSH experience
with ControlPersist.
Slide 32
Slide 32 text
Use ssh -D and the SwitchyOmega
Chrome extension for convenient
access to services in a private network.
Slide 33
Slide 33 text
Running multiple apps per instance?
Use AssumeRole to assign granular
permissions to each app.
Slide 34
Slide 34 text
Protect the EC2 metadata and userdata.
Slide 35
Slide 35 text
Enable Fail2ban to block IPs
with failed login attempts.
Slide 36
Slide 36 text
Use Linux >= 4.4 for best results.
Slide 37
Slide 37 text
Capture userdata output to a file.
#!/bin/bash -x
exec > /var/log/userdata.log 2>&1