Choosing Compute and Event Services for AWS Migrations

Choosing Compute and Event Services for AWS Migrations Eóin Shanaghy
@eoins

This is about… Compute and Event service decisions Choosing between:
EC2, ECS, Fargate, Lambda, EKS, … Traits and trade-offs Selecting from SQS, SNS, Kinesis, EventBridge A migration story

About Eoin Shanaghy @eoins CTO fourTheorem fourtheorem.com/careers aiasaservicebook.com

Decision making 1. Intuition is misleading 2. Use real-world experience
of services to evaluate 3. Start with the simplest and go from there – try them all 4. The answer may differ for every workload 5. Be open – share all the pros and cons

What is Compute? Compute workload before migration Compute workload after
migration ≠ Frameworks Container Orchestration Self- managed storage and messaging Data access ceremony Business Rules Computation Views on data and reports REDUCE RETAIN

Some assumptions We want to optimise for: § Simplicity §
Fast adoption § Minimal care and maintenance We prefer: § Pay for what you use We don’t mind: § Cloud service “lock-in” § Constantly evolving

Compute Factors Simplicity – 1 to 5 Scalability - how
high? Scalability – how fast? Limits Unique features Workload suitability Cultural fit and comfort level not included as a factor

EC2 Simplicity 1 – Lots of configuration, maintenance, OS, security,
storage networking Scalability 1. Scales at instance level 2. Auto-scaling and alarms 3. No (theoretical) limit 4. Scaling speed depends on config (generally very fast!) Unique Features 1. Control at OS level 2. Instance type suited to exact needs (CPU/GPU/Storage/Network) Limits Soft vCPU limits Workload suitability Relatively rare level of optimisation required Running your own ElasticSearch cluster

Lambda Simplicity 4 – On demand. Tooling and configuration required.
Scalability 1. 1,000 concurrency – soft limit 2. Concurrency scaling burst to 3,000* then 500/minute 3. Depends on the trigger! Unique Features 1. Integrates with everything else 2. Destinations 3. DLQ, retries 4. Reserved concurrency 5. Asynchronous workloads 6. Stateless Limits 1. 128MB to 3008MB RAM (1792MB = 1 vCPU) 2. 250MB code 3. 512MB storage 4. 15 minute execution Workload suitability Almost anything. Split the workload to work on less data.

Fargate Simplicity 3 – No instances but requires cluster, service,
task definition to be configured Scalability 1. 100 concurrency – soft limit 2. Bursts to 10 containers in the first second! 3. 1 container every ~3 seconds after Unique Features 1. No servers 2. Supports Spot Limits 1. 4 vCPU 2. 30GB RAM 3. 20GB ephermeral storage Workload suitability Container-based workload requiring dynamic scaling * https://www.vladionescu.me/posts/scaling-containers-in-aws.html

ECS Simplicity 2 – Instances required Scalability 1. 2000 containers
per cluster (soft limit) 2. Speed depends on number and size of instances. Unique Features 1. Custom AMI Limits 1. 2000 containers per cluster (soft limit) Workload suitability Container workloads requiring EC2 instance control also... AWS Batch Simplicity 3 – Managed EC2 Unique Features 1. Job Scheduler 2. Priority levels Limits 20 dependencies Workload suitability Compute-intensive batch process, high-performance computing

EKS Simplicity 1/2 – Kubernetes + EC2 or Fargate Scalability
1. 2000 containers per cluster (soft limit) 2. Speed depends on number and size of instances. Scale-up on EC2 is fast Unique Features Kubernetes Limits 1. 2000 containers per cluster (soft limit) 2. Fixed cost for a cluster Workload suitability Kubernetes fans

CodeBuild Simplicity 4.5 Scalability 1. Up to 20-60 concurrent ‘builds’
Unique Features 1. Least amount of configuration for a container workload 2. Caching 3. Configurable language runtimes Limits 1. Max 145 GB memory, 72 vCPUs Workload suitability Ad-hoc, low-volume scripting tasks requiring a container

Event Factors Simplicity – 1 to 5 Capacity Sending and
Delivery Methods Limits (Message Size) Latency Delivery Guarantees

SQS Simplicity 3.5 – some tuning of visibility timeouts Type
Queue Features FIFO DLQs Delivery Guarantees 1. At-least-once (exactly once for FIFO) 2. Best-effort ordering for non-FIFO queues Limits 1. 256KB in a message 2. SendMessageBatch – maximum 10 messages 3. Very slow to scale with Lambda – 60 concurrent instances per minute 4. Otherwise “infinitely” scalable Suitability General use

EventBridge Simplicity 5 – Zero configuration, no infrastructure to provision
Type Pub/Sub Features 1. Pattern matching rules for targets 2. Third party SaaS providers Delivery Guarantees At-least-once Limits 1. Performance (latency) is variable – 500ms -> 100s! 2. 300 rules, 5 targets per rule Suitability General purpose pub/sub with zero configuration

SNS Simplicity 3.5 Type Pub/Sub Features 1. Email and SMS
integration 2. Topic and Subscription resources 3. DLQs Delivery Guarantees 1. Combine with SQS to guarantee delivery (at-least-once) Limits 1. Publish only one message at a time 2. 256KB Suitability 1. Lower-latency pub/sub 2. Many subscribers

Kinesis Data Streams Simplicity 2 – Requires calculated provisioning Type
Streaming Features 1. Large volume 2. Shards must be provisioned for required capacity 3. Integration with Kinesis Analytics, Firehose Delivery Guarantees 1. Consumers get guaranteed ordering per shard. 2. Events can be re-read – up to 7 days Limits 1. 1MB or 1000 records written per second per shard 2. Read 2MB per second per shard 3. Put 500 records at a time 4. Get up to 10,000 records at a time 5. Re-sharding limits apply Suitability 1. Anything with large volumes, streaming analytics, clickstreams 2. Small number of consumers Kinesis integration with Lambda

Customer story o On-premise compute cluster migration o Critical daily
processing of models. C++ and Python. o Nightly batch processing of financial models o Win came from scalability (parallelising workload) and efficient orchestration (Lambda and Redis scheduler)

Batch Fargate Lambda Fargate Lambda + COMPUTE SQS EventBridge Kinesis
SQS Kinesis + EVENTS 90 minutes 15 minutes Python with some C++ Monolithic data RDBMS Python Small data units S3 for Data Customer story: iterations

ElastiCache Redis DynamoDB Fargate Lambda Step Functions SQS Kinesis Data
Streams S3 Customer story: evolved state

Recap 1. Intuition is misleading 2. Use real-world experience of
services to evaluate 3. Start with the simplest - try them all 4. Serverless (simplest) first means more agility 5. Divide workloads into small units. Run anywhere at scale.

Thank You [email protected] @eoins fourtheorem.com/careers

Choosing Compute and Event Services for AWS Mig...

Choosing Compute and Event Services for AWS Migrations

Eoin Shanaghy

More Decks by Eoin Shanaghy

Other Decks in Programming

Featured

Transcript

Choosing Compute and Event Services for AWS Migrations Eóin Shanaghy

This is about… Compute and Event service decisions Choosing between:

About Eoin Shanaghy @eoins CTO fourTheorem fourtheorem.com/careers aiasaservicebook.com

Decision making 1. Intuition is misleading 2. Use real-world experience

What is Compute? Compute workload before migration Compute workload after

Some assumptions We want to optimise for: § Simplicity §

Compute Factors Simplicity – 1 to 5 Scalability - how

EC2 Simplicity 1 – Lots of configuration, maintenance, OS, security,

Lambda Simplicity 4 – On demand. Tooling and configuration required.

Fargate Simplicity 3 – No instances but requires cluster, service,

ECS Simplicity 2 – Instances required Scalability 1. 2000 containers

EKS Simplicity 1/2 – Kubernetes + EC2 or Fargate Scalability

CodeBuild Simplicity 4.5 Scalability 1. Up to 20-60 concurrent ‘builds’

Event Factors Simplicity – 1 to 5 Capacity Sending and

SQS Simplicity 3.5 – some tuning of visibility timeouts Type

EventBridge Simplicity 5 – Zero configuration, no infrastructure to provision

SNS Simplicity 3.5 Type Pub/Sub Features 1. Email and SMS

Kinesis Data Streams Simplicity 2 – Requires calculated provisioning Type

Customer story o On-premise compute cluster migration o Critical daily

Batch Fargate Lambda Fargate Lambda + COMPUTE SQS EventBridge Kinesis

ElastiCache Redis DynamoDB Fargate Lambda Step Functions SQS Kinesis Data

Recap 1. Intuition is misleading 2. Use real-world experience of

Thank You [email protected] @eoins fourtheorem.com/careers