Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Choosing Compute and Event Services for AWS Migrations

Choosing Compute and Event Services for AWS Migrations

A cheat sheat-style presentation with tactics and data to help you choose services when migrating workloads to AWS. This presentation helps guide you towards the simplest, most-effective, most-serverless outcome.

Presented at the AWS Summer School organised by AWS User Groups Dublin on July 28th 2020.

Eoin Shanaghy

July 28, 2020
Tweet

More Decks by Eoin Shanaghy

Other Decks in Programming

Transcript

  1. This is about… Compute and Event service decisions Choosing between:

    EC2, ECS, Fargate, Lambda, EKS, … Traits and trade-offs Selecting from SQS, SNS, Kinesis, EventBridge A migration story
  2. Decision making 1. Intuition is misleading 2. Use real-world experience

    of services to evaluate 3. Start with the simplest and go from there – try them all 4. The answer may differ for every workload 5. Be open – share all the pros and cons
  3. What is Compute? Compute workload before migration Compute workload after

    migration ≠ Frameworks Container Orchestration Self- managed storage and messaging Data access ceremony Business Rules Computation Views on data and reports REDUCE RETAIN
  4. Some assumptions We want to optimise for: § Simplicity §

    Fast adoption § Minimal care and maintenance We prefer: § Pay for what you use We don’t mind: § Cloud service “lock-in” § Constantly evolving
  5. Compute Factors Simplicity – 1 to 5 Scalability - how

    high? Scalability – how fast? Limits Unique features Workload suitability Cultural fit and comfort level not included as a factor
  6. EC2 Simplicity 1 – Lots of configuration, maintenance, OS, security,

    storage networking Scalability 1. Scales at instance level 2. Auto-scaling and alarms 3. No (theoretical) limit 4. Scaling speed depends on config (generally very fast!) Unique Features 1. Control at OS level 2. Instance type suited to exact needs (CPU/GPU/Storage/Network) Limits Soft vCPU limits Workload suitability Relatively rare level of optimisation required Running your own ElasticSearch cluster
  7. Lambda Simplicity 4 – On demand. Tooling and configuration required.

    Scalability 1. 1,000 concurrency – soft limit 2. Concurrency scaling burst to 3,000* then 500/minute 3. Depends on the trigger! Unique Features 1. Integrates with everything else 2. Destinations 3. DLQ, retries 4. Reserved concurrency 5. Asynchronous workloads 6. Stateless Limits 1. 128MB to 3008MB RAM (1792MB = 1 vCPU) 2. 250MB code 3. 512MB storage 4. 15 minute execution Workload suitability Almost anything. Split the workload to work on less data.
  8. Fargate Simplicity 3 – No instances but requires cluster, service,

    task definition to be configured Scalability 1. 100 concurrency – soft limit 2. Bursts to 10 containers in the first second! 3. 1 container every ~3 seconds after Unique Features 1. No servers 2. Supports Spot Limits 1. 4 vCPU 2. 30GB RAM 3. 20GB ephermeral storage Workload suitability Container-based workload requiring dynamic scaling * https://www.vladionescu.me/posts/scaling-containers-in-aws.html
  9. ECS Simplicity 2 – Instances required Scalability 1. 2000 containers

    per cluster (soft limit) 2. Speed depends on number and size of instances. Unique Features 1. Custom AMI Limits 1. 2000 containers per cluster (soft limit) Workload suitability Container workloads requiring EC2 instance control also... AWS Batch Simplicity 3 – Managed EC2 Unique Features 1. Job Scheduler 2. Priority levels Limits 20 dependencies Workload suitability Compute-intensive batch process, high-performance computing
  10. EKS Simplicity 1/2 – Kubernetes + EC2 or Fargate Scalability

    1. 2000 containers per cluster (soft limit) 2. Speed depends on number and size of instances. Scale-up on EC2 is fast Unique Features Kubernetes Limits 1. 2000 containers per cluster (soft limit) 2. Fixed cost for a cluster Workload suitability Kubernetes fans
  11. CodeBuild Simplicity 4.5 Scalability 1. Up to 20-60 concurrent ‘builds’

    Unique Features 1. Least amount of configuration for a container workload 2. Caching 3. Configurable language runtimes Limits 1. Max 145 GB memory, 72 vCPUs Workload suitability Ad-hoc, low-volume scripting tasks requiring a container
  12. Event Factors Simplicity – 1 to 5 Capacity Sending and

    Delivery Methods Limits (Message Size) Latency Delivery Guarantees
  13. SQS Simplicity 3.5 – some tuning of visibility timeouts Type

    Queue Features FIFO DLQs Delivery Guarantees 1. At-least-once (exactly once for FIFO) 2. Best-effort ordering for non-FIFO queues Limits 1. 256KB in a message 2. SendMessageBatch – maximum 10 messages 3. Very slow to scale with Lambda – 60 concurrent instances per minute 4. Otherwise “infinitely” scalable Suitability General use
  14. EventBridge Simplicity 5 – Zero configuration, no infrastructure to provision

    Type Pub/Sub Features 1. Pattern matching rules for targets 2. Third party SaaS providers Delivery Guarantees At-least-once Limits 1. Performance (latency) is variable – 500ms -> 100s! 2. 300 rules, 5 targets per rule Suitability General purpose pub/sub with zero configuration
  15. SNS Simplicity 3.5 Type Pub/Sub Features 1. Email and SMS

    integration 2. Topic and Subscription resources 3. DLQs Delivery Guarantees 1. Combine with SQS to guarantee delivery (at-least-once) Limits 1. Publish only one message at a time 2. 256KB Suitability 1. Lower-latency pub/sub 2. Many subscribers
  16. Kinesis Data Streams Simplicity 2 – Requires calculated provisioning Type

    Streaming Features 1. Large volume 2. Shards must be provisioned for required capacity 3. Integration with Kinesis Analytics, Firehose Delivery Guarantees 1. Consumers get guaranteed ordering per shard. 2. Events can be re-read – up to 7 days Limits 1. 1MB or 1000 records written per second per shard 2. Read 2MB per second per shard 3. Put 500 records at a time 4. Get up to 10,000 records at a time 5. Re-sharding limits apply Suitability 1. Anything with large volumes, streaming analytics, clickstreams 2. Small number of consumers Kinesis integration with Lambda
  17. Customer story o On-premise compute cluster migration o Critical daily

    processing of models. C++ and Python. o Nightly batch processing of financial models o Win came from scalability (parallelising workload) and efficient orchestration (Lambda and Redis scheduler)
  18. Batch Fargate Lambda Fargate Lambda + COMPUTE SQS EventBridge Kinesis

    SQS Kinesis + EVENTS 90 minutes 15 minutes Python with some C++ Monolithic data RDBMS Python Small data units S3 for Data Customer story: iterations
  19. Recap 1. Intuition is misleading 2. Use real-world experience of

    services to evaluate 3. Start with the simplest - try them all 4. Serverless (simplest) first means more agility 5. Divide workloads into small units. Run anywhere at scale.