Low High High Request rate Low High Cost / GB Low High Latency Low High Data Volume Low In-Memory SQL NoSQL Search Object Storage Archive Storage Graph
CPU, Memory, Disk ü Connection Pooling ü Schema changes: alter table ü Schema validation Reliability ü High availability: Master-Slave fail-over ü Data loss Performance efficiency ü Slow SQL queries caused by poor database indexing ü Number of writes >>> Number of reads Cost optimization ü Management of RDMS ü Compute(Instance), Storage cost Security ü Encryption data at rest and in transition
Model • Data Scheme • Access Patterns ü SQL Supported ü Put/Get(key, value) ü Range Query ü Join Query • Serverless or Managed Service • Current skill set
supported by big data frameworks (Spark, Hive, Presto, etc.) • Decouple storage and compute • No need to run compute clusters for storage (unlike HDFS) • Can run transient Amazon EMR clusters with Amazon EC2 Spot Instances • Multiple & heterogeneous analysis clusters and services can use the same data • Designed for 99.999999999% durability • No need to pay for data replication within a region • Secure: SSL, client/server-side encryption at rest • Low cost
Full Languages API/SQL SQL SQL SQL Data Store S3 (Glue), S3/HDFS (Spark) S3/HDFS S3 Local Use case Transformation SQL Queries for S3/HDFS Serverless SQL Queries for S3 Fully Featured SQL Database Performance AWS Glue Amazon Athena Amazon Redshift
function Elasticsearch Service Kibana EMR real-time dashboard ElastiCache Kinesis Data Analytics Lambda function QuickSight Amazon RDS Kinesis Data Streams DynamoDB 1 2 3
QuickSight Kinesis Data Streams Lambda function Elasticsearch Service Kibana S3 ElastiCache API Gateway Event (time-based) Lambda function Personalize Lambda function EMR
→ Store → Process → Store → Analyze → Answers • Use the right tool for the job - Data structure, latency, throughput, access patterns • Leverage managed and serverless services - Scalable/elastic, available, reliable, secure, no/low admin • Use log-centric design patterns - Immutable logs (data lake), materialized views • Be cost-conscious - Big data ≠ Big cost • Working backwards - Design from consume to collect
Practices (ANT201-R1) - AWS re:Invent 2018 https://www.slideshare.net/AmazonWebServices/big-data-analytics-architectural-patterns-and-best- practices-ant201r1-aws-reinvent-2018 - Everything You Need to Know About Big Data: From Architectural Principles to Best Practices https://www.slideshare.net/AmazonWebServices/everything-you-need-to-know-about-big-data- from-architectural-principles-to-best-practices - Big Data Analytics Options on AWS https://d1.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf - AWS Big Data Blog https://aws.amazon.com/ko/blogs/big-data - AWS Well-Architected Labs https://wellarchitectedlabs.com