Slide 1

Slide 1 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gary Stafford, Solution Architect - AWS Email: [email protected] LinkedIn: GaryStafford Twitter: GaryStafford Presto on K8s with Ahana & AWS Dipti Borkar, Co-Founder & CPO - Ahana Email: [email protected] LinkedIn: DiptiBorkar Twitter: dborkar

Slide 2

Slide 2 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Elastic Kubernetes Service (Amazon EKS)

Slide 3

Slide 3 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Runs upstream Kubernetes (K8s) • Certified Kubernetes conformant • Current EKS versions include 1.15, 1.16, 1.17 (default), 1.18 • Each minor version is supported for approximately nine months after it is first released Amazon Elastic Kubernetes Service (EKS)

Slide 4

Slide 4 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EKS Architecture • Fully-managed Kubernetes • EKS cluster consists of two VPCs • VPC managed by AWS that hosts Kubernetes control plane • VPC managed by customers that hosts Kubernetes worker nodes (EC2s)

Slide 5

Slide 5 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EKS Control Plane VPC NLB Availability Zone 1 Availability Zone 2 Availability Zone 3 ELB etcd ASG API Servers ASG • Automatically manages availability and scalability of the Kubernetes control plane nodes • Responsible for starting and stopping containers • Scheduling containers on VMs • Storing cluster data • Automatically detecting and replacing unhealthy nodes

Slide 6

Slide 6 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EKS Cluster

Slide 7

Slide 7 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Provisioning EKS Cluster

Slide 8

Slide 8 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ahana Cloud `In-VPC` Deployment Methodology

Slide 9

Slide 9 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • SaaS • Fully-managed (`as-a-service`) • Loss of control of your data • DIY • You build it, you own it • Retain full control of your data • Ahana Cloud • Fully-managed • Retain full control of your data Ahana Cloud `In-VPC` Deployment

Slide 10

Slide 10 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Virtual Private Cloud (Amazon VPC)

Slide 11

Slide 11 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Ahana deploys Presto to Amazon EKS cluster within a VPC • Logically isolated section of the AWS Cloud where you can launch AWS resources • Private IPv4 address range (CIDR block) • Within an AWS Account and within an AWS Region • Divided into one or more public or private subnets • Control access using security groups and network access control lists (NACLs) Amazon Virtual Private Cloud (VPC)

Slide 12

Slide 12 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Delegating Access to your AWS Resources

Slide 13

Slide 13 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Gives a third-party access to your AWS resources (delegate access) • AWS Security Token Service (STS) provides temporary, limited-privilege credentials • Third-party required to provides an External ID when the role is assumed • External ID used for programmatic access through the AWS CLI (Ahana) • Limit session duration up to 12 hours (default 1 hour) Assumed IAM Role

Slide 14

Slide 14 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. IAM Role Trust Relationship { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::123456789012:root" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "aBc123DeF56" } } } ] }

Slide 15

Slide 15 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Delegating Access to your AWS Resources

Slide 16

Slide 16 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo

Slide 17

Slide 17 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Presto Cluster Coordinator • (1) r5.4xlarge*: 16 vCPU, 128 GiB Memory (8:1 ratio) • Presto Cluster Worker Nodes • (3) r5.2xlarge*: 8 vCPU, 64 GiB Memory (8:1 ratio) • Three Data Sources (Presto Federated Query) • AWS Glue Data Catalog / Amazon S3 Object Storage • Amazon RDS for PostgreSQL RDBMS • Amazon Redshift cloud data warehouse • Dataset: Kaggle Movie Ratings (27M rows) * Accelerated performance for workloads that process large data sets in memory Demo

Slide 18

Slide 18 text

Ahana Cloud Walkthrough

Slide 19

Slide 19 text

19 Ahana Cloud Summary Gives you Presto as a Cloud Data Warehouse in an open, disaggregated stack Managed Presto in-VPC in user account Built-in metadata catalog, data lake, Apache Superset - Start, stop, restart, resize, terminate – end-to-end cluster life cycle management Amazon sources: S3, RDS/MySQL, RDS/Postgres, Elasticsearch, Redshift Highly available & scalable running in containers on Kubernetes across AZs Flexible analytics stack with BYO - metadata, data source, BI tool or notebook Ahana Cloud for Presto Point & Query Cloud Service

Slide 20

Slide 20 text

20 Ahana Cloud for Presto Ahana Console (Control Plane) CLUSTER ORCHESTRATION CONSOLIDATED LOGGING SECURITY & ACCESS BILLING & SUPPORT In-VPC Presto Clusters (Compute Plane) AD HOC CLUSTER 1 TEST CLUSTER 2 PROD CLUSTER N Glue S3 RDS Elasticsearch Ahana Cloud Account Ahana console oversees and manages every Presto cluster Customer Cloud Account In-VPC orchestration of Presto clusters, where metadata, monitoring, and data sources reside

Slide 21

Slide 21 text

21 COMPUTE PLANE Coordinator 1 Worker Worker Worker Metastores Scale Up/Down SumUp’s Redshift, MySQL, Postgres, MongoDB (SSL / HTTPS) Coordinator 2 Worker Worker Worker Worker Worker USER DATA PLANE Cluster: ReportingProd Cluster: DataEnggJobs CREATE 4 NODE CLUSTER Metastore ADD DATA SOURCE & AUTO-RESTART OPERATION: OPERATION: CREATE 2 NODE CLUSTER RE-SIZE STOP ($0 WHEN STOPPED) START CLUSTER /W SAVED CONFIG & DATA SOURCES ATTACHED Coordinator 2 Worker Worker Worker Worker Worker AWS EMR does not allow for ▪ Cluster click-button restart, stop & start, auto-restarts for catalog changes ▪ Cluster & data source configs and metastores are not preserved ▪ Re-started clusters are not auto upgraded to latest Presto version Ahana Cloud – Seamless Cluster Operations

Slide 22

Slide 22 text

22 Ahana Cloud – Cost estimates In-VPC Presto Clusters (Compute Plane) AD HOC CLUSTER 1 TEST CLUSTER 2 PROD CLUSTER N Glue S3 RDS Elasticsearch AWS Usage Costs for Ahana Compute Plane Fixed Costs: < $5 / day in US-East-1 AWS EKS Cluster - 10 cents / hour AWS ELB – 2 cents / hour AWS RDS PostgreSQL – 2 cents / hour Instance for Apache Superset – 4 cents/ hour AWS EBS – 10 cents / GB / month Flexible Costs for Presto Clusters: Standard EC2 pricing based on instance type Example: r5.xlarge – 24 cents / hour Ahana Cloud Costs Pay As You Go (PAYGO) on your AWS Bill Priced Hourly based on instance type Example: r5.xlarge – 15 cents / hour FREE in Early Access

Slide 23

Slide 23 text

23 Ahana Cloud – Reference Architecture • Distributed SQL engine with proven scalability • Interactive ANSI SQL queries • Query data where it lives with Federated Connectors (no ETL) • High concurrency • Separation of compute and storage

Slide 24

Slide 24 text

24 Per Cluster Access Control to Data Sources Highly flexible, granular access controls • Each cluster can be created with different AWS IAM Roles • Each IAM Role can be configured to have access to different S3 buckets allowing clusters to be isolated Cluster 1 Cluster 2 Cluster 3 Metastore AWS IAM Role A COMPUTE PLANE Metastore Bucket A Bucket B AWS IAM Role B Reference: https://aws.amazon.com/blogs/security/how-to- restrict-amazon-s3-bucket-access-to-a-specific-iam-role/ AWS IAM Role A Bucket A Bucket B AWS IAM Role B