Upgrade to Pro — share decks privately, control downloads, hide ads and more …

0 to Presto in 30 minutes with AWS and Ahana Cloud

Ahana
December 10, 2020

0 to Presto in 30 minutes with AWS and Ahana Cloud

We will share how Ahana Cloud, the first managed service for PrestoDB, simplifies Presto deployment and management on AWS running in-VPC on Kubernetes.

Ahana

December 10, 2020
Tweet

More Decks by Ahana

Other Decks in Technology

Transcript

  1. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Gary Stafford, Solution Architect - AWS Email: [email protected] LinkedIn: GaryStafford Twitter: GaryStafford Presto on K8s with Ahana & AWS Dipti Borkar, Co-Founder & CPO - Ahana Email: [email protected] LinkedIn: DiptiBorkar Twitter: dborkar
  2. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Amazon Elastic Kubernetes Service (Amazon EKS)
  3. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. • Runs upstream Kubernetes (K8s) • Certified Kubernetes conformant • Current EKS versions include 1.15, 1.16, 1.17 (default), 1.18 • Each minor version is supported for approximately nine months after it is first released Amazon Elastic Kubernetes Service (EKS)
  4. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Amazon EKS Architecture • Fully-managed Kubernetes • EKS cluster consists of two VPCs • VPC managed by AWS that hosts Kubernetes control plane • VPC managed by customers that hosts Kubernetes worker nodes (EC2s)
  5. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Amazon EKS Control Plane VPC NLB Availability Zone 1 Availability Zone 2 Availability Zone 3 ELB etcd ASG API Servers ASG • Automatically manages availability and scalability of the Kubernetes control plane nodes • Responsible for starting and stopping containers • Scheduling containers on VMs • Storing cluster data • Automatically detecting and replacing unhealthy nodes
  6. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Amazon EKS Cluster
  7. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Provisioning EKS Cluster
  8. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Ahana Cloud `In-VPC` Deployment Methodology
  9. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. • SaaS • Fully-managed (`as-a-service`) • Loss of control of your data • DIY • You build it, you own it • Retain full control of your data • Ahana Cloud • Fully-managed • Retain full control of your data Ahana Cloud `In-VPC` Deployment
  10. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Amazon Virtual Private Cloud (Amazon VPC)
  11. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. • Ahana deploys Presto to Amazon EKS cluster within a VPC • Logically isolated section of the AWS Cloud where you can launch AWS resources • Private IPv4 address range (CIDR block) • Within an AWS Account and within an AWS Region • Divided into one or more public or private subnets • Control access using security groups and network access control lists (NACLs) Amazon Virtual Private Cloud (VPC)
  12. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Delegating Access to your AWS Resources
  13. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. • Gives a third-party access to your AWS resources (delegate access) • AWS Security Token Service (STS) provides temporary, limited-privilege credentials • Third-party required to provides an External ID when the role is assumed • External ID used for programmatic access through the AWS CLI (Ahana) • Limit session duration up to 12 hours (default 1 hour) Assumed IAM Role
  14. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. IAM Role Trust Relationship { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::123456789012:root" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "aBc123DeF56" } } } ] }
  15. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Delegating Access to your AWS Resources
  16. © 2018, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. • Presto Cluster Coordinator • (1) r5.4xlarge*: 16 vCPU, 128 GiB Memory (8:1 ratio) • Presto Cluster Worker Nodes • (3) r5.2xlarge*: 8 vCPU, 64 GiB Memory (8:1 ratio) • Three Data Sources (Presto Federated Query) • AWS Glue Data Catalog / Amazon S3 Object Storage • Amazon RDS for PostgreSQL RDBMS • Amazon Redshift cloud data warehouse • Dataset: Kaggle Movie Ratings (27M rows) * Accelerated performance for workloads that process large data sets in memory Demo
  17. 19 Ahana Cloud Summary Gives you Presto as a Cloud

    Data Warehouse in an open, disaggregated stack Managed Presto in-VPC in user account Built-in metadata catalog, data lake, Apache Superset - Start, stop, restart, resize, terminate – end-to-end cluster life cycle management Amazon sources: S3, RDS/MySQL, RDS/Postgres, Elasticsearch, Redshift Highly available & scalable running in containers on Kubernetes across AZs Flexible analytics stack with BYO - metadata, data source, BI tool or notebook Ahana Cloud for Presto Point & Query Cloud Service
  18. 20 Ahana Cloud for Presto Ahana Console (Control Plane) CLUSTER

    ORCHESTRATION CONSOLIDATED LOGGING SECURITY & ACCESS BILLING & SUPPORT In-VPC Presto Clusters (Compute Plane) AD HOC CLUSTER 1 TEST CLUSTER 2 PROD CLUSTER N Glue S3 RDS Elasticsearch Ahana Cloud Account Ahana console oversees and manages every Presto cluster Customer Cloud Account In-VPC orchestration of Presto clusters, where metadata, monitoring, and data sources reside
  19. 21 COMPUTE PLANE Coordinator 1 Worker Worker Worker Metastores Scale

    Up/Down SumUp’s Redshift, MySQL, Postgres, MongoDB (SSL / HTTPS) Coordinator 2 Worker Worker Worker Worker Worker USER DATA PLANE Cluster: ReportingProd Cluster: DataEnggJobs CREATE 4 NODE CLUSTER Metastore ADD DATA SOURCE & AUTO-RESTART OPERATION: OPERATION: CREATE 2 NODE CLUSTER RE-SIZE STOP ($0 WHEN STOPPED) START CLUSTER /W SAVED CONFIG & DATA SOURCES ATTACHED Coordinator 2 Worker Worker Worker Worker Worker AWS EMR does not allow for ▪ Cluster click-button restart, stop & start, auto-restarts for catalog changes ▪ Cluster & data source configs and metastores are not preserved ▪ Re-started clusters are not auto upgraded to latest Presto version Ahana Cloud – Seamless Cluster Operations
  20. 22 Ahana Cloud – Cost estimates In-VPC Presto Clusters (Compute

    Plane) AD HOC CLUSTER 1 TEST CLUSTER 2 PROD CLUSTER N Glue S3 RDS Elasticsearch AWS Usage Costs for Ahana Compute Plane Fixed Costs: < $5 / day in US-East-1 AWS EKS Cluster - 10 cents / hour AWS ELB – 2 cents / hour AWS RDS PostgreSQL – 2 cents / hour Instance for Apache Superset – 4 cents/ hour AWS EBS – 10 cents / GB / month Flexible Costs for Presto Clusters: Standard EC2 pricing based on instance type Example: r5.xlarge – 24 cents / hour Ahana Cloud Costs Pay As You Go (PAYGO) on your AWS Bill Priced Hourly based on instance type Example: r5.xlarge – 15 cents / hour FREE in Early Access
  21. 23 Ahana Cloud – Reference Architecture • Distributed SQL engine

    with proven scalability • Interactive ANSI SQL queries • Query data where it lives with Federated Connectors (no ETL) • High concurrency • Separation of compute and storage
  22. 24 Per Cluster Access Control to Data Sources Highly flexible,

    granular access controls • Each cluster can be created with different AWS IAM Roles • Each IAM Role can be configured to have access to different S3 buckets allowing clusters to be isolated Cluster 1 Cluster 2 Cluster 3 Metastore AWS IAM Role A COMPUTE PLANE Metastore Bucket A Bucket B AWS IAM Role B Reference: https://aws.amazon.com/blogs/security/how-to- restrict-amazon-s3-bucket-access-to-a-specific-iam-role/ AWS IAM Role A Bucket A Bucket B AWS IAM Role B