Slide 1

Slide 1 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gary Stafford Senior Solution Architect - AWS Email: [email protected] LinkedIn: GaryStafford Twitter: GaryStafford 0 to Presto in 30 minutes with AWS & Ahana Cloud Dipti Borkar Co-Founder & CPO - Ahana Email: [email protected] LinkedIn: DiptiBorkar Twitter: dborkar

Slide 2

Slide 2 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda • Intro to Amazon Elastic Kubernetes Service (Amazon EKS) • Ahana Cloud’s In-VPC Deployment Methodology • Demo time! • Intro to Presto & Ahana Cloud • Demo time round 2!

Slide 3

Slide 3 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Elastic Kubernetes Service (Amazon EKS)

Slide 4

Slide 4 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Runs upstream Kubernetes (K8s) • Certified Kubernetes conformant • Current EKS versions include 1.17 - 1.21 (default) • Minor versions are supported for approximately nine months after release • Easily upgrade versions through AWS Console or CLI Amazon Elastic Kubernetes Service (EKS)

Slide 5

Slide 5 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Elastic Kubernetes Service (EKS) EKS cluster consists of two Amazon Virtual Private Clouds (Amazon VPCs) • VPC managed by AWS that hosts Kubernetes control plane • VPC managed by customers that hosts Kubernetes data plane consisting of worker nodes (Amazon EC2)

Slide 6

Slide 6 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EKS Control Plane (Managed by AWS) VPC NLB Availability Zone 1 Availability Zone 2 Availability Zone 3 ELB etcd ASG API Servers ASG • Automatically manages availability and scalability of the Kubernetes control plane nodes • Responsible for starting and stopping containers • Scheduling containers on VMs • Storing cluster data • Automatically detecting and replacing unhealthy nodes

Slide 7

Slide 7 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ahana Cloud’s In-VPC Deployment Methodology

Slide 8

Slide 8 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • SaaS • Fully-managed • Loss of control of your data • DIY • You build it, you own it • Retain full control of your data • Ahana Cloud • Fully-managed • Retain full control of your data Ahana Cloud In-VPC Deployment Methodology

Slide 9

Slide 9 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Securely Delegating Access to Ahana

Slide 10

Slide 10 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Delegate secure access to Ahana for specific AWS resources • Ahana is required to provides a unique External ID when the role is assumed • External ID used for programmatic access through the AWS CLI • Access by Ahana can be logged, audited, changed, and removed at anytime • CloudFormation template can be inspected and approved in advance Securely Delegating Access to Ahana

Slide 11

Slide 11 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. IAM Role Trust Relationship { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::123456789012:root" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "aBc123DeF56" } } } ] }

Slide 12

Slide 12 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Securely Delegating Access to Ahana

Slide 13

Slide 13 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Securely Delegating Access to Ahana

Slide 14

Slide 14 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Securely Delegating Access to Ahana

Slide 15

Slide 15 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo

Slide 16

Slide 16 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Presto Cluster Coordinator • (1) r5.4xlarge*: 16 vCPU, 128 GiB Memory (8:1 ratio) • Presto Cluster Worker Nodes • (3) r5.2xlarge*: 8 vCPU, 64 GiB Memory (8:1 ratio) • Three Data Sources (Presto Federated Query) • AWS Glue Data Catalog / Amazon S3 Object Storage • Amazon RDS for PostgreSQL RDBMS • Amazon Redshift cloud data warehouse • Dataset: Kaggle Movie Ratings (27M rows) * Accelerated performance for workloads that process large data sets in memory Demo

Slide 17

Slide 17 text

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ahana Cloud for Presto For Open Data Lake Analytics

Slide 18

Slide 18 text

22 At A Glance • Ahana - The Company • Ahana Cloud is SaaS Managed Service to Query Data Lakes • Simplifies SQL analytics on cloud data lakes like S3 Team Ahana Cloud, Database & Presto Experts Steven Mih Cofounder CEO Dipti Borkar Cofounder Chief Products Officer Dave Simmen Cofounder Chief Technical Officer 2021 DBTA Best Data 100 2021 Stevie Best Startup 2021 Coolest Analytics 2021 Top 10 Hot Big Data 2020 Datanami Best Big Data Startup Awards 2020 Open Source 100

Slide 19

Slide 19 text

23 Data SQL Query Processing Data Warehouse Cloud Data Lake Open Data Lake Analytics SQL Query Processing 1-10 TB 1TB -> PB The Next Data Warehouse is SQL on Open Data Lakes Reporting & Dashboarding Reporting & Dashboarding

Slide 20

Slide 20 text

24 Challenges with SQL on Open Data Lakes Alternatives get very expensive for growing data volumes ▪ Cloud data warehouse costs grow much faster than compute engine costs ▪ Serverless options like AWS Athena charge /query and get expensive “Do it yourself” approach is complicated § Big data skills in platform teams are limited § Presto is complicated and operationally very time consuming Presto on AWS like AWS Athena has limited capabilities and doesn’t scale ▪ Limited concurrency of 20 per account ▪ No visibility into cluster logs, query logs, no flexibility / control on scale

Slide 21

Slide 21 text

Presto & the Presto Community

Slide 22

Slide 22 text

26 Open Source Presto Overview • Distributed SQL query engine • Created at • ANSI SQL on Databases, Data lakes • Designed to be interactive & access petabytes of data • Open source, hosted at https://github.com/prestodb

Slide 23

Slide 23 text

27 Presto Users

Slide 24

Slide 24 text

28 Ahana Cloud for Presto Ahana Console (Control Plane) CLUSTER ORCHESTRATION CONSOLIDATED LOGGING SECURITY & ACCESS BILLING & SUPPORT In-VPC Presto Clusters (Compute Plane) AD HOC CLUSTER 1 TEST CLUSTER 2 PROD CLUSTER N Glue S3 RDS Elasticsearch Ahana Cloud Account Ahana console oversees and manages every Presto cluster Customer Cloud Account In-VPC orchestration of Presto clusters, where metadata, monitoring, and data sources reside

Slide 25

Slide 25 text

29 Ahana Cloud Architecture

Slide 26

Slide 26 text

Ahana Cloud Walkthrough

Slide 27

Slide 27 text

Use Cases

Slide 28

Slide 28 text

32 Emerging use cases Use Cases Data Lakehouse analytics Reporting & dashboarding Interactive querying use cases Transformation using SQL (ETL) Federated access across data sources SQL Data Science Customer-facing app analytics

Slide 29

Slide 29 text

33 3x Better Price/Performance 3x faster SQL data transformation jobs at same price § Enable data platform engineers in minutes vs. days § Fully integrated & pre-configured § No ETL, in-place analytics VS

Slide 30

Slide 30 text

34 Case study: Securonix NextGen SIEM Cluster AWS S3 Data Lake Glue Metastore § Securonix is a Security information and event management software § They use Ahana for in-app SQL analytics on data from AWS S3 for threat hunting § They pull in billions of events per day that get stored in S3 § With Ahana Cloud, they saw 3x better price performance compared with Presto on AWS

Slide 31

Slide 31 text

35 How Ahana Cloud works? ~ 30 mins to create the compute plane https://app.ahana.cloud/signup Create Presto Clusters in your account

Slide 32

Slide 32 text

36 Ahana Cloud for Presto - Summary § Brings SQL on AWS S3 with an open data lake + USER § Presto compute brought to your data in your VPC in your account § Fully managed Presto cluster life cycle including idle-time management § Query AWS DBs - RDS/MySQL , RDS/Postgres, Elasticsearch, Redshift, Elasticsearch § Cloud-native and highly available running on Kubernetes § Bring your own § BI tool / Data Science Notebook § Metadata Catalog § Transaction Manager Easy to use 3x Price Performance Open & Flexible

Slide 33

Slide 33 text

12/17/20