Upgrade to Pro — share decks privately, control downloads, hide ads and more …

0 to Presto in 30 minutes with AWS and Ahana Cloud

Ahana
October 20, 2021

0 to Presto in 30 minutes with AWS and Ahana Cloud

Ahana

October 20, 2021
Tweet

More Decks by Ahana

Other Decks in Technology

Transcript

  1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Gary Stafford
    Senior Solution Architect - AWS
    Email: [email protected]
    LinkedIn: GaryStafford
    Twitter: GaryStafford
    0 to Presto in 30 minutes with AWS & Ahana Cloud
    Dipti Borkar
    Co-Founder & CPO - Ahana
    Email: [email protected]
    LinkedIn: DiptiBorkar
    Twitter: dborkar

    View Slide

  2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Agenda
    • Intro to Amazon Elastic Kubernetes Service (Amazon EKS)
    • Ahana Cloud’s In-VPC Deployment Methodology
    • Demo time!
    • Intro to Presto & Ahana Cloud
    • Demo time round 2!

    View Slide

  3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon Elastic Kubernetes Service
    (Amazon EKS)

    View Slide

  4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    • Runs upstream Kubernetes (K8s)
    • Certified Kubernetes conformant
    • Current EKS versions include
    1.17 - 1.21 (default)
    • Minor versions are supported for
    approximately nine months after release
    • Easily upgrade versions through AWS
    Console or CLI
    Amazon Elastic Kubernetes Service (EKS)

    View Slide

  5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon Elastic Kubernetes Service (EKS)
    EKS cluster consists of two Amazon Virtual Private
    Clouds (Amazon VPCs)
    • VPC managed by AWS that hosts Kubernetes control
    plane
    • VPC managed by customers
    that hosts Kubernetes data
    plane consisting of worker
    nodes (Amazon EC2)

    View Slide

  6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon EKS Control Plane (Managed by AWS)
    VPC
    NLB
    Availability Zone 1 Availability Zone 2 Availability Zone 3
    ELB
    etcd ASG
    API Servers ASG
    • Automatically manages availability and
    scalability of the Kubernetes control
    plane nodes
    • Responsible for starting and stopping
    containers
    • Scheduling containers on VMs
    • Storing cluster data
    • Automatically detecting and replacing
    unhealthy nodes

    View Slide

  7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Ahana Cloud’s In-VPC
    Deployment Methodology

    View Slide

  8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    • SaaS
    • Fully-managed
    • Loss of control of your data
    • DIY
    • You build it, you own it
    • Retain full control of your data
    • Ahana Cloud
    • Fully-managed
    • Retain full control of your data
    Ahana Cloud In-VPC Deployment Methodology

    View Slide

  9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Securely Delegating Access to Ahana

    View Slide

  10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    • Delegate secure access to Ahana
    for specific AWS resources
    • Ahana is required to provides a unique
    External ID when the role is assumed
    • External ID used for programmatic access
    through the AWS CLI
    • Access by Ahana can be logged, audited,
    changed, and removed at anytime
    • CloudFormation template can be inspected
    and approved in advance
    Securely Delegating Access to Ahana

    View Slide

  11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    IAM Role Trust Relationship
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Effect": "Allow",
    "Principal": {
    "AWS": "arn:aws:iam::123456789012:root"
    },
    "Action": "sts:AssumeRole",
    "Condition": {
    "StringEquals": {
    "sts:ExternalId": "aBc123DeF56"
    }
    }
    }
    ]
    }

    View Slide

  12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Securely Delegating Access to Ahana

    View Slide

  13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Securely Delegating Access to Ahana

    View Slide

  14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Securely Delegating Access to Ahana

    View Slide

  15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Demo

    View Slide

  16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    • Presto Cluster Coordinator
    • (1) r5.4xlarge*: 16 vCPU, 128 GiB Memory (8:1 ratio)
    • Presto Cluster Worker Nodes
    • (3) r5.2xlarge*: 8 vCPU, 64 GiB Memory (8:1 ratio)
    • Three Data Sources (Presto Federated Query)
    • AWS Glue Data Catalog / Amazon S3 Object Storage
    • Amazon RDS for PostgreSQL RDBMS
    • Amazon Redshift cloud data warehouse
    • Dataset: Kaggle Movie Ratings (27M rows)
    * Accelerated performance for workloads that process large data sets in memory
    Demo

    View Slide

  17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Ahana Cloud for Presto For Open
    Data Lake Analytics

    View Slide

  18. 22
    At A Glance
    • Ahana - The Company
    • Ahana Cloud is SaaS Managed Service to Query Data
    Lakes
    • Simplifies SQL analytics on cloud data lakes like
    S3
    Team Ahana
    Cloud, Database & Presto Experts
    Steven Mih
    Cofounder
    CEO
    Dipti Borkar
    Cofounder
    Chief Products Officer
    Dave Simmen
    Cofounder
    Chief Technical Officer
    2021 DBTA
    Best Data
    100
    2021 Stevie
    Best Startup
    2021 Coolest
    Analytics
    2021 Top 10
    Hot Big Data
    2020 Datanami
    Best Big Data Startup
    Awards
    2020 Open
    Source 100

    View Slide

  19. 23
    Data
    SQL Query Processing
    Data Warehouse
    Cloud Data Lake
    Open Data Lake Analytics
    SQL Query Processing
    1-10 TB
    1TB -> PB
    The Next Data Warehouse is SQL on Open Data Lakes
    Reporting & Dashboarding
    Reporting & Dashboarding

    View Slide

  20. 24
    Challenges with SQL on Open Data Lakes
    Alternatives get very expensive
    for growing data volumes
    ▪ Cloud data warehouse
    costs grow much faster
    than compute engine costs
    ▪ Serverless options like
    AWS Athena charge /query
    and get expensive
    “Do it yourself” approach
    is complicated
    § Big data skills in platform
    teams are limited
    § Presto is complicated and
    operationally very time
    consuming
    Presto on AWS like AWS
    Athena has limited capabilities
    and doesn’t scale
    ▪ Limited concurrency of 20
    per account
    ▪ No visibility into cluster
    logs, query logs, no
    flexibility / control on scale

    View Slide

  21. Presto & the Presto Community

    View Slide

  22. 26
    Open Source Presto Overview
    • Distributed SQL query engine
    • Created at
    • ANSI SQL on Databases, Data lakes
    • Designed to be interactive & access
    petabytes of data
    • Open source, hosted at
    https://github.com/prestodb

    View Slide

  23. 27
    Presto
    Users

    View Slide

  24. 28
    Ahana Cloud for Presto
    Ahana Console (Control Plane)
    CLUSTER
    ORCHESTRATION
    CONSOLIDATED
    LOGGING
    SECURITY &
    ACCESS
    BILLING &
    SUPPORT
    In-VPC Presto Clusters (Compute Plane)
    AD HOC CLUSTER 1
    TEST CLUSTER 2
    PROD CLUSTER N
    Glue
    S3
    RDS
    Elasticsearch
    Ahana
    Cloud Account
    Ahana console
    oversees and
    manages every
    Presto cluster
    Customer
    Cloud Account
    In-VPC orchestration of
    Presto clusters, where
    metadata, monitoring,
    and data sources
    reside

    View Slide

  25. 29
    Ahana Cloud
    Architecture

    View Slide

  26. Ahana Cloud Walkthrough

    View Slide

  27. Use Cases

    View Slide

  28. 32
    Emerging
    use cases
    Use Cases
    Data Lakehouse
    analytics
    Reporting &
    dashboarding
    Interactive
    querying
    use cases
    Transformation
    using SQL (ETL)
    Federated access
    across data sources
    SQL
    Data Science
    Customer-facing
    app analytics

    View Slide

  29. 33
    3x Better Price/Performance
    3x faster SQL data
    transformation jobs
    at same price
    § Enable data platform engineers in minutes vs. days
    § Fully integrated & pre-configured
    § No ETL, in-place analytics
    VS

    View Slide

  30. 34
    Case study: Securonix
    NextGen SIEM
    Cluster
    AWS S3 Data
    Lake
    Glue
    Metastore
    § Securonix is a Security information and event
    management software
    § They use Ahana for in-app SQL analytics on data
    from AWS S3 for threat hunting
    § They pull in billions of events per day that get
    stored in S3
    § With Ahana Cloud, they saw 3x better price
    performance compared with Presto on AWS

    View Slide

  31. 35
    How Ahana Cloud works?
    ~ 30 mins to create the compute plane
    https://app.ahana.cloud/signup Create Presto Clusters in your account

    View Slide

  32. 36
    Ahana Cloud for Presto - Summary
    § Brings SQL on AWS S3 with an open data
    lake
    +
    USER
    § Presto compute brought to your data in your
    VPC in your account
    § Fully managed Presto cluster life cycle
    including idle-time management
    § Query AWS DBs - RDS/MySQL , RDS/Postgres,
    Elasticsearch, Redshift, Elasticsearch
    § Cloud-native and highly available running on
    Kubernetes
    § Bring your own
    § BI tool / Data Science Notebook
    § Metadata Catalog
    § Transaction Manager
    Easy to use
    3x Price Performance
    Open & Flexible

    View Slide

  33. 12/17/20

    View Slide