Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Presto on AWS: Exploring different Presto services

Ahana
September 16, 2021

Presto on AWS: Exploring different Presto services

Presto is a widely adopted distributed SQL engine for data lake analytics. Running Presto in the cloud comes with many benefits – performance, price, and scale are just a few. To run Presto on AWS, there are a few services you can use to do that: EMR Presto, Amazon Athena, and Ahana Cloud.

In this webinar, Asif will discuss these 3 approaches, the pros and cons of each, and how to determine which service is best for your use case. He’ll cover:

Quick overview of EMR Presto, Athena, and Ahana
Benefits and limitations of each
How to pick the best approach based on your needs
If you’re using or evaluating Presto today, register to learn more about running Presto in the cloud.

Ahana

September 16, 2021
Tweet

More Decks by Ahana

Other Decks in Technology

Transcript

  1. Exploring Different Presto
    Services on AWS
    Asif Kazi
    Principal Solutions Engineer
    Email: [email protected]

    View Slide

  2. 2
    Agenda
    • What is Presto?
    • AWS Presto Options
    • Managed Presto Offering (Ahana)
    • Demo
    • Picking the Right Approach

    View Slide

  3. What is Presto?

    View Slide

  4. 4
    You’ve All Heard of Presto – It's Exploding
    Presto is De-Facto SQL Engine
    https://db-engines.com/en/ranking_trend/relational+dbms
    Spark SQL vs. Presto

    View Slide

  5. 5
    So, What is Presto (PrestoDB)?
    • Open source, distributed MPP SQL
    query engine
    • Query in Place
    • Federated Querying
    • ANSI SQL Compliant
    • Designed ground up for fast analytic
    queries against data of any size
    • Originally developed at Facebook
    • SQL-On-Anything
    • Hive/HDFS, S3
    • Parquet, ORC, Avro, JSON,
    CSV/Delimited etc.
    • Relational Database (MySQL,
    PostgreSQL, SQL Server etc.)
    • NoSQL (Cassandra, Redis,
    Phoenix/HBase etc.)
    • Many More

    View Slide

  6. 6
    Presto
    Users

    View Slide

  7. 7
    Community By The Numbers
    100K+
    Docker Hub Downloads
    (last 6 months)
    331
    Contributors
    12K+
    GitHub Stars
    1700+
    Slack Members
    1800+
    Meetup Members
    Ahana Company Confidential

    View Slide

  8. 8
    Data
    SQL Query Processing
    Data Warehouse
    Cloud Data Lake
    Open Source Data Warehouse
    SQL Query Processing
    1-10 TB
    1TB -> PB
    The Next Data Warehouse is Open Data Lake Analytics
    Reporting & Dashboarding
    Reporting & Dashboarding
    Ahana Company Confidential

    View Slide

  9. AWS Presto Options

    View Slide

  10. 10
    Overview of AWS Presto Offerings in the Cloud
    DIY - Presto AMIs (EC2)
    ▪ Self Managed
    ▪ Extremely complex cluster
    setup and integration with
    data sources
    ▪ Devops / SRE cycles and
    expertise required
    Amazon EMR Presto
    ▪ Partially managed approach
    ▪ Config-file based integration
    required for everything
    ▪ No pre-packaged integrations
    like Superset / HMS / AWS S3
    ▪ Devops / SRE cycles and
    expertise required
    Amazon Athena
    ▪ Primarily built for S3, very
    few other connectors
    ▪ Concurrent query limit of 20
    per account*
    ▪ No visibility into cluster logs,
    query logs, no control
    ▪ Pay-per-Query can be
    unpredictable & expensive
    at $5.00 per TB scanned

    View Slide

  11. AWS Managed Presto
    Amazon EMR

    View Slide

  12. 12
    What is Amazon EMR ?
    • Amazon’s managed Hadoop solution
    • Running various distributed processing
    frameworks - MapReduce, Spark, Presto
    • Great for running custom applications
    • Requires big data knowledge and expertise
    along with SRE to manage and operate the
    cluster

    View Slide

  13. 13
    Benefits of EMR
    • Full-fledged Data Lake
    • More than just Presto
    • Running custom applications - AI/ML, Data
    Engineering, NoSQL/HBase, Spark
    • Integrates with Glue
    • More up-to-date than Athena for Presto

    View Slide

  14. 14
    Disadvantages of EMR
    • Power Tool for Power Users
    • Complex to Manage and Operate
    • TCO generally high for simple workloads
    • Personnel
    • Operational Costs
    • Resources

    View Slide

  15. Serverless Presto
    Amazon Athena

    View Slide

  16. 16
    What is Amazon Athena ?
    • Amazon’s serverless Presto based service
    • Query Amazon S3 using standard SQL
    • Two engine versions:
    • Athena Engine 1 – based on Presto version .172
    (Nov 2016 GA)
    • Athena Engine 2 – based on Presto version .217
    (Nov 2020 GA)
    • Availability of federated querying using Lambda
    (Engine 2 only)
    • Out-of-the-box integrated with AWS Glue Data
    Catalog

    View Slide

  17. 17
    Benefits of Athena
    • Easy to get started, serverless
    • Out-of-the-box integration with Glue
    • Cost effective for low usage
    • Infrequent use
    • Small to medium sized data volumes
    • Not too many concurrent users
    • Quick and Easy tool for intermittent
    querying, data discovery, browsing

    View Slide

  18. 18
    Limitations
    • Shared regional service
    • Frequent queuing
    • Competing for the same resources with other customers
    • Inconsistent performance
    • Various size, scale and feature limitations*
    • Cannot really tune it
    • Black Box
    • No ability to tune underlying resources
    • Lack of visibility into underlying errors
    • No Query plan or insights into what query is doing
    • Gets expensive very quickly for large data volumes
    • Pay $5 per TB scanned
    • Federated connector architecture is also serverless
    • Warm up times
    • Artificially need to batch queries to work around limitations
    • Significantly behind on latest Presto version (0.260)
    Encountered too many
    errors talking to a
    worker node. The
    node may have
    crashed or be under
    too much load.
    EXCEEDED_MEMORY_LIMIT:
    Query exceeded local
    memory limit
    INTERNAL_ERROR_QUERY_ENGINE
    Query exhausted
    resources at this
    scale factor
    Please post error
    message on our forum or
    contact customer
    support with Query Id:
    * Some limits are soft while others are hard
    https://docs.aws.amazon.com/athena/latest/ug/other-notable-limitations.html
    https://docs.aws.amazon.com/athena/latest/ug/performance-tuning.html
    Too Many Parallel Queries
    30 min DML Query
    Timeout, 25
    Concurrent Queries
    max, No Explain,
    Limited Connectors
    QueryExecutionStatus: QUEUED

    View Slide

  19. Ahana Managed Presto

    View Slide

  20. 20
    Ahana Cloud for Presto - Managed Service for AWS
    Simplifies Open Data Lake Analytics
    • Enables data platform engineers in
    minutes vs. days
    • Fully integrated & pre-configured
    • No ETL, in-place analytics
    Cluster
    AWS S3 Data Lake Glue Metastore

    View Slide

  21. 21
    Ahana Cloud for Presto - Managed Service for AWS
    Simplifies Open Data Lake Analytics
    • Enables data platform engineers in
    minutes vs. days
    • Fully integrated & pre-configured
    • No ETL, in-place analytics
    Cluster
    AWS S3 Data Lake Glue Metastore
    NextGen SIEM

    View Slide

  22. 22
    3x Better Price/Performance VS
    Ahana Company Confidential

    View Slide

  23. 23
    Ahana Console (Control Plane)
    CLUSTER
    ORCHESTRATION
    CONSOLIDATED
    LOGGING
    SECURITY & ACCESS BILLING & SUPPORT
    In-VPC Presto Clusters (Compute Plane)
    AD HOC CLUSTER 1
    TEST CLUSTER 2
    PROD CLUSTER N
    Glue
    S3
    RDS
    Elasticsearch
    Ahana Cloud Account
    Ahana console oversees
    and manages every
    Presto cluster
    Customer Cloud Account
    In-VPC orchestration of
    Presto clusters, where
    metadata, monitoring, and
    data sources reside
    Ahana Cloud for Presto

    View Slide

  24. 24
    Benefits of Ahana
    • Zero to presto in 30 mins - easy to get started, point and click
    • Reliability, availability and scalability running containers on K8s across AZs
    • Full control of your deployment - Balance performance, cost and convenience
    • Size clusters based on your needs (scale-up/out and scale-down/in)
    • Start/Stop/Delete clusters as needed
    • Dedicate or share clusters depending upon your business priorities
    • Consistent Performance at high concurrency and scale
    • Optional Data Lake caching for additional performance boosting
    • Data catalog agnostic
    • Bring your own, Ahana managed HMS, Out-of-the-box integration with Glue and Lakeformation
    • Visibility and Control - see what your queries are doing
    • Detailed logging and query performance statistics

    View Slide

  25. How Carbon uses PrestoDB in the Cloud with Ahana
    to Power its Real-time Customer Dashboards
    Jordan Hoggart, Data Engineer at Carbon

    View Slide

  26. 26
    Upcoming Enhancements
    • Apache Ranger - centrally define, administer and manage security
    policies across platforms
    • RaptorX – Disaggregates the storage from compute for low latency to
    provide a unified, cheap, fast, and scalable solution to OLAP and
    interactive use cases
    Roadmap:
    • Disaggregated Coordinator (a.k.a. Fireball) – Scale out the coordinator
    horizontally and revamp the RPC stack
    • C++ Worker: native C++ worker for better performance

    View Slide

  27. See Ahana in Action
    Demo Time

    View Slide

  28. Picking the Right Approach

    View Slide

  29. 29
    Making the right choice for your workload
    Your workload Athena Ahana EMR
    Ease of Operations
    Ease of Use
    Supportability / Visibility
    Query Federation
    Performance Consistency
    Cost Effectiveness - Small - Medium Workloads
    Cost Effectiveness - Large - XLarge Workloads

    View Slide

  30. 30
    In Summary
    • Ahana Cloud is:
    • The easiest Cloud Managed Service for Presto
    • Highly scalable, cost-effective, managed presto service
    • Based on the open source PrestoDB project
    • Ahana works closely with the Presto community and contributes
    features and fixes back to the project
    • Ahana Cloud is available on the AWS Marketplace
    • Sign-up for a 14-day free trial here with free 1-hour on-boarding:
    https://ahana.io/sign-up

    View Slide

  31. Thank you!
    Stay Up-to-Date with Ahana
    Website: https://ahana.io/
    Blogs: https://ahana.io/blog/
    Twitter: @ahanaio

    View Slide