Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Presto on AWS: Exploring different Presto services

Ahana
September 16, 2021

Presto on AWS: Exploring different Presto services

Presto is a widely adopted distributed SQL engine for data lake analytics. Running Presto in the cloud comes with many benefits – performance, price, and scale are just a few. To run Presto on AWS, there are a few services you can use to do that: EMR Presto, Amazon Athena, and Ahana Cloud.

In this webinar, Asif will discuss these 3 approaches, the pros and cons of each, and how to determine which service is best for your use case. He’ll cover:

Quick overview of EMR Presto, Athena, and Ahana
Benefits and limitations of each
How to pick the best approach based on your needs
If you’re using or evaluating Presto today, register to learn more about running Presto in the cloud.

Ahana

September 16, 2021
Tweet

More Decks by Ahana

Other Decks in Technology

Transcript

  1. 2 Agenda • What is Presto? • AWS Presto Options

    • Managed Presto Offering (Ahana) • Demo • Picking the Right Approach
  2. 4 You’ve All Heard of Presto – It's Exploding Presto

    is De-Facto SQL Engine https://db-engines.com/en/ranking_trend/relational+dbms Spark SQL vs. Presto
  3. 5 So, What is Presto (PrestoDB)? • Open source, distributed

    MPP SQL query engine • Query in Place • Federated Querying • ANSI SQL Compliant • Designed ground up for fast analytic queries against data of any size • Originally developed at Facebook • SQL-On-Anything • Hive/HDFS, S3 • Parquet, ORC, Avro, JSON, CSV/Delimited etc. • Relational Database (MySQL, PostgreSQL, SQL Server etc.) • NoSQL (Cassandra, Redis, Phoenix/HBase etc.) • Many More
  4. 7 Community By The Numbers 100K+ Docker Hub Downloads (last

    6 months) 331 Contributors 12K+ GitHub Stars 1700+ Slack Members 1800+ Meetup Members Ahana Company Confidential
  5. 8 Data SQL Query Processing Data Warehouse Cloud Data Lake

    Open Source Data Warehouse SQL Query Processing 1-10 TB 1TB -> PB The Next Data Warehouse is Open Data Lake Analytics Reporting & Dashboarding Reporting & Dashboarding Ahana Company Confidential
  6. 10 Overview of AWS Presto Offerings in the Cloud DIY

    - Presto AMIs (EC2) ▪ Self Managed ▪ Extremely complex cluster setup and integration with data sources ▪ Devops / SRE cycles and expertise required Amazon EMR Presto ▪ Partially managed approach ▪ Config-file based integration required for everything ▪ No pre-packaged integrations like Superset / HMS / AWS S3 ▪ Devops / SRE cycles and expertise required Amazon Athena ▪ Primarily built for S3, very few other connectors ▪ Concurrent query limit of 20 per account* ▪ No visibility into cluster logs, query logs, no control ▪ Pay-per-Query can be unpredictable & expensive at $5.00 per TB scanned
  7. 12 What is Amazon EMR ? • Amazon’s managed Hadoop

    solution • Running various distributed processing frameworks - MapReduce, Spark, Presto • Great for running custom applications • Requires big data knowledge and expertise along with SRE to manage and operate the cluster
  8. 13 Benefits of EMR • Full-fledged Data Lake • More

    than just Presto • Running custom applications - AI/ML, Data Engineering, NoSQL/HBase, Spark • Integrates with Glue • More up-to-date than Athena for Presto
  9. 14 Disadvantages of EMR • Power Tool for Power Users

    • Complex to Manage and Operate • TCO generally high for simple workloads • Personnel • Operational Costs • Resources
  10. 16 What is Amazon Athena ? • Amazon’s serverless Presto

    based service • Query Amazon S3 using standard SQL • Two engine versions: • Athena Engine 1 – based on Presto version .172 (Nov 2016 GA) • Athena Engine 2 – based on Presto version .217 (Nov 2020 GA) • Availability of federated querying using Lambda (Engine 2 only) • Out-of-the-box integrated with AWS Glue Data Catalog
  11. 17 Benefits of Athena • Easy to get started, serverless

    • Out-of-the-box integration with Glue • Cost effective for low usage • Infrequent use • Small to medium sized data volumes • Not too many concurrent users • Quick and Easy tool for intermittent querying, data discovery, browsing
  12. 18 Limitations • Shared regional service • Frequent queuing •

    Competing for the same resources with other customers • Inconsistent performance • Various size, scale and feature limitations* • Cannot really tune it • Black Box • No ability to tune underlying resources • Lack of visibility into underlying errors • No Query plan or insights into what query is doing • Gets expensive very quickly for large data volumes • Pay $5 per TB scanned • Federated connector architecture is also serverless • Warm up times • Artificially need to batch queries to work around limitations • Significantly behind on latest Presto version (0.260) Encountered too many errors talking to a worker node. The node may have crashed or be under too much load. EXCEEDED_MEMORY_LIMIT: Query exceeded local memory limit INTERNAL_ERROR_QUERY_ENGINE Query exhausted resources at this scale factor Please post error message on our forum or contact customer support with Query Id: * Some limits are soft while others are hard https://docs.aws.amazon.com/athena/latest/ug/other-notable-limitations.html https://docs.aws.amazon.com/athena/latest/ug/performance-tuning.html Too Many Parallel Queries 30 min DML Query Timeout, 25 Concurrent Queries max, No Explain, Limited Connectors QueryExecutionStatus: QUEUED
  13. 20 Ahana Cloud for Presto - Managed Service for AWS

    Simplifies Open Data Lake Analytics • Enables data platform engineers in minutes vs. days • Fully integrated & pre-configured • No ETL, in-place analytics Cluster AWS S3 Data Lake Glue Metastore
  14. 21 Ahana Cloud for Presto - Managed Service for AWS

    Simplifies Open Data Lake Analytics • Enables data platform engineers in minutes vs. days • Fully integrated & pre-configured • No ETL, in-place analytics Cluster AWS S3 Data Lake Glue Metastore NextGen SIEM
  15. 23 Ahana Console (Control Plane) CLUSTER ORCHESTRATION CONSOLIDATED LOGGING SECURITY

    & ACCESS BILLING & SUPPORT In-VPC Presto Clusters (Compute Plane) AD HOC CLUSTER 1 TEST CLUSTER 2 PROD CLUSTER N Glue S3 RDS Elasticsearch Ahana Cloud Account Ahana console oversees and manages every Presto cluster Customer Cloud Account In-VPC orchestration of Presto clusters, where metadata, monitoring, and data sources reside Ahana Cloud for Presto
  16. 24 Benefits of Ahana • Zero to presto in 30

    mins - easy to get started, point and click • Reliability, availability and scalability running containers on K8s across AZs • Full control of your deployment - Balance performance, cost and convenience • Size clusters based on your needs (scale-up/out and scale-down/in) • Start/Stop/Delete clusters as needed • Dedicate or share clusters depending upon your business priorities • Consistent Performance at high concurrency and scale • Optional Data Lake caching for additional performance boosting • Data catalog agnostic • Bring your own, Ahana managed HMS, Out-of-the-box integration with Glue and Lakeformation • Visibility and Control - see what your queries are doing • Detailed logging and query performance statistics
  17. How Carbon uses PrestoDB in the Cloud with Ahana to

    Power its Real-time Customer Dashboards Jordan Hoggart, Data Engineer at Carbon
  18. 26 Upcoming Enhancements • Apache Ranger - centrally define, administer

    and manage security policies across platforms • RaptorX – Disaggregates the storage from compute for low latency to provide a unified, cheap, fast, and scalable solution to OLAP and interactive use cases Roadmap: • Disaggregated Coordinator (a.k.a. Fireball) – Scale out the coordinator horizontally and revamp the RPC stack • C++ Worker: native C++ worker for better performance
  19. 29 Making the right choice for your workload Your workload

    Athena Ahana EMR Ease of Operations Ease of Use Supportability / Visibility Query Federation Performance Consistency Cost Effectiveness - Small - Medium Workloads Cost Effectiveness - Large - XLarge Workloads
  20. 30 In Summary • Ahana Cloud is: • The easiest

    Cloud Managed Service for Presto • Highly scalable, cost-effective, managed presto service • Based on the open source PrestoDB project • Ahana works closely with the Presto community and contributes features and fixes back to the project • Ahana Cloud is available on the AWS Marketplace • Sign-up for a 14-day free trial here with free 1-hour on-boarding: https://ahana.io/sign-up