Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Picking the right approach for Presto on AWS: Comparing Serverless vs. Managed Service

00101a1274d1f92977f4e442ef73be86?s=47 Ahana
March 30, 2021

Picking the right approach for Presto on AWS: Comparing Serverless vs. Managed Service

In this webinar we’ll discuss two approaches: a serverless approach (AWS Athena) and a managed service approach (Ahana Cloud), along with key considerations when deciding which is right for you.

00101a1274d1f92977f4e442ef73be86?s=128

Ahana

March 30, 2021
Tweet

Transcript

  1. Serverless vs. Managed Presto March 30th 2021 Asif Kazi Principal

    Solutions Engineer Email: asif@ahana.io
  2. 2 Agenda • What is Presto? • Serverless Presto (Athena)

    • Managed Presto (Ahana) • Demo • Picking the Right Approach
  3. 3 Overview: Serverless vs. Managed Service for Presto Managed Service

    • Managed software clusters • All point and click, no manual changes • Performance: 10X faster, consistently • Scale: unlimited scale out of concurrent queries • Costs: Linear, instance-based Serverless • No installed software • Simple, just submit queries • Performance: non-deterministic • Scale: limits on concurrent queries • Costs: $5/TB scanned can unpredictable and costly
  4. 4 What is Presto (PrestoDB)? • Open source, distributed MPP

    SQL query engine • Query in Place • Federated Querying • ANSI SQL Compliant • Designed ground up for fast analytic queries against data of any size • Originally developed at Facebook • SQL-On-Anything • Hive/HDFS, S3 • Parquet, ORC, Avro, JSON, CSV/Delimited etc. • Relational Database (MySQL, PostgreSQL, SQL Server etc.) • NoSQL (Cassandra, Redis, Phoenix/HBase etc.) • Apache Kafka • and many more through its pluggable connector architecture
  5. 5 Presto: One of the Fastest Growing Open Source Projects

    in Data Analytics Business Needs Data-driven decision making Businesses need more data to iterate over Technology Trends Disaggregation of Storage and Compute The rise of data lakes
  6. 6 Common Presto Use Cases Interactive ad hoc querying With

    Presto connectors and their in-place execution, platform teams can quickly provide access to datasets that analysts have interest in. Along with that access comes the power of Presto to run queries in seconds instead of hours of time. Interactive exploration of any dataset, residing anywhere. Reporting & dashboarding Query data across multiple sources to build reports and dashboards for internal/external self-service analytics/BI. Transformation using SQL (ETL) Aggregate terabytes of data across multiple data sources and run efficient ETL queries . Data lake analytics Query data directly on a data lake without transformation. Any type of data in your data lake, including both structured and unstructured data. Federated querying across multiple data sources Query data across many different data sources including databases, data lakes, and lake houses.
  7. Serverless Presto Amazon Athena

  8. 8 What is Amazon Athena ? • Amazon’s serverless Presto

    based service • Query Amazon S3 using standard SQL • Two engine versions: • Athena Engine 1 – based on Presto version .172 (Nov 2016 GA) • Athena Engine 2 – based on Presto version .217 (Nov 2020 GA) • Availability of federated querying using Lambda (Engine 2 only) • Out-of-the-box integrated with AWS Glue Data Catalog
  9. 9 Benefits of Athena • Easy to get started, serverless

    • Out-of-the-box integration with Glue • Cost effective for low usage • Infrequent use • Small to medium sized data volumes • Not too many concurrent users • Quick and Easy tool for intermittent querying, data discovery, browsing
  10. 10 Limitations • Shared regional service • Frequent queuing •

    Competing for the same resources with other customers • Inconsistent performance • Various size, scale and feature limitations* • Black Box • No ability to tune underlying resources • Lack of visibility into underlying errors • No Query plan or insights into what query is doing • Gets expensive very quickly for large data volumes • Pay $5 per TB scanned • Federated connector architecture is also serverless • Warm up times • Artificially need to batch queries to work around limitations • Significantly behind on latest Presto version (0.249) Encountered too many errors talking to a worker node. The node may have crashed or be under too much load. EXCEEDED_MEMORY_LIMIT: Query exceeded local memory limit INTERNAL_ERROR_QUERY_ENGINE Query exhausted resources at this scale factor Please post error message on our forum or contact customer support with Query Id: * Some limits are soft while others are hard https://docs.aws.amazon.com/athena/latest/ug/other-notable-limitations.html https://docs.aws.amazon.com/athena/latest/ug/performance-tuning.html Too Many Parallel Queries 30 min DML Query Timeout, 25 Concurrent Queries max, No Explain, Limited Connectors QueryExecutionStatus: QUEUED
  11. Managed Presto Ahana Cloud on AWS

  12. 12 Ahana At A Glance • First PrestoDB based company

    • Named Best Big Data Startup of 2020 by datanami • Named CRN Top 10 Big Data Startup of 2020 • Investment from Google Ventures • Team of experts in cloud, database, and Presto • Premier member of
  13. 13 Ahana Console (Control Plane) CLUSTER ORCHESTRATION CONSOLIDATED LOGGING SECURITY

    & ACCESS BILLING & SUPPORT In-VPC Presto Clusters (Compute Plane) AD HOC CLUSTER 1 TEST CLUSTER 2 PROD CLUSTER N Glue S3 RDS Elasticsearch Ahana Cloud Account Ahana console oversees and manages every Presto cluster Customer Cloud Account In-VPC orchestration of Presto clusters, where metadata, monitoring, and data sources reside Ahana Cloud for Presto
  14. 14 Benefits of Ahana • Zero to presto in 30

    mins - easy to get started, point and click • Reliability, availability and scalability running containers on K8s across AZs • Full control of your deployment • Balance performance, cost and convenience • Size clusters based on your needs (scale-up/out and scale-down/in) • Start/Stop/Delete clusters as needed • Dedicate or share clusters depending upon your business priorities • Consistent Performance at high concurrency and scale • Cost effective • Data catalog agnostic • Bring your own, Ahana managed HMS, Out-of-the-box integration with Glue and Lakeformation • Visibility and Control - see what your queries are doing • Detailed logging and query performance statistics • Optional Data Lake caching for additional performance boosting
  15. 15 Upto 85% latency reduction for concurrent workloads Data Lake

    caching Mixed Workload 6 - Combination of 4 queries - Q2 x 10 times , Q3 x 7 times, Q1 x12 times - 650 billion rows - Average time of 10 executions
  16. 16 Up to 60% cost reduction per query Mixed Workload

    - Ahana cost per instance hour - Athena cost per data scanned Cost Effectiveness
  17. How Carbon uses PrestoDB in the Cloud with Ahana to

    Power its Real-time Customer Dashboards Jordan Hoggart, Data Engineer at Carbon
  18. Getting Better than Athena Performance • Managed to get a

    good approximation for 5 queries • Categorisation and Demographic breakdown were tougher 18 Cluster Name Worker config $/hr* Athena ? ? Ahana 10 x c5.2xlarge $3.40 * based on EC2 on-demand hourly price
  19. 19 Latest Presto Features and Upcoming Performance Enhancements • Apache

    Ranger - centrally define, administer and manage security policies across platforms • Project Aria - PrestoDB can now push down entire expressions to the data source for some file formats like ORC • RaptorX – Disaggregates the storage from compute for low latency to provide a unified, cheap, fast, and scalable solution to OLAP and interactive use cases Roadmap: • Disaggregated Coordinator (a.k.a. Fireball) – Scale out the coordinator horizontally and revamp the RPC stack • C++ Worker: native C++ worker for better performance
  20. See Ahana in Action Demo Time

  21. Picking the Right Approach

  22. 22 Making the right choice for your workload Your workload

    Athena Ahana Low-Mid volume, infrequent usage Medium-High volume, frequent usage High Concurrency is required Large number of disparate federated sources Long Running Queries Differences in workload Priorities Consistency in Performance is Important Cost Effectiveness is important
  23. 23 In Summary • Ahana Cloud is: • The easiest

    Cloud Managed Service for Presto • Highly scalable, cost-effective, managed presto service • Based on the open source PrestoDB project • Ahana works closely with the Presto community and contributes features and fixes back to the project • Ahana frequently validates and incorporates the open-source improvements into the managed platform • Ahana Cloud is available on the AWS Marketplace • Sign-up for a 14-day free trial here with free 1-hour on-boarding: https://ahana.io/sign-up
  24. Thank you! Stay Up-to-Date with Ahana Website: https://ahana.io/ Blogs: https://ahana.io/blog/

    Twitter: @ahanaio
  25. 25 How to get involved with Presto Join the Slack

    channel! prestodb.slack.com Write a blog for prestodb.io! prestodb.io/blog Join the virtual meetup group & present! meetup.com/prestodb Contribute to the project! github.com/prestodb