Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hands-on Virtual Lab - Querying Data in S3 With Presto

Ahana
January 27, 2023

Hands-on Virtual Lab - Querying Data in S3 With Presto

Ahana

January 27, 2023
Tweet

More Decks by Ahana

Other Decks in Technology

Transcript

  1. Over the next 90 minutes you will: • Explore and

    understand Presto • Get a walk-through of creating a new cluster using Ahana • Query data sitting in S3 using Presto • Convert data to different Formats Using Presto • Run federated queries/joins across multiple sources combining data in S3 and RDS/MYSQL Objective for Today 3
  2. Agenda 1) Understand the Technology (15-20 mins) a) What is

    Presto? b) What is Ahana Cloud? 2) Getting your hands dirty (60 mins) 3) Summary and Close Out (5 mins) 4
  3. What is Presto? • Open source, distributed MPP SQL query

    engine • Query in Place • Federated Querying • ANSI SQL Compliant • Designed ground up for fast analytic queries against data of any size • Originally developed at Facebook • Proven on petabytes of data • SQL-On-Anything • Federated pluggable architecture to support many connector • Opensource, hosted on github • https://github.com/prestodb 6
  4. Presto – It’s Exploding Presto is De-Facto SQL Engine https://db-engines.com/en/ranking_trend/relational+dbms

    Spark SQL vs. Presto “As one of our earliest members, Ahana has been strong supporters of the Presto Foundation since its launch in 2019.”
  5. Presto Use Cases Data Lakehouse analytics Reporting & dashboarding Interactive

    ad hoc querying Transformation using SQL (ETL) Federated querying across data sources 10
  6. Scalable Architecture • Two roles - coordinator and worker •

    Easy scale up and scale down • Scale up to 1000 workers • Validated at web scale companies New Worker New Worker Worker Worker Worker Coordinator Data Source Presto Cluster 13
  7. Scalable Architecture Parser/analyzer Worker Worker Worker Metadata API Planner Scheduler

    Data Location API Data Shuffle Data Shuffle Presto Connector Presto Coordinator BI Tools/Notebooks/Clients Presto CLI Looker JDBC Superset ... Tableau Jupyter Result Sets SQL Any Database, Data Stream, or Storage HDFS Object Stores (S3) MySQL ElasticSearch Kafka ... Presto Connector Presto Connector 14
  8. Presto Connector Data Model • Connector: Driver for a data

    source. • Example: HDFS, AWS S3, Cassandra, MySQL, SQL Server, Kafka • Catalog: Contains schemas from a data source specified by the connector • Schemas: Namespace to organize tables. • Tables: Set of unordered rows organized into columns with types. 16
  9. Presto Hive Connector – Data File Types • Supported File

    Types • ORC • Parquet • Avro • RCFile • CSV • No data ingestion/duplication/movement needed • Query data in-place • SequenceFile • JSON • Text 17
  10. Ahana Cloud for Presto Managed Service • Enables data platform

    engineers in minutes vs. days • Fully integrated & pre-configured • No ETL, in-place analytics
  11. Ahana Cloud for Presto Ahana Console (Control Plane) CLUSTER ORCHESTRATION

    CONSOLIDATED LOGGING SECURITY & ACCESS BILLING & SUPPORT In-VPC Presto Clusters (Compute Plane) AD HOC CLUSTER 1 TEST CLUSTER 2 PROD CLUSTER N Glue S3 RDS Elasticsearch Ahana Cloud Account Ahana console oversees and manages every Presto cluster Customer Cloud Account In-VPC orchestration of Presto clusters, where metadata, monitoring, and data sources reside
  12. Ahana Cloud – Reference Architecture • Distributed SQL engine with

    proven scalability • Interactive ANSI SQL queries • Query data where it lives with Federated Connectors (no ETL) • High concurrency • Separation of compute and storage • Cost Management Features 21
  13. 1. Scale Up and Scale Down your cluster 2. Check

    your cluster’s PrestoDB Console when running SQL 3. Try queries with more/less workers; how does performance change? 4. Try partitioned datasets Things to try later... 24
  14. Conclusion In this hands-on workshop you have: 1. About Presto

    and Ahana Cloud 2. How to effortlessly created and managed Presto clusters 3. Run fast SQL federated queries combining datasets from S3 and MySQL 4. Run presto queries via python 5. Create a simple BI dashboard using SuperSet 25
  15. Next Steps for You... • Ahana Cloud is available on

    the AWS Marketplace • Sign-up for a 14-day free trial here: https://ahana.io/sign-up 26
  16. How to get involved with Presto Join the Slack channel!

    prestodb.slack.com Write a blog for prestodb.io! prestodb.io/blog Join the virtual meetup group & present! meetup.com/prestodb Contribute to the project! github.com/prestodb 27