Hands-on Virtual Lab - Querying Data in S3 With Presto

Slide 1

Slide 1 text

HANDS-ON VIRTUAL LAB Query Data in S3 Using Presto Jan 26 2023 1

Slide 2

Slide 2 text

WELCOME TO THE AHANA HANDS-ON VIRTUAL LAB Your Lab Guide: Rohan Pednekar

Slide 3

Slide 3 text

Over the next 90 minutes you will: ● Explore and understand Presto ● Get a walk-through of creating a new cluster using Ahana ● Query data sitting in S3 using Presto ● Convert data to different Formats Using Presto ● Run federated queries/joins across multiple sources combining data in S3 and RDS/MYSQL Objective for Today 3

Slide 4

Slide 4 text

Agenda 1) Understand the Technology (15-20 mins) a) What is Presto? b) What is Ahana Cloud? 2) Getting your hands dirty (60 mins) 3) Summary and Close Out (5 mins) 4

Slide 5

Slide 5 text

Understanding The Technology Presto

Slide 6

Slide 6 text

What is Presto? • Open source, distributed MPP SQL query engine • Query in Place • Federated Querying • ANSI SQL Compliant • Designed ground up for fast analytic queries against data of any size • Originally developed at Facebook • Proven on petabytes of data • SQL-On-Anything • Federated pluggable architecture to support many connector • Opensource, hosted on github • https://github.com/prestodb 6

Slide 7

Slide 7 text

Presto Overview 7 Presto Cluster Coordinator Worker Worker Worker Worker

Slide 8

Slide 8 text

Presto – It’s Exploding Presto is De-Facto SQL Engine https://db-engines.com/en/ranking_trend/relational+dbms Spark SQL vs. Presto “As one of our earliest members, Ahana has been strong supporters of the Presto Foundation since its launch in 2019.”

Slide 9

Slide 9 text

Presto Users

Slide 10

Slide 10 text

Presto Use Cases Data Lakehouse analytics Reporting & dashboarding Interactive ad hoc querying Transformation using SQL (ETL) Federated querying across data sources 10

Slide 11

Slide 11 text

Presto Architecture

Slide 12

Slide 12 text

What makes Presto different? Scalable Architecture Pluggable Connectors Performance 12

Slide 13

Slide 13 text

Scalable Architecture • Two roles - coordinator and worker • Easy scale up and scale down • Scale up to 1000 workers • Validated at web scale companies New Worker New Worker Worker Worker Worker Coordinator Data Source Presto Cluster 13

Slide 14

Slide 14 text

Scalable Architecture Parser/analyzer Worker Worker Worker Metadata API Planner Scheduler Data Location API Data Shufﬂe Data Shufﬂe Presto Connector Presto Coordinator BI Tools/Notebooks/Clients Presto CLI Looker JDBC Superset ... Tableau Jupyter Result Sets SQL Any Database, Data Stream, or Storage HDFS Object Stores (S3) MySQL ElasticSearch Kafka ... Presto Connector Presto Connector 14

Slide 15

Slide 15 text

Pluggable Presto Connectors 15

Slide 16

Slide 16 text

Presto Connector Data Model • Connector: Driver for a data source. • Example: HDFS, AWS S3, Cassandra, MySQL, SQL Server, Kafka • Catalog: Contains schemas from a data source speciﬁed by the connector • Schemas: Namespace to organize tables. • Tables: Set of unordered rows organized into columns with types. 16

Slide 17

Slide 17 text

Presto Hive Connector – Data File Types • Supported File Types • ORC • Parquet • Avro • RCFile • CSV • No data ingestion/duplication/movement needed • Query data in-place • SequenceFile • JSON • Text 17

Slide 18

Slide 18 text

Introducing Ahana Cloud Fully-Managed Presto Service

Slide 19

Slide 19 text

Ahana Cloud for Presto Managed Service • Enables data platform engineers in minutes vs. days • Fully integrated & pre-conﬁgured • No ETL, in-place analytics

Slide 20

Slide 20 text

Ahana Cloud for Presto Ahana Console (Control Plane) CLUSTER ORCHESTRATION CONSOLIDATED LOGGING SECURITY & ACCESS BILLING & SUPPORT In-VPC Presto Clusters (Compute Plane) AD HOC CLUSTER 1 TEST CLUSTER 2 PROD CLUSTER N Glue S3 RDS Elasticsearch Ahana Cloud Account Ahana console oversees and manages every Presto cluster Customer Cloud Account In-VPC orchestration of Presto clusters, where metadata, monitoring, and data sources reside

Slide 21

Slide 21 text

Ahana Cloud – Reference Architecture • Distributed SQL engine with proven scalability • Interactive ANSI SQL queries • Query data where it lives with Federated Connectors (no ETL) • High concurrency • Separation of compute and storage • Cost Management Features 21

Slide 22

Slide 22 text

Getting Your Hands Dirty

Slide 23

Slide 23 text

Wrapping Up Conclusion and Things to Try

Slide 24

Slide 24 text

1. Scale Up and Scale Down your cluster 2. Check your cluster’s PrestoDB Console when running SQL 3. Try queries with more/less workers; how does performance change? 4. Try partitioned datasets Things to try later... 24

Slide 25

Slide 25 text

Conclusion In this hands-on workshop you have: 1. About Presto and Ahana Cloud 2. How to effortlessly created and managed Presto clusters 3. Run fast SQL federated queries combining datasets from S3 and MySQL 4. Run presto queries via python 5. Create a simple BI dashboard using SuperSet 25

Slide 26

Slide 26 text

Next Steps for You... • Ahana Cloud is available on the AWS Marketplace • Sign-up for a 14-day free trial here: https://ahana.io/sign-up 26

Slide 27

Slide 27 text

How to get involved with Presto Join the Slack channel! prestodb.slack.com Write a blog for prestodb.io! prestodb.io/blog Join the virtual meetup group & present! meetup.com/prestodb Contribute to the project! github.com/prestodb 27

Slide 28

Slide 28 text

Questions? 28

Slide 29

Slide 29 text

Thank you! Stay Up-to-Date with Ahana Website: https://ahana.io/ Blogs: https://ahana.io/blog/ Twitter: @ahanaio 29