Over the next 90 minutes you will: ● Explore and understand Presto ● Get a walk-through of creating a new cluster using Ahana ● Query data sitting in S3 using Presto ● Convert data to different Formats Using Presto ● Get a walk-through of enabling Apache Ranger for Presto ● Run SQL query to explore Ranger policies ● Run federated queries/joins across multiple sources combining data in S3 and RDS/MYSQL Objective for Today 3
Agenda 1) Understand the Technology (40 mins) a) What is Presto? b) What is Ahana Cloud? c) Apache Ranger Integration 2) Getting your hands dirty ( 40 mins) 3) Summary and Close Out (10 mins) 4
What is Presto? • Open source, distributed MPP SQL query engine • Query in Place • Federated Querying • ANSI SQL Compliant • Designed ground up for fast analytic queries against data of any size • Originally developed at Facebook • Proven on petabytes of data • SQL-On-Anything • Federated pluggable architecture to support many connector • Opensource, hosted under Linux Foundation • https://github.com/prestodb 6
Presto – It’s Exploding Presto is De-Facto SQL Engine https://db-engines.com/en/ranking_trend/relational+dbms Spark SQL vs. Presto “As one of our earliest members, Ahana has been strong supporters of the Presto Foundation since its launch in 2019.”
Presto Use Cases Data Lakehouse analytics Reporting & dashboarding Interactive ad hoc querying Transformation using SQL (ETL) Federated querying across data sources 10
Scalable Architecture • Two roles - coordinator and worker • Easy scale up and scale down • Scale up to 1000 workers • Validated at web scale companies New Worker New Worker Worker Worker Worker Coordinator Data Source Presto Cluster 13 Coordinator
Scalable Architecture Parser/analyzer Worker Worker Worker Metadata API Planner Scheduler Data Location API Data Shuffle Data Shuffle Presto Connector Presto Coordinator BI Tools/Notebooks/Clients Presto CLI Looker JDBC Superset ... Tableau Jupyter Result Sets SQL Any Database, Data Stream, or Storage HDFS Object Stores (S3) MySQL ElasticSearch Kafka ... Presto Connector Presto Connector 14
Presto Connector Data Model • Connector: Driver for a data source. • Example: HDFS, AWS S3, Cassandra, MySQL, SQL Server, Kafka • Catalog: Contains schemas from a data source specified by the connector • Schemas: Namespace to organize tables. • Tables: Set of unordered rows organized into columns with types. 16
Ahana Cloud for Presto Managed Service • Enables data platform engineers in minutes vs. days • Fully integrated & pre-configured • No ETL, in-place analytics
Ahana Cloud – Reference Architecture • Distributed SQL engine with proven scalability • Interactive ANSI SQL queries • Query data where it lives with Federated Connectors (no ETL) • High concurrency • Separation of compute and storage • Cost Management Features 21
Why Apache Ranger ● An Open-Source Authorization Solution ● Cloud Agnostic ● Fine-grained access control ● Audit support ● Secured with SSL ● Easy to configure with Ahana Cloud 23
Ranger Plugin Architecture • Extended to reuse Hive Ranger Plugin • Centralized, fine-grained access control with column-level, row-level policies across all clusters • Supports centralized auditing of user access • Secured with SSL Support • Simplified integration with Ahana Cloud • Enable, monitor and manage comprehensive data security across data lake for the user-triggered Hive or Glue Catalog Queries 24
Conclusion In this hands-on workshop you have: 1. About Presto and Ahana Cloud 2. How to effortlessly created and managed Presto clusters 3. Run fast SQL federated queries combining datasets from S3 and MySQL 4. Run presto queries with Apache Ranger enabled 28
1. Scale Up and Scale Down your cluster 2. Try queries with more/less workers; how does performance change? 3. Fine grained access control with Lake Formation 4. Table formats for transactions, schema evolution, table versioning with Apache Hudi, Apache Iceberg, Delta Lake Connector Things to try later... 29
Next Steps for You... • Ahana Cloud is available on the AWS Marketplace • Sign-up for a 14-day free trial here: https://ahana.io/sign-up • Community Edition - Free forever service 30
How to get involved with Presto Join the Slack channel! prestodb.slack.com Write a blog for prestodb.io! prestodb.io/blog Join the virtual meetup group & present! meetup.com/prestodb Contribute to the project! github.com/prestodb 31