SQL on the Data Lake, Using open source Presto to unlock the value of your data lake

Part 2: SQL on the Data Lake: Using open source
Presto to unlock the value of your data lake

2 Today’s Speaker Dipti is a Cofounder, CPO & Chief
Evangelist of Ahana with over 15 years experience in distributed data and database technology including relational, NoSQL and federated systems. She is also the Presto Foundation Outreach Chairperson. Prior to Ahana, Dipti held VP roles at Alluxio, Kinetica and Couchbase. At Alluxio, she was Vice President of Products and at Couchbase she held several leadership positions there including VP, Product Marketing, Head of Global Technical Sales and Head of Product Management. Earlier in her career Dipti managed development teams at IBM DB2 Distributed where she started her career as a database software engineer. Dipti holds a M.S. in Computer Science from UC San Diego, and an MBA from the Haas School of Business at UC Berkeley. © 2021 Enterprise Management Associates, Inc. 2 Dipti Borkar Cofounder, Chief Product Officer and Chief Evangelist Ahana

3 Data SQL Query Processing Data Warehouse Cloud Data Lake
SQL Query Processing 1-10 TB 1TB -> PB The Next Data Warehouse is Open Data Lake Analytics Reporting & Dashboarding Data Science In-data lake transformation Open Data Lake Analytics Reporting & Dashboarding

4 Data Warehouse Operational Data Stores Third Party Data Machine
Learning Semi- | unstructured Data Virtualization / Federated Access Streaming & IoT Data SQL Query Processing SQL Query Processing = Insights Massive Data Lake Analytics Market Opportunity ETL ELT Data Engg Storage Compute 1-10 TB Query & Processing Storage Compute SQL Structured Workloads 1TB -> PB Data Lake Reporting Dashboards Visualizations Notebooks Custom Apps

5 At A Glance • Distributed SQL query engine to
get insights from data lakes and databases • Created at • Lightning-fast for querying on petabytes of data • Open source https://github.com/prestodb • Hosted under 250K+ Docker Hub Downloads (last 6 months) 331 Contributors 12K+ GitHub Stars 1800+ Slack Members 1800+ Meetup Members

6 Presto aka prestoDB : The de facto engine for
data platform teams Business Needs Data-driven decision making Businesses need more data to iterate over Technology Trends Disaggregation of Storage and Compute The rise of data lakes

7 What is Presto? • Distributed SQL query engine •
ANSI SQL on Databases, Data lakes • Designed to be interactive • Designed to be federated • Access to petabytes of data • Opensource, hosted in the Linux Foundation under Presto Foundation https://github.com/prestodb

8 Presto Use Cases Data Lakehouse analytics Reporting & dashboarding
Interactive querying use cases Transformation using SQL (ETL) Federated querying across data sources Data Science

9 Interactive – Reporting and Dashboarding

10 Interactive – Data Science

11 Interactive – Federated

12 Batch – Transformation, cleansing etc.

13 Data LakeHouse

14 At A Glance • Ahana - The Company •
Ahana Cloud is SaaS Managed Service to Query Data Lakes • Simplifies SQL analytics on cloud data lakes like S3 Team Ahana includes experts in Cloud, Database & Presto Steven Mih Cofounder CEO Dipti Borkar Cofounder Chief Products Officer Dave Simmen Cofounder Chief Technical Officer

16 How Ahana Cloud works? ~ 30 mins to create
the compute plane https://app.ahana.cloud/signup Create Presto Clusters in your account

17 Ahana Cloud – Reference Architecture • Distributed SQL engine
with proven scalability • Interactive ANSI SQL queries • Query data where it lives with Federated Connectors (no ETL) • High concurrency • Separation of compute and storage

Questions

SQL on the Data Lake, Using open source Presto ...

SQL on the Data Lake, Using open source Presto to unlock the value of your data lake

Ahana

More Decks by Ahana

Other Decks in Technology

Featured

Transcript

Part 2: SQL on the Data Lake: Using open source

2 Today’s Speaker Dipti is a Cofounder, CPO & Chief

3 Data SQL Query Processing Data Warehouse Cloud Data Lake

4 Data Warehouse Operational Data Stores Third Party Data Machine

5 At A Glance • Distributed SQL query engine to

6 Presto aka prestoDB : The de facto engine for

7 What is Presto? • Distributed SQL query engine •

8 Presto Use Cases Data Lakehouse analytics Reporting & dashboarding

9 Interactive – Reporting and Dashboarding

10 Interactive – Data Science

11 Interactive – Federated

12 Batch – Transformation, cleansing etc.

13 Data LakeHouse

14 At A Glance • Ahana - The Company •

15

16 How Ahana Cloud works? ~ 30 mins to create

17 Ahana Cloud – Reference Architecture • Distributed SQL engine

Questions