Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SQL on the Data Lake, Using open source Presto ...

Ahana
August 28, 2021

SQL on the Data Lake, Using open source Presto to unlock the value of your data lake

In this webinar, Dipti will discuss why open source Presto has quickly become the de-facto query engine for the data lake. Presto enables ad hoc data discovery where you can use SQL to run queries whenever you want, wherever your data resides. With Presto, you can unlock the value of your data lake.

Ahana

August 28, 2021
Tweet

More Decks by Ahana

Other Decks in Technology

Transcript

  1. Part 2: SQL on the Data Lake: Using open source

    Presto to unlock the value of your data lake
  2. 2 Today’s Speaker Dipti is a Cofounder, CPO & Chief

    Evangelist of Ahana with over 15 years experience in distributed data and database technology including relational, NoSQL and federated systems. She is also the Presto Foundation Outreach Chairperson. Prior to Ahana, Dipti held VP roles at Alluxio, Kinetica and Couchbase. At Alluxio, she was Vice President of Products and at Couchbase she held several leadership positions there including VP, Product Marketing, Head of Global Technical Sales and Head of Product Management. Earlier in her career Dipti managed development teams at IBM DB2 Distributed where she started her career as a database software engineer. Dipti holds a M.S. in Computer Science from UC San Diego, and an MBA from the Haas School of Business at UC Berkeley. © 2021 Enterprise Management Associates, Inc. 2 Dipti Borkar Cofounder, Chief Product Officer and Chief Evangelist Ahana
  3. 3 Data SQL Query Processing Data Warehouse Cloud Data Lake

    SQL Query Processing 1-10 TB 1TB -> PB The Next Data Warehouse is Open Data Lake Analytics Reporting & Dashboarding Data Science In-data lake transformation Open Data Lake Analytics Reporting & Dashboarding
  4. 4 Data Warehouse Operational Data Stores Third Party Data Machine

    Learning Semi- | unstructured Data Virtualization / Federated Access Streaming & IoT Data SQL Query Processing SQL Query Processing = Insights Massive Data Lake Analytics Market Opportunity ETL ELT Data Engg Storage Compute 1-10 TB Query & Processing Storage Compute SQL Structured Workloads 1TB -> PB Data Lake Reporting Dashboards Visualizations Notebooks Custom Apps
  5. 5 At A Glance • Distributed SQL query engine to

    get insights from data lakes and databases • Created at • Lightning-fast for querying on petabytes of data • Open source https://github.com/prestodb • Hosted under 250K+ Docker Hub Downloads (last 6 months) 331 Contributors 12K+ GitHub Stars 1800+ Slack Members 1800+ Meetup Members
  6. 6 Presto aka prestoDB : The de facto engine for

    data platform teams Business Needs Data-driven decision making Businesses need more data to iterate over Technology Trends Disaggregation of Storage and Compute The rise of data lakes
  7. 7 What is Presto? • Distributed SQL query engine •

    ANSI SQL on Databases, Data lakes • Designed to be interactive • Designed to be federated • Access to petabytes of data • Opensource, hosted in the Linux Foundation under Presto Foundation https://github.com/prestodb
  8. 8 Presto Use Cases Data Lakehouse analytics Reporting & dashboarding

    Interactive querying use cases Transformation using SQL (ETL) Federated querying across data sources Data Science
  9. 14 At A Glance • Ahana - The Company •

    Ahana Cloud is SaaS Managed Service to Query Data Lakes • Simplifies SQL analytics on cloud data lakes like S3 Team Ahana includes experts in Cloud, Database & Presto Steven Mih Cofounder CEO Dipti Borkar Cofounder Chief Products Officer Dave Simmen Cofounder Chief Technical Officer
  10. 15

  11. 16 How Ahana Cloud works? ~ 30 mins to create

    the compute plane https://app.ahana.cloud/signup Create Presto Clusters in your account
  12. 17 Ahana Cloud – Reference Architecture • Distributed SQL engine

    with proven scalability • Interactive ANSI SQL queries • Query data where it lives with Federated Connectors (no ETL) • High concurrency • Separation of compute and storage