Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hands-on Virtual Lab - Querying Data in S3 With Presto

Ahana
January 27, 2023

Hands-on Virtual Lab - Querying Data in S3 With Presto

Ahana

January 27, 2023
Tweet

More Decks by Ahana

Other Decks in Technology

Transcript

  1. HANDS-ON
    VIRTUAL LAB
    Query Data in S3
    Using Presto
    Jan 26 2023
    1

    View Slide

  2. WELCOME TO THE AHANA
    HANDS-ON VIRTUAL LAB
    Your Lab Guide:
    Rohan Pednekar

    View Slide

  3. Over the next 90 minutes you will:
    ● Explore and understand Presto
    ● Get a walk-through of creating a new cluster using Ahana
    ● Query data sitting in S3 using Presto
    ● Convert data to different Formats Using Presto
    ● Run federated queries/joins across multiple sources combining data in S3
    and RDS/MYSQL
    Objective for Today
    3

    View Slide

  4. Agenda
    1) Understand the Technology (15-20 mins)
    a) What is Presto?
    b) What is Ahana Cloud?
    2) Getting your hands dirty (60 mins)
    3) Summary and Close Out (5 mins)
    4

    View Slide

  5. Understanding The
    Technology
    Presto

    View Slide

  6. What is Presto?
    • Open source, distributed MPP SQL query engine
    • Query in Place
    • Federated Querying
    • ANSI SQL Compliant
    • Designed ground up for fast analytic queries against data of any size
    • Originally developed at Facebook
    • Proven on petabytes of data
    • SQL-On-Anything
    • Federated pluggable architecture to support many connector
    • Opensource, hosted on github
    • https://github.com/prestodb
    6

    View Slide

  7. Presto Overview
    7
    Presto
    Cluster
    Coordinator Worker Worker Worker Worker

    View Slide

  8. Presto – It’s Exploding
    Presto is De-Facto SQL Engine
    https://db-engines.com/en/ranking_trend/relational+dbms
    Spark SQL vs. Presto
    “As one of our earliest
    members, Ahana has been
    strong supporters of the
    Presto Foundation since its
    launch in 2019.”

    View Slide

  9. Presto
    Users

    View Slide

  10. Presto Use Cases
    Data
    Lakehouse
    analytics
    Reporting &
    dashboarding
    Interactive
    ad hoc
    querying
    Transformation
    using SQL (ETL)
    Federated
    querying
    across data
    sources
    10

    View Slide

  11. Presto Architecture

    View Slide

  12. What makes Presto different?
    Scalable
    Architecture
    Pluggable
    Connectors
    Performance
    12

    View Slide

  13. Scalable Architecture
    • Two roles - coordinator
    and worker
    • Easy scale up and
    scale down
    • Scale up to 1000 workers
    • Validated at web scale
    companies
    New Worker
    New Worker
    Worker
    Worker
    Worker
    Coordinator
    Data
    Source
    Presto Cluster
    13

    View Slide

  14. Scalable Architecture
    Parser/analyzer
    Worker
    Worker
    Worker
    Metadata API
    Planner Scheduler
    Data Location API
    Data Shuffle
    Data Shuffle
    Presto
    Connector
    Presto Coordinator
    BI Tools/Notebooks/Clients
    Presto CLI
    Looker
    JDBC
    Superset
    ...
    Tableau
    Jupyter
    Result
    Sets
    SQL
    Any
    Database,
    Data Stream,
    or Storage
    HDFS
    Object Stores (S3)
    MySQL
    ElasticSearch
    Kafka
    ...
    Presto
    Connector
    Presto
    Connector
    14

    View Slide

  15. Pluggable Presto Connectors
    15

    View Slide

  16. Presto Connector Data Model
    • Connector: Driver for a data source.
    • Example: HDFS, AWS S3, Cassandra, MySQL, SQL Server, Kafka
    • Catalog: Contains schemas from a data source
    specified by the connector
    • Schemas: Namespace to organize tables.
    • Tables: Set of unordered rows organized into columns
    with types.
    16

    View Slide

  17. Presto Hive Connector – Data File Types
    • Supported File Types
    • ORC
    • Parquet
    • Avro
    • RCFile
    • CSV
    • No data ingestion/duplication/movement needed
    • Query data in-place
    • SequenceFile
    • JSON
    • Text
    17

    View Slide

  18. Introducing Ahana Cloud
    Fully-Managed Presto Service

    View Slide

  19. Ahana Cloud for Presto Managed Service
    • Enables data platform engineers in minutes vs. days
    • Fully integrated & pre-configured
    • No ETL, in-place analytics

    View Slide

  20. Ahana Cloud for Presto
    Ahana Console (Control Plane)
    CLUSTER
    ORCHESTRATION
    CONSOLIDATED
    LOGGING
    SECURITY &
    ACCESS
    BILLING &
    SUPPORT
    In-VPC Presto Clusters (Compute Plane)
    AD HOC CLUSTER 1
    TEST CLUSTER 2
    PROD CLUSTER N
    Glue
    S3
    RDS
    Elasticsearch
    Ahana
    Cloud Account
    Ahana console
    oversees and
    manages every
    Presto cluster
    Customer
    Cloud Account
    In-VPC
    orchestration of
    Presto clusters,
    where metadata,
    monitoring, and
    data sources
    reside

    View Slide

  21. Ahana Cloud – Reference Architecture
    • Distributed SQL engine with
    proven scalability
    • Interactive ANSI SQL queries
    • Query data where it lives with
    Federated Connectors (no
    ETL)
    • High concurrency
    • Separation of compute and
    storage
    • Cost Management Features
    21

    View Slide

  22. Getting Your Hands Dirty

    View Slide

  23. Wrapping Up
    Conclusion and Things to Try

    View Slide

  24. 1. Scale Up and Scale Down your cluster
    2. Check your cluster’s PrestoDB Console when running SQL
    3. Try queries with more/less workers; how does performance change?
    4. Try partitioned datasets
    Things to try later...
    24

    View Slide

  25. Conclusion
    In this hands-on workshop you have:
    1. About Presto and Ahana Cloud
    2. How to effortlessly created and managed Presto
    clusters
    3. Run fast SQL federated queries combining datasets
    from S3 and MySQL
    4. Run presto queries via python
    5. Create a simple BI dashboard using SuperSet
    25

    View Slide

  26. Next Steps for You...
    • Ahana Cloud is available on the AWS Marketplace
    • Sign-up for a 14-day free trial here: https://ahana.io/sign-up
    26

    View Slide

  27. How to get involved with Presto
    Join the Slack channel!
    prestodb.slack.com
    Write a blog for prestodb.io!
    prestodb.io/blog
    Join the virtual meetup group &
    present!
    meetup.com/prestodb
    Contribute to the project!
    github.com/prestodb
    27

    View Slide

  28. Questions?
    28

    View Slide

  29. Thank you!
    Stay Up-to-Date with Ahana
    Website: https://ahana.io/
    Blogs: https://ahana.io/blog/
    Twitter: @ahanaio
    29

    View Slide