Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Presto 101: An introduction to Presto, the open source distributed SQL query engine

Ahana
September 02, 2021

Presto 101: An introduction to Presto, the open source distributed SQL query engine

In this session, Dipti will introduce the Presto technology and share why it’s becoming so popular – in fact, companies like Facebook, Uber, Twitter, Alibaba, and much more use Presto for interactive ad hoc queries, reporting & dashboarding data lake analytics, and much more. We’ll also show a quick demo on getting Presto running in AWS.

Ahana

September 02, 2021
Tweet

More Decks by Ahana

Other Decks in Technology

Transcript

  1. Dipti Borkar
    Co-Founder, CPO, Chief Evangelist | Ahana
    Chair | Presto Foundation | Community
    Presto 101 : An
    introduction
    to Presto, an open
    source distributed
    SQL engine

    View Slide

  2. 2
    Agenda
    • What is Presto?
    • History of federation
    • Introduction to Presto
    • What makes Presto different?
    • Scalable architecture
    • Flexible Connectors
    • Performance
    • The life of a query

    View Slide

  3. 3
    Technology Cycles Rhyme: Data Federation
    Cloud & Big Data
    Composite Software (founded 2001)
    Garlic Paper by Laura Haas (2002) DB2 Federated
    Google File System Paper (2003)
    MapReduce paper (2006) Spark Paper (2010)
    Too many Data Sources, No one uber schema
    80’s 90’s 2000’s 2010’s 2020’s
    FDBMS Challenges RDBMS
    FDBMS Paper by McCleod / Heimbigner (1985)
    FDBMS Paper by Sheth / Larson (1990)
    OLTP to DLW Wins
    Data Warehouse becomes the source of truth
    Star schema becomes sacred
    SQL Federation Makes Comeback
    Dremel Paper (2010) Drill paper (2012)
    SQL ++ paper (2014) Couchbase SQL++ engine (2018)
    Presto paper (2019), PartiQL (2019)
    New Cloud DW
    w/ Data Lakes
    Based on SQL
    Self Service Platforms which
    enable Self-Service Analytics

    View Slide

  4. 4
    Presto aka prestoDB : The de facto engine
    for data platform teams
    Business Needs
    Data-driven decision making
    Businesses need more data to
    iterate over
    Technology Trends
    Disaggregation of Storage and
    Compute
    The rise of data lakes

    View Slide

  5. 5
    What is Presto?
    • Distributed SQL query engine
    • ANSI SQL on Databases, Data lakes
    • Designed to be interactive
    • Access to petabytes of data
    • Opensource, hosted on github
    • https://github.com/prestodb

    View Slide

  6. 6
    Presto Foundation: Community Driven

    View Slide

  7. 7
    Presto Overview
    Presto
    Cluster
    Coordinator Worker Worker Worker Worker

    View Slide

  8. 8
    Common Questions
    • Is Presto a database?
    • How is it related to Hadoop?
    • How is it different from a
    data warehouse?

    View Slide

  9. 9
    Sample Presto deployment stack & use cases
    • Ad hoc
    • BI tools
    • Dashboard
    • A/B testing
    • ETL/scheduled job
    • Online service
    Client: Zeppelin, Tableau, JDBC, DBVis
    Front-end
    Spark SQL
    Hive
    Presto
    Analytics SQL Tools
    Hive Metadata Store
    Dataset Discovery
    GCS
    AWS S3
    HDFS
    Data Storage

    View Slide

  10. 10
    What makes Presto different?
    Scalable
    Architecture
    Pluggable
    Connectors
    Performance

    View Slide

  11. 11
    Scalable Architecture
    • Two roles - coordinator
    and worker
    • Easy scale up and
    scale down
    • Scale up to 1000 workers
    • Validated at web scaled
    companies
    New Worker
    New Worker
    Worker
    Worker
    Worker
    Coordinator
    Data
    Source
    Presto Cluster

    View Slide

  12. 12
    Scalable Architecture
    Parser/analyzer
    Worker
    Worker
    Worker
    Metadata API
    Planner Scheduler
    Data Location API
    Data Shuffle
    Data Shuffle
    Presto
    Connector
    Presto Coordinator
    BI Tools/Notebooks/Clients
    Presto CLI
    Looker
    JDBC
    Superset
    ...
    Tableau
    Jupyter
    Result
    Sets
    SQL
    Any
    Database,
    Data Stream,
    or Storage
    HDFS
    Object Stores (S3)
    MySQL
    ElasticSearch
    Kafka
    ...
    Presto
    Connector
    Presto
    Connector

    View Slide

  13. 13
    Pluggable Presto Connectors

    View Slide

  14. 14
    Presto Connector Data Model
    • Connector: Driver for a data source.
    • Example: HDFS, AWS S3, Cassandra, MySQL, SQL Server, Kafka
    • Catalog: Contains schemas from a data source
    specified by the connector
    • Schemas: Namespace to organize tables.
    • Tables: Set of unordered rows organized into columns
    with types.

    View Slide

  15. 15
    Presto Hive Connector for
    Object stores & Files systems
    HDFS/S3/GCS
    Worker
    Worker
    Coordinator
    Hive Metastore
    Thrift API
    Split
    Split

    View Slide

  16. 16
    Presto Hive Connector – Access Control
    HDFS
    GCS
    Presto has to be ‘proxy user’ which means
    it will read/write the data on behalf of you.
    The storage will handle the access control
    Presto
    Presto
    Auth Impersonation
    OAuth Token OAuth Token

    View Slide

  17. 17
    Presto Hive Connector – Data File Types
    • Supported File Types
    • ORC
    • Parquet
    • Avro
    • RCFile
    • No data ingestion needed
    • SequenceFile
    • JSON
    • Text

    View Slide

  18. 18
    Presto Druid Connector for real-time analytics
    Druid
    HDFS
    Presto
    Presto
    SQL SQL
    SQL 2. Read Files
    1. Get Segments

    View Slide

  19. 19
    Why Presto is Fast
    In-Memory
    Processing
    Pull
    Model
    Columnar
    storage &
    execution

    View Slide

  20. 20
    The Life of a Query – Simple Scan
    Output
    Filter [discount = 0]
    Scan [Orders]
    SELECT *
    FROM orders
    WHERE discount = 0

    View Slide

  21. 21
    The Life of a Query – Join and Aggregation
    SELECT
    orders.orderkey, SUM(tax)
    FROM orders
    LEFT JOIN lineitem
    ON orders.orderkey = lineitem.orderkey
    WHERE discount = 0
    GROUP BY orders.orderkey
    This example is from Presto: SQL on
    Everything
    https://research.fb.com/publications/
    presto-sql-on-everything/

    View Slide

  22. 22
    Logical Plan
    Aggregate [SUM(tax)]
    LeftJoin [ON orderkey]
    Filter [discount=0]
    Scan [lineitem]
    Scan [orders]

    View Slide

  23. 23
    Parser
    Planner
    Optimizer
    Scheduler
    Coordinator
    Hive Metastore
    S3
    (ORC,
    Parquet, files..
    Execution
    Hive
    Connector
    Worker
    Execution
    Hive
    Connector
    Worker
    Execution
    Hive
    Connector
    Worker
    SerializedPage
    Pages
    AST Logical
    Plan
    TaskUpdates (has Splits)
    Query
    Execution
    Hive
    Connector
    Worker
    SerializedPage
    SerializedPage
    SerializedPage
    Deeper Look

    View Slide

  24. 24
    Another Query
    SELECT
    SUM(extendedprice),
    suppkey
    FROM lineitem
    WHERE shipdate < DATE '1992-03-01'
    GROUP BY suppkey

    View Slide

  25. 25
    Query Creation
    @POST
    @Path("/v1/statement")
    @Produces(APPLICATION_JSON)
    public Response postStatement(
    String statement,
    @HeaderParam(X_FORWARDED_PROTO) String
    xForwardedProto,
    @Context HttpServletRequest servletRequest,
    @Context UriInfo uriInfo)
    {

    SessionContext sessionContext = new
    HttpRequestSessionContext(servletRequest, sqlParserOptions);
    Query query = new Query(statement, sessionContext,
    dispatchManager, queryResultsProvider, 0);
    queries.put(query.getQueryId(), query);

    }
    @GET
    @Path("/v1/statement/queued/{queryId}/{token}")
    @Produces(APPLICATION_JSON)
    public void getStatus(
    @PathParam("queryId") QueryId queryId,
    @PathParam("token") long token,
    @QueryParam("slug") String slug,
    @QueryParam("maxWait") Duration maxWait,
    @HeaderParam(X_FORWARDED_PROTO) String xForwardedProto,
    @Context UriInfo uriInfo,
    @Suspended AsyncResponse asyncResponse)
    {
    Query query = getQuery(queryId, slug);
    // wait for query to be dispatched, up to the wait timeout
    ListenableFuture> futureStateChange = addTimeout(
    query.waitForDispatched(), // DispatchManager creates
    DispatchQuery and start waiting for resources
    …);
    // when state changes, fetch the next result
    ListenableFuture queryResultsFuture = transformAsync(…);
    bindAsyncResponse(asyncResponse, queryResultsFuture,
    responseExecutor);
    }

    View Slide

  26. 26
    Parsing
    • DispatchManager invokes the Parser
    • Grammer defined in com/facebook/presto/sql/parser/SqlBase.g4
    • ANTLR generates
    • SqlBaseLexer
    • SqlBaseParser
    • SqlBaseVisitor
    • SqlBaseBaseVisitor
    • etc
    • Parser calls AstBuilder(parsingOptions).visit(tree);
    • Parsed objects (SELECT, WHERE, c1, etc.) 🡪 Node (Select, ComparisonExpression, SingleColumn, etc.)
    • Returns Query (Multiple trees of Node)
    QueryPreparer
    SqlParser
    Lexer Parser AstBuilder
    CommonTokenStre
    am
    ParserRuleContext
    tree Query
    String
    sql
    PreparedQuer
    y

    View Slide

  27. 27
    Planning
    SELECT
    SUM(extendedprice),
    suppkey
    FROM lineitem
    WHERE shipdate < DATE '1992-03-01'
    GROUP BY suppkey
    OutputNode
    ProjectNode
    AggregationNode
    ProjectNode
    ProjectNode
    FilterNode
    TableScanNode
    ProjectNode
    ProjectNode
    ProjectNode
    All 17
    columns
    ("shipdate" < date
    '1992-03-01')
    suppkey,
    extendedprice
    suppkey,
    extendedprice
    SqlQueryExecution
    Analyzer LogicalPlanner
    Analysis
    &
    Statemen
    t
    PlanNode
    PreparedQuer
    y
    suppkey,
    sum_9
    suppkey_3,
    sum_2
    suppkey_5,
    sum_4
    expr_6,
    expr
    expr_8,
    expr_7
    All 17
    columns

    View Slide

  28. 28
    Optimizing
    PlanOptimizer
    PlanNode
    Tree
    Hive
    Metastore
    Optimized PlanNode
    Tree
    OutputNode
    ProjectNode
    ExchangeNode
    ProjectNode
    AggregationNode
    ProjectNode
    FilterNode
    AggregationNode
    ExchangeNode
    ExchangeNode
    ("shipdate" < date
    '1992-03-01')
    suppkey,
    sum_9
    suppkey,
    sum_9
    TableScanNode
    suppkey, extendedprice,
    shipdate
    suppkey,
    extendedprice
    suppkey,
    sum_9
    suppkey, sum_9,
    hash(suppkey)
    suppkey,
    sum
    suppkey,
    sum
    suppkey,
    sum
    suppkey, sum_9,
    hashvalue
    suppkey, extendedprice,
    shipdate
    apply rule
    PlanNode
    Tree
    OutputNode
    ProjectNode
    AggregationNode
    ProjectNode
    ProjectNode
    FilterNode
    TableScanNode
    ProjectNode
    ProjectNode
    ProjectNode
    All 17
    columns
    ("shipdate" < date
    '1992-03-01')
    suppkey,
    extendedprice
    suppkey,
    extendedprice
    suppkey,
    sum_9
    suppkey_3,
    sum_2
    suppkey_5,
    sum_4
    expr_6,
    expr
    expr_8,
    expr_7
    All 17
    columns
    Group
    7
    Group
    6
    Group
    8
    Group
    9
    Group
    0
    Group
    1
    Group
    2
    Group
    3
    Group
    4
    Group
    5
    Rules
    • IterativeOptimizer
    • ProjectExpressionRewrite
    • AddExchanges
    • PruneUnreferencedOutputs
    • ApplyConnectorOptimization
    • HiveFilterPushdown
    • PushdownSubfields
    • …
    Memo
    LogicalPlann
    er

    View Slide

  29. 29
    Scheduling: Node Selection
    • Suppose
    • 1 coordinator
    • 4 workers: W1, W2, W3, W4
    • exchange.client-threads (Repartition fanout): 2
    • We have 3 PlanFragments
    • Fragment 2 : all 4 workers
    • Fragment 1 : 2 random workers
    • Fragment 0: On coordinator Only

    View Slide

  30. 30
    Execution
    • Threads
    • task.max-worker-threads
    • Number of worker threads a task
    will ever use, e.g. 100.
    • Driver
    • Bound on to a CPU thread
    • 1 second slot run time
    • Yield for other drivers (possibly from
    other tasks from other queries) for fair
    sharing
    • No prioritized thread scheduling?
    • Query with more fragments/stages
    would have higher priority
    FilterAndProjectOp
    ExchangeOp
    TaskOutputOp
    HashAggregationOp(
    F)
    LocalExchangeSinkO
    p
    LocalExchangeSourc
    eOp
    Drive
    r
    Drive
    r
    Task
    1
    HashAggregationOp(
    P)
    ScanFilterAndProject
    FilterAndProjectOp
    Task
    2
    PartitionedOutputOp
    Driver
    ExchangeOp
    Task
    0
    TaskOutputOp
    Driver

    View Slide

  31. 31
    Limitations & Updates
    Memory
    Limitation
    (some spill-over support)
    Fault
    Tolerance
    Single
    Coordinator

    View Slide

  32. Ahana Overview

    View Slide

  33. 33

    View Slide

  34. 34
    Managing Presto Remains Complex
    Hadoop complexity
    ▪ /etc/presto/config.properties
    ▪ /etc/presto/node.properties
    ▪ /etc/presto/jvm.config
    Many hidden parameters –
    difficult to tune
    Just the query engine
    ▪ No built-in catalog – users
    need to manage Hive
    metastore or AWS Glue
    ▪ No datalake S3 integration
    Poor out-of-box perf
    ▪ No tuning
    ▪ No high-performance
    indexing
    ▪ Basic optimizations for even
    for common queries

    View Slide

  35. 35
    How Ahana Cloud works?
    ~ 30 mins to create the compute plane
    https://app.ahana.cloud/signup Create Presto Clusters in your account

    View Slide

  36. 36
    Ahana Cloud – Reference Architecture
    • Distributed SQL engine with
    proven scalability
    • Interactive ANSI SQL queries
    • Query data where it lives with
    Federated Connectors (no ETL)
    • High concurrency
    • Separation of compute and
    storage

    View Slide

  37. Questions

    View Slide