Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Presto 101: An introduction to Presto, the open source distributed SQL query engine

00101a1274d1f92977f4e442ef73be86?s=47 Ahana
September 02, 2021

Presto 101: An introduction to Presto, the open source distributed SQL query engine

In this session, Dipti will introduce the Presto technology and share why it’s becoming so popular – in fact, companies like Facebook, Uber, Twitter, Alibaba, and much more use Presto for interactive ad hoc queries, reporting & dashboarding data lake analytics, and much more. We’ll also show a quick demo on getting Presto running in AWS.



September 02, 2021


  1. Dipti Borkar Co-Founder, CPO, Chief Evangelist | Ahana Chair |

    Presto Foundation | Community Presto 101 : An introduction to Presto, an open source distributed SQL engine
  2. 2 Agenda • What is Presto? • History of federation

    • Introduction to Presto • What makes Presto different? • Scalable architecture • Flexible Connectors • Performance • The life of a query
  3. 3 Technology Cycles Rhyme: Data Federation Cloud & Big Data

    Composite Software (founded 2001) Garlic Paper by Laura Haas (2002) DB2 Federated Google File System Paper (2003) MapReduce paper (2006) Spark Paper (2010) Too many Data Sources, No one uber schema 80’s 90’s 2000’s 2010’s 2020’s FDBMS Challenges RDBMS FDBMS Paper by McCleod / Heimbigner (1985) FDBMS Paper by Sheth / Larson (1990) OLTP to DLW Wins Data Warehouse becomes the source of truth Star schema becomes sacred SQL Federation Makes Comeback Dremel Paper (2010) Drill paper (2012) SQL ++ paper (2014) Couchbase SQL++ engine (2018) Presto paper (2019), PartiQL (2019) New Cloud DW w/ Data Lakes Based on SQL Self Service Platforms which enable Self-Service Analytics
  4. 4 Presto aka prestoDB : The de facto engine for

    data platform teams Business Needs Data-driven decision making Businesses need more data to iterate over Technology Trends Disaggregation of Storage and Compute The rise of data lakes
  5. 5 What is Presto? • Distributed SQL query engine •

    ANSI SQL on Databases, Data lakes • Designed to be interactive • Access to petabytes of data • Opensource, hosted on github • https://github.com/prestodb
  6. 6 Presto Foundation: Community Driven

  7. 7 Presto Overview Presto Cluster Coordinator Worker Worker Worker Worker

  8. 8 Common Questions • Is Presto a database? • How

    is it related to Hadoop? • How is it different from a data warehouse?
  9. 9 Sample Presto deployment stack & use cases • Ad

    hoc • BI tools • Dashboard • A/B testing • ETL/scheduled job • Online service Client: Zeppelin, Tableau, JDBC, DBVis Front-end Spark SQL Hive Presto Analytics SQL Tools Hive Metadata Store Dataset Discovery GCS AWS S3 HDFS Data Storage
  10. 10 What makes Presto different? Scalable Architecture Pluggable Connectors Performance

  11. 11 Scalable Architecture • Two roles - coordinator and worker

    • Easy scale up and scale down • Scale up to 1000 workers • Validated at web scaled companies New Worker New Worker Worker Worker Worker Coordinator Data Source Presto Cluster
  12. 12 Scalable Architecture Parser/analyzer Worker Worker Worker Metadata API Planner

    Scheduler Data Location API Data Shuffle Data Shuffle Presto Connector Presto Coordinator BI Tools/Notebooks/Clients Presto CLI Looker JDBC Superset ... Tableau Jupyter Result Sets SQL Any Database, Data Stream, or Storage HDFS Object Stores (S3) MySQL ElasticSearch Kafka ... Presto Connector Presto Connector
  13. 13 Pluggable Presto Connectors

  14. 14 Presto Connector Data Model • Connector: Driver for a

    data source. • Example: HDFS, AWS S3, Cassandra, MySQL, SQL Server, Kafka • Catalog: Contains schemas from a data source specified by the connector • Schemas: Namespace to organize tables. • Tables: Set of unordered rows organized into columns with types.
  15. 15 Presto Hive Connector for Object stores & Files systems

    HDFS/S3/GCS Worker Worker Coordinator Hive Metastore Thrift API Split Split
  16. 16 Presto Hive Connector – Access Control HDFS GCS Presto

    has to be ‘proxy user’ which means it will read/write the data on behalf of you. The storage will handle the access control Presto Presto Auth Impersonation OAuth Token OAuth Token
  17. 17 Presto Hive Connector – Data File Types • Supported

    File Types • ORC • Parquet • Avro • RCFile • No data ingestion needed • SequenceFile • JSON • Text
  18. 18 Presto Druid Connector for real-time analytics Druid HDFS Presto

    Presto SQL SQL SQL 2. Read Files 1. Get Segments
  19. 19 Why Presto is Fast In-Memory Processing Pull Model Columnar

    storage & execution
  20. 20 The Life of a Query – Simple Scan Output

    Filter [discount = 0] Scan [Orders] SELECT * FROM orders WHERE discount = 0
  21. 21 The Life of a Query – Join and Aggregation

    SELECT orders.orderkey, SUM(tax) FROM orders LEFT JOIN lineitem ON orders.orderkey = lineitem.orderkey WHERE discount = 0 GROUP BY orders.orderkey This example is from Presto: SQL on Everything https://research.fb.com/publications/ presto-sql-on-everything/
  22. 22 Logical Plan Aggregate [SUM(tax)] LeftJoin [ON orderkey] Filter [discount=0]

    Scan [lineitem] Scan [orders]
  23. 23 Parser Planner Optimizer Scheduler Coordinator Hive Metastore S3 (ORC,

    Parquet, files.. Execution Hive Connector Worker Execution Hive Connector Worker Execution Hive Connector Worker SerializedPage Pages AST Logical Plan TaskUpdates (has Splits) Query Execution Hive Connector Worker SerializedPage SerializedPage SerializedPage Deeper Look
  24. 24 Another Query SELECT SUM(extendedprice), suppkey FROM lineitem WHERE shipdate

    < DATE '1992-03-01' GROUP BY suppkey
  25. 25 Query Creation @POST @Path("/v1/statement") @Produces(APPLICATION_JSON) public Response postStatement( String

    statement, @HeaderParam(X_FORWARDED_PROTO) String xForwardedProto, @Context HttpServletRequest servletRequest, @Context UriInfo uriInfo) { … SessionContext sessionContext = new HttpRequestSessionContext(servletRequest, sqlParserOptions); Query query = new Query(statement, sessionContext, dispatchManager, queryResultsProvider, 0); queries.put(query.getQueryId(), query); … } @GET @Path("/v1/statement/queued/{queryId}/{token}") @Produces(APPLICATION_JSON) public void getStatus( @PathParam("queryId") QueryId queryId, @PathParam("token") long token, @QueryParam("slug") String slug, @QueryParam("maxWait") Duration maxWait, @HeaderParam(X_FORWARDED_PROTO) String xForwardedProto, @Context UriInfo uriInfo, @Suspended AsyncResponse asyncResponse) { Query query = getQuery(queryId, slug); // wait for query to be dispatched, up to the wait timeout ListenableFuture<?> futureStateChange = addTimeout( query.waitForDispatched(), // DispatchManager creates DispatchQuery and start waiting for resources …); // when state changes, fetch the next result ListenableFuture<Response> queryResultsFuture = transformAsync(…); bindAsyncResponse(asyncResponse, queryResultsFuture, responseExecutor); }
  26. 26 Parsing • DispatchManager invokes the Parser • Grammer defined

    in com/facebook/presto/sql/parser/SqlBase.g4 • ANTLR generates • SqlBaseLexer • SqlBaseParser • SqlBaseVisitor • SqlBaseBaseVisitor • etc • Parser calls AstBuilder(parsingOptions).visit(tree); • Parsed objects (SELECT, WHERE, c1, etc.) 🡪 Node (Select, ComparisonExpression, SingleColumn, etc.) • Returns Query (Multiple trees of Node) QueryPreparer SqlParser Lexer Parser AstBuilder CommonTokenStre am ParserRuleContext tree Query String sql PreparedQuer y
  27. 27 Planning SELECT SUM(extendedprice), suppkey FROM lineitem WHERE shipdate <

    DATE '1992-03-01' GROUP BY suppkey OutputNode ProjectNode AggregationNode ProjectNode ProjectNode FilterNode TableScanNode ProjectNode ProjectNode ProjectNode All 17 columns ("shipdate" < date '1992-03-01') suppkey, extendedprice suppkey, extendedprice SqlQueryExecution Analyzer LogicalPlanner Analysis & Statemen t PlanNode PreparedQuer y suppkey, sum_9 suppkey_3, sum_2 suppkey_5, sum_4 expr_6, expr expr_8, expr_7 All 17 columns
  28. 28 Optimizing PlanOptimizer PlanNode Tree Hive Metastore Optimized PlanNode Tree

    OutputNode ProjectNode ExchangeNode ProjectNode AggregationNode ProjectNode FilterNode AggregationNode ExchangeNode ExchangeNode ("shipdate" < date '1992-03-01') suppkey, sum_9 suppkey, sum_9 TableScanNode suppkey, extendedprice, shipdate suppkey, extendedprice suppkey, sum_9 suppkey, sum_9, hash(suppkey) suppkey, sum suppkey, sum suppkey, sum suppkey, sum_9, hashvalue suppkey, extendedprice, shipdate apply rule PlanNode Tree OutputNode ProjectNode AggregationNode ProjectNode ProjectNode FilterNode TableScanNode ProjectNode ProjectNode ProjectNode All 17 columns ("shipdate" < date '1992-03-01') suppkey, extendedprice suppkey, extendedprice suppkey, sum_9 suppkey_3, sum_2 suppkey_5, sum_4 expr_6, expr expr_8, expr_7 All 17 columns Group 7 Group 6 Group 8 Group 9 Group 0 Group 1 Group 2 Group 3 Group 4 Group 5 Rules • IterativeOptimizer • ProjectExpressionRewrite • AddExchanges • PruneUnreferencedOutputs • ApplyConnectorOptimization • HiveFilterPushdown • PushdownSubfields • … Memo LogicalPlann er
  29. 29 Scheduling: Node Selection • Suppose • 1 coordinator •

    4 workers: W1, W2, W3, W4 • exchange.client-threads (Repartition fanout): 2 • We have 3 PlanFragments • Fragment 2 : all 4 workers • Fragment 1 : 2 random workers • Fragment 0: On coordinator Only
  30. 30 Execution • Threads • task.max-worker-threads • Number of worker

    threads a task will ever use, e.g. 100. • Driver • Bound on to a CPU thread • 1 second slot run time • Yield for other drivers (possibly from other tasks from other queries) for fair sharing • No prioritized thread scheduling? • Query with more fragments/stages would have higher priority FilterAndProjectOp ExchangeOp TaskOutputOp HashAggregationOp( F) LocalExchangeSinkO p LocalExchangeSourc eOp Drive r Drive r Task 1 HashAggregationOp( P) ScanFilterAndProject FilterAndProjectOp Task 2 PartitionedOutputOp Driver ExchangeOp Task 0 TaskOutputOp Driver
  31. 31 Limitations & Updates Memory Limitation (some spill-over support) Fault

    Tolerance Single Coordinator
  32. Ahana Overview

  33. 33

  34. 34 Managing Presto Remains Complex Hadoop complexity ▪ /etc/presto/config.properties ▪

    /etc/presto/node.properties ▪ /etc/presto/jvm.config Many hidden parameters – difficult to tune Just the query engine ▪ No built-in catalog – users need to manage Hive metastore or AWS Glue ▪ No datalake S3 integration Poor out-of-box perf ▪ No tuning ▪ No high-performance indexing ▪ Basic optimizations for even for common queries
  35. 35 How Ahana Cloud works? ~ 30 mins to create

    the compute plane https://app.ahana.cloud/signup Create Presto Clusters in your account
  36. 36 Ahana Cloud – Reference Architecture • Distributed SQL engine

    with proven scalability • Interactive ANSI SQL queries • Query data where it lives with Federated Connectors (no ETL) • High concurrency • Separation of compute and storage
  37. Questions