Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Presto 101: An introduction to Presto, the open source distributed SQL query engine

September 02, 2021

Presto 101: An introduction to Presto, the open source distributed SQL query engine

In this session, Dipti will introduce the Presto technology and share why it’s becoming so popular – in fact, companies like Facebook, Uber, Twitter, Alibaba, and much more use Presto for interactive ad hoc queries, reporting & dashboarding data lake analytics, and much more. We’ll also show a quick demo on getting Presto running in AWS.


September 02, 2021

More Decks by Ahana

Other Decks in Technology


  1. Dipti Borkar Co-Founder, CPO, Chief Evangelist | Ahana Chair |

    Presto Foundation | Community Presto 101 : An introduction to Presto, an open source distributed SQL engine
  2. 2 Agenda • What is Presto? • History of federation

    • Introduction to Presto • What makes Presto different? • Scalable architecture • Flexible Connectors • Performance • The life of a query
  3. 3 Technology Cycles Rhyme: Data Federation Cloud & Big Data

    Composite Software (founded 2001) Garlic Paper by Laura Haas (2002) DB2 Federated Google File System Paper (2003) MapReduce paper (2006) Spark Paper (2010) Too many Data Sources, No one uber schema 80’s 90’s 2000’s 2010’s 2020’s FDBMS Challenges RDBMS FDBMS Paper by McCleod / Heimbigner (1985) FDBMS Paper by Sheth / Larson (1990) OLTP to DLW Wins Data Warehouse becomes the source of truth Star schema becomes sacred SQL Federation Makes Comeback Dremel Paper (2010) Drill paper (2012) SQL ++ paper (2014) Couchbase SQL++ engine (2018) Presto paper (2019), PartiQL (2019) New Cloud DW w/ Data Lakes Based on SQL Self Service Platforms which enable Self-Service Analytics
  4. 4 Presto aka prestoDB : The de facto engine for

    data platform teams Business Needs Data-driven decision making Businesses need more data to iterate over Technology Trends Disaggregation of Storage and Compute The rise of data lakes
  5. 5 What is Presto? • Distributed SQL query engine •

    ANSI SQL on Databases, Data lakes • Designed to be interactive • Access to petabytes of data • Opensource, hosted on github • https://github.com/prestodb
  6. 8 Common Questions • Is Presto a database? • How

    is it related to Hadoop? • How is it different from a data warehouse?
  7. 9 Sample Presto deployment stack & use cases • Ad

    hoc • BI tools • Dashboard • A/B testing • ETL/scheduled job • Online service Client: Zeppelin, Tableau, JDBC, DBVis Front-end Spark SQL Hive Presto Analytics SQL Tools Hive Metadata Store Dataset Discovery GCS AWS S3 HDFS Data Storage
  8. 11 Scalable Architecture • Two roles - coordinator and worker

    • Easy scale up and scale down • Scale up to 1000 workers • Validated at web scaled companies New Worker New Worker Worker Worker Worker Coordinator Data Source Presto Cluster
  9. 12 Scalable Architecture Parser/analyzer Worker Worker Worker Metadata API Planner

    Scheduler Data Location API Data Shuffle Data Shuffle Presto Connector Presto Coordinator BI Tools/Notebooks/Clients Presto CLI Looker JDBC Superset ... Tableau Jupyter Result Sets SQL Any Database, Data Stream, or Storage HDFS Object Stores (S3) MySQL ElasticSearch Kafka ... Presto Connector Presto Connector
  10. 14 Presto Connector Data Model • Connector: Driver for a

    data source. • Example: HDFS, AWS S3, Cassandra, MySQL, SQL Server, Kafka • Catalog: Contains schemas from a data source specified by the connector • Schemas: Namespace to organize tables. • Tables: Set of unordered rows organized into columns with types.
  11. 15 Presto Hive Connector for Object stores & Files systems

    HDFS/S3/GCS Worker Worker Coordinator Hive Metastore Thrift API Split Split
  12. 16 Presto Hive Connector – Access Control HDFS GCS Presto

    has to be ‘proxy user’ which means it will read/write the data on behalf of you. The storage will handle the access control Presto Presto Auth Impersonation OAuth Token OAuth Token
  13. 17 Presto Hive Connector – Data File Types • Supported

    File Types • ORC • Parquet • Avro • RCFile • No data ingestion needed • SequenceFile • JSON • Text
  14. 18 Presto Druid Connector for real-time analytics Druid HDFS Presto

    Presto SQL SQL SQL 2. Read Files 1. Get Segments
  15. 20 The Life of a Query – Simple Scan Output

    Filter [discount = 0] Scan [Orders] SELECT * FROM orders WHERE discount = 0
  16. 21 The Life of a Query – Join and Aggregation

    SELECT orders.orderkey, SUM(tax) FROM orders LEFT JOIN lineitem ON orders.orderkey = lineitem.orderkey WHERE discount = 0 GROUP BY orders.orderkey This example is from Presto: SQL on Everything https://research.fb.com/publications/ presto-sql-on-everything/
  17. 23 Parser Planner Optimizer Scheduler Coordinator Hive Metastore S3 (ORC,

    Parquet, files.. Execution Hive Connector Worker Execution Hive Connector Worker Execution Hive Connector Worker SerializedPage Pages AST Logical Plan TaskUpdates (has Splits) Query Execution Hive Connector Worker SerializedPage SerializedPage SerializedPage Deeper Look
  18. 25 Query Creation @POST @Path("/v1/statement") @Produces(APPLICATION_JSON) public Response postStatement( String

    statement, @HeaderParam(X_FORWARDED_PROTO) String xForwardedProto, @Context HttpServletRequest servletRequest, @Context UriInfo uriInfo) { … SessionContext sessionContext = new HttpRequestSessionContext(servletRequest, sqlParserOptions); Query query = new Query(statement, sessionContext, dispatchManager, queryResultsProvider, 0); queries.put(query.getQueryId(), query); … } @GET @Path("/v1/statement/queued/{queryId}/{token}") @Produces(APPLICATION_JSON) public void getStatus( @PathParam("queryId") QueryId queryId, @PathParam("token") long token, @QueryParam("slug") String slug, @QueryParam("maxWait") Duration maxWait, @HeaderParam(X_FORWARDED_PROTO) String xForwardedProto, @Context UriInfo uriInfo, @Suspended AsyncResponse asyncResponse) { Query query = getQuery(queryId, slug); // wait for query to be dispatched, up to the wait timeout ListenableFuture<?> futureStateChange = addTimeout( query.waitForDispatched(), // DispatchManager creates DispatchQuery and start waiting for resources …); // when state changes, fetch the next result ListenableFuture<Response> queryResultsFuture = transformAsync(…); bindAsyncResponse(asyncResponse, queryResultsFuture, responseExecutor); }
  19. 26 Parsing • DispatchManager invokes the Parser • Grammer defined

    in com/facebook/presto/sql/parser/SqlBase.g4 • ANTLR generates • SqlBaseLexer • SqlBaseParser • SqlBaseVisitor • SqlBaseBaseVisitor • etc • Parser calls AstBuilder(parsingOptions).visit(tree); • Parsed objects (SELECT, WHERE, c1, etc.) 🡪 Node (Select, ComparisonExpression, SingleColumn, etc.) • Returns Query (Multiple trees of Node) QueryPreparer SqlParser Lexer Parser AstBuilder CommonTokenStre am ParserRuleContext tree Query String sql PreparedQuer y
  20. 27 Planning SELECT SUM(extendedprice), suppkey FROM lineitem WHERE shipdate <

    DATE '1992-03-01' GROUP BY suppkey OutputNode ProjectNode AggregationNode ProjectNode ProjectNode FilterNode TableScanNode ProjectNode ProjectNode ProjectNode All 17 columns ("shipdate" < date '1992-03-01') suppkey, extendedprice suppkey, extendedprice SqlQueryExecution Analyzer LogicalPlanner Analysis & Statemen t PlanNode PreparedQuer y suppkey, sum_9 suppkey_3, sum_2 suppkey_5, sum_4 expr_6, expr expr_8, expr_7 All 17 columns
  21. 28 Optimizing PlanOptimizer PlanNode Tree Hive Metastore Optimized PlanNode Tree

    OutputNode ProjectNode ExchangeNode ProjectNode AggregationNode ProjectNode FilterNode AggregationNode ExchangeNode ExchangeNode ("shipdate" < date '1992-03-01') suppkey, sum_9 suppkey, sum_9 TableScanNode suppkey, extendedprice, shipdate suppkey, extendedprice suppkey, sum_9 suppkey, sum_9, hash(suppkey) suppkey, sum suppkey, sum suppkey, sum suppkey, sum_9, hashvalue suppkey, extendedprice, shipdate apply rule PlanNode Tree OutputNode ProjectNode AggregationNode ProjectNode ProjectNode FilterNode TableScanNode ProjectNode ProjectNode ProjectNode All 17 columns ("shipdate" < date '1992-03-01') suppkey, extendedprice suppkey, extendedprice suppkey, sum_9 suppkey_3, sum_2 suppkey_5, sum_4 expr_6, expr expr_8, expr_7 All 17 columns Group 7 Group 6 Group 8 Group 9 Group 0 Group 1 Group 2 Group 3 Group 4 Group 5 Rules • IterativeOptimizer • ProjectExpressionRewrite • AddExchanges • PruneUnreferencedOutputs • ApplyConnectorOptimization • HiveFilterPushdown • PushdownSubfields • … Memo LogicalPlann er
  22. 29 Scheduling: Node Selection • Suppose • 1 coordinator •

    4 workers: W1, W2, W3, W4 • exchange.client-threads (Repartition fanout): 2 • We have 3 PlanFragments • Fragment 2 : all 4 workers • Fragment 1 : 2 random workers • Fragment 0: On coordinator Only
  23. 30 Execution • Threads • task.max-worker-threads • Number of worker

    threads a task will ever use, e.g. 100. • Driver • Bound on to a CPU thread • 1 second slot run time • Yield for other drivers (possibly from other tasks from other queries) for fair sharing • No prioritized thread scheduling? • Query with more fragments/stages would have higher priority FilterAndProjectOp ExchangeOp TaskOutputOp HashAggregationOp( F) LocalExchangeSinkO p LocalExchangeSourc eOp Drive r Drive r Task 1 HashAggregationOp( P) ScanFilterAndProject FilterAndProjectOp Task 2 PartitionedOutputOp Driver ExchangeOp Task 0 TaskOutputOp Driver
  24. 33

  25. 34 Managing Presto Remains Complex Hadoop complexity ▪ /etc/presto/config.properties ▪

    /etc/presto/node.properties ▪ /etc/presto/jvm.config Many hidden parameters – difficult to tune Just the query engine ▪ No built-in catalog – users need to manage Hive metastore or AWS Glue ▪ No datalake S3 integration Poor out-of-box perf ▪ No tuning ▪ No high-performance indexing ▪ Basic optimizations for even for common queries
  26. 35 How Ahana Cloud works? ~ 30 mins to create

    the compute plane https://app.ahana.cloud/signup Create Presto Clusters in your account
  27. 36 Ahana Cloud – Reference Architecture • Distributed SQL engine

    with proven scalability • Interactive ANSI SQL queries • Query data where it lives with Federated Connectors (no ETL) • High concurrency • Separation of compute and storage