Presto 101: An introduction to Presto, the open source distributed SQL query engine

Slide 1

Slide 1 text

Dipti Borkar Co-Founder, CPO, Chief Evangelist | Ahana Chair | Presto Foundation | Community Presto 101 : An introduction to Presto, an open source distributed SQL engine

Slide 2

Slide 2 text

2 Agenda • What is Presto? • History of federation • Introduction to Presto • What makes Presto different? • Scalable architecture • Flexible Connectors • Performance • The life of a query

Slide 3

Slide 3 text

3 Technology Cycles Rhyme: Data Federation Cloud & Big Data Composite Software (founded 2001) Garlic Paper by Laura Haas (2002) DB2 Federated Google File System Paper (2003) MapReduce paper (2006) Spark Paper (2010) Too many Data Sources, No one uber schema 80’s 90’s 2000’s 2010’s 2020’s FDBMS Challenges RDBMS FDBMS Paper by McCleod / Heimbigner (1985) FDBMS Paper by Sheth / Larson (1990) OLTP to DLW Wins Data Warehouse becomes the source of truth Star schema becomes sacred SQL Federation Makes Comeback Dremel Paper (2010) Drill paper (2012) SQL ++ paper (2014) Couchbase SQL++ engine (2018) Presto paper (2019), PartiQL (2019) New Cloud DW w/ Data Lakes Based on SQL Self Service Platforms which enable Self-Service Analytics

Slide 4

Slide 4 text

4 Presto aka prestoDB : The de facto engine for data platform teams Business Needs Data-driven decision making Businesses need more data to iterate over Technology Trends Disaggregation of Storage and Compute The rise of data lakes

Slide 5

Slide 5 text

5 What is Presto? • Distributed SQL query engine • ANSI SQL on Databases, Data lakes • Designed to be interactive • Access to petabytes of data • Opensource, hosted on github • https://github.com/prestodb

Slide 6

Slide 6 text

6 Presto Foundation: Community Driven

Slide 7

Slide 7 text

7 Presto Overview Presto Cluster Coordinator Worker Worker Worker Worker

Slide 8

Slide 8 text

8 Common Questions • Is Presto a database? • How is it related to Hadoop? • How is it different from a data warehouse?

Slide 9

Slide 9 text

9 Sample Presto deployment stack & use cases • Ad hoc • BI tools • Dashboard • A/B testing • ETL/scheduled job • Online service Client: Zeppelin, Tableau, JDBC, DBVis Front-end Spark SQL Hive Presto Analytics SQL Tools Hive Metadata Store Dataset Discovery GCS AWS S3 HDFS Data Storage

Slide 10

Slide 10 text

10 What makes Presto different? Scalable Architecture Pluggable Connectors Performance

Slide 11

Slide 11 text

11 Scalable Architecture • Two roles - coordinator and worker • Easy scale up and scale down • Scale up to 1000 workers • Validated at web scaled companies New Worker New Worker Worker Worker Worker Coordinator Data Source Presto Cluster

Slide 12

Slide 12 text

12 Scalable Architecture Parser/analyzer Worker Worker Worker Metadata API Planner Scheduler Data Location API Data Shufﬂe Data Shufﬂe Presto Connector Presto Coordinator BI Tools/Notebooks/Clients Presto CLI Looker JDBC Superset ... Tableau Jupyter Result Sets SQL Any Database, Data Stream, or Storage HDFS Object Stores (S3) MySQL ElasticSearch Kafka ... Presto Connector Presto Connector

Slide 13

Slide 13 text

13 Pluggable Presto Connectors

Slide 14

Slide 14 text

14 Presto Connector Data Model • Connector: Driver for a data source. • Example: HDFS, AWS S3, Cassandra, MySQL, SQL Server, Kafka • Catalog: Contains schemas from a data source speciﬁed by the connector • Schemas: Namespace to organize tables. • Tables: Set of unordered rows organized into columns with types.

Slide 15

Slide 15 text

15 Presto Hive Connector for Object stores & Files systems HDFS/S3/GCS Worker Worker Coordinator Hive Metastore Thrift API Split Split

Slide 16

Slide 16 text

16 Presto Hive Connector – Access Control HDFS GCS Presto has to be ‘proxy user’ which means it will read/write the data on behalf of you. The storage will handle the access control Presto Presto Auth Impersonation OAuth Token OAuth Token

Slide 17

Slide 17 text

17 Presto Hive Connector – Data File Types • Supported File Types • ORC • Parquet • Avro • RCFile • No data ingestion needed • SequenceFile • JSON • Text

Slide 18

Slide 18 text

18 Presto Druid Connector for real-time analytics Druid HDFS Presto Presto SQL SQL SQL 2. Read Files 1. Get Segments

Slide 19

Slide 19 text

19 Why Presto is Fast In-Memory Processing Pull Model Columnar storage & execution

Slide 20

Slide 20 text

20 The Life of a Query – Simple Scan Output Filter [discount = 0] Scan [Orders] SELECT * FROM orders WHERE discount = 0

Slide 21

Slide 21 text

21 The Life of a Query – Join and Aggregation SELECT orders.orderkey, SUM(tax) FROM orders LEFT JOIN lineitem ON orders.orderkey = lineitem.orderkey WHERE discount = 0 GROUP BY orders.orderkey This example is from Presto: SQL on Everything https://research.fb.com/publications/ presto-sql-on-everything/

Slide 22

Slide 22 text

22 Logical Plan Aggregate [SUM(tax)] LeftJoin [ON orderkey] Filter [discount=0] Scan [lineitem] Scan [orders]

Slide 23

Slide 23 text

23 Parser Planner Optimizer Scheduler Coordinator Hive Metastore S3 (ORC, Parquet, files.. Execution Hive Connector Worker Execution Hive Connector Worker Execution Hive Connector Worker SerializedPage Pages AST Logical Plan TaskUpdates (has Splits) Query Execution Hive Connector Worker SerializedPage SerializedPage SerializedPage Deeper Look

Slide 24

Slide 24 text

24 Another Query SELECT SUM(extendedprice), suppkey FROM lineitem WHERE shipdate < DATE '1992-03-01' GROUP BY suppkey

Slide 25

Slide 25 text

25 Query Creation @POST @Path("/v1/statement") @Produces(APPLICATION_JSON) public Response postStatement( String statement, @HeaderParam(X_FORWARDED_PROTO) String xForwardedProto, @Context HttpServletRequest servletRequest, @Context UriInfo uriInfo) { … SessionContext sessionContext = new HttpRequestSessionContext(servletRequest, sqlParserOptions); Query query = new Query(statement, sessionContext, dispatchManager, queryResultsProvider, 0); queries.put(query.getQueryId(), query); … } @GET @Path("/v1/statement/queued/{queryId}/{token}") @Produces(APPLICATION_JSON) public void getStatus( @PathParam("queryId") QueryId queryId, @PathParam("token") long token, @QueryParam("slug") String slug, @QueryParam("maxWait") Duration maxWait, @HeaderParam(X_FORWARDED_PROTO) String xForwardedProto, @Context UriInfo uriInfo, @Suspended AsyncResponse asyncResponse) { Query query = getQuery(queryId, slug); // wait for query to be dispatched, up to the wait timeout ListenableFuture futureStateChange = addTimeout( query.waitForDispatched(), // DispatchManager creates DispatchQuery and start waiting for resources …); // when state changes, fetch the next result ListenableFuture queryResultsFuture = transformAsync(…); bindAsyncResponse(asyncResponse, queryResultsFuture, responseExecutor); }

Slide 26

Slide 26 text

26 Parsing • DispatchManager invokes the Parser • Grammer deﬁned in com/facebook/presto/sql/parser/SqlBase.g4 • ANTLR generates • SqlBaseLexer • SqlBaseParser • SqlBaseVisitor • SqlBaseBaseVisitor • etc • Parser calls AstBuilder(parsingOptions).visit(tree); • Parsed objects (SELECT, WHERE, c1, etc.) 🡪 Node (Select, ComparisonExpression, SingleColumn, etc.) • Returns Query (Multiple trees of Node) QueryPreparer SqlParser Lexer Parser AstBuilder CommonTokenStre am ParserRuleContext tree Query String sql PreparedQuer y

Slide 27

Slide 27 text

27 Planning SELECT SUM(extendedprice), suppkey FROM lineitem WHERE shipdate < DATE '1992-03-01' GROUP BY suppkey OutputNode ProjectNode AggregationNode ProjectNode ProjectNode FilterNode TableScanNode ProjectNode ProjectNode ProjectNode All 17 columns ("shipdate" < date '1992-03-01') suppkey, extendedprice suppkey, extendedprice SqlQueryExecution Analyzer LogicalPlanner Analysis & Statemen t PlanNode PreparedQuer y suppkey, sum_9 suppkey_3, sum_2 suppkey_5, sum_4 expr_6, expr expr_8, expr_7 All 17 columns

Slide 28

Slide 28 text

28 Optimizing PlanOptimizer PlanNode Tree Hive Metastore Optimized PlanNode Tree OutputNode ProjectNode ExchangeNode ProjectNode AggregationNode ProjectNode FilterNode AggregationNode ExchangeNode ExchangeNode ("shipdate" < date '1992-03-01') suppkey, sum_9 suppkey, sum_9 TableScanNode suppkey, extendedprice, shipdate suppkey, extendedprice suppkey, sum_9 suppkey, sum_9, hash(suppkey) suppkey, sum suppkey, sum suppkey, sum suppkey, sum_9, hashvalue suppkey, extendedprice, shipdate apply rule PlanNode Tree OutputNode ProjectNode AggregationNode ProjectNode ProjectNode FilterNode TableScanNode ProjectNode ProjectNode ProjectNode All 17 columns ("shipdate" < date '1992-03-01') suppkey, extendedprice suppkey, extendedprice suppkey, sum_9 suppkey_3, sum_2 suppkey_5, sum_4 expr_6, expr expr_8, expr_7 All 17 columns Group 7 Group 6 Group 8 Group 9 Group 0 Group 1 Group 2 Group 3 Group 4 Group 5 Rules • IterativeOptimizer • ProjectExpressionRewrite • AddExchanges • PruneUnreferencedOutputs • ApplyConnectorOptimization • HiveFilterPushdown • PushdownSubfields • … Memo LogicalPlann er

Slide 29

Slide 29 text

29 Scheduling: Node Selection • Suppose • 1 coordinator • 4 workers: W1, W2, W3, W4 • exchange.client-threads (Repartition fanout): 2 • We have 3 PlanFragments • Fragment 2 : all 4 workers • Fragment 1 : 2 random workers • Fragment 0: On coordinator Only

Slide 30

Slide 30 text

30 Execution • Threads • task.max-worker-threads • Number of worker threads a task will ever use, e.g. 100. • Driver • Bound on to a CPU thread • 1 second slot run time • Yield for other drivers (possibly from other tasks from other queries) for fair sharing • No prioritized thread scheduling? • Query with more fragments/stages would have higher priority FilterAndProjectOp ExchangeOp TaskOutputOp HashAggregationOp( F) LocalExchangeSinkO p LocalExchangeSourc eOp Drive r Drive r Task 1 HashAggregationOp( P) ScanFilterAndProject FilterAndProjectOp Task 2 PartitionedOutputOp Driver ExchangeOp Task 0 TaskOutputOp Driver

Slide 31

Slide 31 text

31 Limitations & Updates Memory Limitation (some spill-over support) Fault Tolerance Single Coordinator

Slide 32

Slide 32 text

Ahana Overview

Slide 33

Slide 33 text

Slide 34

Slide 34 text

34 Managing Presto Remains Complex Hadoop complexity ▪ /etc/presto/config.properties ▪ /etc/presto/node.properties ▪ /etc/presto/jvm.config Many hidden parameters – difficult to tune Just the query engine ▪ No built-in catalog – users need to manage Hive metastore or AWS Glue ▪ No datalake S3 integration Poor out-of-box perf ▪ No tuning ▪ No high-performance indexing ▪ Basic optimizations for even for common queries

Slide 35

Slide 35 text

35 How Ahana Cloud works? ~ 30 mins to create the compute plane https://app.ahana.cloud/signup Create Presto Clusters in your account

Slide 36

Slide 36 text

36 Ahana Cloud – Reference Architecture • Distributed SQL engine with proven scalability • Interactive ANSI SQL queries • Query data where it lives with Federated Connectors (no ETL) • High concurrency • Separation of compute and storage

Slide 37

Slide 37 text

Questions