Level 101 for Presto: What is PrestoDB?

Level 101 for Presto SQL on Everything Part 1 of
the Tech Talk Series for Presto What is PrestoDB? What’s the difference? Beinan Wang Sr. Software Engineer, Twitter Dipti Borkar Co-Founder & CPO, Ahana

Presto 101 Outline • What is Presto? • How are
we using Presto? • What made Presto different? ◦ Scalable architecture ◦ Flexible Connectors ◦ Performance • The life of a query 2

What is Presto? • Distributed SQL query engine ◦ ANSI
SQL on Hadoop, Kafka, Druid etc. ◦ Designed to be interactive ◦ Access to petabytes of data • Opensource, hosted on github ◦ https://github.com/prestodb • Open question: ◦ Is presto a database? 3

How are we using Presto? • Adhoc • BI tools
• Dashboard • A/B testing • ETL/scheduled job • Online service * 4

What made presto different? • Scalable architecture • Pluggable Connectors
• Performance 5

Scalable architecture • Two roles -- coordinator and worker •
Easy scale up and scale down ◦ Scale up to 1000 workers* ◦ Fit in web scaled companies 6

Pluggable Presto Connectors

Presto Connector Data Model • Connector: Driver for a data
source. ◦ Example: HDFS, Cassandra, Kafka, SQL Server • Catalog: Contains schemas from a datasource specified by the connector • Schemas: Namespace to organize tables. • Tables: Set of unordered rows organized into columns with types. 8

Presto Hive Connector 9

Presto Hive Connector -- Access Control 10

Presto Hive Connector -- Data File Types 11 • Supported
File Types ◦ ORC ◦ Parquet ◦ Avro ◦ RCFile ◦ SequenceFile ◦ JSON ◦ Text • No data ingestion needed

Presto Druid Connector 12

Why Presto is Fast • In-Memory processing • Pull model
• Columnar storage and execution • Bytecode generation 13

The Life of a Query -- Simple Scan SELECT *
FROM orders WHERE discount = 0

The Life of a Query -- Join and Aggregation SELECT
orders.orderkey, SUM(tax) FROM orders LEFT JOIN lineitem ON orders.orderkey = lineitem.orderkey WHERE discount = 0 GROUP BY orders.orderkey This example is from Presto: SQL on Everything https://research.fb.com/publications/presto-sql-on-everything/

Logical Plan -- do NOT join two big tables This
example is from Presto: SQL on Everything https://research.fb.com/publications/presto-sql-on-everything/

Limitations • Memory Limitation • Fault Tolerance • Single Point
of Failure: Coordinator 17

Time for a demo! Local Setup Query TPC-DS Cloud Setup
Query S3 / Parquet

Docker Sandbox for Presto https://hub.docker.com/r/ahanaio/prestodb-sandbox

AWS Sandbox AMI for Presto https://ahana.io/tutorials/aws-sandbox/

Join the Presto Community • Require new feature or file
a bug: github.com/prestodb/presto • Slack: prestodb.slack.com • Twitter: @prestodb 22 Stay up-to-date with Ahana • URL: ahana.io • Twitter: @ahanaio

Level 101 for Presto: What is PrestoDB?

Level 101 for Presto: What is PrestoDB?

Ahana

More Decks by Ahana

Other Decks in Technology

Featured

Transcript

Level 101 for Presto SQL on Everything Part 1 of

Presto 101 Outline • What is Presto? • How are

What is Presto? • Distributed SQL query engine ◦ ANSI

How are we using Presto? • Adhoc • BI tools

What made presto different? • Scalable architecture • Pluggable Connectors

Scalable architecture • Two roles -- coordinator and worker •

Pluggable Presto Connectors

Presto Connector Data Model • Connector: Driver for a data

Presto Hive Connector 9

Presto Hive Connector -- Access Control 10

Presto Hive Connector -- Data File Types 11 • Supported

Presto Druid Connector 12

Why Presto is Fast • In-Memory processing • Pull model

The Life of a Query -- Simple Scan SELECT *

The Life of a Query -- Join and Aggregation SELECT

Logical Plan -- do NOT join two big tables This

Limitations • Memory Limitation • Fault Tolerance • Single Point

Time for a demo! Local Setup Query TPC-DS Cloud Setup

Docker Sandbox for Presto https://hub.docker.com/r/ahanaio/prestodb-sandbox

AWS Sandbox AMI for Presto https://ahana.io/tutorials/aws-sandbox/

Q&A

Join the Presto Community • Require new feature or file