Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Level 101 for Presto: What is PrestoDB?

Ahana
July 28, 2020

Level 101 for Presto: What is PrestoDB?

Presto is a widely adopted federated SQL engine for federated querying across multiple data sources. With Presto, you can perform ad hoc querying of data in place. For today’s “data hacker”, Presto helps solve challenges around time to discovery and the amount of time it takes to do ad hoc analysis.

In Level 101, you’ll get an overview of Presto, including:

A high level overview of Presto & most common use cases
The problems it solves and why you should use it
A live, hands-on demo on getting Presto running on Docker
Real world example: How Twitter uses Presto at scale

Ahana

July 28, 2020
Tweet

More Decks by Ahana

Other Decks in Technology

Transcript

  1. Level 101 for Presto SQL on Everything Part 1 of

    the Tech Talk Series for Presto What is PrestoDB? What’s the difference? Beinan Wang Sr. Software Engineer, Twitter Dipti Borkar Co-Founder & CPO, Ahana
  2. Presto 101 Outline • What is Presto? • How are

    we using Presto? • What made Presto different? ◦ Scalable architecture ◦ Flexible Connectors ◦ Performance • The life of a query 2
  3. What is Presto? • Distributed SQL query engine ◦ ANSI

    SQL on Hadoop, Kafka, Druid etc. ◦ Designed to be interactive ◦ Access to petabytes of data • Opensource, hosted on github ◦ https://github.com/prestodb • Open question: ◦ Is presto a database? 3
  4. How are we using Presto? • Adhoc • BI tools

    • Dashboard • A/B testing • ETL/scheduled job • Online service * 4
  5. Scalable architecture • Two roles -- coordinator and worker •

    Easy scale up and scale down ◦ Scale up to 1000 workers* ◦ Fit in web scaled companies 6
  6. Presto Connector Data Model • Connector: Driver for a data

    source. ◦ Example: HDFS, Cassandra, Kafka, SQL Server • Catalog: Contains schemas from a datasource specified by the connector • Schemas: Namespace to organize tables. • Tables: Set of unordered rows organized into columns with types. 8
  7. Presto Hive Connector -- Data File Types 11 • Supported

    File Types ◦ ORC ◦ Parquet ◦ Avro ◦ RCFile ◦ SequenceFile ◦ JSON ◦ Text • No data ingestion needed
  8. Why Presto is Fast • In-Memory processing • Pull model

    • Columnar storage and execution • Bytecode generation 13
  9. The Life of a Query -- Simple Scan SELECT *

    FROM orders WHERE discount = 0
  10. The Life of a Query -- Join and Aggregation SELECT

    orders.orderkey, SUM(tax) FROM orders LEFT JOIN lineitem ON orders.orderkey = lineitem.orderkey WHERE discount = 0 GROUP BY orders.orderkey This example is from Presto: SQL on Everything https://research.fb.com/publications/presto-sql-on-everything/
  11. Logical Plan -- do NOT join two big tables This

    example is from Presto: SQL on Everything https://research.fb.com/publications/presto-sql-on-everything/
  12. Q&A

  13. Join the Presto Community • Require new feature or file

    a bug: github.com/prestodb/presto • Slack: prestodb.slack.com • Twitter: @prestodb 22 Stay up-to-date with Ahana • URL: ahana.io • Twitter: @ahanaio