Apache Drill - interactive query and analysis at scale

Slide 1

Slide 1 text

Apache Drill interac.ve, ad-‐hoc query at scale Michael Hausenblas, Chief Data Engineer EMEA, MapR Cloud East 2013, Cambridge, UK, 2013-‐05-‐24

Slide 2

Slide 2 text

#cloudeast #meme #geekhumor @Hr @rbin @andypiper #unicorn #cloudcompuJng @TashaDrew

Slide 3

Slide 3 text

Which workloads do you encounter in your environment? hPp://www.ﬂickr.com/photos/kevinomara/2866648330/ licensed under CC BY-‐NC-‐ND 2.0

Slide 4

Slide 4 text

Batch processing … for recurring tasks such as large-‐scale data mining, ETL oﬄoading/data-‐warehousing à for the batch layer in Lambda architecture

Slide 5

Slide 5 text

OLTP … user-‐facing eCommerce transacJons, real-‐Jme messaging at scale (FB), Jme-‐series processing, etc. à for the serving layer in Lambda architecture

Slide 6

Slide 6 text

Stream processing … in order to handle stream sources such as social media feeds or sensor data (mobile phones, RFID, weather staJons, etc.) à for the speed layer in Lambda architecture

Slide 7

Slide 7 text

Search/InformaJon Retrieval … retrieval of items from unstructured documents (plain text, etc.), semi-‐structured data formats (JSON, etc.), as well as data stores (MongoDB, CouchDB, etc.)

Slide 8

Slide 8 text

hPp://www.ﬂickr.com/photos/9479603@N02/4144121838/ licensed under CC BY-‐NC-‐ND 2.0 But what about interac.ve ad-‐hoc query at scale?

Slide 9

Slide 9 text

Impala InteracJve Query (?) low-‐latency

Slide 10

Slide 10 text

Use Case: MarkeJng Campaign •  Jane, a markeJng analyst •  Determine target segments •  Data from diﬀerent sources

Slide 11

Slide 11 text

Use Case: LogisJcs •  Supplier tracking and performance •  Queries – Shipments from supplier ‘ACM’ in last 24h – Shipments in region ‘US’ not from ‘ACM’ SUPPLIER_ID NAME REGION ACM ACME Corp US GAL GotALot Inc US BAP Bits and Pieces Ltd Europe ZUP Zu Pli Asia { "shipment": 100123, "supplier": "ACM", “timestamp": "2013-02-01", "description": ”first delivery today” }, { "shipment": 100124, "supplier": "BAP", "timestamp": "2013-02-02", "description": "hope you enjoy it” } …

Slide 12

Slide 12 text

Use Case: Crime DetecJon •  Online purchases •  Fraud, bilking, etc. •  Batch-‐generated overview •  Modes – ExploraJve – Alerts

Slide 13

Slide 13 text

Requirements •  Support for diﬀerent data sources •  Support for diﬀerent query interfaces •  Low-‐latency/real-‐Jme •  Ad-‐hoc queries •  Scalable, reliable

Slide 14

Slide 14 text

And now for something completely different …

Slide 15

Slide 15 text

Google’s Dremel hPp://research.google.com/pubs/pub36632.html Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoﬀrey Romer, Shiva Shivakumar, Ma@ Tolton, Theo Vassilakis, Proc. of the 36th Int'l Conf on Very Large Data Bases (2010), pp. 330-‐339 Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google. … “ “ Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google. …

Slide 16

Slide 16 text

Google’s Dremel multi-level execution trees columnar data layout

Slide 17

Slide 17 text

Google’s Dremel nested data + schema column-striped representation map nested data to tables

Slide 18

Slide 18 text

Google’s Dremel experiments: datasets & query performance

Slide 19

Slide 19 text

Back to Apache Drill …

Slide 20

Slide 20 text

Apache Drill–key facts •  Inspired by Google’s Dremel •  Standard SQL 2003 support •  Plug-‐able data sources •  Nested data is a ﬁrst-‐class ciJzen •  Schema is op.onal •  Community driven, open, 100’s involved

Slide 21

Slide 21 text

High-‐level Architecture

Slide 22

Slide 22 text

Principled Query ExecuJon •  Source query—what we want to do (analyst friendly) •  Logical Plan— what we want to do (language agnosJc, computer friendly) •  Physical Plan—how we want to do it (the best way we can tell) •  Execu.on Plan—where we want to do it

Slide 23

Slide 23 text

Principled Query ExecuJon Source Query Parser Logical Plan OpJmizer Physical Plan ExecuJon SQL 2003 DrQL MongoQL DSL scanner API Topology CF etc. query: [ { @id: "log", op: "sequence", do: [ { op: "scan", source: “logs” }, { op: "filter", condition: "x > 3” }, parser API

Slide 24

Slide 24 text

Wire-‐level Architecture •  Each node: Drillbit -‐ maximize data locality •  Co-‐ordinaJon, query planning, execuJon, etc, are distributed •  Any node can act as endpoint for a query—foreman Storage Process Drillbit node Storage Process Drillbit node Storage Process Drillbit node Storage Process Drillbit node

Slide 25

Slide 25 text

Wire-‐level Architecture •  Curator/Zookeeper for ephemeral cluster membership info •  Distributed cache (Hazelcast) for metadata, locality informaJon, etc. Curator/Zk Distributed Cache Storage Process Drillbit node Storage Process Drillbit node Storage Process Drillbit node Storage Process Drillbit node Distributed Cache Distributed Cache Distributed Cache

Slide 26

Slide 26 text

Wire-‐level Architecture •  OriginaJng Drillbit acts as foreman: manages query execuJon, scheduling, locality informaJon, etc. •  Streaming data communica.on avoiding SerDe Curator/Zk Distributed Cache Storage Process Drillbit node Storage Process Drillbit node Storage Process Drillbit node Storage Process Drillbit node Distributed Cache Distributed Cache Distributed Cache

Slide 27

Slide 27 text

Wire-‐level Architecture Foreman turns into root of the mulJ-‐level execuJon tree, leafs acJvate their storage engine interface. node node node Curator/Zk

Slide 28

Slide 28 text

On the shoulders of giants … •  Jackson for JSON SerDe for metadata •  Typesafe HOCON for conﬁguraJon and module management •  NeXy4 as core RPC engine, protobuf for communicaJon •  Vanilla Java, Larray and NeXy ByteBuf for oﬀ-‐heap large data structures •  Hazelcast for distributed cache •  Neulix Curator on top of Zookeeper for service registry •  Op.q for SQL parsing and cost opJmizaJon •  Parquet (hPp://parquet.io) as naJve columnar format •  Janino for expression compilaJon •  ASM for ByteCode manipulaJon •  Yammer Metrics for metrics •  Guava extensively •  Carrot HPC for primiJve collecJons

Slide 29

Slide 29 text

Key features •  Full SQL – ANSI SQL 2003 •  Nested Data as ﬁrst class ciJzen •  OpJonal Schema •  Extensibility Points …

Slide 30

Slide 30 text

Extensibility Points •  Source query à parser API •  Custom operators, UDF à logical plan •  Serving tree, CF, topology à physical plan/opJmizer •  Data sources &formats à scanner API Source Query Parser Logical Plan OpJmizer Physical Plan ExecuJon

Slide 31

Slide 31 text

… and Hadoop? •  How is it different to Hive, Cascading, etc.? •  Complementary use cases* •  … use Apache Drill –  Find record with specified condiJon –  AggregaJon under dynamic condiJons •  … use MapReduce –  Data mining with mulJple iteraJons –  ETL *) hPps://cloud.google.com/files/BigQueryTechnicalWP.pdf

Slide 32

Slide 32 text

User Interfaces

Slide 33

Slide 33 text

User Interfaces •  API—DrillClient – Encapsulates endpoint discovery – Supports logical and physical plan submission, query cancellaJon, query status – Supports streaming return results •  JDBC driver, converJng JDBC into DrillClient communicaJon. •  REST proxy for DrillClient

Slide 34

Slide 34 text

User Interfaces

Slide 35

Slide 35 text

LET’S GET OUR HANDS DIRTY…

Slide 36

Slide 36 text

Basic Demo hPps://cwiki.apache.org/conﬂuence/display/DRILL/Demo+HowTo { "id": "0001", "type": "donut", ”ppu": 0.55, "batters": { "batter”: [ { "id": "1001", "type": "Regular" }, { "id": "1002", "type": "Chocolate" }, … data source: donuts.json query:[ { op:"sequence", do:[ { op: "scan", ref: "donuts", source: "local-logs", selection: {data: "activity"} }, { op: "filter", expr: "donuts.ppu < 2.00" }, … logical plan: simple_plan.json result: out.json { "sales" : 700.0, "typeCount" : 1, "quantity" : 700, "ppu" : 1.0 } { "sales" : 109.71, "typeCount" : 2, "quantity" : 159, "ppu" : 0.69 } { "sales" : 184.25, "typeCount" : 2, "quantity" : 335, "ppu" : 0.55 }

Slide 37

Slide 37 text

SELECT t.cf1.name as name, SUM(t.cf1.sales) as total_sales FROM m7://cluster1/sales t GROUP BY name ORDER BY by total_sales desc LIMIT 10;

Slide 38

Slide 38 text

sequence: [ { op: scan, storageengine: m7, selection: {table: sales}} { op: project, projections: [ {ref: name, expr: cf1.name}, {ref: sales, expr: cf1.sales}]} { op: segment, ref: by_name, exprs: [name]} { op: collapsingaggregate, target: by_name, carryovers: [name], aggregations: [{ref: total_sales, expr: sum(name)}]} { op: order, ordering: [{order: desc, expr: total_sales}]} { op: store, storageengine: screen} ]

Slide 39

Slide 39 text

{ @id: 1, pop: m7scan, cluster: def, table: sales, cols: [cf1.name, cf2.name] } { @id: 2, op: hash-random-exchange, input: 1, expr: 1 } { @id: 3, op: sorting-hash-aggregate, input: 2, grouping: 1, aggr:[sum(2)], carry: [1], sort: ~agrr[0] } { @id: 4, op: screen, input: 4 }

Slide 40

Slide 40 text

ExecuJon Plan •  Break physical plan into fragments •  Determine quanJty of parallelizaJon for each task based on esJmated costs •  Assign parJcular nodes based on aﬃnity, load and topology

Slide 41

Slide 41 text

BE A PART OF IT!

Slide 42

Slide 42 text

Status •  Heavy development by mulJple organizaJons •  Available – Logical plan (ADSP) – Reference interpreter – Basic SQL parser – Basic demo

Slide 43

Slide 43 text

Status May 2013 •  Full SQL support (+JDBC) •  Physical plan •  In-‐memory compressed data interfaces •  Distributed execuJon

Slide 44

Slide 44 text

Status May 2013 •  HBase and MySQL storage engine •  WebUI client

Slide 45

Slide 45 text

ContribuJng ContribuJons appreciated (not only code drops) … •  Test data & test queries •  Use case scenarios (textual/SQL queries) •  DocumentaJon •  Further schedule –  Alpha Q2 –  Beta Q3

Slide 46

Slide 46 text

Kudos to … •  Julian Hyde, Pentaho •  Lisen Mu, XingCloud •  Tim Chen, Microso{ •  Chris Merrick, RJMetrics •  David Alves, UT AusJn •  Sree Vaadi, SSS •  Jacques Nadeau, MapR •  Ted Dunning, MapR

Slide 47

Slide 47 text

Engage! •  Follow @ApacheDrill on TwiPer •  Sign up at mailing lists (user | dev) hPp://incubator.apache.org/drill/mailing-‐lists.html •  Standing G+ hangouts every Tuesday at 5pm GMT hPp://j.mp/apache-‐drill-‐hangouts •  Keep an eye on hPp://drill-‐user.org/