Slide 1

Slide 1 text

Apache  Drill   interac.ve,  ad-­‐hoc  query  at  scale   Michael  Hausenblas,  Chief  Data  Engineer  EMEA,  MapR   Cloud  East  2013,  Cambridge,  UK,  2013-­‐05-­‐24  

Slide 2

Slide 2 text

#cloudeast   #meme   #geekhumor   @Hr   @rbin   @andypiper   #unicorn   #cloudcompuJng   @TashaDrew  

Slide 3

Slide 3 text

Which   workloads  do   you   encounter  in   your   environment?   hPp://www.flickr.com/photos/kevinomara/2866648330/  licensed  under  CC  BY-­‐NC-­‐ND  2.0  

Slide 4

Slide 4 text

Batch  processing   …  for  recurring  tasks  such  as  large-­‐scale  data  mining,  ETL   offloading/data-­‐warehousing  à  for  the  batch  layer  in  Lambda   architecture  

Slide 5

Slide 5 text

OLTP   …  user-­‐facing  eCommerce  transacJons,  real-­‐Jme  messaging  at   scale  (FB),  Jme-­‐series  processing,  etc.  à  for  the  serving  layer  in   Lambda  architecture  

Slide 6

Slide 6 text

Stream  processing   …  in  order  to  handle  stream  sources  such  as  social  media  feeds   or  sensor  data  (mobile  phones,  RFID,  weather  staJons,  etc.)  à   for  the  speed  layer  in  Lambda  architecture    

Slide 7

Slide 7 text

Search/InformaJon  Retrieval   …  retrieval  of  items  from  unstructured  documents  (plain   text,  etc.),  semi-­‐structured  data  formats  (JSON,  etc.),  as   well  as  data  stores  (MongoDB,  CouchDB,  etc.)  

Slide 8

Slide 8 text

hPp://www.flickr.com/photos/9479603@N02/4144121838/    licensed  under  CC  BY-­‐NC-­‐ND  2.0   But  what  about   interac.ve   ad-­‐hoc  query     at  scale?  

Slide 9

Slide 9 text

Impala InteracJve  Query  (?)   low-­‐latency  

Slide 10

Slide 10 text

Use  Case:  MarkeJng  Campaign   •  Jane,  a  markeJng  analyst   •  Determine  target  segments   •  Data  from  different  sources    

Slide 11

Slide 11 text

Use  Case:  LogisJcs   •  Supplier  tracking  and  performance   •  Queries   – Shipments  from  supplier  ‘ACM’  in  last  24h   – Shipments  in  region  ‘US’  not  from  ‘ACM’   SUPPLIER_ID   NAME   REGION   ACM   ACME  Corp   US   GAL   GotALot  Inc   US   BAP   Bits  and  Pieces  Ltd   Europe   ZUP   Zu  Pli   Asia   { "shipment": 100123, "supplier": "ACM", “timestamp": "2013-02-01", "description": ”first delivery today” }, { "shipment": 100124, "supplier": "BAP", "timestamp": "2013-02-02", "description": "hope you enjoy it” } …

Slide 12

Slide 12 text

Use  Case:  Crime  DetecJon   •  Online  purchases   •  Fraud,  bilking,  etc.   •  Batch-­‐generated  overview   •  Modes   – ExploraJve   – Alerts  

Slide 13

Slide 13 text

Requirements   •  Support  for  different  data  sources   •  Support  for  different  query  interfaces   •  Low-­‐latency/real-­‐Jme   •  Ad-­‐hoc  queries   •  Scalable,  reliable  

Slide 14

Slide 14 text

And now for something completely different …

Slide 15

Slide 15 text

Google’s  Dremel   hPp://research.google.com/pubs/pub36632.html       Sergey  Melnik,  Andrey  Gubarev,  Jing  Jing  Long,  Geoffrey  Romer,  Shiva  Shivakumar,  Ma@  Tolton,   Theo  Vassilakis,  Proc.  of  the  36th  Int'l  Conf  on  Very  Large  Data  Bases  (2010),  pp.  330-­‐339   Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google. … “ “ Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google. …

Slide 16

Slide 16 text

Google’s  Dremel   multi-level execution trees   columnar data layout  

Slide 17

Slide 17 text

Google’s  Dremel   nested data + schema   column-striped representation   map nested data to tables  

Slide 18

Slide 18 text

Google’s  Dremel   experiments: datasets & query performance  

Slide 19

Slide 19 text

Back to Apache Drill …

Slide 20

Slide 20 text

Apache  Drill–key  facts   •  Inspired  by  Google’s  Dremel   •  Standard  SQL  2003  support   •  Plug-­‐able  data  sources   •  Nested  data  is  a  first-­‐class  ciJzen   •  Schema  is  op.onal   •  Community  driven,  open,  100’s  involved  

Slide 21

Slide 21 text

High-­‐level  Architecture  

Slide 22

Slide 22 text

Principled  Query  ExecuJon   •  Source  query—what  we  want  to  do  (analyst   friendly)   •  Logical  Plan—  what  we  want  to  do  (language   agnosJc,  computer  friendly)   •  Physical  Plan—how  we  want  to  do  it  (the  best   way  we  can  tell)   •  Execu.on  Plan—where  we  want  to  do  it  

Slide 23

Slide 23 text

Principled  Query  ExecuJon   Source   Query   Parser   Logical   Plan   OpJmizer   Physical   Plan   ExecuJon   SQL  2003     DrQL   MongoQL   DSL   scanner  API   Topology   CF   etc.   query: [ { @id: "log", op: "sequence", do: [ { op: "scan", source: “logs” }, { op: "filter", condition: "x > 3” }, parser  API  

Slide 24

Slide 24 text

Wire-­‐level  Architecture   •  Each  node:  Drillbit  -­‐  maximize  data  locality   •  Co-­‐ordinaJon,  query  planning,  execuJon,  etc,  are  distributed   •  Any  node  can  act  as  endpoint  for  a  query—foreman   Storage   Process   Drillbit   node   Storage   Process   Drillbit   node   Storage   Process   Drillbit   node   Storage   Process   Drillbit   node  

Slide 25

Slide 25 text

Wire-­‐level  Architecture   •  Curator/Zookeeper  for  ephemeral  cluster  membership  info   •  Distributed  cache  (Hazelcast)  for  metadata,  locality   informaJon,  etc.   Curator/Zk   Distributed  Cache   Storage   Process   Drillbit   node   Storage   Process   Drillbit   node   Storage   Process   Drillbit   node   Storage   Process   Drillbit   node   Distributed  Cache   Distributed  Cache   Distributed  Cache  

Slide 26

Slide 26 text

Wire-­‐level  Architecture   •  OriginaJng  Drillbit  acts  as  foreman:  manages  query  execuJon,   scheduling,  locality  informaJon,  etc.   •  Streaming  data  communica.on  avoiding  SerDe   Curator/Zk   Distributed  Cache   Storage   Process   Drillbit   node   Storage   Process   Drillbit   node   Storage   Process   Drillbit   node   Storage   Process   Drillbit   node   Distributed  Cache   Distributed  Cache   Distributed  Cache  

Slide 27

Slide 27 text

Wire-­‐level  Architecture   Foreman  turns  into   root  of  the  mulJ-­‐level   execuJon  tree,  leafs   acJvate  their  storage   engine  interface.   node   node   node   Curator/Zk  

Slide 28

Slide 28 text

On  the  shoulders  of  giants  …   •  Jackson  for  JSON  SerDe  for  metadata   •  Typesafe  HOCON  for  configuraJon  and  module  management   •  NeXy4  as  core  RPC  engine,  protobuf  for  communicaJon   •  Vanilla  Java,  Larray  and  NeXy  ByteBuf  for  off-­‐heap  large  data  structures   •  Hazelcast  for  distributed  cache   •  Neulix  Curator  on  top  of  Zookeeper  for  service  registry   •  Op.q  for  SQL  parsing  and  cost  opJmizaJon   •  Parquet  (hPp://parquet.io)  as  naJve  columnar  format   •  Janino  for  expression  compilaJon     •  ASM  for  ByteCode  manipulaJon   •  Yammer  Metrics  for  metrics   •  Guava  extensively   •  Carrot  HPC  for  primiJve  collecJons  

Slide 29

Slide 29 text

Key  features   •  Full  SQL  –  ANSI  SQL  2003   •  Nested  Data  as  first  class  ciJzen   •  OpJonal  Schema   •  Extensibility  Points  …  

Slide 30

Slide 30 text

Extensibility  Points   •  Source  query  à  parser  API   •  Custom  operators,  UDF  à  logical  plan   •  Serving  tree,  CF,  topology  à  physical  plan/opJmizer   •  Data  sources  &formats  à  scanner  API   Source   Query   Parser   Logical   Plan   OpJmizer   Physical   Plan   ExecuJon  

Slide 31

Slide 31 text

…  and  Hadoop?   •  How  is  it  different  to  Hive,  Cascading,  etc.?   •  Complementary  use  cases*   •  …  use  Apache  Drill   –  Find  record  with  specified  condiJon   –  AggregaJon  under  dynamic  condiJons   •  …  use  MapReduce   –  Data  mining  with  mulJple  iteraJons   –  ETL   *)  hPps://cloud.google.com/files/BigQueryTechnicalWP.pdf    

Slide 32

Slide 32 text

User  Interfaces  

Slide 33

Slide 33 text

User  Interfaces   •  API—DrillClient     – Encapsulates  endpoint  discovery   – Supports  logical  and  physical  plan  submission,   query  cancellaJon,  query  status   – Supports  streaming  return  results   •  JDBC  driver,  converJng  JDBC  into  DrillClient   communicaJon.       •  REST  proxy  for  DrillClient  

Slide 34

Slide 34 text

User  Interfaces  

Slide 35

Slide 35 text

LET’S  GET  OUR  HANDS  DIRTY…  

Slide 36

Slide 36 text

Basic  Demo   hPps://cwiki.apache.org/confluence/display/DRILL/Demo+HowTo     { "id": "0001", "type": "donut", ”ppu": 0.55, "batters": { "batter”: [ { "id": "1001", "type": "Regular" }, { "id": "1002", "type": "Chocolate" }, … data  source:  donuts.json   query:[ { op:"sequence", do:[ { op: "scan", ref: "donuts", source: "local-logs", selection: {data: "activity"} }, { op: "filter", expr: "donuts.ppu < 2.00" }, … logical  plan:  simple_plan.json   result:  out.json   { "sales" : 700.0, "typeCount" : 1, "quantity" : 700, "ppu" : 1.0 } { "sales" : 109.71, "typeCount" : 2, "quantity" : 159, "ppu" : 0.69 } { "sales" : 184.25, "typeCount" : 2, "quantity" : 335, "ppu" : 0.55 }

Slide 37

Slide 37 text

SELECT t.cf1.name as name, SUM(t.cf1.sales) as total_sales FROM m7://cluster1/sales t GROUP BY name ORDER BY by total_sales desc LIMIT 10;

Slide 38

Slide 38 text

sequence: [ { op: scan, storageengine: m7, selection: {table: sales}} { op: project, projections: [ {ref: name, expr: cf1.name}, {ref: sales, expr: cf1.sales}]} { op: segment, ref: by_name, exprs: [name]} { op: collapsingaggregate, target: by_name, carryovers: [name], aggregations: [{ref: total_sales, expr: sum(name)}]} { op: order, ordering: [{order: desc, expr: total_sales}]} { op: store, storageengine: screen} ]

Slide 39

Slide 39 text

{ @id: 1, pop: m7scan, cluster: def, table: sales, cols: [cf1.name, cf2.name] } { @id: 2, op: hash-random-exchange, input: 1, expr: 1 } { @id: 3, op: sorting-hash-aggregate, input: 2, grouping: 1, aggr:[sum(2)], carry: [1], sort: ~agrr[0] } { @id: 4, op: screen, input: 4 }

Slide 40

Slide 40 text

ExecuJon  Plan   •  Break  physical  plan  into  fragments   •  Determine  quanJty  of  parallelizaJon  for  each   task  based  on  esJmated  costs   •  Assign  parJcular  nodes  based  on  affinity,  load   and  topology  

Slide 41

Slide 41 text

BE  A  PART  OF  IT!  

Slide 42

Slide 42 text

Status   •  Heavy  development  by  mulJple  organizaJons   •  Available   – Logical  plan  (ADSP)   – Reference  interpreter   – Basic  SQL  parser     – Basic  demo  

Slide 43

Slide 43 text

Status   May  2013     •  Full  SQL  support  (+JDBC)   •  Physical  plan   •  In-­‐memory  compressed  data  interfaces   •  Distributed  execuJon  

Slide 44

Slide 44 text

Status   May  2013     •  HBase  and  MySQL  storage  engine   •  WebUI  client  

Slide 45

Slide 45 text

ContribuJng   ContribuJons  appreciated  (not  only  code  drops)  …     •  Test  data  &  test  queries   •  Use  case  scenarios  (textual/SQL  queries)   •  DocumentaJon   •  Further  schedule   –  Alpha  Q2   –  Beta  Q3  

Slide 46

Slide 46 text

Kudos  to  …   •  Julian  Hyde,  Pentaho     •  Lisen  Mu,  XingCloud   •  Tim  Chen,  Microso{   •  Chris  Merrick,  RJMetrics     •  David  Alves,  UT  AusJn   •  Sree  Vaadi,  SSS   •  Jacques  Nadeau,  MapR   •  Ted  Dunning,  MapR  

Slide 47

Slide 47 text

Engage!   •  Follow  @ApacheDrill  on  TwiPer   •  Sign  up  at  mailing  lists  (user  |  dev)     hPp://incubator.apache.org/drill/mailing-­‐lists.html       •  Standing  G+  hangouts  every  Tuesday  at  5pm  GMT   hPp://j.mp/apache-­‐drill-­‐hangouts     •  Keep  an  eye  on  hPp://drill-­‐user.org/