Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

* Revolute

Slide 4

Slide 4 text

Alex Boisvert

Slide 5

Slide 5 text

Work @

Slide 6

Slide 6 text

SQL-like query language (for Big Data)

Slide 7

Slide 7 text

( Scala ) embedded DSL

Slide 8

Slide 8 text

* Apache Hive * Scala Query * Cascalog Inspired by

Slide 9

Slide 9 text

Familiar + Type-safe + Expressive + Interactive

Slide 10

Slide 10 text

“Thin Layer” on top of Cascading.

Slide 11

Slide 11 text

A Taste of Revolute →

Slide 12

Slide 12 text

object Persons extends Table[(String, Int, String)] { def name = column[String]("name") def age = column[Int]("age") def gender = column[String]("gender") def * = name ~ age ~ gender }

Slide 13

Slide 13 text

-- SQL select p.* from Persons ↓ /* Revolute */ for { p Persons } yield p.* ←

Slide 14

Slide 14 text

/* output multiple fields */ for { p Persons } ← yield p.name ~ p.age

Slide 15

Slide 15 text

/* filtering */ for { p Persons if (p.age > 21) ← } yield p.name ~ p.age

Slide 16

Slide 16 text

/* combinators for complex expressions */ for { p Persons ← if (p.age > 21) && (gender === “m”) } yield p.name ~ p.age

Slide 17

Slide 17 text

/* How much time are people spending per location? */ for { Join(p, l) (Persons innerJoin Locations) ← on (_.name is _.name) _ Query.groupBy (p.name ~ l.city) ← time TimeSpent(l.timestamp) ← } yield p.name ~ l.city ~ time

Slide 18

Slide 18 text

// context provides table bindings for taps/sinks // // e.g. Traffic table HDFS or S3 → // Parter table MySQL → // Summary table Google Spreadsheet → flow(context) { val myQuery = for { … } yield … insert { for { (partner, segment, views) <- myQuery if segment in Set("a", "b", "c") } yield partner ~ views } into Summary }

Slide 19

Slide 19 text

* Query - one or more table joined together (“join”) - field selection and function application (“select”) - one or more filters (“where”) - grouping and sorting (“group by”, “sort by”) - aggregation based on groupings (“count()”, …) * Nest & chain queries * 1-1, 1-0/1, 1-N mappings * Null, Option & PartialFunction filtering … and (eventually) more awesome.

Slide 20

Slide 20 text

aboisvert / revolute @ github