Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cascalog

 Cascalog

Short demo talk on Cascalog on Hadoop UG in Munich

αλεx π

May 22, 2013
Tweet

More Decks by αλεx π

Other Decks in Technology

Transcript

  1. Setting expecations •This is not a guide •And not a

    tutorial •Doesn’t claim to be complete •Mostly to give you an idea •And encourage you to explore further Thursday, May 23, 13
  2. How much time do you spend on writing logic that

    framework should take care of? Thursday, May 23, 13
  3. Hadoop + Java composable, but too vebrose Pig, Hive too

    concrete, lack of abstraction and composition Thursday, May 23, 13
  4. • Clear, declarative syntax • Inner and outer joins •

    Aggregators • Functions • Subqueries, composition • Sorting • Performant Thursday, May 23, 13
  5. Casca-WHAT? • Built on top of Hadoop (MapReduce) • Cascading

    (tuples, workflows, job execution) • Written in Clojure • Datalog (logic programming) Thursday, May 23, 13
  6. Sources and sinks • HDFS (go figure) • Cassandra •

    MongoDB • SQL data sources • File system • Memory sources Thursday, May 23, 13
  7. (?<- (stdout) [?person] (age ?person 25)) Exact match of second

    element in a tuple Thursday, May 23, 13
  8. (defn younger-than? [limit age] (< age limit)) (?<- (stdout) [?person

    ?age] (age ?person ?age) (younger-than? 32 ?age)) Predicate match, fn call Predicate Thursday, May 23, 13
  9. Benefits •Query language is same as application language •Subqueries, reusability

    •Ad-hoc querying •Cascading underneath, so taps for all DBs work •Reuse application logic •Text editor integration Thursday, May 23, 13