Upgrade to Pro — share decks privately, control downloads, hide ads and more …

EuroClojure 2012: Introduction to Cascalog

EuroClojure 2012: Introduction to Cascalog

Cascalog is a data processing library for Clojure. Cascalog mixes functional and logic programming. It makes data processing code very concise, easier to grasp and reason about.

Being just a Clojure library, all of Clojures’ features are just a keystroke away – no need to learn another custom language (like Pig or Hive). This also allows for powerful abstraction and composition capabilities.

This is the slide deck of my presentation at EuroClojure 2012 (http://euroclojure.com/2012). The slides are mainly about my motivations for using Cascalog. The library itself was presented in a live coding environment. As soon as the recording is available, I'll add a link to it here.

Keywords: Hadoop, Clojure, Cascalog, Pig, Hive, Logic Programming

Stefan Hübner

May 31, 2012
Tweet

More Decks by Stefan Hübner

Other Decks in Programming

Transcript

  1. Hadoop • Batch Processing • (Very) Large Scale • Distributed

    Filesystem • Parallel Computation • Fault-Tolerant
  2. Hadoop • Batch Processing • (Very) Large Scale • Distributed

    Filesystem • Parallel Computation • Fault-Tolerant
  3. Hadoop • Batch Processing • (Very) Large Scale • Distributed

    Filesystem • Parallel Computation • Fault-Tolerant
  4. • Tedious and verbose • Hard to test • Hard

    to refactor Hadoop MapReduce API
  5. Pig and Hive • Define their own query language •

    Custom operations in Java, Python, ... • Non-intuitive integration
  6. Star Trek I - "The Motion Picture", Paramount Pictures "Ist

    das wirklich alles? Ist da sonst gar nichts mehr?"
  7. Cascalog Cascalog Cascading Hadoop Abstraction Variables and logic Tuples, data

    workflows Key/value pairs, simple aggregation slide (c) Nathan Marz, reproduced with permission
  8. Queries (<- ; defines a query [?person] ; output variables

    (age ?person ?age) ; generator with two variables (< ?age 30)) ; filter
  9. Queries (<- [?person] (age ?person ?age) ; generator with two

    variables (< ?age 30)) ; filter Predicates