Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Berlin Buzzwords 2012: Introduction to Cascalog

Berlin Buzzwords 2012: Introduction to Cascalog

These are the slides for my talk "Introduction to Cascalog: Functional Data Processing for Hadoop". http://berlinbuzzwords.de/sessions/introducing-cascalog-functional-data-processing-hadoop

The talk features a live demonstration of Cascalog. Check out the video on Vimeo: https://vimeo.com/album/1978224/video/43804713

Stefan Hübner

June 18, 2012
Tweet

More Decks by Stefan Hübner

Other Decks in Technology

Transcript

  1. Hadoop • Batch Processing • (Very) Large Scale • Distributed

    Filesystem • Parallel Computation • Fault-Tolerant
  2. Hadoop • Batch Processing • (Very) Large Scale • Distributed

    Filesystem • Parallel Computation • Fault-Tolerant
  3. • Tedious and verbose • Hard to test • Hard

    to refactor Hadoop MapReduce API
  4. Pig and Hive • Define their own query language •

    Custom operations in Java, Python, ... • Non-intuitive integration
  5. Cascalog Cascalog Cascading Hadoop Abstraction Variables and logic Tuples, data

    workflows Key/value pairs, simple aggregation slide (c) Nathan Marz, reproduced with permission