Slide 1

Slide 1 text

Introduction to Cascalog Stefan Hübner, Nokia Berlin Berlin Buzzwords 2012

Slide 2

Slide 2 text

Nokia Maps • Address & POI Search • Category Search • Nearby Recommendations

Slide 3

Slide 3 text

Hadoop • Batch Processing • (Very) Large Scale • Distributed Filesystem • Parallel Computation • Fault-Tolerant

Slide 4

Slide 4 text

Hadoop • Batch Processing • (Very) Large Scale • Distributed Filesystem • Parallel Computation • Fault-Tolerant

Slide 5

Slide 5 text

Hadoop MapReduce API • Tedious and verbose • Hard to test • Hard to refactor

Slide 6

Slide 6 text

• Tedious and verbose • Hard to test • Hard to refactor Hadoop MapReduce API

Slide 7

Slide 7 text

Pig and Hive • Define their own query language • Custom operations in Java, Python, ... • Non-intuitive integration

Slide 8

Slide 8 text

(())

Slide 9

Slide 9 text

Cascalog Cascalog Cascading Hadoop Abstraction Variables and logic Tuples, data workflows Key/value pairs, simple aggregation slide (c) Nathan Marz, reproduced with permission

Slide 10

Slide 10 text

Thank you! Stefan Hübner, @sthuebner http://knowyourmeme.com/memes/cereal-guy