Slide 1

Slide 1 text

Introduction to Cascalog Stefan Hübner, Nokia Berlin EuroClojure 2012

Slide 2

Slide 2 text

Nokia Maps • Address & POI Search • Category Search • Nearby Recommendations

Slide 3

Slide 3 text

Hadoop • Batch Processing • (Very) Large Scale • Distributed Filesystem • Parallel Computation • Fault-Tolerant

Slide 4

Slide 4 text

Hadoop • Batch Processing • (Very) Large Scale • Distributed Filesystem • Parallel Computation • Fault-Tolerant

Slide 5

Slide 5 text

Hadoop • Batch Processing • (Very) Large Scale • Distributed Filesystem • Parallel Computation • Fault-Tolerant

Slide 6

Slide 6 text

Hadoop MapReduce API • Tedious and verbose • Hard to test • Hard to refactor

Slide 7

Slide 7 text

• Tedious and verbose • Hard to test • Hard to refactor Hadoop MapReduce API

Slide 8

Slide 8 text

Pig and Hive • Define their own query language • Custom operations in Java, Python, ... • Non-intuitive integration

Slide 9

Slide 9 text

(())

Slide 10

Slide 10 text

Star Trek I - "The Motion Picture", Paramount Pictures V'Gr's question

Slide 11

Slide 11 text

Star Trek I - "The Motion Picture", Paramount Pictures "Ist das wirklich alles? Ist da sonst gar nichts mehr?"

Slide 12

Slide 12 text

Cascalog Cascalog Cascading Hadoop Abstraction Variables and logic Tuples, data workflows Key/value pairs, simple aggregation slide (c) Nathan Marz, reproduced with permission

Slide 13

Slide 13 text

Queries (<- ; defines a query [?person] ; output variables (age ?person ?age) ; generator with two variables (< ?age 30)) ; filter

Slide 14

Slide 14 text

Queries (<- [?person] (age ?person ?age) ; generator with two variables (< ?age 30)) ; filter Predicates

Slide 15

Slide 15 text

Predicates • Functions • Filters • Aggregators • Generators

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

Thank you! Stefan Hübner, @sthuebner http://knowyourmeme.com/memes/cereal-guy