Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AppsFlyer presenting: Cascalog, MapReduce for T...

AppsFlyer
November 06, 2014

AppsFlyer presenting: Cascalog, MapReduce for The Code Craftsman

This presentation describe AppsFlyer's work with Hadoop in the Clojure production environment.

AppsFlyer

November 06, 2014
Tweet

More Decks by AppsFlyer

Other Decks in Programming

Transcript

  1. Why Cascalog? We already know and love Clojure Same tools

    - test in the REPL Custom operations are ordinary functions (no UDFs)
  2. Relations and Tuples Relation 1 Relation 2 t 1 t

    2 t 3 t 4 t 1 t 2 t 3 t 4 Relational Model
  3. Generators Cascalog taps Hadoop and local file systems Clojure sequences

    [["alice" 28] ["bob" 33] ["chris" 40] ["david" 25] ["emily" 25] ["george" 31]] Cascalog queries Defined using <-
  4. The AppsFlyer flow Kafka Secor AWS S3 Data Collection Continuously

    saves Kafka topics to HDFS/S3 according to a scheme
  5. The AppsFlyer flow Data Processing Lemur Spins up Hadoop cluster

    Submit steps Data processing (using cascalog) Export processed data (using Apache Sqoop) Postgresql 1 2 1 2