Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AppsFlyer presenting: Cascalog, MapReduce for The Code Craftsman
Search
AppsFlyer
November 06, 2014
Programming
0
930
AppsFlyer presenting: Cascalog, MapReduce for The Code Craftsman
This presentation describe AppsFlyer's work with Hadoop in the Clojure production environment.
AppsFlyer
November 06, 2014
Tweet
Share
More Decks by AppsFlyer
See All by AppsFlyer
Processing 15 Billion events a day without breaking the bank - ReversimX ILTechTalks
appsflyer
0
440
Journey to the Real-Time Analytics in Extreme Growth
appsflyer
0
260
10 Real problems & solutions in your build and deploy process
appsflyer
0
130
DevOps paradigm in R&D day-to-day
appsflyer
0
110
Building a Mobile Backend to Evolve
appsflyer
0
73
Ido Barkan
appsflyer
1
130
Sometimes, Druid is not the best solution for a business use case
appsflyer
1
410
Processing 8 Billion Daily Events in Real Time!
appsflyer
1
88
React Performance
appsflyer
1
180
Other Decks in Programming
See All in Programming
新宿ダンジョンを可視化してみた
satoshi7190
3
390
CA.swift19 恋するAIアプリ開発の裏側
oskmr
0
380
Fast JSX: Don't clone props object #28768
yossydev
1
190
見た目から始める生産性向上
ikumatadokoro
10
1.4k
禅の心を手に入れよ
eltociear
1
370
WebGLで始める コンピュータグラフィックス入門
heller77
0
180
AWS CDKコントリビュートTIPS / aws-cdk-contribution-tips
gotok365
4
390
try! Swift Tokyo 初参加報告LT
hinakko2
0
240
スクラムガイドのスプリントレトロスペクティブを改めて読みかえしてみた / Re-reading the Sprint Retrospective Section in the Scrum Guide
mackey0225
3
490
SIMD Parallel Programming with the Vector API
josepaumard
0
230
Ruby GitHub Packages
bkuhlmann
0
640
2 週間で Twitter Bot を作ってみた
contour_gara
0
770
Featured
See All Featured
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
65
14k
Teambox: Starting and Learning
jrom
128
8.4k
Adopting Sorbet at Scale
ufuk
69
8.6k
Reflections from 52 weeks, 52 projects
jeffersonlam
345
19k
Learning to Love Humans: Emotional Interface Design
aarron
267
39k
In The Pink: A Labor of Love
frogandcode
138
21k
Pencils Down: Stop Designing & Start Developing
hursman
117
11k
Visualization
eitanlees
137
14k
YesSQL, Process and Tooling at Scale
rocio
165
13k
Writing Fast Ruby
sferik
622
60k
What the flash - Photography Introduction
edds
64
11k
Designing for Performance
lara
601
67k
Transcript
Cascalog
AppsFlyer We use Clojure Used python previously Mobile app conversion
tracking
Challenges User Retention Cohort analysis ~0.5TB of data collected daily
What is Cascalog? A Clojure DSL for writing Hadoop jobs
On top of Cascading
Clojure Modern LISP for the JVM
Why Cascalog? We already know and love Clojure Same tools
- test in the REPL Custom operations are ordinary functions (no UDFs)
Why Cascalog? Cascalog is Datalog Fits our use cases well
Simple (once you know it )
Relations and Tuples Relation 1 Relation 2 t 1 t
2 t 3 t 4 t 1 t 2 t 3 t 4 Relational Model
Query Anatomy
Query Anatomy Order does not matter Grouping is implicit
Why Cascalog? Composition
Why Cascalog? Composition Joins are implicit
Generators Cascalog taps Hadoop and local file systems Clojure sequences
[["alice" 28] ["bob" 33] ["chris" 40] ["david" 25] ["emily" 25] ["george" 31]] Cascalog queries Defined using <-
Demo
The AppsFlyer flow Kafka Secor AWS S3 Data Collection Continuously
saves Kafka topics to HDFS/S3 according to a scheme
The AppsFlyer flow Data Processing Lemur Spins up Hadoop cluster
Submit steps Data processing (using cascalog) Export processed data (using Apache Sqoop) Postgresql 1 2 1 2
Questions ?