Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AppsFlyer presenting: Cascalog, MapReduce for T...
Search
AppsFlyer
November 06, 2014
Programming
0
940
AppsFlyer presenting: Cascalog, MapReduce for The Code Craftsman
This presentation describe AppsFlyer's work with Hadoop in the Clojure production environment.
AppsFlyer
November 06, 2014
Tweet
Share
More Decks by AppsFlyer
See All by AppsFlyer
Processing 15 Billion events a day without breaking the bank - ReversimX ILTechTalks
appsflyer
0
460
Journey to the Real-Time Analytics in Extreme Growth
appsflyer
0
270
10 Real problems & solutions in your build and deploy process
appsflyer
0
130
DevOps paradigm in R&D day-to-day
appsflyer
0
130
Building a Mobile Backend to Evolve
appsflyer
0
77
Ido Barkan
appsflyer
1
130
Sometimes, Druid is not the best solution for a business use case
appsflyer
1
410
Processing 8 Billion Daily Events in Real Time!
appsflyer
1
99
React Performance
appsflyer
1
200
Other Decks in Programming
See All in Programming
ピラミッド、アイスクリームコーン、SMURF: 自動テストの最適バランスを求めて / Pyramid Ice-Cream-Cone and SMURF
twada
PRO
10
1.3k
Streams APIとTCPフロー制御 / Web Streams API and TCP flow control
tasshi
2
350
2024/11/8 関西Kaggler会 2024 #3 / Kaggle Kernel で Gemma 2 × vLLM を動かす。
kohecchi
5
920
役立つログに取り組もう
irof
28
9.6k
Realtime API 入門
riofujimon
0
150
Jakarta Concurrencyによる並行処理プログラミングの始め方 (JJUG CCC 2024 Fall)
tnagao7
1
290
Better Code Design in PHP
afilina
PRO
0
120
3 Effective Rules for Using Signals in Angular
manfredsteyer
PRO
0
100
とにかくAWS GameDay!AWSは世界の共通言語! / Anyway, AWS GameDay! AWS is the world's lingua franca!
seike460
PRO
1
860
聞き手から登壇者へ: RubyKaigi2024 LTでの初挑戦が 教えてくれた、可能性の星
mikik0
1
130
エンジニアとして関わる要件と仕様(公開用)
murabayashi
0
290
GitHub Actionsのキャッシュと手を挙げることの大切さとそれに必要なこと
satoshi256kbyte
5
430
Featured
See All Featured
Code Reviewing Like a Champion
maltzj
520
39k
Large-scale JavaScript Application Architecture
addyosmani
510
110k
Faster Mobile Websites
deanohume
305
30k
Thoughts on Productivity
jonyablonski
67
4.3k
Rebuilding a faster, lazier Slack
samanthasiow
79
8.7k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
47
5k
Bash Introduction
62gerente
608
210k
Building Adaptive Systems
keathley
38
2.3k
It's Worth the Effort
3n
183
27k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
356
29k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
109
49k
Being A Developer After 40
akosma
86
590k
Transcript
Cascalog
AppsFlyer We use Clojure Used python previously Mobile app conversion
tracking
Challenges User Retention Cohort analysis ~0.5TB of data collected daily
What is Cascalog? A Clojure DSL for writing Hadoop jobs
On top of Cascading
Clojure Modern LISP for the JVM
Why Cascalog? We already know and love Clojure Same tools
- test in the REPL Custom operations are ordinary functions (no UDFs)
Why Cascalog? Cascalog is Datalog Fits our use cases well
Simple (once you know it )
Relations and Tuples Relation 1 Relation 2 t 1 t
2 t 3 t 4 t 1 t 2 t 3 t 4 Relational Model
Query Anatomy
Query Anatomy Order does not matter Grouping is implicit
Why Cascalog? Composition
Why Cascalog? Composition Joins are implicit
Generators Cascalog taps Hadoop and local file systems Clojure sequences
[["alice" 28] ["bob" 33] ["chris" 40] ["david" 25] ["emily" 25] ["george" 31]] Cascalog queries Defined using <-
Demo
The AppsFlyer flow Kafka Secor AWS S3 Data Collection Continuously
saves Kafka topics to HDFS/S3 according to a scheme
The AppsFlyer flow Data Processing Lemur Spins up Hadoop cluster
Submit steps Data processing (using cascalog) Export processed data (using Apache Sqoop) Postgresql 1 2 1 2
Questions ?