Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AppsFlyer presenting: Cascalog, MapReduce for T...
Search
AppsFlyer
November 06, 2014
Programming
0
960
AppsFlyer presenting: Cascalog, MapReduce for The Code Craftsman
This presentation describe AppsFlyer's work with Hadoop in the Clojure production environment.
AppsFlyer
November 06, 2014
Tweet
Share
More Decks by AppsFlyer
See All by AppsFlyer
Processing 15 Billion events a day without breaking the bank - ReversimX ILTechTalks
appsflyer
0
500
Journey to the Real-Time Analytics in Extreme Growth
appsflyer
0
310
10 Real problems & solutions in your build and deploy process
appsflyer
0
140
DevOps paradigm in R&D day-to-day
appsflyer
0
150
Building a Mobile Backend to Evolve
appsflyer
0
110
Ido Barkan
appsflyer
1
150
Sometimes, Druid is not the best solution for a business use case
appsflyer
1
440
Processing 8 Billion Daily Events in Real Time!
appsflyer
1
130
React Performance
appsflyer
1
230
Other Decks in Programming
See All in Programming
dnx で実行できるコマンド、作ってみました
tomohisa
0
140
【CA.ai #3】ワークフローから見直すAIエージェント — 必要な場面と“選ばない”判断
satoaoaka
0
220
全員アーキテクトで挑む、 巨大で高密度なドメインの紐解き方
agatan
8
19k
[SF Ruby Conf 2025] Rails X
palkan
0
450
これだけで丸わかり!LangChain v1.0 アップデートまとめ
os1ma
6
1.3k
テストやOSS開発に役立つSetup PHP Action
matsuo_atsushi
0
140
関数の挙動書き換える
takatofukui
4
770
AI時代もSEOを頑張っている話
shirahama_x
0
230
connect-python: convenient protobuf RPC for Python
anuraaga
0
360
著者と進める!『AIと個人開発したくなったらまずCursorで要件定義だ!』
yasunacoffee
0
120
【Streamlit x Snowflake】データ基盤からアプリ開発・AI活用まで、すべてをSnowflake内で実現
ayumu_yamaguchi
1
110
複数人でのCLI/Infrastructure as Codeの暮らしを良くする
shmokmt
5
2.1k
Featured
See All Featured
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.6k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
22k
Product Roadmaps are Hard
iamctodd
PRO
55
12k
Embracing the Ebb and Flow
colly
88
4.9k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.7k
Thoughts on Productivity
jonyablonski
73
5k
Rebuilding a faster, lazier Slack
samanthasiow
84
9.3k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.8k
Testing 201, or: Great Expectations
jmmastey
46
7.8k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.5k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.4k
The Hidden Cost of Media on the Web [PixelPalooza 2025]
tammyeverts
1
80
Transcript
Cascalog
AppsFlyer We use Clojure Used python previously Mobile app conversion
tracking
Challenges User Retention Cohort analysis ~0.5TB of data collected daily
What is Cascalog? A Clojure DSL for writing Hadoop jobs
On top of Cascading
Clojure Modern LISP for the JVM
Why Cascalog? We already know and love Clojure Same tools
- test in the REPL Custom operations are ordinary functions (no UDFs)
Why Cascalog? Cascalog is Datalog Fits our use cases well
Simple (once you know it )
Relations and Tuples Relation 1 Relation 2 t 1 t
2 t 3 t 4 t 1 t 2 t 3 t 4 Relational Model
Query Anatomy
Query Anatomy Order does not matter Grouping is implicit
Why Cascalog? Composition
Why Cascalog? Composition Joins are implicit
Generators Cascalog taps Hadoop and local file systems Clojure sequences
[["alice" 28] ["bob" 33] ["chris" 40] ["david" 25] ["emily" 25] ["george" 31]] Cascalog queries Defined using <-
Demo
The AppsFlyer flow Kafka Secor AWS S3 Data Collection Continuously
saves Kafka topics to HDFS/S3 according to a scheme
The AppsFlyer flow Data Processing Lemur Spins up Hadoop cluster
Submit steps Data processing (using cascalog) Export processed data (using Apache Sqoop) Postgresql 1 2 1 2
Questions ?