Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AppsFlyer presenting: Cascalog, MapReduce for T...
Search
AppsFlyer
November 06, 2014
Programming
0
950
AppsFlyer presenting: Cascalog, MapReduce for The Code Craftsman
This presentation describe AppsFlyer's work with Hadoop in the Clojure production environment.
AppsFlyer
November 06, 2014
Tweet
Share
More Decks by AppsFlyer
See All by AppsFlyer
Processing 15 Billion events a day without breaking the bank - ReversimX ILTechTalks
appsflyer
0
490
Journey to the Real-Time Analytics in Extreme Growth
appsflyer
0
290
10 Real problems & solutions in your build and deploy process
appsflyer
0
140
DevOps paradigm in R&D day-to-day
appsflyer
0
140
Building a Mobile Backend to Evolve
appsflyer
0
100
Ido Barkan
appsflyer
1
140
Sometimes, Druid is not the best solution for a business use case
appsflyer
1
420
Processing 8 Billion Daily Events in Real Time!
appsflyer
1
120
React Performance
appsflyer
1
210
Other Decks in Programming
See All in Programming
WindowInsetsだってテストしたい
ryunen344
1
190
関数型まつりレポート for JuliaTokai #22
antimon2
0
150
VS Code Update for GitHub Copilot
74th
1
310
XSLTで作るBrainfuck処理系
makki_d
0
210
Cursor AI Agentと伴走する アプリケーションの高速リプレイス
daisuketakeda
1
130
地方に住むエンジニアの残酷な現実とキャリア論
ichimichi
5
1.3k
今ならAmazon ECSのサービス間通信をどう選ぶか / Selection of ECS Interservice Communication 2025
tkikuc
17
3.3k
Azure AI Foundryではじめてのマルチエージェントワークフロー
seosoft
0
130
Kotlin エンジニアへ送る:Swift 案件に参加させられる日に備えて~似てるけど色々違う Swift の仕様 / from Kotlin to Swift
lovee
1
250
PHPで始める振る舞い駆動開発(Behaviour-Driven Development)
ohmori_yusuke
2
170
プロダクト志向ってなんなんだろうね
righttouch
PRO
0
150
データの民主化を支える、透明性のあるデータ利活用への挑戦 2025-06-25 Database Engineering Meetup#7
y_ken
0
310
Featured
See All Featured
Fireside Chat
paigeccino
37
3.5k
Side Projects
sachag
455
42k
Building Flexible Design Systems
yeseniaperezcruz
328
39k
What's in a price? How to price your products and services
michaelherold
246
12k
Optimising Largest Contentful Paint
csswizardry
37
3.3k
Git: the NoSQL Database
bkeepers
PRO
430
65k
VelocityConf: Rendering Performance Case Studies
addyosmani
330
24k
Unsuck your backbone
ammeep
671
58k
GitHub's CSS Performance
jonrohan
1031
460k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
45
7.4k
Six Lessons from altMBA
skipperchong
28
3.8k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Transcript
Cascalog
AppsFlyer We use Clojure Used python previously Mobile app conversion
tracking
Challenges User Retention Cohort analysis ~0.5TB of data collected daily
What is Cascalog? A Clojure DSL for writing Hadoop jobs
On top of Cascading
Clojure Modern LISP for the JVM
Why Cascalog? We already know and love Clojure Same tools
- test in the REPL Custom operations are ordinary functions (no UDFs)
Why Cascalog? Cascalog is Datalog Fits our use cases well
Simple (once you know it )
Relations and Tuples Relation 1 Relation 2 t 1 t
2 t 3 t 4 t 1 t 2 t 3 t 4 Relational Model
Query Anatomy
Query Anatomy Order does not matter Grouping is implicit
Why Cascalog? Composition
Why Cascalog? Composition Joins are implicit
Generators Cascalog taps Hadoop and local file systems Clojure sequences
[["alice" 28] ["bob" 33] ["chris" 40] ["david" 25] ["emily" 25] ["george" 31]] Cascalog queries Defined using <-
Demo
The AppsFlyer flow Kafka Secor AWS S3 Data Collection Continuously
saves Kafka topics to HDFS/S3 according to a scheme
The AppsFlyer flow Data Processing Lemur Spins up Hadoop cluster
Submit steps Data processing (using cascalog) Export processed data (using Apache Sqoop) Postgresql 1 2 1 2
Questions ?