Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AppsFlyer presenting: Cascalog, MapReduce for T...
Search
AppsFlyer
November 06, 2014
Programming
0
960
AppsFlyer presenting: Cascalog, MapReduce for The Code Craftsman
This presentation describe AppsFlyer's work with Hadoop in the Clojure production environment.
AppsFlyer
November 06, 2014
Tweet
Share
More Decks by AppsFlyer
See All by AppsFlyer
Processing 15 Billion events a day without breaking the bank - ReversimX ILTechTalks
appsflyer
0
490
Journey to the Real-Time Analytics in Extreme Growth
appsflyer
0
300
10 Real problems & solutions in your build and deploy process
appsflyer
0
140
DevOps paradigm in R&D day-to-day
appsflyer
0
150
Building a Mobile Backend to Evolve
appsflyer
0
110
Ido Barkan
appsflyer
1
150
Sometimes, Druid is not the best solution for a business use case
appsflyer
1
430
Processing 8 Billion Daily Events in Real Time!
appsflyer
1
120
React Performance
appsflyer
1
220
Other Decks in Programming
See All in Programming
理論と実務のギャップを超える
eycjur
0
140
iOSエンジニア向けの英語学習アプリを作る!
yukawashouhei
0
190
開発生産性を上げるための生成AI活用術
starfish719
3
1k
AI Coding Meetup #3 - 導入セッション / ai-coding-meetup-3
izumin5210
0
3.3k
PHPに関数型の魂を宿す〜PHP 8.5 で実現する堅牢なコードとは〜 #phpcon_hiroshima / phpcon-hiroshima-2025
shogogg
1
230
What's new in Spring Modulith?
olivergierke
1
150
大規模アプリのDIフレームワーク刷新戦略 ~過去最大規模の並行開発を止めずにアプリ全体に導入するまで~
mot_techtalk
1
450
登壇は dynamic! な営みである / speech is dynamic
da1chi
0
340
Swift Concurrency - 状態監視の罠
objectiveaudio
2
520
私達はmodernize packageに夢を見るか feat. go/analysis, go/ast / Go Conference 2025
kaorumuta
2
570
Railsだからできる 例外業務に禍根を残さない 設定設計パターン
ei_ei_eiichi
0
900
その面倒な作業、「Dart」にやらせませんか? Flutter開発者のための業務効率化
yordgenome03
1
130
Featured
See All Featured
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.7k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
BBQ
matthewcrist
89
9.8k
Build your cross-platform service in a week with App Engine
jlugia
232
18k
The Cost Of JavaScript in 2023
addyosmani
55
9k
Faster Mobile Websites
deanohume
310
31k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
8
910
Unsuck your backbone
ammeep
671
58k
Building Applications with DynamoDB
mza
96
6.7k
A better future with KSS
kneath
239
18k
Code Reviewing Like a Champion
maltzj
526
40k
How To Stay Up To Date on Web Technology
chriscoyier
791
250k
Transcript
Cascalog
AppsFlyer We use Clojure Used python previously Mobile app conversion
tracking
Challenges User Retention Cohort analysis ~0.5TB of data collected daily
What is Cascalog? A Clojure DSL for writing Hadoop jobs
On top of Cascading
Clojure Modern LISP for the JVM
Why Cascalog? We already know and love Clojure Same tools
- test in the REPL Custom operations are ordinary functions (no UDFs)
Why Cascalog? Cascalog is Datalog Fits our use cases well
Simple (once you know it )
Relations and Tuples Relation 1 Relation 2 t 1 t
2 t 3 t 4 t 1 t 2 t 3 t 4 Relational Model
Query Anatomy
Query Anatomy Order does not matter Grouping is implicit
Why Cascalog? Composition
Why Cascalog? Composition Joins are implicit
Generators Cascalog taps Hadoop and local file systems Clojure sequences
[["alice" 28] ["bob" 33] ["chris" 40] ["david" 25] ["emily" 25] ["george" 31]] Cascalog queries Defined using <-
Demo
The AppsFlyer flow Kafka Secor AWS S3 Data Collection Continuously
saves Kafka topics to HDFS/S3 according to a scheme
The AppsFlyer flow Data Processing Lemur Spins up Hadoop cluster
Submit steps Data processing (using cascalog) Export processed data (using Apache Sqoop) Postgresql 1 2 1 2
Questions ?