Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AppsFlyer presenting: Cascalog, MapReduce for T...
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
AppsFlyer
November 06, 2014
Programming
0
960
AppsFlyer presenting: Cascalog, MapReduce for The Code Craftsman
This presentation describe AppsFlyer's work with Hadoop in the Clojure production environment.
AppsFlyer
November 06, 2014
Tweet
Share
More Decks by AppsFlyer
See All by AppsFlyer
Processing 15 Billion events a day without breaking the bank - ReversimX ILTechTalks
appsflyer
0
500
Journey to the Real-Time Analytics in Extreme Growth
appsflyer
0
310
10 Real problems & solutions in your build and deploy process
appsflyer
0
150
DevOps paradigm in R&D day-to-day
appsflyer
0
160
Building a Mobile Backend to Evolve
appsflyer
0
120
Ido Barkan
appsflyer
1
160
Sometimes, Druid is not the best solution for a business use case
appsflyer
1
440
Processing 8 Billion Daily Events in Real Time!
appsflyer
1
130
React Performance
appsflyer
1
230
Other Decks in Programming
See All in Programming
AIによるイベントストーミング図からのコード生成 / AI-powered code generation from Event Storming diagrams
nrslib
2
1.9k
カスタマーサクセス業務を変革したヘルススコアの実現と学び
_hummer0724
0
700
副作用をどこに置くか問題:オブジェクト指向で整理する設計判断ツリー
koxya
1
610
Raku Raku Notion 20260128
hareyakayuruyaka
0
180
Basic Architectures
denyspoltorak
0
680
AI によるインシデント初動調査の自動化を行う AI インシデントコマンダーを作った話
azukiazusa1
1
730
FOSDEM 2026: STUNMESH-go: Building P2P WireGuard Mesh Without Self-Hosted Infrastructure
tjjh89017
0
170
AIと一緒にレガシーに向き合ってみた
nyafunta9858
0
240
それ、本当に安全? ファイルアップロードで見落としがちなセキュリティリスクと対策
penpeen
7
3.9k
AI時代の認知負荷との向き合い方
optfit
0
160
Implementation Patterns
denyspoltorak
0
290
インターン生でもAuth0で認証基盤刷新が出来るのか
taku271
0
190
Featured
See All Featured
Deep Space Network (abreviated)
tonyrice
0
49
Bioeconomy Workshop: Dr. Julius Ecuru, Opportunities for a Bioeconomy in West Africa
akademiya2063
PRO
1
54
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
Producing Creativity
orderedlist
PRO
348
40k
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
0
1.1k
How To Speak Unicorn (iThemes Webinar)
marktimemedia
1
380
Abbi's Birthday
coloredviolet
1
4.7k
Avoiding the “Bad Training, Faster” Trap in the Age of AI
tmiket
0
76
The Organizational Zoo: Understanding Human Behavior Agility Through Metaphoric Constructive Conversations (based on the works of Arthur Shelley, Ph.D)
kimpetersen
PRO
0
240
Dominate Local Search Results - an insider guide to GBP, reviews, and Local SEO
greggifford
PRO
0
78
Discover your Explorer Soul
emna__ayadi
2
1.1k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
3.9k
Transcript
Cascalog
AppsFlyer We use Clojure Used python previously Mobile app conversion
tracking
Challenges User Retention Cohort analysis ~0.5TB of data collected daily
What is Cascalog? A Clojure DSL for writing Hadoop jobs
On top of Cascading
Clojure Modern LISP for the JVM
Why Cascalog? We already know and love Clojure Same tools
- test in the REPL Custom operations are ordinary functions (no UDFs)
Why Cascalog? Cascalog is Datalog Fits our use cases well
Simple (once you know it )
Relations and Tuples Relation 1 Relation 2 t 1 t
2 t 3 t 4 t 1 t 2 t 3 t 4 Relational Model
Query Anatomy
Query Anatomy Order does not matter Grouping is implicit
Why Cascalog? Composition
Why Cascalog? Composition Joins are implicit
Generators Cascalog taps Hadoop and local file systems Clojure sequences
[["alice" 28] ["bob" 33] ["chris" 40] ["david" 25] ["emily" 25] ["george" 31]] Cascalog queries Defined using <-
Demo
The AppsFlyer flow Kafka Secor AWS S3 Data Collection Continuously
saves Kafka topics to HDFS/S3 according to a scheme
The AppsFlyer flow Data Processing Lemur Spins up Hadoop cluster
Submit steps Data processing (using cascalog) Export processed data (using Apache Sqoop) Postgresql 1 2 1 2
Questions ?