Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AppsFlyer presenting: Cascalog, MapReduce for T...
Search
AppsFlyer
November 06, 2014
Programming
0
960
AppsFlyer presenting: Cascalog, MapReduce for The Code Craftsman
This presentation describe AppsFlyer's work with Hadoop in the Clojure production environment.
AppsFlyer
November 06, 2014
Tweet
Share
More Decks by AppsFlyer
See All by AppsFlyer
Processing 15 Billion events a day without breaking the bank - ReversimX ILTechTalks
appsflyer
0
500
Journey to the Real-Time Analytics in Extreme Growth
appsflyer
0
310
10 Real problems & solutions in your build and deploy process
appsflyer
0
150
DevOps paradigm in R&D day-to-day
appsflyer
0
170
Building a Mobile Backend to Evolve
appsflyer
0
120
Ido Barkan
appsflyer
1
160
Sometimes, Druid is not the best solution for a business use case
appsflyer
1
440
Processing 8 Billion Daily Events in Real Time!
appsflyer
1
130
React Performance
appsflyer
1
240
Other Decks in Programming
See All in Programming
Agentic AI: Evolution oder Revolution
mobilelarson
PRO
0
180
エージェント開発初心者の僕がエージェントを作った話と今後やりたいこと
thasu0123
0
250
Ruby x Terminal
a_matsuda
7
600
maplibre-gl-layers - 地図に移動体たくさん表示したい
kekyo
PRO
0
270
Kubernetesでセルフホストが簡単なNewSQLを求めて / Seeking a NewSQL Database That's Simple to Self-Host on Kubernetes
nnaka2992
0
120
Angular-Apps smarter machen mit Gen AI: Lokal und offlinefähig - Hands-on Workshop!
christianliebel
PRO
0
110
AIコードレビューの導入・運用と AI駆動開発における「AI4QA」の取り組みについて
hagevvashi
0
490
Windows on Ryzen and I
seosoft
0
290
Rで始めるML・LLM活用入門
wakamatsu_takumu
0
180
メタプログラミングで実現する「コードを仕様にする」仕組み/nikkei-tech-talk43
nikkei_engineer_recruiting
0
190
ロボットのための工場に灯りは要らない
watany
10
2.9k
AI Assistants for Your Angular Solutions
manfredsteyer
PRO
0
140
Featured
See All Featured
Leading Effective Engineering Teams in the AI Era
addyosmani
9
1.7k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.8k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
1
1.2k
Ecommerce SEO: The Keys for Success Now & Beyond - #SERPConf2024
aleyda
1
1.8k
Mozcon NYC 2025: Stop Losing SEO Traffic
samtorres
0
180
Joys of Absence: A Defence of Solitary Play
codingconduct
1
310
For a Future-Friendly Web
brad_frost
183
10k
Efficient Content Optimization with Google Search Console & Apps Script
katarinadahlin
PRO
1
410
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.6k
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
240
Transcript
Cascalog
AppsFlyer We use Clojure Used python previously Mobile app conversion
tracking
Challenges User Retention Cohort analysis ~0.5TB of data collected daily
What is Cascalog? A Clojure DSL for writing Hadoop jobs
On top of Cascading
Clojure Modern LISP for the JVM
Why Cascalog? We already know and love Clojure Same tools
- test in the REPL Custom operations are ordinary functions (no UDFs)
Why Cascalog? Cascalog is Datalog Fits our use cases well
Simple (once you know it )
Relations and Tuples Relation 1 Relation 2 t 1 t
2 t 3 t 4 t 1 t 2 t 3 t 4 Relational Model
Query Anatomy
Query Anatomy Order does not matter Grouping is implicit
Why Cascalog? Composition
Why Cascalog? Composition Joins are implicit
Generators Cascalog taps Hadoop and local file systems Clojure sequences
[["alice" 28] ["bob" 33] ["chris" 40] ["david" 25] ["emily" 25] ["george" 31]] Cascalog queries Defined using <-
Demo
The AppsFlyer flow Kafka Secor AWS S3 Data Collection Continuously
saves Kafka topics to HDFS/S3 according to a scheme
The AppsFlyer flow Data Processing Lemur Spins up Hadoop cluster
Submit steps Data processing (using cascalog) Export processed data (using Apache Sqoop) Postgresql 1 2 1 2
Questions ?