Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AppsFlyer presenting: Cascalog, MapReduce for T...
Search
AppsFlyer
November 06, 2014
Programming
0
950
AppsFlyer presenting: Cascalog, MapReduce for The Code Craftsman
This presentation describe AppsFlyer's work with Hadoop in the Clojure production environment.
AppsFlyer
November 06, 2014
Tweet
Share
More Decks by AppsFlyer
See All by AppsFlyer
Processing 15 Billion events a day without breaking the bank - ReversimX ILTechTalks
appsflyer
0
490
Journey to the Real-Time Analytics in Extreme Growth
appsflyer
0
300
10 Real problems & solutions in your build and deploy process
appsflyer
0
140
DevOps paradigm in R&D day-to-day
appsflyer
0
150
Building a Mobile Backend to Evolve
appsflyer
0
110
Ido Barkan
appsflyer
1
140
Sometimes, Druid is not the best solution for a business use case
appsflyer
1
430
Processing 8 Billion Daily Events in Real Time!
appsflyer
1
120
React Performance
appsflyer
1
220
Other Decks in Programming
See All in Programming
「手軽で便利」に潜む罠。 Popover API を WCAG 2.2の視点で安全に使うには
taitotnk
0
670
プロポーザル駆動学習 / Proposal-Driven Learning
mackey0225
2
700
MLH State of the League: 2026 Season
theycallmeswift
0
220
MCPでVibe Working。そして、結局はContext Eng(略)/ Working with Vibe on MCP And Context Eng
rkaga
5
1.7k
AIレビュアーをスケールさせるには / Scaling AI Reviewers
technuma
2
240
Design Foundational Data Engineering Observability
sucitw
3
160
「待たせ上手」なスケルトンスクリーン、 そのUXの裏側
teamlab
PRO
0
190
DockerからECSへ 〜 AWSの海に出る前に知っておきたいこと 〜
ota1022
5
1.9k
Kiroの仕様駆動開発から見えてきたAIコーディングとの正しい付き合い方
clshinji
1
200
TROCCO×dbtで実現する人にもAIにもやさしいデータ基盤
nealle
0
420
Flutter with Dart MCP: All You Need - 박제창 2025 I/O Extended Busan
itsmedreamwalker
0
120
Ruby Parser progress report 2025
yui_knk
1
290
Featured
See All Featured
Gamification - CAS2011
davidbonilla
81
5.4k
BBQ
matthewcrist
89
9.8k
StorybookのUI Testing Handbookを読んだ
zakiyama
30
6.1k
KATA
mclloyd
32
14k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4k
Code Review Best Practice
trishagee
70
19k
Automating Front-end Workflow
addyosmani
1370
200k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
18
1.1k
Unsuck your backbone
ammeep
671
58k
Large-scale JavaScript Application Architecture
addyosmani
512
110k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Transcript
Cascalog
AppsFlyer We use Clojure Used python previously Mobile app conversion
tracking
Challenges User Retention Cohort analysis ~0.5TB of data collected daily
What is Cascalog? A Clojure DSL for writing Hadoop jobs
On top of Cascading
Clojure Modern LISP for the JVM
Why Cascalog? We already know and love Clojure Same tools
- test in the REPL Custom operations are ordinary functions (no UDFs)
Why Cascalog? Cascalog is Datalog Fits our use cases well
Simple (once you know it )
Relations and Tuples Relation 1 Relation 2 t 1 t
2 t 3 t 4 t 1 t 2 t 3 t 4 Relational Model
Query Anatomy
Query Anatomy Order does not matter Grouping is implicit
Why Cascalog? Composition
Why Cascalog? Composition Joins are implicit
Generators Cascalog taps Hadoop and local file systems Clojure sequences
[["alice" 28] ["bob" 33] ["chris" 40] ["david" 25] ["emily" 25] ["george" 31]] Cascalog queries Defined using <-
Demo
The AppsFlyer flow Kafka Secor AWS S3 Data Collection Continuously
saves Kafka topics to HDFS/S3 according to a scheme
The AppsFlyer flow Data Processing Lemur Spins up Hadoop cluster
Submit steps Data processing (using cascalog) Export processed data (using Apache Sqoop) Postgresql 1 2 1 2
Questions ?