Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AppsFlyer presenting: Cascalog, MapReduce for T...
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
AppsFlyer
November 06, 2014
Programming
0
960
AppsFlyer presenting: Cascalog, MapReduce for The Code Craftsman
This presentation describe AppsFlyer's work with Hadoop in the Clojure production environment.
AppsFlyer
November 06, 2014
Tweet
Share
More Decks by AppsFlyer
See All by AppsFlyer
Processing 15 Billion events a day without breaking the bank - ReversimX ILTechTalks
appsflyer
0
500
Journey to the Real-Time Analytics in Extreme Growth
appsflyer
0
310
10 Real problems & solutions in your build and deploy process
appsflyer
0
150
DevOps paradigm in R&D day-to-day
appsflyer
0
160
Building a Mobile Backend to Evolve
appsflyer
0
120
Ido Barkan
appsflyer
1
160
Sometimes, Druid is not the best solution for a business use case
appsflyer
1
440
Processing 8 Billion Daily Events in Real Time!
appsflyer
1
130
React Performance
appsflyer
1
230
Other Decks in Programming
See All in Programming
AIエージェント、”どう作るか”で差は出るか? / AI Agents: Does the "How" Make a Difference?
rkaga
4
2k
LLM Observabilityによる 対話型音声AIアプリケーションの安定運用
gekko0114
2
430
Vibe Coding - AI 驅動的軟體開發
mickyp100
0
180
humanlayerのブログから学ぶ、良いCLAUDE.mdの書き方
tsukamoto1783
0
200
AgentCoreとHuman in the Loop
har1101
5
240
それ、本当に安全? ファイルアップロードで見落としがちなセキュリティリスクと対策
penpeen
7
3.9k
今から始めるClaude Code超入門
448jp
8
8.8k
AIで開発はどれくらい加速したのか?AIエージェントによるコード生成を、現場の評価と研究開発の評価の両面からdeep diveしてみる
daisuketakeda
1
2.5k
CSC307 Lecture 04
javiergs
PRO
0
660
Data-Centric Kaggle
isax1015
2
780
AIと一緒にレガシーに向き合ってみた
nyafunta9858
0
240
MUSUBIXとは
nahisaho
0
130
Featured
See All Featured
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
35
2.4k
Jamie Indigo - Trashchat’s Guide to Black Boxes: Technical SEO Tactics for LLMs
techseoconnect
PRO
0
62
Marketing to machines
jonoalderson
1
4.6k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.8k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
Efficient Content Optimization with Google Search Console & Apps Script
katarinadahlin
PRO
1
320
A Tale of Four Properties
chriscoyier
162
24k
Agile Actions for Facilitating Distributed Teams - ADO2019
mkilby
0
110
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
220
Gemini Prompt Engineering: Practical Techniques for Tangible AI Outcomes
mfonobong
2
280
Google's AI Overviews - The New Search
badams
0
910
Design in an AI World
tapps
0
140
Transcript
Cascalog
AppsFlyer We use Clojure Used python previously Mobile app conversion
tracking
Challenges User Retention Cohort analysis ~0.5TB of data collected daily
What is Cascalog? A Clojure DSL for writing Hadoop jobs
On top of Cascading
Clojure Modern LISP for the JVM
Why Cascalog? We already know and love Clojure Same tools
- test in the REPL Custom operations are ordinary functions (no UDFs)
Why Cascalog? Cascalog is Datalog Fits our use cases well
Simple (once you know it )
Relations and Tuples Relation 1 Relation 2 t 1 t
2 t 3 t 4 t 1 t 2 t 3 t 4 Relational Model
Query Anatomy
Query Anatomy Order does not matter Grouping is implicit
Why Cascalog? Composition
Why Cascalog? Composition Joins are implicit
Generators Cascalog taps Hadoop and local file systems Clojure sequences
[["alice" 28] ["bob" 33] ["chris" 40] ["david" 25] ["emily" 25] ["george" 31]] Cascalog queries Defined using <-
Demo
The AppsFlyer flow Kafka Secor AWS S3 Data Collection Continuously
saves Kafka topics to HDFS/S3 according to a scheme
The AppsFlyer flow Data Processing Lemur Spins up Hadoop cluster
Submit steps Data processing (using cascalog) Export processed data (using Apache Sqoop) Postgresql 1 2 1 2
Questions ?