Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ido Barkan
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
AppsFlyer
July 27, 2016
Technology
160
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Ido Barkan
Using Druid Analyzing web access logs for 8 billion events per day
AppsFlyer
July 27, 2016
More Decks by AppsFlyer
See All by AppsFlyer
Processing 15 Billion events a day without breaking the bank - ReversimX ILTechTalks
appsflyer
0
520
Journey to the Real-Time Analytics in Extreme Growth
appsflyer
0
330
10 Real problems & solutions in your build and deploy process
appsflyer
0
160
DevOps paradigm in R&D day-to-day
appsflyer
0
170
Building a Mobile Backend to Evolve
appsflyer
0
130
Sometimes, Druid is not the best solution for a business use case
appsflyer
1
450
Processing 8 Billion Daily Events in Real Time!
appsflyer
1
150
React Performance
appsflyer
1
240
Real-time analytics with Druid at Appsflyer
appsflyer
0
410
Other Decks in Technology
See All in Technology
DevOps Agentで始めるAWS運用 〜フロンティアエージェントが変える運用の現場〜
nyankotaro
1
320
ChatworkとBPaaS 異なる特性で学んだAI機能開発の ベストプラクティス
kubell_hr
2
3.1k
AI Engineering Summit Tokyo 2026 AIの前に、やることがある 〜医療データ企業の4フェーズ〜
dtaniwaki
0
2.2k
AI駆動開発が変える、大規模開発の前提 ーHuman in the Loop から Human on the Loop へ / AIE2026
visional_engineering_and_design
28
19k
価格.comをAI駆動で全面刷新する ー 30年分の技術的負債を返し、次の30年の土台をつくる ー / AI Engineering Summit Tokyo 2026
tkyowa
50
56k
BigQuery の Cross-cloud Lakehouse への歩み
phaya72
2
600
AIの性能が向上しても未解決な組織の重大問題は何か?/An Unsolved Organizational Problem in the Age of AI
moriyuya
2
300
Dario Amodi『Policy on the AI Exponential』を理解する
nagatsu
0
200
データ基盤をDataformで整えた話 〜 開発環境を添えて 〜
takapy
0
130
MIERUNE JCT 発表資料「宇宙から伊能忠敬ごっこ」
syuchimu
0
190
TypeScript Compiler APIとPHP-Parserを活用し、TypeScriptとPHPで型を共有する
shuta13
0
370
非定型業務をAI slackbotで自動化する ~ 社内要望を自動壁打ちするbotを作った ~/automating-ad-hoc-work-with-ai-slackbot
shibayu36
0
260
Featured
See All Featured
Code Reviewing Like a Champion
maltzj
528
40k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
2k
Stewardship and Sustainability of Urban and Community Forests
pwiseman
0
220
Mind Mapping
helmedeiros
PRO
1
240
DBのスキルで生き残る技術 - AI時代におけるテーブル設計の勘所
soudai
PRO
65
55k
The State of eCommerce SEO: How to Win in Today's Products SERPs - #SEOweek
aleyda
2
11k
Data-driven link building: lessons from a $708K investment (BrightonSEO talk)
szymonslowik
1
1.1k
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
280
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Into the Great Unknown - MozCon
thekraken
41
2.5k
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
1
280
Product Roadmaps are Hard
iamctodd
PRO
55
12k
Transcript
Ido Barkan Analyzing web access logs for 8 billion events
per day
5xx Errors
Appsflyer gets around 8B web events per day.
Micro Services Architecture Real Time Attr.
AWS Elastic load balancer Log entry
A Log line 2016-02-06T12:51:54.201846Z Appsflyer-web 139.162.156.169:50435 10.10.8.90:6555 0.000021 0.001916 0.00001
200 200 780 2 "POST https://track. appsflyer.com:443/... HTTP/1.1" "Dalvik/1.6.0 (Linux; U; Android 4.4.4; SM-J110H Build/KTU84P)" ECDHE-RSA-AES128-SHA TLSv1 $ head -1 195229424603_elasticloadbalancing_eu-west-1_appsflyer-web_.log | wc -c 331 Total: 300-1500 bytes =>sub sampling of 1/10 => 223 GB daily approx.
What was missing? No transparency of incoming web requests. ?
# error (400 / 500) responses grouped by app ? # of events grouped by app ? # of events grouped by response code
What wasn’t missing? ! No single event granularity- only analytics
! No fancy enterprise features (role-based access, alerts etc.)
Possible solutions 1. Our own ELK- will not hold the
volume 2. SaaS based ELK (logz.io, loggly...)- expensive and gives more than we want.
Data flow Log to bucket Trigger Lambda Druid sink service
Druid configured naively • 3 data nodes (historical+RT) • 1
master (coordinator) • 1 broker • No data duplication • 7d data retention • Only 5 machines
Basic log processing 2016-02-06T12:51:54.201846Z Appsflyer-web 139.162.156.169:50435 10.10.8.90:6555 0.000021 0.001916 0.00001
200 200 780 2 "POST https://track.appsflyer.com:443/... HTTP/1.1" "Dalvik/1.6.0 (Linux; U; Android 4.4.4; SM-J110H Build/KTU84P)" ECDHE-RSA-AES128-SHA TLSv1
Demo! • druidquery • caravel
Thank you
[email protected]
Questions?
Thank you
[email protected]
We are hiring!