Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Ido Barkan
Search
AppsFlyer
July 27, 2016
Technology
1
140
Ido Barkan
Using Druid Analyzing web access logs for 8 billion events per day
AppsFlyer
July 27, 2016
Tweet
Share
More Decks by AppsFlyer
See All by AppsFlyer
Processing 15 Billion events a day without breaking the bank - ReversimX ILTechTalks
appsflyer
0
490
Journey to the Real-Time Analytics in Extreme Growth
appsflyer
0
300
10 Real problems & solutions in your build and deploy process
appsflyer
0
140
DevOps paradigm in R&D day-to-day
appsflyer
0
140
Building a Mobile Backend to Evolve
appsflyer
0
100
Sometimes, Druid is not the best solution for a business use case
appsflyer
1
430
Processing 8 Billion Daily Events in Real Time!
appsflyer
1
120
React Performance
appsflyer
1
210
Real-time analytics with Druid at Appsflyer
appsflyer
0
360
Other Decks in Technology
See All in Technology
三視点LLMによる複数観点レビュー
mhlyc
0
230
「Chatwork」のEKS環境を支えるhelmfileを使用したマニフェスト管理術
hanayo04
1
400
スタックチャン家庭用アシスタントへの道
kanekoh
0
120
データ駆動経営の道しるべ:プロダクト開発指標の戦略的活用法
ham0215
1
100
データ戦略部門 紹介資料
sansan33
PRO
1
3.3k
アクセスピークを制するオートスケール再設計: 障害を乗り越えKEDAで実現したリソース管理の最適化
myamashii
1
670
Four Keysから始める信頼性の改善 - SRE NEXT 2025
ozakikota
0
420
Data Engineering Study#30 LT資料
tetsuroito
1
200
Delegating the chores of authenticating users to Keycloak
ahus1
0
190
SREのためのeBPF活用ステップアップガイド
egmc
2
1.3k
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
5
39k
20250708オープンエンドな探索と知識発見
sakana_ai
PRO
4
1k
Featured
See All Featured
A better future with KSS
kneath
238
17k
Why You Should Never Use an ORM
jnunemaker
PRO
58
9.5k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Speed Design
sergeychernyshev
32
1k
Git: the NoSQL Database
bkeepers
PRO
430
65k
How GitHub (no longer) Works
holman
314
140k
The World Runs on Bad Software
bkeepers
PRO
70
11k
Faster Mobile Websites
deanohume
308
31k
Docker and Python
trallard
45
3.5k
Building Better People: How to give real-time feedback that sticks.
wjessup
367
19k
Java REST API Framework Comparison - PWX 2021
mraible
31
8.7k
The Invisible Side of Design
smashingmag
301
51k
Transcript
Ido Barkan Analyzing web access logs for 8 billion events
per day
5xx Errors
Appsflyer gets around 8B web events per day.
Micro Services Architecture Real Time Attr.
AWS Elastic load balancer Log entry
A Log line 2016-02-06T12:51:54.201846Z Appsflyer-web 139.162.156.169:50435 10.10.8.90:6555 0.000021 0.001916 0.00001
200 200 780 2 "POST https://track. appsflyer.com:443/... HTTP/1.1" "Dalvik/1.6.0 (Linux; U; Android 4.4.4; SM-J110H Build/KTU84P)" ECDHE-RSA-AES128-SHA TLSv1 $ head -1 195229424603_elasticloadbalancing_eu-west-1_appsflyer-web_.log | wc -c 331 Total: 300-1500 bytes =>sub sampling of 1/10 => 223 GB daily approx.
What was missing? No transparency of incoming web requests. ?
# error (400 / 500) responses grouped by app ? # of events grouped by app ? # of events grouped by response code
What wasn’t missing? ! No single event granularity- only analytics
! No fancy enterprise features (role-based access, alerts etc.)
Possible solutions 1. Our own ELK- will not hold the
volume 2. SaaS based ELK (logz.io, loggly...)- expensive and gives more than we want.
Data flow Log to bucket Trigger Lambda Druid sink service
Druid configured naively • 3 data nodes (historical+RT) • 1
master (coordinator) • 1 broker • No data duplication • 7d data retention • Only 5 machines
Basic log processing 2016-02-06T12:51:54.201846Z Appsflyer-web 139.162.156.169:50435 10.10.8.90:6555 0.000021 0.001916 0.00001
200 200 780 2 "POST https://track.appsflyer.com:443/... HTTP/1.1" "Dalvik/1.6.0 (Linux; U; Android 4.4.4; SM-J110H Build/KTU84P)" ECDHE-RSA-AES128-SHA TLSv1
Demo! • druidquery • caravel
Thank you
[email protected]
Questions?
Thank you
[email protected]
We are hiring!