Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to scale a Logging Infrastructure
Search
Paul Stack
June 03, 2015
Technology
0
180
How to scale a Logging Infrastructure
Logging infrastructure using ELK + Kafka
Paul Stack
June 03, 2015
Tweet
Share
More Decks by Paul Stack
See All by Paul Stack
Infrastructure as Software
stack72
0
72
Mirror, Mirror on the way, what is the vainest metric of them all?
stack72
1
2.3k
Continuously Delivering Infrastructure to the Cloud
stack72
0
190
DevOops 2016
stack72
0
120
The Quest for Infrastructure Management 2.0
stack72
0
140
The Biggest Trick Consultants Ever Pulled was Telling The World Continuous Delivery is Easy
stack72
1
120
The Transition from Product to Infrastructure
stack72
0
62
Continuous Delivery - the missing parts
stack72
0
960
Windows: Having its ass kicked by puppet and powershell
stack72
0
140
Other Decks in Technology
See All in Technology
SRE新規立ち上げ! Hubbleインフラのこれまでと展望
katsuya0515
0
170
LLMで構造化出力の成功率をグンと上げる方法
keisuketakiguchi
0
580
Nx × AI によるモノレポ活用 〜コードジェネレーター編〜
puku0x
0
390
AIに目を奪われすぎて、周りの困っている人間が見えなくなっていませんか?
cap120
1
440
【Λ(らむだ)】最近のアプデ情報 / RPALT20250729
lambda
0
230
専門分化が進む分業下でもユーザーが本当に欲しかったものを追求するプロダクトマネジメント/Focus on real user needs despite deep specialization and division of labor
moriyuya
1
1.2k
Mambaで物体検出 完全に理解した
shirarei24
2
220
猫でもわかるQ_CLI(CDK開発編)+ちょっとだけKiro
kentapapa
0
3.4k
LTに影響を受けてテンプレリポジトリを作った話
hol1kgmg
0
320
AWS re:Inforce 2025 re:Cap Update Pickup & AWS Control Tower の運用における考慮ポイント
htan
1
220
Amazon Qで2Dゲームを作成してみた
siromi
0
120
「Roblox」の開発環境とその効率化 ~DAU9700万人超の巨大プラットフォームの開発 事始め~
keitatanji
0
120
Featured
See All Featured
Making the Leap to Tech Lead
cromwellryan
134
9.5k
Faster Mobile Websites
deanohume
308
31k
Building an army of robots
kneath
306
45k
Facilitating Awesome Meetings
lara
54
6.5k
Optimising Largest Contentful Paint
csswizardry
37
3.4k
Build The Right Thing And Hit Your Dates
maggiecrowley
37
2.8k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3k
Practical Orchestrator
shlominoach
190
11k
StorybookのUI Testing Handbookを読んだ
zakiyama
30
6k
Product Roadmaps are Hard
iamctodd
PRO
54
11k
For a Future-Friendly Web
brad_frost
179
9.9k
Site-Speed That Sticks
csswizardry
10
760
Transcript
How do you scale a logging infrastructure to accept a
billion messages a day? Paul Stack http://twitter.com/stack72 mail:
[email protected]
About Me Infrastructure Engineer for a cool startup :) Reformed
ASP.NET / C# Developer DevOps Extremist Conference Junkie
Background Project was to replace the legacy ‘logging solution’
Iteration 0: A Developer created a single box with the
ELK all in 1 jar
Time to make it production ready now
None
Iteration 1: Using Redis as the input mechanism for LogStash
None
None
Enter Apache Kafka
“Kafka is a distributed publish- subscribe messaging system that is
designed to be fast, scalable, and durable” Source: Cloudera Blog
Introduction to Kafka • Kafka is made up of ‘topics’,
‘producers’, ‘consumers’ and ‘brokers’ • Communication is via TCP • Backed by Zookeeper
Kafka Topics Source: http://kafka.apache.org/documentation.html
Kafka Producers • Producers are responsible to chose what topic
to publish data to • The producer is responsible for choosing a partition to write to • Can be handled round robin or partition functions
Kafka Consumers • Consumption can be done via: • queuing
• pub-sub
Kafka Consumers • Kafka consumer group • Strong ordering
Kafka Consumers • Strong ordering
https://github.com/opentable/puppet-exhibitor
None
Iteration 2 Introduction of Kafka
None
None
Iteration 3 Further ‘Improvements’ to the cluster layout
None
The Numbers • Logs kept in ES for 30 days
then archived • 12 billion documents active in ES • ES space was about 25 - 30TB in EBS volumes • Average Doc Size ~ 1.2KB • V-Day 2015: ~750M docs collected without failure
What about metrics and monitoring?
Monitoring - Nagios • Alerts on • ES Cluster •
zK and Kafka Nodes • Logstash / Redis nodes
None
https://github.com/stack72/nagios-elasticsearch
Metrics - Kafka Offset Monitor
https://github.com/opentable/KafkaOffsetMonitor
Metrics - ElasticSearch
None
None
None
Visibility Rocks!
None
So what would I do differently?
Questions?
Paul Stack @stack72