Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to scale a Logging Infrastructure
Search
Paul Stack
June 03, 2015
Technology
0
200
How to scale a Logging Infrastructure
Logging infrastructure using ELK + Kafka
Paul Stack
June 03, 2015
Tweet
Share
More Decks by Paul Stack
See All by Paul Stack
Infrastructure as Software
stack72
0
92
Mirror, Mirror on the way, what is the vainest metric of them all?
stack72
1
2.4k
Continuously Delivering Infrastructure to the Cloud
stack72
0
220
DevOops 2016
stack72
0
130
The Quest for Infrastructure Management 2.0
stack72
0
170
The Biggest Trick Consultants Ever Pulled was Telling The World Continuous Delivery is Easy
stack72
1
150
The Transition from Product to Infrastructure
stack72
0
86
Continuous Delivery - the missing parts
stack72
0
1k
Windows: Having its ass kicked by puppet and powershell
stack72
0
160
Other Decks in Technology
See All in Technology
Google系サービスで文字起こしから勝手にカレンダーを埋めるエージェントを作った話
risatube
0
190
visionOS 開発向けの MCP / Skills をつくり続けることで XR の探究と学習を最大化
karad
1
840
楽しく学ぼう!ネットワーク入門
shotashiratori
1
480
生成AI活用でQAエンジニアにどのような仕事が生まれるか/Support Required of QA Engineers for Generative AI
goyoki
1
270
Mitigating geopolitical risks with local-first software and atproto
ept
0
120
スケールアップ企業でQA組織が機能し続けるための組織設計と仕組み〜ボトムアップとトップダウンを両輪としたアプローチ〜
tarappo
1
170
Sansanでの認証基盤内製化と移行
sansantech
PRO
0
590
AlloyDB 奮闘記
hatappi
0
160
身体を持ったパーソナルAIエージェントの 可能性を探る開発
yokomachi
1
130
形式手法特論:SMT ソルバで解く認可ポリシの静的解析 #kernelvm / Kernel VM Study Tsukuba No3
ytaka23
1
580
Go標準パッケージのI/O処理をながめる
matumoto
0
230
AI実装による「レビューボトルネック」を解消する仕様駆動開発(SDD)/ ai-sdd-review-bottleneck
rakus_dev
0
160
Featured
See All Featured
Learning to Love Humans: Emotional Interface Design
aarron
275
41k
Ethics towards AI in product and experience design
skipperchong
2
230
Leo the Paperboy
mayatellez
4
1.5k
A brief & incomplete history of UX Design for the World Wide Web: 1989–2019
jct
1
320
Pawsitive SEO: Lessons from My Dog (and Many Mistakes) on Thriving as a Consultant in the Age of AI
davidcarrasco
0
89
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1.2k
How STYLIGHT went responsive
nonsquared
100
6k
Building the Perfect Custom Keyboard
takai
2
710
Become a Pro
speakerdeck
PRO
31
5.9k
State of Search Keynote: SEO is Dead Long Live SEO
ryanjones
0
160
Breaking role norms: Why Content Design is so much more than writing copy - Taylor Woolridge
uxyall
0
210
コードの90%をAIが書く世界で何が待っているのか / What awaits us in a world where 90% of the code is written by AI
rkaga
60
43k
Transcript
How do you scale a logging infrastructure to accept a
billion messages a day? Paul Stack http://twitter.com/stack72 mail:
[email protected]
About Me Infrastructure Engineer for a cool startup :) Reformed
ASP.NET / C# Developer DevOps Extremist Conference Junkie
Background Project was to replace the legacy ‘logging solution’
Iteration 0: A Developer created a single box with the
ELK all in 1 jar
Time to make it production ready now
None
Iteration 1: Using Redis as the input mechanism for LogStash
None
None
Enter Apache Kafka
“Kafka is a distributed publish- subscribe messaging system that is
designed to be fast, scalable, and durable” Source: Cloudera Blog
Introduction to Kafka • Kafka is made up of ‘topics’,
‘producers’, ‘consumers’ and ‘brokers’ • Communication is via TCP • Backed by Zookeeper
Kafka Topics Source: http://kafka.apache.org/documentation.html
Kafka Producers • Producers are responsible to chose what topic
to publish data to • The producer is responsible for choosing a partition to write to • Can be handled round robin or partition functions
Kafka Consumers • Consumption can be done via: • queuing
• pub-sub
Kafka Consumers • Kafka consumer group • Strong ordering
Kafka Consumers • Strong ordering
https://github.com/opentable/puppet-exhibitor
None
Iteration 2 Introduction of Kafka
None
None
Iteration 3 Further ‘Improvements’ to the cluster layout
None
The Numbers • Logs kept in ES for 30 days
then archived • 12 billion documents active in ES • ES space was about 25 - 30TB in EBS volumes • Average Doc Size ~ 1.2KB • V-Day 2015: ~750M docs collected without failure
What about metrics and monitoring?
Monitoring - Nagios • Alerts on • ES Cluster •
zK and Kafka Nodes • Logstash / Redis nodes
None
https://github.com/stack72/nagios-elasticsearch
Metrics - Kafka Offset Monitor
https://github.com/opentable/KafkaOffsetMonitor
Metrics - ElasticSearch
None
None
None
Visibility Rocks!
None
So what would I do differently?
Questions?
Paul Stack @stack72