Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to scale a Logging Infrastructure
Search
Paul Stack
June 03, 2015
Technology
0
200
How to scale a Logging Infrastructure
Logging infrastructure using ELK + Kafka
Paul Stack
June 03, 2015
Tweet
Share
More Decks by Paul Stack
See All by Paul Stack
Infrastructure as Software
stack72
0
91
Mirror, Mirror on the way, what is the vainest metric of them all?
stack72
1
2.4k
Continuously Delivering Infrastructure to the Cloud
stack72
0
220
DevOops 2016
stack72
0
130
The Quest for Infrastructure Management 2.0
stack72
0
170
The Biggest Trick Consultants Ever Pulled was Telling The World Continuous Delivery is Easy
stack72
1
140
The Transition from Product to Infrastructure
stack72
0
83
Continuous Delivery - the missing parts
stack72
0
990
Windows: Having its ass kicked by puppet and powershell
stack72
0
160
Other Decks in Technology
See All in Technology
論文検索を日本語でできるアプリを作ってみた
sailen2
0
140
競争優位を生み出す戦略的内製開発の実践技法
masuda220
PRO
2
500
作るべきものと向き合う - ecspresso 8年間の開発史から学ぶ技術選定 / 技術選定con findy 2026
fujiwara3
6
1.6k
AWS Bedrock Guardrails / 機密情報の入力・出力をブロックする — Blocking Sensitive Information Input/Output
kazuhitonakayama
2
180
【PyCon mini Shizuoka 2026】生成AI時代に画像処理やオーディオ処理のノードエディターを作る理由
kazuhitotakahashi
0
190
【5分でわかる】セーフィー エンジニア向け会社紹介
safie_recruit
0
43k
社内ワークショップで終わらせない 業務改善AIエージェント開発
lycorptech_jp
PRO
1
400
Introduction to Bill One Development Engineer
sansan33
PRO
0
370
Exadata Fleet Update
oracle4engineer
PRO
0
1.3k
All About Sansan – for New Global Engineers
sansan33
PRO
1
1.4k
1 年間の育休から時短勤務で復帰した私が、 AI を駆使して立ち上がりを早めた話
lycorptech_jp
PRO
0
190
社内でAWS BuilderCards体験会を立ち上げ、得られた気づき / 20260225 Masaki Okuda
shift_evolve
PRO
1
150
Featured
See All Featured
Introduction to Domain-Driven Design and Collaborative software design
baasie
1
610
WENDY [Excerpt]
tessaabrams
9
36k
Un-Boring Meetings
codingconduct
0
220
How To Stay Up To Date on Web Technology
chriscoyier
791
250k
Into the Great Unknown - MozCon
thekraken
40
2.3k
The Language of Interfaces
destraynor
162
26k
Marketing Yourself as an Engineer | Alaka | Gurzu
gurzu
0
140
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
35
2.4k
30 Presentation Tips
portentint
PRO
1
240
RailsConf 2023
tenderlove
30
1.4k
What the history of the web can teach us about the future of AI
inesmontani
PRO
1
450
Designing for humans not robots
tammielis
254
26k
Transcript
How do you scale a logging infrastructure to accept a
billion messages a day? Paul Stack http://twitter.com/stack72 mail:
[email protected]
About Me Infrastructure Engineer for a cool startup :) Reformed
ASP.NET / C# Developer DevOps Extremist Conference Junkie
Background Project was to replace the legacy ‘logging solution’
Iteration 0: A Developer created a single box with the
ELK all in 1 jar
Time to make it production ready now
None
Iteration 1: Using Redis as the input mechanism for LogStash
None
None
Enter Apache Kafka
“Kafka is a distributed publish- subscribe messaging system that is
designed to be fast, scalable, and durable” Source: Cloudera Blog
Introduction to Kafka • Kafka is made up of ‘topics’,
‘producers’, ‘consumers’ and ‘brokers’ • Communication is via TCP • Backed by Zookeeper
Kafka Topics Source: http://kafka.apache.org/documentation.html
Kafka Producers • Producers are responsible to chose what topic
to publish data to • The producer is responsible for choosing a partition to write to • Can be handled round robin or partition functions
Kafka Consumers • Consumption can be done via: • queuing
• pub-sub
Kafka Consumers • Kafka consumer group • Strong ordering
Kafka Consumers • Strong ordering
https://github.com/opentable/puppet-exhibitor
None
Iteration 2 Introduction of Kafka
None
None
Iteration 3 Further ‘Improvements’ to the cluster layout
None
The Numbers • Logs kept in ES for 30 days
then archived • 12 billion documents active in ES • ES space was about 25 - 30TB in EBS volumes • Average Doc Size ~ 1.2KB • V-Day 2015: ~750M docs collected without failure
What about metrics and monitoring?
Monitoring - Nagios • Alerts on • ES Cluster •
zK and Kafka Nodes • Logstash / Redis nodes
None
https://github.com/stack72/nagios-elasticsearch
Metrics - Kafka Offset Monitor
https://github.com/opentable/KafkaOffsetMonitor
Metrics - ElasticSearch
None
None
None
Visibility Rocks!
None
So what would I do differently?
Questions?
Paul Stack @stack72