Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to scale a Logging Infrastructure
Search
Paul Stack
June 03, 2015
Technology
0
180
How to scale a Logging Infrastructure
Logging infrastructure using ELK + Kafka
Paul Stack
June 03, 2015
Tweet
Share
More Decks by Paul Stack
See All by Paul Stack
Infrastructure as Software
stack72
0
71
Mirror, Mirror on the way, what is the vainest metric of them all?
stack72
1
2.3k
Continuously Delivering Infrastructure to the Cloud
stack72
0
190
DevOops 2016
stack72
0
120
The Quest for Infrastructure Management 2.0
stack72
0
140
The Biggest Trick Consultants Ever Pulled was Telling The World Continuous Delivery is Easy
stack72
1
120
The Transition from Product to Infrastructure
stack72
0
61
Continuous Delivery - the missing parts
stack72
0
950
Windows: Having its ass kicked by puppet and powershell
stack72
0
130
Other Decks in Technology
See All in Technology
“日本一のM&A企業”を支える、少人数SREの効率化戦略 / SRE NEXT 2025
genda
1
280
AIエージェントが書くのなら直接CloudFormationを書かせればいいじゃないですか何故AWS CDKを使う必要があるのさ
watany
19
7.6k
P2P通信の標準化 WebRTCを知ろう
faithandbrave
1
420
振り返りTransit Gateway ~VPCをいい感じでつなげるために~
masakiokuda
4
210
マルチプロダクト環境におけるSREの役割 / SRE NEXT 2025 lunch session
sugamasao
1
760
ソフトウェアQAがハードウェアの人になったの
mineo_matsuya
3
220
AIでテストプロセス自動化に挑戦する
sakatakazunori
1
550
Deep Security Conference 2025:生成AI時代のセキュリティ監視 /dsc2025-genai-secmon
mizutani
4
3k
QuickSight SPICE の効果的な運用戦略~S3 + Athena 構成での実践ノウハウ~/quicksight-spice-s3-athena-best-practices
emiki
0
290
セキュアなAI活用のためのLiteLLMの可能性
tk3fftk
1
360
対話型音声AIアプリケーションの信頼性向上の取り組み
ivry_presentationmaterials
3
1.1k
All About Sansan – for New Global Engineers
sansan33
PRO
1
1.2k
Featured
See All Featured
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3.3k
GitHub's CSS Performance
jonrohan
1031
460k
What's in a price? How to price your products and services
michaelherold
246
12k
Docker and Python
trallard
45
3.5k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
35
2.4k
Thoughts on Productivity
jonyablonski
69
4.7k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
The Cost Of JavaScript in 2023
addyosmani
51
8.6k
Into the Great Unknown - MozCon
thekraken
40
1.9k
The Cult of Friendly URLs
andyhume
79
6.5k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
8
340
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
47
9.6k
Transcript
How do you scale a logging infrastructure to accept a
billion messages a day? Paul Stack http://twitter.com/stack72 mail: paul@paulstack.co.uk
About Me Infrastructure Engineer for a cool startup :) Reformed
ASP.NET / C# Developer DevOps Extremist Conference Junkie
Background Project was to replace the legacy ‘logging solution’
Iteration 0: A Developer created a single box with the
ELK all in 1 jar
Time to make it production ready now
None
Iteration 1: Using Redis as the input mechanism for LogStash
None
None
Enter Apache Kafka
“Kafka is a distributed publish- subscribe messaging system that is
designed to be fast, scalable, and durable” Source: Cloudera Blog
Introduction to Kafka • Kafka is made up of ‘topics’,
‘producers’, ‘consumers’ and ‘brokers’ • Communication is via TCP • Backed by Zookeeper
Kafka Topics Source: http://kafka.apache.org/documentation.html
Kafka Producers • Producers are responsible to chose what topic
to publish data to • The producer is responsible for choosing a partition to write to • Can be handled round robin or partition functions
Kafka Consumers • Consumption can be done via: • queuing
• pub-sub
Kafka Consumers • Kafka consumer group • Strong ordering
Kafka Consumers • Strong ordering
https://github.com/opentable/puppet-exhibitor
None
Iteration 2 Introduction of Kafka
None
None
Iteration 3 Further ‘Improvements’ to the cluster layout
None
The Numbers • Logs kept in ES for 30 days
then archived • 12 billion documents active in ES • ES space was about 25 - 30TB in EBS volumes • Average Doc Size ~ 1.2KB • V-Day 2015: ~750M docs collected without failure
What about metrics and monitoring?
Monitoring - Nagios • Alerts on • ES Cluster •
zK and Kafka Nodes • Logstash / Redis nodes
None
https://github.com/stack72/nagios-elasticsearch
Metrics - Kafka Offset Monitor
https://github.com/opentable/KafkaOffsetMonitor
Metrics - ElasticSearch
None
None
None
Visibility Rocks!
None
So what would I do differently?
Questions?
Paul Stack @stack72