Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to scale a Logging Infrastructure
Search
Paul Stack
June 03, 2015
Technology
0
190
How to scale a Logging Infrastructure
Logging infrastructure using ELK + Kafka
Paul Stack
June 03, 2015
Tweet
Share
More Decks by Paul Stack
See All by Paul Stack
Infrastructure as Software
stack72
0
82
Mirror, Mirror on the way, what is the vainest metric of them all?
stack72
1
2.4k
Continuously Delivering Infrastructure to the Cloud
stack72
0
210
DevOops 2016
stack72
0
130
The Quest for Infrastructure Management 2.0
stack72
0
160
The Biggest Trick Consultants Ever Pulled was Telling The World Continuous Delivery is Easy
stack72
1
130
The Transition from Product to Infrastructure
stack72
0
75
Continuous Delivery - the missing parts
stack72
0
980
Windows: Having its ass kicked by puppet and powershell
stack72
0
150
Other Decks in Technology
See All in Technology
小さな判断で育つ、大きな意思決定力 / 20251204 Takahiro Kinjo
shift_evolve
PRO
1
580
20251209_WAKECareer_生成AIを活用した設計・開発プロセス
syobochim
5
1.3k
生成AI・AIエージェント時代、データサイエンティストは何をする人なのか?そして、今学生であるあなたは何を学ぶべきか?
kuri8ive
2
2.1k
Karate+Database RiderによるAPI自動テスト導入工数をCline+GitLab MCPを使って2割削減を目指す! / 20251206 Kazuki Takahashi
shift_evolve
PRO
1
470
AWS re:Invent 2025で見たGrafana最新機能の紹介
hamadakoji
0
130
今からでも間に合う!速習Devin入門とその活用方法
ismk
1
430
意外とあった SQL Server 関連アップデート + Database Savings Plans
stknohg
PRO
0
290
[CMU-DB-2025FALL] Apache Fluss - A Streaming Storage for Real-Time Lakehouse
jark
0
110
pmconf2025 - 他社事例を"自社仕様化"する技術_iRAFT法
daichi_yamashita
0
780
コミューンのデータ分析AIエージェント「Community Sage」の紹介
fufufukakaka
0
440
プロダクトマネジメントの分業が生む「デリバリーの渋滞」を解消するTPMの越境
recruitengineers
PRO
3
720
Haskell を武器にして挑む競技プログラミング ─ 操作的思考から意味モデル思考へ
naoya
0
200
Featured
See All Featured
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.6k
Become a Pro
speakerdeck
PRO
31
5.7k
RailsConf 2023
tenderlove
30
1.3k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.7k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
14k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
22k
A Modern Web Designer's Workflow
chriscoyier
698
190k
How STYLIGHT went responsive
nonsquared
100
5.9k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
32
2.7k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
132
19k
Art, The Web, and Tiny UX
lynnandtonic
303
21k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.3k
Transcript
How do you scale a logging infrastructure to accept a
billion messages a day? Paul Stack http://twitter.com/stack72 mail:
[email protected]
About Me Infrastructure Engineer for a cool startup :) Reformed
ASP.NET / C# Developer DevOps Extremist Conference Junkie
Background Project was to replace the legacy ‘logging solution’
Iteration 0: A Developer created a single box with the
ELK all in 1 jar
Time to make it production ready now
None
Iteration 1: Using Redis as the input mechanism for LogStash
None
None
Enter Apache Kafka
“Kafka is a distributed publish- subscribe messaging system that is
designed to be fast, scalable, and durable” Source: Cloudera Blog
Introduction to Kafka • Kafka is made up of ‘topics’,
‘producers’, ‘consumers’ and ‘brokers’ • Communication is via TCP • Backed by Zookeeper
Kafka Topics Source: http://kafka.apache.org/documentation.html
Kafka Producers • Producers are responsible to chose what topic
to publish data to • The producer is responsible for choosing a partition to write to • Can be handled round robin or partition functions
Kafka Consumers • Consumption can be done via: • queuing
• pub-sub
Kafka Consumers • Kafka consumer group • Strong ordering
Kafka Consumers • Strong ordering
https://github.com/opentable/puppet-exhibitor
None
Iteration 2 Introduction of Kafka
None
None
Iteration 3 Further ‘Improvements’ to the cluster layout
None
The Numbers • Logs kept in ES for 30 days
then archived • 12 billion documents active in ES • ES space was about 25 - 30TB in EBS volumes • Average Doc Size ~ 1.2KB • V-Day 2015: ~750M docs collected without failure
What about metrics and monitoring?
Monitoring - Nagios • Alerts on • ES Cluster •
zK and Kafka Nodes • Logstash / Redis nodes
None
https://github.com/stack72/nagios-elasticsearch
Metrics - Kafka Offset Monitor
https://github.com/opentable/KafkaOffsetMonitor
Metrics - ElasticSearch
None
None
None
Visibility Rocks!
None
So what would I do differently?
Questions?
Paul Stack @stack72