Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to scale a Logging Infrastructure
Search
Paul Stack
June 03, 2015
Technology
0
190
How to scale a Logging Infrastructure
Logging infrastructure using ELK + Kafka
Paul Stack
June 03, 2015
Tweet
Share
More Decks by Paul Stack
See All by Paul Stack
Infrastructure as Software
stack72
0
75
Mirror, Mirror on the way, what is the vainest metric of them all?
stack72
1
2.3k
Continuously Delivering Infrastructure to the Cloud
stack72
0
200
DevOops 2016
stack72
0
130
The Quest for Infrastructure Management 2.0
stack72
0
150
The Biggest Trick Consultants Ever Pulled was Telling The World Continuous Delivery is Easy
stack72
1
130
The Transition from Product to Infrastructure
stack72
0
66
Continuous Delivery - the missing parts
stack72
0
970
Windows: Having its ass kicked by puppet and powershell
stack72
0
140
Other Decks in Technology
See All in Technology
CREが作る自己解決サイクルSlackワークフローに組み込んだAIによる社内ヘルプデスク改革 #cre_meetup
bengo4com
0
340
20251027_マルチエージェントとは
almondo_event
1
440
Biz職でもDifyでできる! 「触らないAIワークフロー」を実現する方法
igarashikana
7
3.5k
アウトプットから始めるOSSコントリビューション 〜eslint-plugin-vueの場合〜 #vuefes
bengo4com
3
1.8k
現場の壁を乗り越えて、 「計装注入」が拓く オブザーバビリティ / Beyond the Field Barriers: Instrumentation Injection and the Future of Observability
aoto
PRO
1
610
頭部ふわふわ浄酔器
uyupun
0
110
入院医療費算定業務をAIで支援する:包括医療費支払い制度とDPCコーディング (公開版)
hagino3000
0
110
AWS re:Invent 2025事前勉強会資料 / AWS re:Invent 2025 pre study meetup
kinunori
0
340
AIでデータ活用を加速させる取り組み / Leveraging AI to accelerate data utilization
okiyuki99
1
480
混合雲環境整合異質工作流程工具運行關鍵業務 Job 的經驗分享
yaosiang
0
190
OCIjp_Oracle AI World_Recap
shinpy
1
180
ゼロコード計装導入後のカスタム計装でさらに可観測性を高めよう
sansantech
PRO
1
460
Featured
See All Featured
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.5k
How to train your dragon (web standard)
notwaldorf
97
6.3k
Embracing the Ebb and Flow
colly
88
4.9k
Into the Great Unknown - MozCon
thekraken
40
2.1k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
34
2.3k
Rails Girls Zürich Keynote
gr2m
95
14k
The Cost Of JavaScript in 2023
addyosmani
55
9.1k
Code Review Best Practice
trishagee
72
19k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
116
20k
BBQ
matthewcrist
89
9.9k
The Language of Interfaces
destraynor
162
25k
Transcript
How do you scale a logging infrastructure to accept a
billion messages a day? Paul Stack http://twitter.com/stack72 mail:
[email protected]
About Me Infrastructure Engineer for a cool startup :) Reformed
ASP.NET / C# Developer DevOps Extremist Conference Junkie
Background Project was to replace the legacy ‘logging solution’
Iteration 0: A Developer created a single box with the
ELK all in 1 jar
Time to make it production ready now
None
Iteration 1: Using Redis as the input mechanism for LogStash
None
None
Enter Apache Kafka
“Kafka is a distributed publish- subscribe messaging system that is
designed to be fast, scalable, and durable” Source: Cloudera Blog
Introduction to Kafka • Kafka is made up of ‘topics’,
‘producers’, ‘consumers’ and ‘brokers’ • Communication is via TCP • Backed by Zookeeper
Kafka Topics Source: http://kafka.apache.org/documentation.html
Kafka Producers • Producers are responsible to chose what topic
to publish data to • The producer is responsible for choosing a partition to write to • Can be handled round robin or partition functions
Kafka Consumers • Consumption can be done via: • queuing
• pub-sub
Kafka Consumers • Kafka consumer group • Strong ordering
Kafka Consumers • Strong ordering
https://github.com/opentable/puppet-exhibitor
None
Iteration 2 Introduction of Kafka
None
None
Iteration 3 Further ‘Improvements’ to the cluster layout
None
The Numbers • Logs kept in ES for 30 days
then archived • 12 billion documents active in ES • ES space was about 25 - 30TB in EBS volumes • Average Doc Size ~ 1.2KB • V-Day 2015: ~750M docs collected without failure
What about metrics and monitoring?
Monitoring - Nagios • Alerts on • ES Cluster •
zK and Kafka Nodes • Logstash / Redis nodes
None
https://github.com/stack72/nagios-elasticsearch
Metrics - Kafka Offset Monitor
https://github.com/opentable/KafkaOffsetMonitor
Metrics - ElasticSearch
None
None
None
Visibility Rocks!
None
So what would I do differently?
Questions?
Paul Stack @stack72