Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to scale a Logging Infrastructure
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Paul Stack
June 03, 2015
Technology
0
200
How to scale a Logging Infrastructure
Logging infrastructure using ELK + Kafka
Paul Stack
June 03, 2015
Tweet
Share
More Decks by Paul Stack
See All by Paul Stack
Infrastructure as Software
stack72
0
88
Mirror, Mirror on the way, what is the vainest metric of them all?
stack72
1
2.4k
Continuously Delivering Infrastructure to the Cloud
stack72
0
220
DevOops 2016
stack72
0
130
The Quest for Infrastructure Management 2.0
stack72
0
160
The Biggest Trick Consultants Ever Pulled was Telling The World Continuous Delivery is Easy
stack72
1
140
The Transition from Product to Infrastructure
stack72
0
81
Continuous Delivery - the missing parts
stack72
0
990
Windows: Having its ass kicked by puppet and powershell
stack72
0
150
Other Decks in Technology
See All in Technology
SREのプラクティスを用いた3領域同時 マネジメントへの挑戦 〜SRE・情シス・セキュリティを統合した チーム運営術〜
coconala_engineer
2
620
今日から始めるAmazon Bedrock AgentCore
har1101
4
400
GitHub Issue Templates + Coding Agentで簡単みんなでIaC/Easy IaC for Everyone with GitHub Issue Templates + Coding Agent
aeonpeople
1
200
データ民主化のための LLM 活用状況と課題紹介(IVRy の場合)
wxyzzz
2
690
Digitization部 紹介資料
sansan33
PRO
1
6.8k
こんなところでも(地味に)活躍するImage Modeさんを知ってるかい?- Image Mode for OpenShift -
tsukaman
0
120
Context Engineeringが企業で不可欠になる理由
hirosatogamo
PRO
3
530
予期せぬコストの急増を障害のように扱う――「コスト版ポストモーテム」の導入とその後の改善
muziyoshiz
1
1.7k
AI駆動PjMの理想像 と現在地 -実践例を添えて-
masahiro_okamura
1
110
Tebiki Engineering Team Deck
tebiki
0
24k
データの整合性を保ちたいだけなんだ
shoheimitani
8
3.1k
CDK対応したAWS DevOps Agentを試そう_20260201
masakiokuda
1
230
Featured
See All Featured
Designing Experiences People Love
moore
144
24k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.4k
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
72
Mobile First: as difficult as doing things right
swwweet
225
10k
Dominate Local Search Results - an insider guide to GBP, reviews, and Local SEO
greggifford
PRO
0
77
The Invisible Side of Design
smashingmag
302
51k
Building Flexible Design Systems
yeseniaperezcruz
330
40k
Ruling the World: When Life Gets Gamed
codingconduct
0
140
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
730
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
1
1.3k
Tell your own story through comics
letsgokoyo
1
810
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
64
Transcript
How do you scale a logging infrastructure to accept a
billion messages a day? Paul Stack http://twitter.com/stack72 mail:
[email protected]
About Me Infrastructure Engineer for a cool startup :) Reformed
ASP.NET / C# Developer DevOps Extremist Conference Junkie
Background Project was to replace the legacy ‘logging solution’
Iteration 0: A Developer created a single box with the
ELK all in 1 jar
Time to make it production ready now
None
Iteration 1: Using Redis as the input mechanism for LogStash
None
None
Enter Apache Kafka
“Kafka is a distributed publish- subscribe messaging system that is
designed to be fast, scalable, and durable” Source: Cloudera Blog
Introduction to Kafka • Kafka is made up of ‘topics’,
‘producers’, ‘consumers’ and ‘brokers’ • Communication is via TCP • Backed by Zookeeper
Kafka Topics Source: http://kafka.apache.org/documentation.html
Kafka Producers • Producers are responsible to chose what topic
to publish data to • The producer is responsible for choosing a partition to write to • Can be handled round robin or partition functions
Kafka Consumers • Consumption can be done via: • queuing
• pub-sub
Kafka Consumers • Kafka consumer group • Strong ordering
Kafka Consumers • Strong ordering
https://github.com/opentable/puppet-exhibitor
None
Iteration 2 Introduction of Kafka
None
None
Iteration 3 Further ‘Improvements’ to the cluster layout
None
The Numbers • Logs kept in ES for 30 days
then archived • 12 billion documents active in ES • ES space was about 25 - 30TB in EBS volumes • Average Doc Size ~ 1.2KB • V-Day 2015: ~750M docs collected without failure
What about metrics and monitoring?
Monitoring - Nagios • Alerts on • ES Cluster •
zK and Kafka Nodes • Logstash / Redis nodes
None
https://github.com/stack72/nagios-elasticsearch
Metrics - Kafka Offset Monitor
https://github.com/opentable/KafkaOffsetMonitor
Metrics - ElasticSearch
None
None
None
Visibility Rocks!
None
So what would I do differently?
Questions?
Paul Stack @stack72