Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to scale a Logging Infrastructure
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Paul Stack
June 03, 2015
Technology
0
200
How to scale a Logging Infrastructure
Logging infrastructure using ELK + Kafka
Paul Stack
June 03, 2015
Tweet
Share
More Decks by Paul Stack
See All by Paul Stack
Infrastructure as Software
stack72
0
88
Mirror, Mirror on the way, what is the vainest metric of them all?
stack72
1
2.4k
Continuously Delivering Infrastructure to the Cloud
stack72
0
220
DevOops 2016
stack72
0
130
The Quest for Infrastructure Management 2.0
stack72
0
160
The Biggest Trick Consultants Ever Pulled was Telling The World Continuous Delivery is Easy
stack72
1
140
The Transition from Product to Infrastructure
stack72
0
81
Continuous Delivery - the missing parts
stack72
0
990
Windows: Having its ass kicked by puppet and powershell
stack72
0
150
Other Decks in Technology
See All in Technology
StrandsとNeptuneを使ってナレッジグラフを構築する
yakumo
1
100
ファインディの横断SREがTakumi byGMOと取り組む、セキュリティと開発スピードの両立
rvirus0817
1
1.3k
M&A 後の統合をどう進めるか ─ ナレッジワーク × Poetics が実践した組織とシステムの融合
kworkdev
PRO
1
420
レガシー共有バッチ基盤への挑戦 - SREドリブンなリアーキテクチャリングの取り組み
tatsukoni
0
210
AzureでのIaC - Bicep? Terraform? それ早く言ってよ会議
torumakabe
1
500
Context Engineeringが企業で不可欠になる理由
hirosatogamo
PRO
3
530
フルカイテン株式会社 エンジニア向け採用資料
fullkaiten
0
10k
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
6
68k
MCPでつなぐElasticsearchとLLM - 深夜の障害対応を楽にしたい / Bridging Elasticsearch and LLMs with MCP
sashimimochi
0
150
Azure Durable Functions で作った NL2SQL Agent の精度向上に取り組んだ話/jat08
thara0402
0
160
Bill One 開発エンジニア 紹介資料
sansan33
PRO
4
17k
変化するコーディングエージェントとの現実的な付き合い方 〜Cursor安定択説と、ツールに依存しない「資産」〜
empitsu
4
1.3k
Featured
See All Featured
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.7k
Collaborative Software Design: How to facilitate domain modelling decisions
baasie
0
140
The Invisible Side of Design
smashingmag
302
51k
Git: the NoSQL Database
bkeepers
PRO
432
66k
Unlocking the hidden potential of vector embeddings in international SEO
frankvandijk
0
170
Why Our Code Smells
bkeepers
PRO
340
58k
Chasing Engaging Ingredients in Design
codingconduct
0
110
Claude Code のすすめ
schroneko
67
210k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.3k
Breaking role norms: Why Content Design is so much more than writing copy - Taylor Woolridge
uxyall
0
160
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
200
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
1k
Transcript
How do you scale a logging infrastructure to accept a
billion messages a day? Paul Stack http://twitter.com/stack72 mail:
[email protected]
About Me Infrastructure Engineer for a cool startup :) Reformed
ASP.NET / C# Developer DevOps Extremist Conference Junkie
Background Project was to replace the legacy ‘logging solution’
Iteration 0: A Developer created a single box with the
ELK all in 1 jar
Time to make it production ready now
None
Iteration 1: Using Redis as the input mechanism for LogStash
None
None
Enter Apache Kafka
“Kafka is a distributed publish- subscribe messaging system that is
designed to be fast, scalable, and durable” Source: Cloudera Blog
Introduction to Kafka • Kafka is made up of ‘topics’,
‘producers’, ‘consumers’ and ‘brokers’ • Communication is via TCP • Backed by Zookeeper
Kafka Topics Source: http://kafka.apache.org/documentation.html
Kafka Producers • Producers are responsible to chose what topic
to publish data to • The producer is responsible for choosing a partition to write to • Can be handled round robin or partition functions
Kafka Consumers • Consumption can be done via: • queuing
• pub-sub
Kafka Consumers • Kafka consumer group • Strong ordering
Kafka Consumers • Strong ordering
https://github.com/opentable/puppet-exhibitor
None
Iteration 2 Introduction of Kafka
None
None
Iteration 3 Further ‘Improvements’ to the cluster layout
None
The Numbers • Logs kept in ES for 30 days
then archived • 12 billion documents active in ES • ES space was about 25 - 30TB in EBS volumes • Average Doc Size ~ 1.2KB • V-Day 2015: ~750M docs collected without failure
What about metrics and monitoring?
Monitoring - Nagios • Alerts on • ES Cluster •
zK and Kafka Nodes • Logstash / Redis nodes
None
https://github.com/stack72/nagios-elasticsearch
Metrics - Kafka Offset Monitor
https://github.com/opentable/KafkaOffsetMonitor
Metrics - ElasticSearch
None
None
None
Visibility Rocks!
None
So what would I do differently?
Questions?
Paul Stack @stack72