Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to scale a Logging Infrastructure
Search
Paul Stack
June 03, 2015
Technology
210
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
How to scale a Logging Infrastructure
Logging infrastructure using ELK + Kafka
Paul Stack
June 03, 2015
More Decks by Paul Stack
See All by Paul Stack
Infrastructure as Software
stack72
0
100
Mirror, Mirror on the way, what is the vainest metric of them all?
stack72
1
2.4k
Continuously Delivering Infrastructure to the Cloud
stack72
0
240
DevOops 2016
stack72
0
140
The Quest for Infrastructure Management 2.0
stack72
0
170
The Biggest Trick Consultants Ever Pulled was Telling The World Continuous Delivery is Easy
stack72
1
150
The Transition from Product to Infrastructure
stack72
0
93
Continuous Delivery - the missing parts
stack72
0
1k
Windows: Having its ass kicked by puppet and powershell
stack72
0
160
Other Decks in Technology
See All in Technology
Claude Code×Terraform IaC テンプレート駆動開発
itouhi
1
510
AGENTS.mdとSkillsで始めるAIエージェント活用
sonoda_mj
3
200
10倍の生産性を実現するAI駆動並列エージェントのすべて
kumaiu
5
1.4k
プロダクト開発から業務改善コンサルまで。事業全体へ「染み出す」ことで広がるエンジニアの可能性
ham0215
0
110
攻撃者視点で考えるDetection Engineering
cryptopeg
2
1.3k
[モダンアプリ勉強会]今更聞けないGit/GitHub入門
tsukuboshi
0
370
Djangoユーザが知っ得なPostgreSQL機能 - 設計の選択肢を増やす / Djang-use-PostgreSQL
soudai
PRO
1
230
2026TECHFRESH畢業分享會 - 原生還是跨平台? App 開發踩坑實錄
line_developers_tw
PRO
0
880
日本 Fintech 未来予測レポート 2027〜2028年(手動編集版)
8maki
0
2.1k
非定型業務をAI slackbotで自動化する ~ 社内要望を自動壁打ちするbotを作った ~/automating-ad-hoc-work-with-ai-slackbot
shibayu36
0
620
Kubernetesにおける学習基盤とLLMOpsの概要
ry
1
250
AIソロプレナー時代に2ヶ月で20人増員した事業創造会社の開発組織の話
miyatakoji
0
610
Featured
See All Featured
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.7k
Conquering PDFs: document understanding beyond plain text
inesmontani
PRO
4
2.8k
Paper Plane (Part 1)
katiecoart
PRO
0
8.8k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
23k
What's in a price? How to price your products and services
michaelherold
247
13k
Game over? The fight for quality and originality in the time of robots
wayneb77
1
200
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.5k
How to Think Like a Performance Engineer
csswizardry
28
2.6k
[RailsConf 2023] Rails as a piece of cake
palkan
59
6.7k
The Cost Of JavaScript in 2023
addyosmani
55
10k
世界の人気アプリ100個を分析して見えたペイウォール設計の心得
akihiro_kokubo
PRO
71
40k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
940
Transcript
How do you scale a logging infrastructure to accept a
billion messages a day? Paul Stack http://twitter.com/stack72 mail:
[email protected]
About Me Infrastructure Engineer for a cool startup :) Reformed
ASP.NET / C# Developer DevOps Extremist Conference Junkie
Background Project was to replace the legacy ‘logging solution’
Iteration 0: A Developer created a single box with the
ELK all in 1 jar
Time to make it production ready now
None
Iteration 1: Using Redis as the input mechanism for LogStash
None
None
Enter Apache Kafka
“Kafka is a distributed publish- subscribe messaging system that is
designed to be fast, scalable, and durable” Source: Cloudera Blog
Introduction to Kafka • Kafka is made up of ‘topics’,
‘producers’, ‘consumers’ and ‘brokers’ • Communication is via TCP • Backed by Zookeeper
Kafka Topics Source: http://kafka.apache.org/documentation.html
Kafka Producers • Producers are responsible to chose what topic
to publish data to • The producer is responsible for choosing a partition to write to • Can be handled round robin or partition functions
Kafka Consumers • Consumption can be done via: • queuing
• pub-sub
Kafka Consumers • Kafka consumer group • Strong ordering
Kafka Consumers • Strong ordering
https://github.com/opentable/puppet-exhibitor
None
Iteration 2 Introduction of Kafka
None
None
Iteration 3 Further ‘Improvements’ to the cluster layout
None
The Numbers • Logs kept in ES for 30 days
then archived • 12 billion documents active in ES • ES space was about 25 - 30TB in EBS volumes • Average Doc Size ~ 1.2KB • V-Day 2015: ~750M docs collected without failure
What about metrics and monitoring?
Monitoring - Nagios • Alerts on • ES Cluster •
zK and Kafka Nodes • Logstash / Redis nodes
None
https://github.com/stack72/nagios-elasticsearch
Metrics - Kafka Offset Monitor
https://github.com/opentable/KafkaOffsetMonitor
Metrics - ElasticSearch
None
None
None
Visibility Rocks!
None
So what would I do differently?
Questions?
Paul Stack @stack72