Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to scale a Logging Infrastructure
Search
Paul Stack
June 03, 2015
Technology
0
170
How to scale a Logging Infrastructure
Logging infrastructure using ELK + Kafka
Paul Stack
June 03, 2015
Tweet
Share
More Decks by Paul Stack
See All by Paul Stack
Infrastructure as Software
stack72
0
60
Mirror, Mirror on the way, what is the vainest metric of them all?
stack72
1
2.3k
Continuously Delivering Infrastructure to the Cloud
stack72
0
170
DevOops 2016
stack72
0
110
The Quest for Infrastructure Management 2.0
stack72
0
130
The Biggest Trick Consultants Ever Pulled was Telling The World Continuous Delivery is Easy
stack72
1
100
The Transition from Product to Infrastructure
stack72
0
57
Continuous Delivery - the missing parts
stack72
0
910
Windows: Having its ass kicked by puppet and powershell
stack72
0
120
Other Decks in Technology
See All in Technology
新機能VPCリソースエンドポイント機能検証から得られた考察
duelist2020jp
0
220
Turing × atmaCup #18 - 1st Place Solution
hakubishin3
0
470
Oracle Cloud Infrastructure:2024年12月度サービス・アップデート
oracle4engineer
PRO
0
170
サイボウズフロントエンドエキスパートチームについて / FrontendExpert Team
cybozuinsideout
PRO
5
38k
開発生産性向上! 育成を「改善」と捉えるエンジニア育成戦略
shoota
1
300
終了の危機にあった15年続くWebサービスを全力で存続させる - phpcon2024
yositosi
1
760
サーバレスアプリ開発者向けアップデートをキャッチアップしてきた #AWSreInvent #regrowth_fuk
drumnistnakano
0
190
Wvlet: A New Flow-Style Query Language For Functional Data Modeling and Interactive Data Analysis - Trino Summit 2024
xerial
1
110
Wantedly での Datadog 活用事例
bgpat
1
430
20241220_S3 tablesの使い方を検証してみた
handy
3
360
LINE Developersプロダクト(LIFF/LINE Login)におけるフロントエンド開発
lycorptech_jp
PRO
0
120
非機能品質を作り込むための実践アーキテクチャ
knih
3
1k
Featured
See All Featured
Designing Dashboards & Data Visualisations in Web Apps
destraynor
229
52k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
5
440
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
45
2.2k
How To Stay Up To Date on Web Technology
chriscoyier
789
250k
Automating Front-end Workflow
addyosmani
1366
200k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
251
21k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
33
1.9k
How GitHub (no longer) Works
holman
311
140k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
48k
What's in a price? How to price your products and services
michaelherold
243
12k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
44
6.9k
Transcript
How do you scale a logging infrastructure to accept a
billion messages a day? Paul Stack http://twitter.com/stack72 mail:
[email protected]
About Me Infrastructure Engineer for a cool startup :) Reformed
ASP.NET / C# Developer DevOps Extremist Conference Junkie
Background Project was to replace the legacy ‘logging solution’
Iteration 0: A Developer created a single box with the
ELK all in 1 jar
Time to make it production ready now
None
Iteration 1: Using Redis as the input mechanism for LogStash
None
None
Enter Apache Kafka
“Kafka is a distributed publish- subscribe messaging system that is
designed to be fast, scalable, and durable” Source: Cloudera Blog
Introduction to Kafka • Kafka is made up of ‘topics’,
‘producers’, ‘consumers’ and ‘brokers’ • Communication is via TCP • Backed by Zookeeper
Kafka Topics Source: http://kafka.apache.org/documentation.html
Kafka Producers • Producers are responsible to chose what topic
to publish data to • The producer is responsible for choosing a partition to write to • Can be handled round robin or partition functions
Kafka Consumers • Consumption can be done via: • queuing
• pub-sub
Kafka Consumers • Kafka consumer group • Strong ordering
Kafka Consumers • Strong ordering
https://github.com/opentable/puppet-exhibitor
None
Iteration 2 Introduction of Kafka
None
None
Iteration 3 Further ‘Improvements’ to the cluster layout
None
The Numbers • Logs kept in ES for 30 days
then archived • 12 billion documents active in ES • ES space was about 25 - 30TB in EBS volumes • Average Doc Size ~ 1.2KB • V-Day 2015: ~750M docs collected without failure
What about metrics and monitoring?
Monitoring - Nagios • Alerts on • ES Cluster •
zK and Kafka Nodes • Logstash / Redis nodes
None
https://github.com/stack72/nagios-elasticsearch
Metrics - Kafka Offset Monitor
https://github.com/opentable/KafkaOffsetMonitor
Metrics - ElasticSearch
None
None
None
Visibility Rocks!
None
So what would I do differently?
Questions?
Paul Stack @stack72