Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to scale a Logging Infrastructure
Search
Paul Stack
June 03, 2015
Technology
0
180
How to scale a Logging Infrastructure
Logging infrastructure using ELK + Kafka
Paul Stack
June 03, 2015
Tweet
Share
More Decks by Paul Stack
See All by Paul Stack
Infrastructure as Software
stack72
0
75
Mirror, Mirror on the way, what is the vainest metric of them all?
stack72
1
2.3k
Continuously Delivering Infrastructure to the Cloud
stack72
0
190
DevOops 2016
stack72
0
120
The Quest for Infrastructure Management 2.0
stack72
0
150
The Biggest Trick Consultants Ever Pulled was Telling The World Continuous Delivery is Easy
stack72
1
120
The Transition from Product to Infrastructure
stack72
0
63
Continuous Delivery - the missing parts
stack72
0
960
Windows: Having its ass kicked by puppet and powershell
stack72
0
140
Other Decks in Technology
See All in Technology
RSCの時代にReactとフレームワークの境界を探る
uhyo
9
2.8k
Kubernetes における cgroup v2 でのOut-Of-Memory 問題の解決
pfn
PRO
0
460
Automating Web Accessibility Testing with AI Agents
maminami373
0
880
Grafana MCPサーバーによるAIエージェント経由でのGrafanaダッシュボード動的生成
hamadakoji
1
1.2k
『FailNet~やらかし共有SNS~』エレベーターピッチ
yokomachi
1
200
異業種出身エンジニアが気づいた、転向して十数年経っても変わらない自分の武器とは
macnekoayu
0
280
生成AI時代のデータ基盤
shibuiwilliam
4
3.1k
オブザーバビリティが広げる AIOps の世界 / The World of AIOps Expanded by Observability
aoto
PRO
0
290
個人CLAUDE.md紹介と設定から学んだこと/introduce-my-claude-md
shibayu36
0
190
エニグモ_会社紹介資料(エンジニア職種向け).pdf
enigmo_hr
0
2.2k
MCPで変わる Amebaデザインシステム「Spindle」の開発
spindle
PRO
3
2.6k
Kiroと学ぶコンテキストエンジニアリング
oikon48
6
8.3k
Featured
See All Featured
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
126
53k
Building Better People: How to give real-time feedback that sticks.
wjessup
368
19k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
Unsuck your backbone
ammeep
671
58k
Building Flexible Design Systems
yeseniaperezcruz
328
39k
StorybookのUI Testing Handbookを読んだ
zakiyama
30
6.1k
Large-scale JavaScript Application Architecture
addyosmani
512
110k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.8k
For a Future-Friendly Web
brad_frost
179
9.9k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
284
13k
[RailsConf 2023] Rails as a piece of cake
palkan
56
5.8k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
111
20k
Transcript
How do you scale a logging infrastructure to accept a
billion messages a day? Paul Stack http://twitter.com/stack72 mail:
[email protected]
About Me Infrastructure Engineer for a cool startup :) Reformed
ASP.NET / C# Developer DevOps Extremist Conference Junkie
Background Project was to replace the legacy ‘logging solution’
Iteration 0: A Developer created a single box with the
ELK all in 1 jar
Time to make it production ready now
None
Iteration 1: Using Redis as the input mechanism for LogStash
None
None
Enter Apache Kafka
“Kafka is a distributed publish- subscribe messaging system that is
designed to be fast, scalable, and durable” Source: Cloudera Blog
Introduction to Kafka • Kafka is made up of ‘topics’,
‘producers’, ‘consumers’ and ‘brokers’ • Communication is via TCP • Backed by Zookeeper
Kafka Topics Source: http://kafka.apache.org/documentation.html
Kafka Producers • Producers are responsible to chose what topic
to publish data to • The producer is responsible for choosing a partition to write to • Can be handled round robin or partition functions
Kafka Consumers • Consumption can be done via: • queuing
• pub-sub
Kafka Consumers • Kafka consumer group • Strong ordering
Kafka Consumers • Strong ordering
https://github.com/opentable/puppet-exhibitor
None
Iteration 2 Introduction of Kafka
None
None
Iteration 3 Further ‘Improvements’ to the cluster layout
None
The Numbers • Logs kept in ES for 30 days
then archived • 12 billion documents active in ES • ES space was about 25 - 30TB in EBS volumes • Average Doc Size ~ 1.2KB • V-Day 2015: ~750M docs collected without failure
What about metrics and monitoring?
Monitoring - Nagios • Alerts on • ES Cluster •
zK and Kafka Nodes • Logstash / Redis nodes
None
https://github.com/stack72/nagios-elasticsearch
Metrics - Kafka Offset Monitor
https://github.com/opentable/KafkaOffsetMonitor
Metrics - ElasticSearch
None
None
None
Visibility Rocks!
None
So what would I do differently?
Questions?
Paul Stack @stack72