Bloomberg's talk at the Elastic Search Meetup at ShopWiki

>>>>>>>>>>>>>> USING ELASTICSEARCH, LOGSTASH AND KIBANA TECHNOLOGIES For centralized viewing
of logs in Bloomberg

Highly Confidential KLOG: Bloomberg Log Viewing Tool • ElasticSearch, Logstash
and Kibana • Security Proxy to safeguard sensitive data • Custom fields extractions mechanism • Throttling • Future plans • Q&A

Highly Confidential PROBLEM STATEMENT • Login to production machines is
required to inspect log files to troubleshoot issues • No in-house tool to view and search through production log files • Splunk is very powerful, but very costly 3

Highly Confidential PROPOSED SOLUTION • Experiment with Open Source technologies
and build in-house specific tool • ElasticSearch, Logstash and Kibana as free alternatives to Splunk • Easy to download, install and integrate with existing in-house log parser 4

Highly Confidential BAMN: BLOOMBERG ALERT LOG MONITORING TOOL • In-house
developed RegEx based alert system • Programmers can setup alerts in their log files • Specify list of files, machines and alert rules • Examples of alerts • Send out a message if a specific pattern shows up 5 times within 3 minutes • Occurrences of patterns (events) can be aggregated across machines and alerted upon 5

Highly Confidential KLOG: BLOOMBERG LOG VIEWING TOOL • In-house log
shipper deployed to all production machines • Dedicated production machines for central storage • In-house built security proxy to restrict who can view what log files on what machines • In-house built regular expressions based field extraction mechanism 6

Highly Confidential KLOG: ARCHITECTURAL DIAGRAM 7

Highly Confidential KLOG: HARDWARE SETUP • 4 machines (RedHat OS)
• 128GB RAM • 1.5TB SSD storage • 16-core CPU • Each machine houses 2 master/data/client nodes w 31GB RAM allocated 8

Highly Confidential KLOG/BAMN: GETTING STARTED • To start indexing your
files, you create a “Ruleset” • “Rulesets” are lists of files, grouped together by TimeStamp format • We allow for client- side filtering via RegEx or full indexing • Set of machines needs to be specified 9

Highly Confidential KLOG: SECURITY PROXY • Each log line is
tagged with “Ruleset” name of file it belongs to • Each “Ruleset” has list of “Authorized Users” • The Python program “KlogProxy” resides between ElasticSearch and Kibana • Makes sure the current user (authenticated by OpenId) is on the list of “Authorized Users” for returned log lines 10

Highly Confidential KLOG: SECURITY PROXY 11 • Modified Kibana to
send back credentials • Sends back our session id • One line change (config option in Kibana 3.0.1) • Shared redis instance holds session info, including a user identification number • Query our own system to determine what access this user has • Insert the list of “Rulesets” into a filter in the request that Kibana sent before passing it on to ElasticSearch

Highly Confidential KLOG: SECURITY PROXY 12 • Modularized for easy
open source release UserInfo.py Kibana KlogProxy.py ES Redis with session info Proprietary DBs with access lists

Highly Confidential KLOG: REGEX BASED FIELDS PARSING • “Extraction Patterns”
allow to enter RegEx for specifying custom fields in KLOG (Python named capture syntax) • Built-in Kibana analytics can be run on custom fields 13

Highly Confidential KLOG: KIBANA FRONT-END 14

Highly Confidential KLOG: MONITORING WITH MARVEL 15

Highly Confidential KLOG: THROTTLING MECHANISM 16 • Log shipper limits
rate of lines shipped • 2MB / ruleset / minute /machine • Hard to balance • Might be a temporary spike due to errors • Throttle on the cluster side • Need to determine a reasonable threshold • Configure for certain “heavy hitters”

Highly Confidential KLOG: FUTURE FEATURES • Custom Cluster solution for
heavy load users • Configurable re-direction of the data • They scale separate from our general cluster • Saving disk space by “expiring” the data • Configurable time limit on the data • Delete by tags 17

Highly Confidential KLOG: STATISTICS • Log shipper on 3000+ production
machines • Indexing 1.5 B log lines per day • Used by 10+ different departments in Bloomberg • Around 100 users and growing • 1.5TB * 4 machines = 6TB of history (about 4-5 days with 1 replica) 18

Highly Confidential KLOG: EXPANSION PLANS • 8 new machines •
256GB RAM • 800GB SSD • 8TB SAN (for archive storage) • These machines will house 2 data nodes • 1 for recent data, 1 for archive data • 4 existing machines will house a client and a master node • 64TB archive + 6.4TB 19

>>>>>>>>>>>>>> QUESTIONS?

Bloomberg's talk at the Elastic Search Meetup a...

Bloomberg's talk at the Elastic Search Meetup at ShopWiki

Olga Zykova

Other Decks in Business

Featured

Transcript

>>>>>>>>>>>>>> USING ELASTICSEARCH, LOGSTASH AND KIBANA TECHNOLOGIES For centralized viewing

Highly Confidential KLOG: Bloomberg Log Viewing Tool • ElasticSearch, Logstash

Highly Confidential PROBLEM STATEMENT • Login to production machines is

Highly Confidential PROPOSED SOLUTION • Experiment with Open Source technologies

Highly Confidential BAMN: BLOOMBERG ALERT LOG MONITORING TOOL • In-house

Highly Confidential KLOG: BLOOMBERG LOG VIEWING TOOL • In-house log

Highly Confidential KLOG: ARCHITECTURAL DIAGRAM 7

Highly Confidential KLOG: HARDWARE SETUP • 4 machines (RedHat OS)

Highly Confidential KLOG/BAMN: GETTING STARTED • To start indexing your

Highly Confidential KLOG: SECURITY PROXY • Each log line is

Highly Confidential KLOG: SECURITY PROXY 11 • Modified Kibana to

Highly Confidential KLOG: SECURITY PROXY 12 • Modularized for easy

Highly Confidential KLOG: REGEX BASED FIELDS PARSING • “Extraction Patterns”

Highly Confidential KLOG: KIBANA FRONT-END 14

Highly Confidential KLOG: MONITORING WITH MARVEL 15

Highly Confidential KLOG: THROTTLING MECHANISM 16 • Log shipper limits

Highly Confidential KLOG: FUTURE FEATURES • Custom Cluster solution for

Highly Confidential KLOG: STATISTICS • Log shipper on 3000+ production

Highly Confidential KLOG: EXPANSION PLANS • 8 new machines •

>>>>>>>>>>>>>> QUESTIONS?