Building scalable logging solutions with ELK stack

by Anh Thi Nguyen

Slide 1

Slide 1 text

Building scalable logging solution with Elastic Stack Presenter: Nguyễn Đăng Anh Thi

Slide 2

Slide 2 text

Log Logs Log Log Log Log Log Log Log Log

Slide 3

Slide 3 text

Problems  Different, un-ununified log format

Slide 4

Slide 4 text

Problems  Different timestamp formats for every service

Slide 5

Slide 5 text

Scenario  Boss ask: Could you find me logs of all service from 1/9 - 10/9 ? SED, GREP, AWK

Slide 6

Slide 6 text

Problems  Different system paths to log file  /var/log/nginx  /var/log/mysql  Difficult to monitor logs of a cluster system  We have to SSH into each server to check logs.

Slide 7

Slide 7 text

Solution – We need a log centralized system

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

Elastic stack

Slide 10

Slide 10 text

Full-text search engine Built-in distributed feature Built upon Apache Lucene and written in Java Use HTTP resful API to communicate with the database using JSON format

Slide 11

Slide 11 text

Usecase Uber Netflix

Slide 12

Slide 12 text

Terminologies Document A document is a JSON document which is stored in Elasticsearch. It is like a row in a table in a relational database. Index Is a collection of documents and each document is a collection of fields, which are the key-value pairs that contain your data.

Slide 13

Slide 13 text

Comparision to SQL relational databases SQL Table Row Elasticsearch Index Document

Slide 14

Slide 14 text

Storage mechanism - Inverted Index  An inverted index is an index data structure storing a mapping from content, such as words or numbers, to its locations in a document or a set of documents D1 : "This is a dog" D2 : "This is a cat" D3 : "Dog eats cat" "this" => {D1, D2} "is" => {D1, D2} "a" => {D1, D2} "dog" => {D1, D3} "cat" => {D2, D3} "eats" => {D3} Supposing we need to find: this dog this {D1, D2} ⋂ dog {D1, D3} = {D1} Documents Inverted Index Tokenize

Slide 15

Slide 15 text

Distributed storage mechanism Shard 1 Shard 2 Shard 3 Index 60GB 20GB 20GB 20GB

Slide 16

Slide 16 text

Distributed storage mechanism Node 1 Node 2 Node 3 Shard 1 Shard 2 Shard 3 Index

Slide 17

Slide 17 text

Distributed storage mechanism Node 1 Node 2 Node 3 Shard 1 Shard 2 Shard 3 Index Primary shard 1 Replica shard 1 Replica shard 1 Primary shard 2 Replica shard 2 Replica shard 2 Primary shard 3 Replica shard 3 Replica shard 3

Slide 18

Slide 18 text

Cluster architecture

Slide 19

Slide 19 text

Definitions

Slide 20

Slide 20 text

❑Manage Index, shard ❑Add, delete node to and from the cluster

Slide 21

Slide 21 text

❑Participating in master selection ❑Can self-promote to master node if the master node failed

Slide 22

Slide 22 text

❑Data storage ❑Return query results

Slide 23

Slide 23 text

❑Routing, load balancing

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

Is a web application used for analytics and visualization of data from elasticsearch Show performance, metric, logs of application and system services

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

Sale dashboard

Slide 28

Slide 28 text

Metric dashboard

Slide 29

Slide 29 text

Machine learning

Slide 30

Slide 30 text

App performance management(APM)

Slide 31

Slide 31 text

Logstash is a log aggregator that collects data from various input sources, executes different transformations and enhancements and then ships the data to various supported output destinations like ElasticSearch, Kafka,…

Slide 32

Slide 32 text

Processing data need a pipeline of 3 stages:  3 stages: Input, Filter, Output  In every stages, we can use different plugins

Slide 33

Slide 33 text

Input Plugins: beats, http, redis, kafka, rabbitmq, amazon cloudwatch

Slide 34

Slide 34 text

Filter Allowing parsing and transforming data from and to diffferent formats Data enrichment

Slide 35

Slide 35 text

Filter plugins Grok – Use Regex to parse data GeoIP – GeoIP location Date – Parse time stamp Mutate – Add, remove field

Slide 36

Slide 36 text

Example: Grok plugin client: 55.3.244.1 method: GET request: /index.html

Slide 37

Slide 37 text

Output - Stash

Slide 38

Slide 38 text

Config sample

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

 Beats are a collection of lightweight (resource efficient, no dependencies, small) and open source log shippers that act as agents installed on the different servers in your infrastructure for collecting logs or metrics.

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

Filebeat  Filebeat is an agent that specializes in monitoring log files and sending log entries to logstash or elasticsearch using supported modules.

Slide 43

Slide 43 text

Metricbeat

Slide 44

Slide 44 text

Heartbeat

Slide 45

Slide 45 text

Filebeat supported modules that help parse log into json format

Slide 46

Slide 46 text

Beats  Beats can send data directly into ElasticSearch  But usually, Beats are used with logstash to help reduce stress on elasticsearch database

Slide 47

Slide 47 text

Elastic stack workflow

Slide 48

Slide 48 text

ELK in production

Slide 49

Slide 49 text

Use Redis, Kafka to buffer log message Where the is a huge spike in traffic. Using redis, kafka as a buffering layer can reduce the stress to the system

Slide 50

Slide 50 text

Compared to other search engine technologies

Slide 51

Slide 51 text

Deployment Elastic cloud

Slide 52

Slide 52 text

Hardware requirements for production cluster Filter Số lượng event CPU Utilization RAM Grok 8k/s 310% 327MB JSON 8k/s 260% 322MB Nguồn: https://www.slideshare.net/sematext/tuning- elasticsearch-indexing-pipeline-for-logs

Slide 53

Slide 53 text

Hardware requirements for elasticsearch production cluster Supposing: Data throughput 15GB/day Data storage 10TB/year

Slide 54

Slide 54 text

Source: https://www.youtube.com/watch?v=nJeCmcUvtmE Master node(8 GB) Master eligible node(8GB) Master eligible node(8GB) Data node (32GB) Data node (32GB) Client node(8GB)  We need at least 5 machines(96GB RAM, 20 TB Storage)  Each machine (8 - 16) core CPU

Slide 55

Slide 55 text

Q & A Thank you!