Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building scalable logging solutions with ELK stack

Anh Thi Nguyen
December 07, 2020

Building scalable logging solutions with ELK stack

Anh Thi Nguyen

December 07, 2020
Tweet

More Decks by Anh Thi Nguyen

Other Decks in Technology

Transcript

  1. Scenario  Boss ask: Could you find me logs of

    all service from 1/9 - 10/9 ? SED, GREP, AWK
  2. Problems  Different system paths to log file  /var/log/nginx

     /var/log/mysql  Difficult to monitor logs of a cluster system  We have to SSH into each server to check logs.
  3. Full-text search engine Built-in distributed feature Built upon Apache Lucene

    and written in Java Use HTTP resful API to communicate with the database using JSON format
  4. Terminologies Document A document is a JSON document which is

    stored in Elasticsearch. It is like a row in a table in a relational database. Index Is a collection of documents and each document is a collection of fields, which are the key-value pairs that contain your data.
  5. Storage mechanism - Inverted Index  An inverted index is

    an index data structure storing a mapping from content, such as words or numbers, to its locations in a document or a set of documents D1 : "This is a dog" D2 : "This is a cat" D3 : "Dog eats cat" "this" => {D1, D2} "is" => {D1, D2} "a" => {D1, D2} "dog" => {D1, D3} "cat" => {D2, D3} "eats" => {D3} Supposing we need to find: this dog this {D1, D2} ⋂ dog {D1, D3} = {D1} Documents Inverted Index Tokenize
  6. Distributed storage mechanism Node 1 Node 2 Node 3 Shard

    1 Shard 2 Shard 3 Index Primary shard 1 Replica shard 1 Replica shard 1 Primary shard 2 Replica shard 2 Replica shard 2 Primary shard 3 Replica shard 3 Replica shard 3
  7. Is a web application used for analytics and visualization of

    data from elasticsearch Show performance, metric, logs of application and system services
  8. Logstash is a log aggregator that collects data from various

    input sources, executes different transformations and enhancements and then ships the data to various supported output destinations like ElasticSearch, Kafka,…
  9. Processing data need a pipeline of 3 stages:  3

    stages: Input, Filter, Output  In every stages, we can use different plugins
  10. Filter plugins Grok – Use Regex to parse data GeoIP

    – GeoIP location Date – Parse time stamp Mutate – Add, remove field
  11.  Beats are a collection of lightweight (resource efficient, no

    dependencies, small) and open source log shippers that act as agents installed on the different servers in your infrastructure for collecting logs or metrics.
  12. Filebeat  Filebeat is an agent that specializes in monitoring

    log files and sending log entries to logstash or elasticsearch using supported modules.
  13. Beats  Beats can send data directly into ElasticSearch 

    But usually, Beats are used with logstash to help reduce stress on elasticsearch database
  14. Use Redis, Kafka to buffer log message Where the is

    a huge spike in traffic. Using redis, kafka as a buffering layer can reduce the stress to the system
  15. Hardware requirements for production cluster Filter Số lượng event CPU

    Utilization RAM Grok 8k/s 310% 327MB JSON 8k/s 260% 322MB Nguồn: https://www.slideshare.net/sematext/tuning- elasticsearch-indexing-pipeline-for-logs
  16. Source: https://www.youtube.com/watch?v=nJeCmcUvtmE Master node(8 GB) Master eligible node(8GB) Master eligible

    node(8GB) Data node (32GB) Data node (32GB) Client node(8GB)  We need at least 5 machines(96GB RAM, 20 TB Storage)  Each machine (8 - 16) core CPU