Introduction to Elastic Stack

Slide 1

Slide 1 text

Haydar KULEKCI Elastic Stack

Slide 2

Slide 2 text

Have you ever used the Elastic Stack?

Slide 3

Slide 3 text

So, what is that?

Slide 4

Slide 4 text

So, what is that? https://www.elastic.co/guide/en/cloud/current/ec-getting-started-search-use-cases-beats-logstash.html

Slide 5

Slide 5 text

This is a journey!

Slide 6

Slide 6 text

Use Cases ● Logging ● Metrics ● Security Analytics ● Business Metrics ● Business Analytics ● Search ● Recommendation ● Similarity

Slide 7

Slide 7 text

Use Cases (Logging)

Slide 8

Slide 8 text

Use Cases (Metrics) ● Screenshots *******

Slide 9

Slide 9 text

Use Cases (Security Analysis)

Slide 10

Slide 10 text

Use Cases (Business Metrics) https://www.elastic.co/elasticon/conf/2017/sf/tinder-using-the-elastic-stack-to-make-connections-around-the-world https://medium.com/paypal-tech/powering-transactions-search-with-elastic-learnings-from-the-field-aee78c5795d6 https://www.elastic.co/blog/why-elasticsearch-is-an-indispensable-component-of-the-adyen-stack

Slide 11

Slide 11 text

Use Cases (Search)

Slide 12

Slide 12 text

Use Cases (Geo Search)

Slide 13

Slide 13 text

Who uses Elasticsearch? https://www.elastic.co/customers/github https://www.elastic.co/blog/image-recognition-and-search-at-adobe-with-elasticsearch-and-sensei https://www.elastic.co/videos/t-mobiles-new-mobile-app-is-powered-by-elasticsearch https://www.elastic.co/elasticon/tour/2020/europe/from-inflexible-to-elastic-how-audi-business-innovation-leverages-the-full- power-of-elastic-cloud https://www.elastic.co/customers/cisco https://www.elastic.co/elasticon/tour/2017/new-york/elastic-vimeo-elasticsearch-for-search

Slide 14

Slide 14 text

What are we searching for? ● Files ● Text ● Logs ● Locations ● Vectors

Slide 15

Slide 15 text

Elastic Stack and Other Solutions

Slide 16

Slide 16 text

Basic Architecture

Slide 17

Slide 17 text

A little bit Improvement on Architecture

Slide 18

Slide 18 text

Architecture with Elasticsearch

Slide 19

Slide 19 text

● Faster search performance: Elasticsearch can perform text-based searches much faster than an RDBMS ● Improved scalability: Elasticsearch is designed to scale horizontally, meaning that you can add more nodes to your cluster as your data grows, without sacrificing performance. ● Better analytics capabilities: Elasticsearch offers a wide range of analytics features, including the ability to perform aggregations, generate histograms, and create geospatial queries. ● Full-text search capabilities: Elasticsearch is optimized for full-text search, which means that users can perform complex queries that take into account factors like proximity, synonymy, and fuzzy matching. Benefits of Combining Elasticsearch with RDBMS

Slide 20

Slide 20 text

Easy Right?

Slide 21

Slide 21 text

Adding two new services to an existing infrastructure is no easy task.

Slide 22

Slide 22 text

● 1. Importing the PGP key: ○ wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add - ● 2. Install the related packages: ○ sudo apt-get install apt-transport-https ● 3. Save the repository definition: ○ echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list ● 4. Install the Elasticsearch ○ sudo apt-get update && sudo apt-get install elasticsearch Elasticsearch Installation

Slide 23

Slide 23 text

● Importing the PGP key: ○ wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add - ● Install the related packages: ○ sudo apt-get install apt-transport-https ● Save the repository definition: ○ echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list ● Install the Elasticsearch ○ sudo apt-get update && sudo apt-get install elasticsearch Elasticsearch Installation

Slide 24

Slide 24 text

● You can use /etc/elasticsearch/elasticsearch.yml file for the configuration. ● Here some configs: ○ network.host : 127.0.0.1, 0.0.0.0 ○ http.port : HTTP API port ○ discovery.seed_hosts : to provide a list of other nodes in the cluster ○ cluster.initial_master_nodes : While initialize a cluster you need to put here first nodes as a master eligible node. You should not use this setting when restarting a cluster or adding a new node to an existing cluster. ○ gateway.recover_after_data_nodes : Recover as long as this many data nodes have joined the cluster. You can use _recovery endpoint to get active recovery tasks for the shards. ○ action.destructive_requires_name : This prevent the delete requests with “*”. DELETE accounts-* Elasticsearch Configuration

Slide 25

Slide 25 text

● You can use /etc/elasticsearch/elasticsearch.yml file for the configuration. ● Here some configs: ○ bootstrap.memory_lock : try to lock the process address space into RAM while starting ○ path.data : Path for the data of node ○ path.logs : Path for the logs of node ○ cluster.name : Name of your cluster to discover the nodes ○ node.name : Name of your node to see on node list ○ node.attr.rack_id : if you want Elasticsearch to distribute shards across different racks, you might set an awareness attribute called rack_id in each node’s ○ discovery.type : to test with only single-node for testing Elasticsearch Configuration

Slide 26

Slide 26 text

Elasticsearch is built on Java • Its performance is highly dependent on the JVM configuration. • JVM configuration affects how Elasticsearch uses memory, CPU, and other system resources. • Common JVM configuration parameters that can impact Elasticsearch performance include heap size, garbage collection settings, and thread stack size.

Slide 27

Slide 27 text

Let’s Look For Configuration

Slide 28

Slide 28 text

● You can use /etc/elasticsearch/jvm.options file for the configuration. ● Here some configs: ○ -Xms2g: the initial size of total heap space ○ -Xmx2g: the maximum size of total heap space ○ 14-:-XX:+UseG1GC: to use G1GC as a garbage collector ○ -XX:+UseConcMarkSweepGC: to use Concurrent Mark Sweep(CMS) as a garbage collector ○ 8-13:-XX:CMSInitiatingOccupancyFraction=75: to sets the percentage of the old generation occupancy (0 to 100) at which to start a CMS collection cycle ○ -XX:+HeapDumpOnOutOfMemoryError: to generate a heap dump when you get an OOM error ○ -XX:HeapDumpPath=/heap/dump/path: this is the path of output of heap dumps JVM Options

Slide 29

Slide 29 text

● Set Xmx and Xms to no more than 50% of your physical RAM. Elasticsearch requires memory for purposes other than the JVM heap and it is important to leave space for this. ● Set Xmx and Xms to no more than the threshold that the JVM uses for compressed object pointers (compressed oops); the exact threshold varies but is near 32 GB. Check the logs for this: ○ heap size [1.9gb], compressed ordinary object pointers [true] ● For the Xmx and Xms, the exact threshold varies but 26 GB is safe on most systems but can be as large as 30 GB on some systems. ● Larger heaps can cause longer garbage collection pauses. JVM Options https://www.elastic.co/guide/en/elasticsearch/reference/current/advanced-configuration.html

Slide 30

Slide 30 text

● You can also use environment variable to set some options: ○ ES_JAVA_OPTS="-Xms2g -Xmx2g" ./bin/elasticsearch ○ ES_JAVA_OPTS="-Xms4000m -Xmx4000m" ./bin/elasticsearch JVM Options https://www.elastic.co/guide/en/elasticsearch/reference/current/advanced-configuration.html

Slide 31

Slide 31 text

So, let’s Test

Slide 32

Slide 32 text

What could we use? Browser There are so many other solution …. Postman ElasticVue Kibana Terminal

Slide 33

Slide 33 text

Keep continue with Kibana

Slide 34

Slide 34 text

● Just you need to add distribution repository to linux environment. We do this for Elasticsearch. ● After adding repository, we can just run the below command : ○ sudo apt-get update && sudo apt-get install kibana ● After installation we can change some configuration related wit X-Pack and logs. For Ubuntu/Centos, configuration file will be inside the /etc/kibana folder. ● There are several installation type: ○ You can install it with a zip file ○ You can use a Linux distribution releases (deb, rpm) ○ You can install with Docker, also Kubernetes. ○ You have an option for MacOS with brew. Kibana Installation https://www.elastic.co/guide/en/kibana/current/install.html

Slide 35

Slide 35 text

● You can use /etc/kibana/kibana.yml file for the configuration. ● Here some of them: ○ server.host : This setting specifies the host of the backend server. To allow remote users to connect, set the value to the IP address or DNS name of the Kibana server. ○ server.port : Kibana is served by a backend server. This setting specifies the port to use. Default: 5601 ○ server.maxPayloadBytes : The maximum payload size in bytes for incoming server requests. Default: 1048576 ○ elasticsearch.hosts : The URLs of the Elasticsearch instances to use for all your queries. ○ server.name : A human-readable display name that identifies this Kibana instance. Default: "your-hostname" Kibana Configuration https://www.elastic.co/guide/en/kibana/current/install.html

Slide 36

Slide 36 text

● You can use /etc/kibana/kibana.yml file for the configuration. ● Here some of them: ○ kibana.index : Kibana uses an index in Elasticsearch to store saved searches, visualizations, and dashboards. Default is “.kibana” ○ logging.dest : Enables you to specify a file where Kibana stores log output. Default: stdout ○ logging.verbose : Set to true to log all events, including system usage information and all requests. Default: false ○ i18n.locale : Set this value to change the Kibana interface language. Kibana Configuration https://www.elastic.co/guide/en/kibana/current/install.html

Slide 37

Slide 37 text

Kibana Interface

Slide 38

Slide 38 text

Logstash

Slide 39

Slide 39 text

● An opensource data collection engine and data shipper ● Works as an agent inside the servers or a server ● It will send operational data to Elasticsearch or other outputs ● Have a pipeline capability to enrich or filter data What is Logstash? https://www.elastic.co/guide/en/logstash/current/install.html

Slide 40

Slide 40 text

● To Install Logstash, use below commands easily ○ apt-get install apt-transport-https ○ apt-get install logstash ● To test the installation, you can use following: ○ bin/logstash -e 'input { stdin { } } output { stdout {} }' ○ This command will create an input inside the terminal and Logstash will get as an input whatever you write to that input. ○ > hello world 2020-08-19T18:35:19.102+0000 0.0.0.0 hello world > this is a log 2020-08-19T18:35:39.102+0000 0.0.0.0 this is a log Logstash Installation https://www.elastic.co/guide/en/logstash/current/install.html

Slide 41

Slide 41 text

● Sample configuration to get data from filebeat to stdout ● We can see here, Logstash will start listen port 5044 for beats and forward the data to stdout Logstash Configuration input { file { path => "/usr/local/var/log/nginx/*.log" type => "log" start_position => beginning sincedb_path => "/usr/local/Cellar/logstash/8.6.1/sincedb-access" } } output { stdout { } }

Slide 42

Slide 42 text

● We can put some filters for the pipeline. ● In here, filters will put some extra to our logs like geo information of IPs. Logstash Configuration input { beats { port => "5044" } } filter { grok { match => { "message" => "%{COMBINEDAPACHELOG}"} } geoip { source => "clientip" } } output { stdout { codec => rubydebug } }

Slide 43

Slide 43 text

● Grok will parse the text and structure it. ● In our example, %{COMBINEDAPACHELOG} part will parse the logs as Apache log and will try to structure it. ● In fact, COMBINEDAPACHELOG is a shortcut for following grok script: ○ %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} Logstash Configuration https://www.javainuse.com/grok

Slide 44

Slide 44 text

Logstash Configuration https://www.javainuse.com/grok

Slide 45

Slide 45 text

Send the Logs Elasticsearch

Slide 46

Slide 46 text

● We just need to change output if we want to save all this data to Elasticsearch. Logstash Configuration input { beats { port => "5044" } } filter { grok { match => { "message" => "%{COMBINEDAPACHELOG}"} } geoip { source => "clientip" } } output { elasticsearch { hosts => [ "localhost:9200" ] } }

Slide 47

Slide 47 text

Let’s Look Closer

Slide 48

Slide 48 text

● azure_event_hubs ● beats ● cloudwatch ● couchdb_changes ● dead_letter_queue ● elastic_agent ● elasticsearch ● exec ● file ● ganglia ● gelf ● generator ● github Logstash Inputs ● google_cloud_storage ● google_pubsub ● graphite ● heartbeat ● http ● http_poller ● imap ● irc ● java_generator ● java_stdin ● jdbc ● jms ● jmx ● kafka ● kinesis ● log4j ● lumberjack ● meetup ● pipe ● puppet_facter ● rabbitmq ● redis ● relp ● rss ● s3 ● s3-sns-sqssalesforce ● snmp ● sqlite ● sqs ● stdin ● stomp ● syslog ● tcp ● twitter ● udp ● unix ● varnishlog ● websocket ● xmpp

Slide 49

Slide 49 text

● app_search ● boundary ● circonus ● cloudwatch ● csv ● datadog ● datadog_metrics ● dynatrace ● elastic_app_search ● elastic_workplace_ search ● elasticsearch ● email Logstash Outputs ● exec ● file ● ganglia ● gelf ● google_bigquery ● google_cloud_storage ● google_pubsub ● graphite ● graphtastic ● http ● influxdb ● irc ● java_stdout ● juggernaut ● kafka ● librato ● loggly ● lumberjack ● metriccatcher ● mongodb ● nagios ● nagios_nsca ● opentsdb ● pagerduty ● pipe ● rabbitmq ● rediss3 ● sinksolr_http ● statsd ● stdout ● syslog ● tcp ● timber ● udp ● webhdfs ● websocket ● workplace_search ● xmpp ● zabbix

Slide 50

Slide 50 text

● age ● aggregate ● alter ● bytes ● cidr ● cipher ● clone ● csv ● date ● de_dot ● dissect ● dns Logstash Filters ● drop ● elapsed ● elasticsearch ● environment ● extractnumbers ● fingerprint ● geoip ● grok ● http ● i18n ● java_uuid ● jdbc_static ● jdbc_streaming ● json ● json_encode ● kv ● memcached ● metricize ● metrics ● mutate ● prune ● range ● ruby ● sleep ● split ● syslog_pri ● threats_classifier ● throttle ● tld ● translate ● truncate ● urldecode ● useragent ● uuid ● wurfl_device_detection ● xml

Slide 51

Slide 51 text

Beats

Slide 52

Slide 52 text

● An opensource data shipper ● Works as an agent inside the servers ● It will send operational data to Elasticsearch ● There are several type of beats: ○ AuditBeat: collect the audit data of users and processes on your servers ○ FileBeat: collect data from your files ○ FunctionBeat: you can deploy it as a function on your serverless cloud and you can collect data from your services ○ HeartBeat: collect the data from the remotes to check periodically that they are alive or not. ○ MetricBeat: collect data from your servers operating system or your applications. ○ PacketBeat: works by capturing the network traffic between your application servers, decoding the application layer protocols. ○ … so on. What is Beat?

Slide 53

Slide 53 text

● To install metric beat : ○ curl -L -O https://artifacts.elastic.co/downloads/beats/metricbeat/metricbea t-7.8.1-amd64.deb ● To configure it: ○ Use the /etc/metricbeat/metricbeat.yml file. ● There are lots of modules to collect data from the services: ○ Apache, Nginx, ActiveMQ, HAProxy, Kafka, MySQL, Oracle, Redis, RabbitMQ, System(core, cpu, diskio, filesystem, memory, load, network), etc. Metricbeat Installation

Slide 54

Slide 54 text

● Here some config keys from Metricbeat: ○ output.elasticsearch.hosts : ○ output.elasticsearch.username : ○ output.elasticsearch.password : ○ output.logstash.hosts : ○ output.logstash.ssl.key : ○ processors[] ■ add_host_metadata : Will expand the host field ■ copy_fields : Will copy a field to another one ■ drop_fields : drop fields ○ monitoring.enabled : Set to true to enable the monitoring reporter. ○ logging.level: Sets log level. The default log level is info. Available log levels are: error, warning, info, debug Beats Configuration

Slide 55

Slide 55 text

Let’s Look Closer