Monitor your containers with the Elastic Stack

Slide 1

Slide 1 text

Monitor your containers with the Elastic Stack Monica Sarbu

Slide 2

Slide 2 text

2 Monica Sarbu Team lead, Beats team Email: [email protected] Twitter: @monicasarbu

Slide 3

Slide 3 text

@monicasarbu Monitor your containers 3 Apache logs

Slide 4

Slide 4 text

@monicasarbu Monitor your containers 4 memory % CPU % Apache logs

Slide 5

Slide 5 text

@monicasarbu Monitor your containers 5 Apache metrics memory % CPU % Apache logs

Slide 6

Slide 6 text

@monicasarbu Monitor your containers 6 Apache metrics memory % CPU % HTTP transactions Apache logs

Slide 7

Slide 7 text

@monicasarbu Multiple data types, one storage 7 Apache metrics memory % CPU % HTTP transactions Apache logs

Slide 8

Slide 8 text

@monicasarbu Scalable from day 1 8

Slide 9

Slide 9 text

Beats are lightweight shippers that collect and ship all kinds of operational data to Elasticsearch

Slide 10

Slide 10 text

10 Elastic Stack Kibana Elasticsearch Beats Logstash

Slide 11

Slide 11 text

@monicasarbu The Beats 11 30+ other community Beats shipping

Slide 12

Slide 12 text

Filebeat 12

Slide 13

Slide 13 text

tail -f

Slide 14

Slide 14 text

tail -f over the Network

Slide 15

Slide 15 text

tail -f over the Network with extra powers http://www.clipartpanda.com/clipart_images/witches-clip-art-6144127

Slide 16

Slide 16 text

Multiline JSON logs Filtering

Slide 17

Slide 17 text

Send raw log lines 17 { message: “55.3.244.1 GET /index.html 15824 0.043” }

Slide 18

Slide 18 text

@monicasarbu Parse log lines by defining grok patterns 18 I N G E S T or

Slide 19

Slide 19 text

@monicasarbu Grok patterns 19 %{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration} { message: “55.3.244.1 GET /index.html 15824 0.043”, … }

Slide 20

Slide 20 text

@monicasarbu After parsing 20 { message: “55.3.244.1 GET /index.html 15824 0.043” client: “55.3.244.1”, method: “GET”, request: “/index.html” bytes: 15824, duration: 0.043 … }

Slide 21

Slide 21 text

Handle back-pressure

Slide 22

Slide 22 text

@monicasarbu Why back-pressure is key? 22

Slide 23

Slide 23 text

@monicasarbu Synchronous sending 23 batch of messages ack stream of log lines read read acked registry file

Slide 24

Slide 24 text

Filebeat adapts its speed automatically to as much as the next stage can process

Slide 25

Slide 25 text

@monicasarbu When next stage is down … • Filebeat patiently waits • Log lines are not lost • It doesn’t allocate memory • It doesn’t buffer log lines on disk 25

Slide 26

Slide 26 text

At-Least-Once delivery

Slide 27

Slide 27 text

#velo @monicasarbu No such think as “exactly once” 27

Slide 28

Slide 28 text

#velo @monicasarbu 28 batch of messages ack batch of messages same batch of messages ack duplicates! 28

Slide 29

Slide 29 text

Filebeat Collect container logs 29

Slide 30

Slide 30 text

@monicasarbu Docker logging drivers 30 https://docs.docker.com/engine/admin/logging/overview/

Slide 31

Slide 31 text

@monicasarbu 001 Gelf driver + Logstash Pros: • logs send directly to Logstash 31 Cons: • UDP based, no delivery guarantees, no congestion control

Slide 32

Slide 32 text

@monicasarbu 010 json-file driver + Filebeat Pros: • Simple to setup as it’s the default driver • Easy to add container metadata (name, labels, etc.) • `docker logs` works 32 Cons: • json-file driver can slow down Docker container

Slide 33

Slide 33 text

@monicasarbu 011 Syslog driver + Syslog server + Filebeat Pros: • Good control over the path where the files are written, rotation strategies, etc. 33 Cons: • you need to manage the syslog server • metadata is serialized as string, needs to be de- serialized again • multiline is difficult because data from containers can be mixed

Slide 34

Slide 34 text

@monicasarbu 100 Journald driver + Filebeat Pros: • journald is often already available • convenient support for container metadata (name, labels, etc.) • `docker logs` works 34 Cons: • Filebeat doesn’t yet support journald • You can use the community Beat, Journalbeat

Slide 35

Slide 35 text

@monicasarbu 101 Shared volume + Filebeat Pros: • If your app can rotate it’s own logs, it’s very easy to setup • Scales well 35 Cons: • Difficult to pass container metadata (name, labels, etc.)

Slide 36

Slide 36 text

Conclusion “At-least-once” guarantees and handle back-pressure: • json driver + Filebeat • Syslog driver + Filebeat • Shared volume + Filebeat • Journald driver + Filebeat (in the future) 36 No guarantees: • Gelf driver + Logstash • Fluentd + Logstash

Slide 37

Slide 37 text

Metricbeat 37 new in 5.0

Slide 38

Slide 38 text

@monicasarbu One Metricbeat module for each service 38 + Add your own

Slide 39

Slide 39 text

@monicasarbu Metricbeat system module 39 CPU Mem diskIO filesystem processes load network cores

Slide 40

Slide 40 text

Metricbeat Collect container metrics 40

Slide 41

Slide 41 text

@monicasarbu Querying the Docker API • CPU and memory • Docker container information • network (in/out bytes, dropped) • diskIO (reads/writes) • status of containers (# of stopped, running, etc) 41

Slide 42

Slide 42 text

@monicasarbu Docker module • Get container metrics by querying the Docker API • Has access to container names and labels • Easy to setup 42 available in 5.1.1

Slide 43

Slide 43 text

@monicasarbu Reading cgroup data from /proc/ • Doesn’t require access to the Docker API (can be a security issue) • Works for any container runtime (Docker, rkt, runC, LXD, etc.) • Cannot get the container name and labels only the container ID 43

Slide 44

Slide 44 text

@monicasarbu System module + cgroup data • if cgroup option is enabled (by default is disabled) • Automatically enhances process data with cgroup information 44

Slide 45

Slide 45 text

@monicasarbu Run as a container 45 App1 App2 App3 Host

Slide 46

Slide 46 text

46 Elasticsearch as time series DB

Slide 47

Slide 47 text

#velo @monicasarbu Elasticsearch BKD trees 47 • Added for Geo-points • faster to index • faster to query • more disk-efficient • more memory efficient

Slide 48

Slide 48 text

@monicasarbu 0 10000 20000 30000 40000 50000 60000 70000 80000 float half float scaled float (factor = 4000) scaled float (factor = 100) On Disk Usage in kb Points disk usage (kb) docs_values disk usage (kb) Float values 48 • half floats • scaled floats (using a scaling factor) - great for things like percentage points

Slide 49

Slide 49 text

#velo @monicasarbu Why Elasticsearch for time series • Horizontal scalability. Mature and battle tested cluster support. • Flexible aggregations (incl moving averages & Holt Winters) • One system for both logs and metrics • Timelion UI, Grafana • Great ecosystem: e.g. alerting tools 49

Slide 50

Slide 50 text

Packetbeat 50

Slide 51

Slide 51 text

@monicasarbu How Packetbeat works 51 1 2 3 4 capture network traffic decodes network traffic correlates request & response into transactions send transactions to Elasticsearch

Slide 52

Slide 52 text

@monicasarbu Supported traffic decoders 52 + Add your own http:// Thrift DNS ICMP AMQP

Slide 53

Slide 53 text

@monicasarbu Unknown traffic, use flows •Look into data for which we don’t understand the application layer protocol •TLS •Protocols we don’t yet support •Get data about IP / TCP / UDP layers •number of packets & bytes •retransmissions •inter-arrival time 53

Slide 54

Slide 54 text

@monicasarbu Monitor traffic exchanged by containers 54 App1 Host App2 App3 Packetbeat traffic exchanged between your containers

Slide 55

Slide 55 text

55 Demo: Metricbeat, Filebeat, Packetbeat Multiple data types, one view in Kibana

Slide 56

Slide 56 text

Thank you • github.com/elastic/beats • discuss.elastic.co • @elastic #elasticbeats • #beats on freenode 56