From ELK to Elastic: Modern logging and monitoring

Slide 1

Slide 1 text

From ELK to the Elastic stack Tudor Golubenco / @tudor_g 1 Modern logging and monitoring

Slide 2

Slide 2 text

2 Dev working on beats

Slide 3

Slide 3 text

The “ELK” stack 3 Elasticsearch Logstash Kibana

Slide 4

Slide 4 text

4 And then the Beats happened

Slide 5

Slide 5 text

5 Beats Logstash

Slide 6

Slide 6 text

• Core and main plugins re- written in Java • performance improvements • Persistent Queue (WIP) • no drops when killed • Monitoring APIs • no more a black box 6 Logstash new features

Slide 7

Slide 7 text

BELK? KELB? ELKB? ELK-Bee? 7

Slide 8

Slide 8 text

8 Say “Heya” to the Elastic Stack 5.0

Slide 9

Slide 9 text

9 Jun 9, 2015 1.6 Jul 16, 2015 1.7 Feb 19, 2015 4.0 Jun 10, 2015 4.1 May 14th, 2015 1.5 May 27th, 2015 1.0 Beta 1 July 13th, 2015 1.0 Beta 2 Sept 4 th, 2015 1.0 Beta 3 May 23, 2015 1.5 Nov 5, 2014 1.4 It’s complicated es kibana ls beats

Slide 10

Slide 10 text

10 Working beautifully together es kibana ls beats 6.0 7.0 5.0 5.0 5.0 5.0

Slide 11

Slide 11 text

30+ other community Beats shipping The Beat(le)s

Slide 12

Slide 12 text

Topbeat Metricbeat 12

Slide 13

Slide 13 text

Demo: Metricbeat

Slide 14

Slide 14 text

Container monitoring 14 Metricbeat

Slide 15

Slide 15 text

Reading cgroup data from /proc/ • Doesn’t require access to the Docker API (can be a security issue) • Works for any container runtime (Docker, rkt, runC, LXD, etc.) • Automatically enhances process data with cgroup information 15

Slide 16

Slide 16 text

Querying the Docker API • Dedicated Docker module • Has access to container names and labels • It’s somehow easier to setup 16

Slide 17

Slide 17 text

Run as a container 17 App1 App2 App3 Host

Slide 18

Slide 18 text

Elasticsearch as a time series DB 18 Metricbeat

Slide 19

Slide 19 text

#velo Elasticsearch BKD trees 19 • Added for Geo-points • faster to index • faster to query • more disk-efficient • more memory efficient

Slide 20

Slide 20 text

0 10000 20000 30000 40000 50000 60000 70000 80000 float half float scaled float (factor = 4000) scaled float (factor = 100) On Disk Usage in kb Points disk usage (kb) docs_values disk usage (kb) Float values 20 • half floats • scaled floats - great for things like percentage points

Slide 21

Slide 21 text

#velo Why Elasticsearch for time series • Horizontal scalability. Mature and battle tested cluster support. • Flexible aggregations (incl moving averages & Holt Winters) • One system for both logs and metrics • Timelion UI, Grafana • Great ecosystem: e.g. alerting tools 21

Slide 22

Slide 22 text

Filebeat 22

Slide 23

Slide 23 text

• Supports multiple file rotation strategies • “At least once” guarantees, handles backpressure • Extra powers: • Multiline • Filtering • JSON decoding 23 Filebeat

Slide 24

Slide 24 text

Back pressure handling 24 Filebeat

Slide 25

Slide 25 text

#velo 25 Why is backpressure key

Slide 26

Slide 26 text

#velo 26 batch of messages ack Synchronous sending stream of log lines read read acked registry file

Slide 27

Slide 27 text

#velo 27 batch of messages When things go wrong ack 0 (still alive) ack 50% ack 100% read acked

Slide 28

Slide 28 text

• Filebeat adapts it’s speed automatically to as much as the next stage can ingest • Simliar to the “pull” model • But: beaware when benchmarking 28 This means..

Slide 29

Slide 29 text

• Filebeat patiently waits • Log lines are not lost • It doesn’t allocate memory, it doesn’t buffer things on disk 29 When the next stage is down..

Slide 30

Slide 30 text

#velo 30 Using an intermediary queue

Slide 31

Slide 31 text

“At least once” guarantees 31 Filebeat

Slide 32

Slide 32 text

#velo No such think as “exactly once” 32

Slide 33

Slide 33 text

#velo 33 batch of messages ack batch of messages same batch of messages ack duplicates!

Slide 34

Slide 34 text

#velo Potential strategy to reduce dupes • Filebeat generates an UUID for each log line • When indexing to Elasticsearch, use the create API • Deduplication happens in Elasticsearch • But ✦ Duplicates can still happen on Filebeat crashes ✦ Performance penalty at index time 34

Slide 35

Slide 35 text

JSON decoding + Structured logging 35 Filebeat

Slide 36

Slide 36 text

Classic logging logging.debug("User '{}' (id: {}) successfully logged in. Session id: {}” .format(user["name"], user["id"], session_id)) which results in: DEBUG:root:User 'arthur' (id: 42) successfully logged in. Session id: 91e5b9d 36

Slide 37

Slide 37 text

Structured logging • Use a logging library that allows code like: log = log.bind(user='arthur', id=42, verified=False) log.msg(‘logged_in’) • Which creates log lines like: {"verified": false, "user": "arthur", "session_id": "91e5b9d", "id": 42, "event": “logged_in"} 37

Slide 38

Slide 38 text

Ask Me (Almost) Anything Tudor Golubenco / @tudor_g