Slide 1

Slide 1 text

Don’t Fly Blind Logging and Metrics in Microservice Architectures Tammo van Lessen | [email protected] Alexander Heusingfeld | [email protected] #microxchg #logging #metrics www.innoQ.com

Slide 2

Slide 2 text

The Talk Today > Motivation > Distributed Logging > Distributed Metrics > Conclusions

Slide 3

Slide 3 text

Breaking the monolith

Slide 4

Slide 4 text

If you review a monolithic application … © innoQ/Roman Stranghöner

Slide 5

Slide 5 text

…and look into the black box… © innoQ/Roman Stranghöner

Slide 6

Slide 6 text

…you’ll find it consists of multiple Bounded Contexts. © innoQ/Roman Stranghöner

Slide 7

Slide 7 text

If you’re able to treat every Bounded Context as a separately deployable, independent component… © innoQ/Roman Stranghöner

Slide 8

Slide 8 text

… you’ll have a self-contained system - which can lead to a 
 microservice architecture Introduction to self-contained systems: https://www.innoq.com/de/links/self-contained-systems-infodeck/

Slide 9

Slide 9 text

A Broken Monolith

Slide 10

Slide 10 text

Architectural Decisions > Domain Architecture
 
 > Macro Architecture
 
 > Micro Architecture

Slide 11

Slide 11 text

Logging in a Distributed Environment

Slide 12

Slide 12 text

Requirements > Apply a well-thought logging concept > Aggregate logs in different formats from different systems > Search & Correlate > Visualize & Drill-down > Alerting

Slide 13

Slide 13 text

Use Thread Contexts / MDCs %-5p: [%X{loginId}] %m%n ThreadContext.put("loginId", login); logger.error("Something bad happened!"); ThreadContext.clear(); + Layout: ERROR: [John Doe] Something bad happened! Log:

Slide 14

Slide 14 text

Use Thread Contexts / MDCs { "@version" => "1", "@timestamp" => "2014-04-29T14:21:14.988-07:00", "logger" => "com.example.LogStashExampleTest", "level" => "ERROR", "thread" => "Test worker", "message" => "Something bad happened!", "Properties" => { "loginId" => "John Doe" } } ThreadContext.put("loginId", login); logger.error("Something bad happened!"); ThreadContext.clear(); + JSON Layout Log:

Slide 15

Slide 15 text

Define QoS for Log Messages > Log messages may have different QoS > Use Markers and Filters to enable fine- grained routing of messages to dedicated appenders > Use Filters and Lookups to dynamically configure logging https://www.innoq.com/en/blog/per-request-debugging-with-log4j2/

Slide 16

Slide 16 text

Requirements > Apply a well-thought logging concept > Aggregate logs in different formats from different systems > Search & Correlate > Visualize & Drill-down > Alerting

Slide 17

Slide 17 text

Logstash Architecture

Slide 18

Slide 18 text

Default ELK-Stack Setup Shipper / 
 Logstash Forwarder Storage & Search Visualize https://www.elastic.co/products/logstash Push

Slide 19

Slide 19 text

Distributed Logstash Setup Shipper / 
 Logstash Forwarder Broker Indexer Storage & Search Visualize https://www.elastic.co/products/logstash Push Pull

Slide 20

Slide 20 text

Requirements > Apply a well-thought logging concept > Aggregate logs in different formats from different systems > Search & Correlate > Visualize & Drill-down > Alerting

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

Requirements > Apply a well-thought logging concept > Aggregate logs in different formats from different systems > Search & Correlate > Visualize & Drill-down > Alerting

Slide 23

Slide 23 text

Filter Log Stream For Alerts input { … } filter { if [message] =~ /.*(CRITICAL|FATAL|ERROR|EXCEPTION).*/ { mutate { add_tag => "alarm" } } if [message] =~ /.*(?i)ignoreme.*/ { mutate { remove_tag => "alarm" } } } output { if [type] == "production" { if "alarm" in [tags] { pagerduty { description => "%{host} - %{log_level}: %{log_message}" details => { "timestamp" => "%{@timestamp}" "host" => "%{host}" "log_level" => "%{log_level}" "message" => "%{log_message}" "path" => "%{path}" } … } } } }

Slide 24

Slide 24 text

Logging is cool… And I can use it to collect metrics as well, right? © http://www.flickr.com/photos/dkeats/3128150892/

Slide 25

Slide 25 text

Logging is cool… And I can use it to collect metrics as well, right? Watch out! © http://www.flickr.com/photos/dkeats/3128150892/

Slide 26

Slide 26 text

Metrics

Slide 27

Slide 27 text

Kinds of Metrics

Slide 28

Slide 28 text

Kinds of Metrics > Business Metrics

Slide 29

Slide 29 text

Kinds of Metrics > Business Metrics > Application Metrics

Slide 30

Slide 30 text

Kinds of Metrics > Business Metrics > Application Metrics > System Metrics

Slide 31

Slide 31 text

Why should a developer care?

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

Types of Metrics

Slide 35

Slide 35 text

Gauges A gauge is an instrument that measures a value. © https://secure.flickr.com/photos/profilerehab/4974589604/

Slide 36

Slide 36 text

Counters A counter is a simple incrementing and decrementing integer. © https://secure.flickr.com/photos/mwichary/2273099939/

Slide 37

Slide 37 text

Meters A meter measures the rate at which a set of events occur. © https://www.flickr.com/photos/springfieldhomer/1244320899

Slide 38

Slide 38 text

Histograms A histogram measures the distribution of values. © https://secure.flickr.com/photos/boulter/3998842325/

Slide 39

Slide 39 text

Timers A timer is a histogram over a duration. © https://secure.flickr.com/photos/psd/4686988937/

Slide 40

Slide 40 text

Distributed Metrics Architecture Measure Collect & Sample Store Query & Graph Anomaly Detection Alerting CEP Dashboards

Slide 41

Slide 41 text

Grafana for Technicians © http://grafana.org/

Slide 42

Slide 42 text

Grafana for Technicians © http://grafana.org/

Slide 43

Slide 43 text

Dashing for Management Dashboards © https://shopify.github.io/dashing/

Slide 44

Slide 44 text

+ producer unaware of target + multiple targets possible + flexible interval - might miss short-lived services - requires service-discovery P T P Push + event-based de-/registration + routable event stream + producer pushes when ready - producer aware of target - packet-loss might be missed Pull P T P vs.

Slide 45

Slide 45 text

Some Recommendations > Think about what metrics are of importance for operating your application > Consider retention policies > Carefully design your dashboards > Think about non-standard graph types

Slide 46

Slide 46 text

Sample architecture

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

Conclusions > Create and document concepts for logging and metrics > Collect & aggregate distributed logs and metrics > Create dashboards tailored for your audience > Correlate your data to make conscious decisions > Don’t create your very own big data problem

Slide 49

Slide 49 text

Prevent the apocalypse! Logging shows events. Metrics show state. Don't fly blind! © http://www.flickr.com/photos/pasukaru76/5067879762

Slide 50

Slide 50 text

Tammo van Lessen | @taval [email protected] Alexander Heusingfeld | @goldstift [email protected] Thank you! Questions? Comments? innoQ Deutschland GmbH Krischerstr. 100 D-40789 Monheim am Rhein Germany Phone: +49 2173 3366-0 innoQ Schweiz GmbH Gewerbestr. 11 CH-6330 Cham Switzerland Phone: +41 41 743 0116 www.innoq.com Ohlauer Straße 43 D-10999 Berlin Germany Phone: +49 2173 3366-0 Ludwigstr. 180 E D-63067 Offenbach Germany Phone: +49 2173 3366-0 Kreuzstr. 16 D-80331 München Germany Telefon +49 2173 3366-0 https://www.innoq.com/en/talks/