Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MicroXchg: Logging and Metrics in Microservice Architectures

MicroXchg: Logging and Metrics in Microservice Architectures

Alexander Heusingfeld

February 05, 2016
Tweet

More Decks by Alexander Heusingfeld

Other Decks in Technology

Transcript

  1. Don’t Fly Blind
    Logging and Metrics in Microservice Architectures
    Tammo van Lessen | [email protected]
    Alexander Heusingfeld | [email protected]
    #microxchg #logging #metrics
    www.innoQ.com

    View full-size slide

  2. The Talk Today
    > Motivation
    > Distributed Logging
    > Distributed Metrics
    > Conclusions

    View full-size slide

  3. Breaking the monolith

    View full-size slide

  4. If you review a
    monolithic application …
    © innoQ/Roman Stranghöner

    View full-size slide

  5. …and look into the
    black box…
    © innoQ/Roman Stranghöner

    View full-size slide

  6. …you’ll find it consists
    of multiple Bounded
    Contexts.
    © innoQ/Roman Stranghöner

    View full-size slide

  7. If you’re able to treat every
    Bounded Context as a
    separately deployable,
    independent component…
    © innoQ/Roman Stranghöner

    View full-size slide

  8. … you’ll have a self-contained
    system - which can lead to a 

    microservice architecture
    Introduction to self-contained systems: https://www.innoq.com/de/links/self-contained-systems-infodeck/

    View full-size slide

  9. A Broken Monolith

    View full-size slide

  10. Architectural Decisions
    > Domain Architecture


    > Macro Architecture


    > Micro Architecture

    View full-size slide

  11. Logging in a Distributed
    Environment

    View full-size slide

  12. Requirements
    > Apply a well-thought logging concept
    > Aggregate logs in different formats from
    different systems
    > Search & Correlate
    > Visualize & Drill-down
    > Alerting

    View full-size slide

  13. Use Thread Contexts / MDCs
    %-5p: [%X{loginId}] %m%n
    ThreadContext.put("loginId", login);
    logger.error("Something bad happened!");
    ThreadContext.clear();
    + Layout:
    ERROR: [John Doe] Something bad happened!
    Log:

    View full-size slide

  14. Use Thread Contexts / MDCs
    {
    "@version" => "1",
    "@timestamp" => "2014-04-29T14:21:14.988-07:00",
    "logger" => "com.example.LogStashExampleTest",
    "level" => "ERROR",
    "thread" => "Test worker",
    "message" => "Something bad happened!",
    "Properties" => {
    "loginId" => "John Doe"
    }
    }
    ThreadContext.put("loginId", login);
    logger.error("Something bad happened!");
    ThreadContext.clear();
    + JSON Layout
    Log:

    View full-size slide

  15. Define QoS for Log Messages
    > Log messages may have different QoS
    > Use Markers and Filters to enable fine-
    grained routing of messages to dedicated
    appenders
    > Use Filters and Lookups to dynamically
    configure logging
    https://www.innoq.com/en/blog/per-request-debugging-with-log4j2/

    View full-size slide

  16. Requirements
    > Apply a well-thought logging concept
    > Aggregate logs in different formats from
    different systems
    > Search & Correlate
    > Visualize & Drill-down
    > Alerting

    View full-size slide

  17. Logstash Architecture

    View full-size slide

  18. Default ELK-Stack Setup
    Shipper / 

    Logstash Forwarder
    Storage & Search Visualize
    https://www.elastic.co/products/logstash
    Push

    View full-size slide

  19. Distributed Logstash Setup
    Shipper / 

    Logstash Forwarder
    Broker Indexer Storage & Search Visualize
    https://www.elastic.co/products/logstash
    Push Pull

    View full-size slide

  20. Requirements
    > Apply a well-thought logging concept
    > Aggregate logs in different formats from
    different systems
    > Search & Correlate
    > Visualize & Drill-down
    > Alerting

    View full-size slide

  21. Requirements
    > Apply a well-thought logging concept
    > Aggregate logs in different formats from
    different systems
    > Search & Correlate
    > Visualize & Drill-down
    > Alerting

    View full-size slide

  22. Filter Log Stream For Alerts
    input {

    }
    filter {
    if [message] =~ /.*(CRITICAL|FATAL|ERROR|EXCEPTION).*/ {
    mutate { add_tag => "alarm" }
    }
    if [message] =~ /.*(?i)ignoreme.*/ {
    mutate { remove_tag => "alarm" }
    }
    }
    output {
    if [type] == "production" {
    if "alarm" in [tags] {
    pagerduty {
    description => "%{host} - %{log_level}: %{log_message}"
    details => {
    "timestamp" => "%{@timestamp}"
    "host" => "%{host}"
    "log_level" => "%{log_level}"
    "message" => "%{log_message}"
    "path" => "%{path}"
    }

    }
    }
    }
    }

    View full-size slide

  23. Logging is cool…
    And I can use it to collect metrics as well, right?
    © http://www.flickr.com/photos/dkeats/3128150892/

    View full-size slide

  24. Logging is cool…
    And I can use it to collect metrics as well, right?
    Watch out!
    © http://www.flickr.com/photos/dkeats/3128150892/

    View full-size slide

  25. Kinds of Metrics

    View full-size slide

  26. Kinds of Metrics
    > Business Metrics

    View full-size slide

  27. Kinds of Metrics
    > Business Metrics
    > Application Metrics

    View full-size slide

  28. Kinds of Metrics
    > Business Metrics
    > Application Metrics
    > System Metrics

    View full-size slide

  29. Why should a developer care?

    View full-size slide

  30. Types of Metrics

    View full-size slide

  31. Gauges
    A gauge is an instrument that measures
    a value.
    © https://secure.flickr.com/photos/profilerehab/4974589604/

    View full-size slide

  32. Counters
    A counter is a simple incrementing and
    decrementing integer.
    © https://secure.flickr.com/photos/mwichary/2273099939/

    View full-size slide

  33. Meters
    A meter measures the rate at which a
    set of events occur.
    © https://www.flickr.com/photos/springfieldhomer/1244320899

    View full-size slide

  34. Histograms
    A histogram measures the distribution
    of values.
    © https://secure.flickr.com/photos/boulter/3998842325/

    View full-size slide

  35. Timers
    A timer is a histogram over a duration.
    © https://secure.flickr.com/photos/psd/4686988937/

    View full-size slide

  36. Distributed Metrics Architecture
    Measure
    Collect &
    Sample
    Store
    Query &
    Graph
    Anomaly
    Detection
    Alerting
    CEP
    Dashboards

    View full-size slide

  37. Grafana for Technicians
    © http://grafana.org/

    View full-size slide

  38. Grafana for Technicians
    © http://grafana.org/

    View full-size slide

  39. Dashing for Management Dashboards
    © https://shopify.github.io/dashing/

    View full-size slide

  40. + producer unaware of target
    + multiple targets possible
    + flexible interval
    - might miss short-lived services
    - requires service-discovery
    P
    T
    P
    Push
    + event-based de-/registration
    + routable event stream
    + producer pushes when ready
    - producer aware of target
    - packet-loss might be missed
    Pull
    P
    T
    P
    vs.

    View full-size slide

  41. Some Recommendations
    > Think about what metrics are of importance
    for operating your application
    > Consider retention policies
    > Carefully design your dashboards
    > Think about non-standard graph types

    View full-size slide

  42. Sample architecture

    View full-size slide

  43. Conclusions
    > Create and document concepts for logging and metrics
    > Collect & aggregate distributed logs and metrics
    > Create dashboards tailored for your audience
    > Correlate your data to make conscious decisions
    > Don’t create your very own big data problem

    View full-size slide

  44. Prevent the apocalypse!
    Logging shows events.
    Metrics show state.
    Don't fly blind!
    © http://www.flickr.com/photos/pasukaru76/5067879762

    View full-size slide

  45. Tammo van Lessen | @taval
    [email protected]
    Alexander Heusingfeld | @goldstift
    [email protected]
    Thank you!
    Questions?
    Comments?
    innoQ Deutschland GmbH
    Krischerstr. 100
    D-40789 Monheim am Rhein
    Germany
    Phone: +49 2173 3366-0
    innoQ Schweiz GmbH
    Gewerbestr. 11
    CH-6330 Cham
    Switzerland
    Phone: +41 41 743 0116
    www.innoq.com
    Ohlauer Straße 43
    D-10999 Berlin
    Germany
    Phone: +49 2173 3366-0
    Ludwigstr. 180 E
    D-63067 Offenbach
    Germany
    Phone: +49 2173 3366-0
    Kreuzstr. 16
    D-80331 München
    Germany
    Telefon +49 2173 3366-0
    https://www.innoq.com/en/talks/

    View full-size slide