Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Logging in the age of Microservices and the Cloud

Axel Fontaine
September 13, 2018

Logging in the age of Microservices and the Cloud

The days of the statically partitioned datacenter are over. Welcome to the modern world of microservices and auto-scaling in the cloud. Requests flow through multiple services, individual services are auto-scaled and machines are short-lived.

This is a brave new world and it is time to change the way we design and architect our software to better deal with it. In this talk we'll look at logging and we'll take a deep dive into the challenges involved into moving from the old "SSH and tail -f" world to a world of centralized and structured logs, consumable both by humans and machines.

This session is for developers and architects looking for battle-tested solutions to implement effective logging for microservices in an auto-scaling world.

Axel Fontaine

September 13, 2018
Tweet

More Decks by Axel Fontaine

Other Decks in Technology

Transcript

  1. Logging in the age of
    @axelfontaine
    Cloud
    Microservices
    and the

    View full-size slide

  2. Axel Fontaine
    @axelfontaine
    flywaydb.org
    boxfuse.com

    View full-size slide

  3. POLL:
    what type of infrastructure are you running on?
    • On Premise
    • Cloud

    View full-size slide

  4. The (good) old days of logging …

    View full-size slide

  5. LOG
    file
    ssh me@myserver
    tail -f server.log

    View full-size slide

  6. Looks great!

    View full-size slide

  7. Thanks !
    @axelfontaine
    boxfuse.com

    View full-size slide

  8. LOG
    file
    ssh me@myserver
    tail -f server.log

    View full-size slide

  9. Times have changed …

    View full-size slide

  10. The new reality
    Cloud
    Microservices

    View full-size slide

  11. But first, back to the fundamental question...

    View full-size slide

  12. Why are we logging?
    Postmortem analysis of
    user activity and programming errors
    Powerful debugging tool
    Should contain answers to
    important questions:
    What? Who? Where? When?

    View full-size slide

  13. What?
    Who?
    Where?
    When?

    View full-size slide

  14. What? Message, Code, Severity
    Who?
    Where?
    When?

    View full-size slide

  15. What? Message, Code, Severity
    Who? Account, User, Session, Request
    Where?
    When?

    View full-size slide

  16. What? Message, Code, Severity
    Who? Account, User, Session, Request
    Where? App, Module, Class
    When?

    View full-size slide

  17. What? Message, Code, Severity
    Who? Account, User, Session, Request
    Where? App, Module, Class
    When? Timestamp, Hostname, PID, Thread
    How can these
    questions be asked?
    How can all this
    information be captured?

    View full-size slide

  18. Capturing log info

    View full-size slide

  19. Logging framework architecture
    Your
    Code
    Logger
    Appender
    A
    Appender
    B
    Storage
    B
    Storage
    A

    View full-size slide

  20. logger.info(“my log message”);
    What? Message, Code, Severity
    Who? Account, User, Session, Request
    Where? App, Module, Class
    When? Timestamp, Hostname, PID, Thread

    View full-size slide

  21. logger.info(“my log message”);
    What? Message, Code, Severity
    Who? Account, User, Session, Request
    Where? App, Module, Class
    When? Timestamp, Hostname, PID, Thread

    View full-size slide

  22. logger.info(“my log message”);
    What? Message, Code, Severity
    Who? Account, User, Session, Request
    Where? App, Module, Class
    When? Timestamp, Hostname, PID, Thread

    View full-size slide

  23. Your
    Code
    Logger
    Appender
    A
    Appender
    B
    Storage
    B
    Storage
    A
    MDC
    Mapped Diagnostic Context
    (Thread-local temporary key-value store)

    View full-size slide

  24. MDC.put(“account”, “company ABC”);
    MDC.put(“user”, “user123”);
    What? Message, Code, Severity
    Who? Account, User, Session, Request
    Where? App, Module, Class
    When? Timestamp, Hostname, PID, Thread

    View full-size slide

  25. MDC.put(“account”, “company ABC”);
    MDC.put(“user”, “user123”);

    logger.info(“my log message”);
    What? Message, Code, Severity
    Who? Account, User, Session, Request
    Where? App, Module, Class
    When? Timestamp, Hostname, PID, Thread

    View full-size slide

  26. MDC.put(“account”, “company ABC”);
    MDC.put(“user”, “user123”);
    Populate when:
    ✓ a request enters the application
    ✓ a message is received from a queue
    ✓ a cron task starts
    ✓ making an async call to another thread
    And don’t forget to clear when done!
    (Threadpools reuse threads!)

    View full-size slide

  27. Querying the logs

    View full-size slide

  28. Truncation!
    Compression! Single line messages!
    No MDC info!

    View full-size slide

  29. Your
    Code
    Logger
    Appender
    Storage
    (formatted)
    MDC
    Log
    Viewer
    FORMAT
    READ
    Decoupling log storage from log representation

    View full-size slide

  30. Your
    Code
    Logger
    Appender
    Storage
    (raw)
    MDC
    Log
    Viewer
    READ
    &
    FORMAT
    Structured logging

    View full-size slide

  31. {
    "account": "axelfontaine",
    "image": "axelfontaine/xyz:543",
    "instance": "i-0d843d5af9b366a69",
    "level": "INFO",
    "logger": "com.myapp.task.TaskService",
    "message": "Successfully killed axelfontaine/demo in prod",
    "request": "crq-7R2CVPUMKREUFLMQUE3XB7JWCX",
    "session": "cli-CRFM2IPABRFUJD7KTDYVDVXABX",
    "thread": "Thread-18710",
    "timestamp": "2017-05-12T10:20:30.444"
    }
    Structured logging

    View full-size slide

  32. Capacity Cost

    View full-size slide

  33. Spare Capacity
    (paying for something
    you don’t use)
    =
    Wasted Money
    https://www.flickr.com/photos/timothykrause/5677858694/

    View full-size slide

  34. Scaling
    =
    alarms
    + corrective actions
    (scaling in or out)

    View full-size slide

  35. Auto Scaling
    =
    automated alarms
    + automated corrective actions
    (scaling in or out)

    View full-size slide

  36. Load
    Balancer
    ssh me@myserver1
    tail -f server.log
    ssh me@myserver2
    tail -f server.log
    ssh me@myserver3
    tail -f server.log
    LOG
    file
    LOG
    file
    LOG
    file
    CPU Load
    Scale Out
    Scale In

    View full-size slide

  37. Load
    Balancer
    ssh me@myserver1
    tail -f server.log
    ssh me@myserver2
    tail -f server.log
    ssh me@myserver3
    tail -f server.log
    ssh me@myserver4
    tail -f server.log
    LOG
    file
    LOG
    file
    LOG
    file
    LOG
    file
    Scale Out
    Scale In
    CPU Load

    View full-size slide

  38. Load
    Balancer
    ssh me@myserver1
    tail -f server.log
    ssh me@myserver2
    tail -f server.log
    ssh me@myserver3
    tail -f server.log
    ssh me@myserver4
    tail -f server.log
    LOG
    file
    LOG
    file
    LOG
    file
    LOG
    file
    Scale Out
    Scale In
    CPU Load

    View full-size slide

  39. Load
    Balancer
    ssh me@myserver1
    tail -f server.log
    ssh me@myserver3
    tail -f server.log
    ssh me@myserver4
    tail -f server.log
    LOG
    file
    LOG
    file
    LOG
    file
    LOG
    file
    Scale Out
    Scale In
    CPU Load
    ssh me@myserver2
    tail -f server.log

    View full-size slide

  40. Load
    Balancer
    ssh me@myserver1
    tail -f server.log
    DATA LOSS
    ssh me@myserver3
    tail -f server.log
    ssh me@myserver4
    tail -f server.log
    LOG
    file
    LOG
    file
    LOG
    file
    LOG
    file
    Scale Out
    Scale In
    CPU Load

    View full-size slide

  41. Load
    Balancer
    LOG
    file
    LOG
    file
    LOG
    file
    log server
    where logs can be
    ✓ aggregated
    ✓ stored and backuped
    ✓ indexed
    ✓ searched

    View full-size slide

  42. log server
    where logs can be
    ✓ aggregated
    ✓ stored and backuped
    ✓ indexed
    ✓ searched
    Many options:
    • Logstash (ELK)
    • AWS CloudWatch Logs
    • Loggly
    • Papertrail
    • …
    Build or Buy?
    Almost always the better option,
    unless you have truly extreme requirements
    (you probably don't)

    View full-size slide

  43. or stdout
    Appender
    ✓ tightly integrated with
    logging framework
    ✓ in-process
    ✓ direct MDC access
    ✓ best for homogenous
    environments
    ✓ universal
    ✓ separate process
    ✓ ingests serialized data
    with record separator
    ✓ best for heterogeneous
    environments

    View full-size slide

  44. Log Retention
    Time
    Cost
    Value
    Best Deal

    View full-size slide

  45. Log Levels
    Importance
    You want both when
    an important failure
    occurs!
    Detail
    DEBUG INFO WARNING ERROR
    What is missing:
    High water mark filtering!

    View full-size slide

  46. Microservices

    View full-size slide

  47. POLL:
    what type of architecture does your software have?
    • Integrated (Monolith)
    • Distributed (Microservices)

    View full-size slide

  48. Querying across systems

    View full-size slide

  49. A B C
    Create MDC
    (based on
    session)
    and assign
    unique
    request ID
    Copy MDC
    to HTTP(S)
    headers
    Read MDC
    HTTP(S)
    headers
    Read MDC
    HTTP(S)
    headers
    Copy MDC
    to HTTP(S)
    headers
    Propagating MDC

    View full-size slide

  50. Propagating MDC
    A B C
    Filter Decorator Filter Filter
    Decorator
    Two Implementation Options:
    • library (manual, precise control)
    • agent (automatic, risk over overreaching)

    View full-size slide

  51. Machine-readable logs

    View full-size slide

  52. Machine-queryable logs
    What? Message, Code, Severity
    Who? Account, User, Session, Request
    Where? App, Module, Class
    When? Timestamp, Hostname, PID, Thread
    Machine-readable logs

    View full-size slide

  53. AWS CloudWatch Logs

    View full-size slide

  54. { $.account = “axelfontaine“ && $.request = “crq-12345678” }

    View full-size slide

  55. {
    "account": "axelfontaine",
    "image": "axelfontaine/xyz:543",
    "instance": "i-0d843d5af9b366a69",
    "level": "INFO",
    "logger": "com.myapp.task.TaskService",
    "message": "Successfully killed axelfontaine/demo in prod",
    "request": "crq-7R2CVPUMKREUFLMQUE3XB7JWCX",
    "session": "cli-CRFM2IPABRFUJD7KTDYVDVXABX",
    "thread": "Thread-18710",
    "timestamp": "2017-05-12T10:20:30.444"
    }
    Standardized keys

    View full-size slide

  56. {
    "account": "axelfontaine",
    "image": "axelfontaine/xyz:543",
    "instance": "i-0d843d5af9b366a69",
    "level": "INFO",
    "logger": "com.myapp.task.TaskService",
    "message": "Successfully killed axelfontaine/demo in prod",
    "request": "crq-7R2CVPUMKREUFLMQUE3XB7JWCX",
    "session": "cli-CRFM2IPABRFUJD7KTDYVDVXABX",
    "thread": "Thread-18710",
    "timestamp": "2017-05-12T10:20:30.444"
    }
    Standardized keys

    View full-size slide

  57. {
    "account": "axelfontaine",
    "image": "axelfontaine/xyz:543",
    "instance": "i-0d843d5af9b366a69",
    "level": "INFO",
    "logger": "com.myapp.task.TaskService",
    "message": "Successfully killed axelfontaine/demo in prod",
    "request": "crq-7R2CVPUMKREUFLMQUE3XB7JWCX",
    "session": "cli-CRFM2IPABRFUJD7KTDYVDVXABX",
    "thread": "Thread-18710",
    "timestamp": "2017-05-12T10:20:30.444"
    }
    Standardized values

    View full-size slide

  58. {
    "account": "axelfontaine",
    "image": "axelfontaine/xyz:543",
    "instance": "i-0d843d5af9b366a69",
    "level": "INFO",
    "logger": "com.myapp.task.TaskService",
    "message": "Successfully killed axelfontaine/demo in prod",
    "request": "crq-7R2CVPUMKREUFLMQUE3XB7JWCX",
    "session": "cli-CRFM2IPABRFUJD7KTDYVDVXABX",
    "thread": "Thread-18710",
    "timestamp": "2017-05-12T10:20:30.444"
    }
    Standardized values

    View full-size slide

  59. Summary
    ✓ Send your logs to a centralized service
    ✓ Buy, don't build
    ✓ Ensure your logs are structured
    ✓ Standardize keys and values
    ✓ Query your logs to answer the
    what, who, where, when questions

    View full-size slide

  60. boxfuse.com
    Continuous Deployment as a Service
    for JVM, Node.js and Go apps
    on AWS
    ✓ Up and running in minutes
    ✓ Deploy with 1 command
    ✓ Focus on development
    ✓ Immutable Infrastructure as Code
    ✓ Minimal images
    ✓ Zero downtime blue/green deployments
    boxfuse run my-java-app.jar –env=prod

    View full-size slide

  61. flywaydb.org
    Evolve your relational database schemas
    reliably across all your environments
    for each of your modules and services
    with pleasure and plain SQL
    ✓ Supports all popular RDBMS
    ✓ Millions of users
    ✓ Designed for Continuous Delivery
    ✓ Open-source Community Edition
    and commercial Pro and Enterprise Editions
    ✓ Highly focused and very easy to get started

    View full-size slide

  62. Thanks !
    @axelfontaine boxfuse.com flywaydb.org

    View full-size slide