Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Logging in the age of Microservices and the Cloud

Axel Fontaine
September 13, 2018

Logging in the age of Microservices and the Cloud

The days of the statically partitioned datacenter are over. Welcome to the modern world of microservices and auto-scaling in the cloud. Requests flow through multiple services, individual services are auto-scaled and machines are short-lived.

This is a brave new world and it is time to change the way we design and architect our software to better deal with it. In this talk we'll look at logging and we'll take a deep dive into the challenges involved into moving from the old "SSH and tail -f" world to a world of centralized and structured logs, consumable both by humans and machines.

This session is for developers and architects looking for battle-tested solutions to implement effective logging for microservices in an auto-scaling world.

Axel Fontaine

September 13, 2018
Tweet

More Decks by Axel Fontaine

Other Decks in Technology

Transcript

  1. Logging in the age of
    @axelfontaine
    Cloud
    Microservices
    and the

    View Slide

  2. Axel Fontaine
    @axelfontaine
    flywaydb.org
    boxfuse.com

    View Slide

  3. POLL:
    what type of infrastructure are you running on?
    • On Premise
    • Cloud

    View Slide

  4. The (good) old days of logging …

    View Slide

  5. LOG
    file
    ssh me@myserver
    tail -f server.log

    View Slide

  6. Looks great!

    View Slide

  7. Thanks !
    @axelfontaine
    boxfuse.com

    View Slide

  8. LOG
    file
    ssh me@myserver
    tail -f server.log

    View Slide

  9. Times have changed …

    View Slide

  10. The new reality
    Cloud
    Microservices

    View Slide

  11. But first, back to the fundamental question...

    View Slide

  12. Why are we logging?
    Postmortem analysis of
    user activity and programming errors
    Powerful debugging tool
    Should contain answers to
    important questions:
    What? Who? Where? When?

    View Slide

  13. What?
    Who?
    Where?
    When?

    View Slide

  14. What? Message, Code, Severity
    Who?
    Where?
    When?

    View Slide

  15. What? Message, Code, Severity
    Who? Account, User, Session, Request
    Where?
    When?

    View Slide

  16. What? Message, Code, Severity
    Who? Account, User, Session, Request
    Where? App, Module, Class
    When?

    View Slide

  17. What? Message, Code, Severity
    Who? Account, User, Session, Request
    Where? App, Module, Class
    When? Timestamp, Hostname, PID, Thread
    How can these
    questions be asked?
    How can all this
    information be captured?

    View Slide

  18. Capturing log info

    View Slide

  19. Logging framework architecture
    Your
    Code
    Logger
    Appender
    A
    Appender
    B
    Storage
    B
    Storage
    A

    View Slide

  20. logger.info(“my log message”);
    What? Message, Code, Severity
    Who? Account, User, Session, Request
    Where? App, Module, Class
    When? Timestamp, Hostname, PID, Thread

    View Slide

  21. logger.info(“my log message”);
    What? Message, Code, Severity
    Who? Account, User, Session, Request
    Where? App, Module, Class
    When? Timestamp, Hostname, PID, Thread

    View Slide

  22. logger.info(“my log message”);
    What? Message, Code, Severity
    Who? Account, User, Session, Request
    Where? App, Module, Class
    When? Timestamp, Hostname, PID, Thread

    View Slide

  23. Your
    Code
    Logger
    Appender
    A
    Appender
    B
    Storage
    B
    Storage
    A
    MDC
    Mapped Diagnostic Context
    (Thread-local temporary key-value store)

    View Slide

  24. MDC.put(“account”, “company ABC”);
    MDC.put(“user”, “user123”);
    What? Message, Code, Severity
    Who? Account, User, Session, Request
    Where? App, Module, Class
    When? Timestamp, Hostname, PID, Thread

    View Slide

  25. MDC.put(“account”, “company ABC”);
    MDC.put(“user”, “user123”);

    logger.info(“my log message”);
    What? Message, Code, Severity
    Who? Account, User, Session, Request
    Where? App, Module, Class
    When? Timestamp, Hostname, PID, Thread

    View Slide

  26. MDC.put(“account”, “company ABC”);
    MDC.put(“user”, “user123”);
    Populate when:
    ✓ a request enters the application
    ✓ a message is received from a queue
    ✓ a cron task starts
    ✓ making an async call to another thread
    And don’t forget to clear when done!
    (Threadpools reuse threads!)

    View Slide

  27. Querying the logs

    View Slide

  28. grep?

    View Slide

  29. Truncation!
    Compression! Single line messages!
    No MDC info!

    View Slide

  30. Your
    Code
    Logger
    Appender
    Storage
    (formatted)
    MDC
    Log
    Viewer
    FORMAT
    READ
    Decoupling log storage from log representation

    View Slide

  31. Your
    Code
    Logger
    Appender
    Storage
    (raw)
    MDC
    Log
    Viewer
    READ
    &
    FORMAT
    Structured logging

    View Slide

  32. {
    "account": "axelfontaine",
    "image": "axelfontaine/xyz:543",
    "instance": "i-0d843d5af9b366a69",
    "level": "INFO",
    "logger": "com.myapp.task.TaskService",
    "message": "Successfully killed axelfontaine/demo in prod",
    "request": "crq-7R2CVPUMKREUFLMQUE3XB7JWCX",
    "session": "cli-CRFM2IPABRFUJD7KTDYVDVXABX",
    "thread": "Thread-18710",
    "timestamp": "2017-05-12T10:20:30.444"
    }
    Structured logging

    View Slide

  33. Cloud

    View Slide

  34. Capacity Cost

    View Slide

  35. Spare Capacity
    (paying for something
    you don’t use)
    =
    Wasted Money
    https://www.flickr.com/photos/timothykrause/5677858694/

    View Slide

  36. Scaling
    =
    alarms
    + corrective actions
    (scaling in or out)

    View Slide

  37. Auto Scaling
    =
    automated alarms
    + automated corrective actions
    (scaling in or out)

    View Slide

  38. Load
    Balancer
    ssh me@myserver1
    tail -f server.log
    ssh me@myserver2
    tail -f server.log
    ssh me@myserver3
    tail -f server.log
    LOG
    file
    LOG
    file
    LOG
    file
    CPU Load
    Scale Out
    Scale In

    View Slide

  39. Load
    Balancer
    ssh me@myserver1
    tail -f server.log
    ssh me@myserver2
    tail -f server.log
    ssh me@myserver3
    tail -f server.log
    ssh me@myserver4
    tail -f server.log
    LOG
    file
    LOG
    file
    LOG
    file
    LOG
    file
    Scale Out
    Scale In
    CPU Load

    View Slide

  40. Load
    Balancer
    ssh me@myserver1
    tail -f server.log
    ssh me@myserver2
    tail -f server.log
    ssh me@myserver3
    tail -f server.log
    ssh me@myserver4
    tail -f server.log
    LOG
    file
    LOG
    file
    LOG
    file
    LOG
    file
    Scale Out
    Scale In
    CPU Load

    View Slide

  41. Load
    Balancer
    ssh me@myserver1
    tail -f server.log
    ssh me@myserver3
    tail -f server.log
    ssh me@myserver4
    tail -f server.log
    LOG
    file
    LOG
    file
    LOG
    file
    LOG
    file
    Scale Out
    Scale In
    CPU Load
    ssh me@myserver2
    tail -f server.log

    View Slide

  42. Load
    Balancer
    ssh me@myserver1
    tail -f server.log
    DATA LOSS
    ssh me@myserver3
    tail -f server.log
    ssh me@myserver4
    tail -f server.log
    LOG
    file
    LOG
    file
    LOG
    file
    LOG
    file
    Scale Out
    Scale In
    CPU Load

    View Slide

  43. Load
    Balancer
    LOG
    file
    LOG
    file
    LOG
    file
    log server
    where logs can be
    ✓ aggregated
    ✓ stored and backuped
    ✓ indexed
    ✓ searched

    View Slide

  44. log server
    where logs can be
    ✓ aggregated
    ✓ stored and backuped
    ✓ indexed
    ✓ searched
    Many options:
    • Logstash (ELK)
    • AWS CloudWatch Logs
    • Loggly
    • Papertrail
    • …
    Build or Buy?
    Almost always the better option,
    unless you have truly extreme requirements
    (you probably don't)

    View Slide

  45. or stdout
    Appender
    ✓ tightly integrated with
    logging framework
    ✓ in-process
    ✓ direct MDC access
    ✓ best for homogenous
    environments
    ✓ universal
    ✓ separate process
    ✓ ingests serialized data
    with record separator
    ✓ best for heterogeneous
    environments

    View Slide

  46. Log Retention
    Time
    Cost
    Value
    Best Deal

    View Slide

  47. Log Levels
    Importance
    You want both when
    an important failure
    occurs!
    Detail
    DEBUG INFO WARNING ERROR
    What is missing:
    High water mark filtering!

    View Slide

  48. Microservices

    View Slide

  49. POLL:
    what type of architecture does your software have?
    • Integrated (Monolith)
    • Distributed (Microservices)

    View Slide

  50. log server

    View Slide

  51. Querying across systems

    View Slide

  52. View Slide

  53. View Slide

  54. A B C
    Create MDC
    (based on
    session)
    and assign
    unique
    request ID
    Copy MDC
    to HTTP(S)
    headers
    Read MDC
    HTTP(S)
    headers
    Read MDC
    HTTP(S)
    headers
    Copy MDC
    to HTTP(S)
    headers
    Propagating MDC

    View Slide

  55. Propagating MDC
    A B C
    Filter Decorator Filter Filter
    Decorator
    Two Implementation Options:
    • library (manual, precise control)
    • agent (automatic, risk over overreaching)

    View Slide

  56. Machine-readable logs

    View Slide

  57. Machine-queryable logs
    What? Message, Code, Severity
    Who? Account, User, Session, Request
    Where? App, Module, Class
    When? Timestamp, Hostname, PID, Thread
    Machine-readable logs

    View Slide

  58. AWS CloudWatch Logs

    View Slide

  59. { $.account = “axelfontaine“ && $.request = “crq-12345678” }

    View Slide

  60. {
    "account": "axelfontaine",
    "image": "axelfontaine/xyz:543",
    "instance": "i-0d843d5af9b366a69",
    "level": "INFO",
    "logger": "com.myapp.task.TaskService",
    "message": "Successfully killed axelfontaine/demo in prod",
    "request": "crq-7R2CVPUMKREUFLMQUE3XB7JWCX",
    "session": "cli-CRFM2IPABRFUJD7KTDYVDVXABX",
    "thread": "Thread-18710",
    "timestamp": "2017-05-12T10:20:30.444"
    }
    Standardized keys

    View Slide

  61. {
    "account": "axelfontaine",
    "image": "axelfontaine/xyz:543",
    "instance": "i-0d843d5af9b366a69",
    "level": "INFO",
    "logger": "com.myapp.task.TaskService",
    "message": "Successfully killed axelfontaine/demo in prod",
    "request": "crq-7R2CVPUMKREUFLMQUE3XB7JWCX",
    "session": "cli-CRFM2IPABRFUJD7KTDYVDVXABX",
    "thread": "Thread-18710",
    "timestamp": "2017-05-12T10:20:30.444"
    }
    Standardized keys

    View Slide

  62. {
    "account": "axelfontaine",
    "image": "axelfontaine/xyz:543",
    "instance": "i-0d843d5af9b366a69",
    "level": "INFO",
    "logger": "com.myapp.task.TaskService",
    "message": "Successfully killed axelfontaine/demo in prod",
    "request": "crq-7R2CVPUMKREUFLMQUE3XB7JWCX",
    "session": "cli-CRFM2IPABRFUJD7KTDYVDVXABX",
    "thread": "Thread-18710",
    "timestamp": "2017-05-12T10:20:30.444"
    }
    Standardized values

    View Slide

  63. {
    "account": "axelfontaine",
    "image": "axelfontaine/xyz:543",
    "instance": "i-0d843d5af9b366a69",
    "level": "INFO",
    "logger": "com.myapp.task.TaskService",
    "message": "Successfully killed axelfontaine/demo in prod",
    "request": "crq-7R2CVPUMKREUFLMQUE3XB7JWCX",
    "session": "cli-CRFM2IPABRFUJD7KTDYVDVXABX",
    "thread": "Thread-18710",
    "timestamp": "2017-05-12T10:20:30.444"
    }
    Standardized values

    View Slide

  64. Summary
    ✓ Send your logs to a centralized service
    ✓ Buy, don't build
    ✓ Ensure your logs are structured
    ✓ Standardize keys and values
    ✓ Query your logs to answer the
    what, who, where, when questions

    View Slide

  65. boxfuse.com
    Continuous Deployment as a Service
    for JVM, Node.js and Go apps
    on AWS
    ✓ Up and running in minutes
    ✓ Deploy with 1 command
    ✓ Focus on development
    ✓ Immutable Infrastructure as Code
    ✓ Minimal images
    ✓ Zero downtime blue/green deployments
    boxfuse run my-java-app.jar –env=prod

    View Slide

  66. flywaydb.org
    Evolve your relational database schemas
    reliably across all your environments
    for each of your modules and services
    with pleasure and plain SQL
    ✓ Supports all popular RDBMS
    ✓ Millions of users
    ✓ Designed for Continuous Delivery
    ✓ Open-source Community Edition
    and commercial Pro and Enterprise Editions
    ✓ Highly focused and very easy to get started

    View Slide

  67. Thanks !
    @axelfontaine boxfuse.com flywaydb.org

    View Slide