Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Un-broken logging - the foundation of software operability - Operability.io - 2015 - Matthew Skelton

Un-broken logging - the foundation of software operability - Operability.io - 2015 - Matthew Skelton

The way in which many (most?) software teams use logging needs a re-think as we move into a world of microservices and remote sensors. Instead of using logging merely to dump out stack traces, our logs become a continuous trace of application state, with unique-enough identifiers for every interesting point of execution. We also use transaction identifiers to trace calls across components, services, and queues, so that we can reconstruct distributed calls after the fact. Logging becomes a rich source of insight for developers and operations people alike, as we 'listen to the logs' and tighten feedback cycles to improve our software systems.

---

From a talk at Operability.io 2015 conference

Matthew Skelton

September 25, 2015
Tweet

More Decks by Matthew Skelton

Other Decks in Technology

Transcript

  1. Un-Broken Logging the foundation of software operability Operability.io conference #OIO15

    Friday 25th September 2015 Matthew Skelton Skelton Thatcher Consulting @matthewpskelton
  2. The way we use logging is (often) broken How to

    make our logging more awesome Why we should care
  3. Logging is often unloved 1. Discontinuous 2. Errors only, or

    arbitrary 3. ‘Bolted on’ 4. No aggregation & search 5. Specify severity up front
  4. How to make logging awesome 1. Continuous event IDs 2.

    Transaction tracing 3. Log aggregation & search tools 4. Design for logging 5. Decoupled severity
  5. Logging is often unloved 1. Discontinuous 2. Errors only, or

    arbitrary 3. ‘Bolted on’ 4. No aggregation & search 5. Specify severity up front
  6. logging assumed to be free ($0) to implement no budget

    for aggregating logs across machines
  7. How to make logging awesome 1. Continuous event IDs 2.

    Transaction tracing 3. Log aggregation & search tools 4. Design for logging 5. Decoupled severity
  8. public enum EventID { // Badly-initialised logging data NotSet =

    0, // An unrecognised event has occurred UnexpectedError = 10000, ApplicationStarted = 20000, ApplicationShutdownNoticeReceived = 20001, PageGenerationStarted = 30000, PageGenerationCompleted = 30001, MessageQueued = 40000, MessagePeeked = 40001, BasketItemAdded = 60001, BasketItemRemoved = 60002, CreditCardDetailsSubmitted = 70001, // ... }
  9. Technical Domain public enum EventID { // Badly-initialised logging data

    NotSet = 0, // An unrecognised event has occurred UnexpectedError = 10000, ApplicationStarted = 20000, ApplicationShutdownNoticeReceived = 20001, PageGenerationStarted = 30000, PageGenerationCompleted = 30001, MessageQueued = 40000, MessagePeeked = 40001, BasketItemAdded = 60001, BasketItemRemoved = 60002, CreditCardDetailsSubmitted = 70001, // ... }
  10. APM gives us application insight BUT How much do we

    learn? Is APM available on the Dev box? It’s not just ‘an Ops problem’!
  11. { "eventmappings": { "events": { "event": [ { "id": "CacheServiceStarted",

    "severity": { "level": "Information" } }, { "id": "PageCachePurged", "severity": { "level": "Debug" }, "state": { "enabled": false } }, { "id": "DatabaseConnectionTimeOut", "severity": { "level": "Error" } } ] } } }
  12. Event tracing Use enumerations (or closest thing) Technical and Domain

    event types Distributed systems: debuggers less useful Trace calls with ‘unique-enough’ handles Tune log levels via config
  13. NTP

  14. Logging is often unloved 1. Discontinuous 2. Errors only, or

    arbitrary 3. ‘Bolted on’ 4. No aggregation & search 5. Specify severity up front
  15. How to make logging awesome 1. Continuous event IDs 2.

    Transaction tracing 3. Log aggregation & search tools 4. Design for logging 5. Decoupled severity
  16. More Ditch the Debugger and Use Log Analysis Instead Matthew

    Skelton https://blog.logentries.com/2015/07/ditch- the-debugger-and-use-log-analysis-instead/
  17. More Using Log Aggregation Across Dev & Ops: The Pricing

    Advantage Rob Thatcher https://blog.logentries.com/2015/08/using- log-aggregation-across-dev-ops-the-pricing- advantage/