Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What uses for observing operations of Configuration Management?

7d9785e3bdceb2d9e86dabcfb77b1686?s=47 Rudder
February 04, 2019

What uses for observing operations of Configuration Management?

More and more services expose their state, internal details and metrics to be observable, and improve overall quality of service.
But what about observing the infrastructure they are deployed, configured and maintained on?
What can we learn from that, and what do we need from configuration management to get these features and metrics?

Logs from installation is a good start, but they need centralization, aggregation and especially knowledge derivation from these - but also we need to observe these features over time, to trace changes, and correlate them with monitoring.

Rudder was built around the predicate that all actions of the configuration agent need to be traced, centralized and exposed in a meaningful way - with agents ensuring the continuous configuration of systems, and this talk will show the rationale behind this predicate, how we implemented this solution, and the benefits of this approach for the modern IT world.

Nicolas Charles
Configuration Management Camp 2019

7d9785e3bdceb2d9e86dabcfb77b1686?s=128

Rudder

February 04, 2019
Tweet

Transcript

  1. rudder.io What uses for observing operations of Configuration Management? Nicolas

    CHARLES nicolas@rudder.io - @nico_charles 1
  2. Are we really looking at logs? 2 I’m sure everyone

    here does, but...
  3. No error nor change in logs means success? 3 Aren’t

    we missing something?
  4. Getting and understanding the info is complex 4 Operators, Managers,

    Experts, APIs have differents needs Frustration if we need a third party to get data We mistrust what we don’t understand
  5. Getting and understanding the info is complex Putting errors into

    perspective Errors can be expected Errors in production can have catastrophic consequences Errors in a Vagrant VM is much less critical
  6. Getting and understanding the info is complex Strong reliance on

    Expert(s) SPOF Fatigue
  7. Knowing the exact infrastructure state monitoring observability VS

  8. Observability adoption Databases Built in facilities Tooling ecosystem to extract

    knowledge
  9. Observability adoption Software Legacy: embedding agent (often proprietary solutions) New

    developments: Best practices Open standards Architectural bricks
  10. These concepts are core to Rudder Everyone/thing can be an

    actor of configuration management
  11. These concepts are core to Rudder Technique A set of

    operations & configurations to reach a state With variables for configuration Created by experts
  12. These concepts are core to Rudder

  13. These concepts are core to Rudder Directive Technique + Parameters

    Defines how services must be managed Driven by business needs, managed by admins or APIs
  14. These concepts are core to Rudder Rule The application of

    Directive(s) to Group(s) Defines the targets of the Directive(s) Higher approach of services, managed by admins or APIs
  15. Each can focus on what is relevant 15 Operators Security

    Experts
  16. Each can focus on what is relevant 16 Managers APIs

    "rules": [ { "id": "32377fd7-02fd-43d0-aab7-28460a91347b", "name": "Security rules - baseline", "compliance": 100, "mode": "full-compliance", "complianceDetails": { "successAlreadyOK": 87.47, "successNotApplicable": 12.53 }, "directives": [ { "id": "c16e3a90-b9d7-427d-83c1-d80e33124e4c", "name": "CIS Benchmark 2.1.6 - rsh", "compliance": 100.0, "complianceDetails": { "successAlreadyOK": 100.00 }
  17. What is this compliance? PARAM RULE • Id DIRECTIVE •

    Id • (Components) GROUP • Id RUDDER config (global) • Policy Mode • Schedule NODE • Properties • Policy Mode • Schedule Environmental context • Id : . . . • Generated : . . . Files Node configuration Change request Historisation Historization Event logs
  18. What is this compliance? RUDDER config (global) • Policy Mode

    • Schedule NODE • Properties • Policy Mode • Schedule Environmental context • Id : . . . • Generated : . . . Files Node configuration Change request Historisation Event logs PARAM RULE • Id • Groups + Directives DIRECTIVE • Id • Components GROUP • Id Historization
  19. What is this compliance? PARAM RULE • Id DIRECTIVE •

    Id • (Components) GROUP • Id RUDDER config (global) • Policy Mode • Schedule NODE • Properties • Policy Mode • Schedule Environmental context • Id : . . . • Generated : . . . Files Node configuration Change request Historisation Historization Event logs
  20. What is this compliance? PARAM RULE • Id DIRECTIVE •

    Id • (Components) GROUP • Id RUDDER config (global) • Policy Mode • Schedule NODE • Properties • Policy Mode • Schedule Environmental context • Id : . . . • Generated : . . . Files Node configuration Change request Historisation Historization Event logs
  21. What is this compliance? PARAM RULE • Id DIRECTIVE •

    Id • (Components) GROUP • Id RUDDER config (global) • Policy Mode • Schedule NODE • Properties • Policy Mode • Schedule Environmental context • Id : . . . • Generated : . . . Files Node configuration Change request Historisation Historization Event logs
  22. What is this compliance? 22 • Id : . .

    . • Generated : . . . Files Node configuration RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp • Signature Get Policy Send configuration reports Expected reports (node id, config id, timestamp) Run reports Historization Compliance historized Store expected reports Metadata • Integrity • Signature Config • Id • For Rule R, Directive D1, Component C
  23. What is this compliance? 23 • Id : . .

    . • Generated : . . . Files Node configuration Run reports RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp • Signature Get Policy Send configuration reports Expected reports node id config id timestamp end of validity Historization Compliance historized Store expected reports Metadata • Integrity • Signature Config • Id • For Rule R, Directive D1, Component C
  24. What is this compliance? 24 • Id : . .

    . • Generated : . . . Files Node configuration RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp • Signature Get Policy Send configuration reports Expected reports (node id, config id, timestamp) Run reports Historization Compliance historized Store expected reports Metadata • Integrity • Signature Config • Id • For Rule R, Directive D1, Component C
  25. What is this compliance? 25 • Id : . .

    . • Generated : . . . Files Node configuration RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp • Signature Get Policy Send configuration reports Expected reports (node id, config id, timestamp) Run reports Historization Compliance historized Store expected reports Metadata • Integrity • Signature Config • Id • For Rule R, Directive D1, Component C
  26. What is this compliance? 26 • Id : . .

    . • Generated : . . . Files Node configuration RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp • Signature Get Policy Send configuration reports Expected reports (node id, config id, timestamp) Run reports Historization Compliance historized Store expected reports Metadata • Integrity • Signature Config • Id • For Rule R, Directive D1, Component C
  27. Make information available 27 A lot information from inside Rudder,

    usable in Rudder context Details of each run (timestamped info) Policy generation details Serialization of configurations Inventories ...
  28. Causality and dependencies of events 28 Why would we need

    it? • We have logs • We have experts
  29. Causality and dependencies of events 29

  30. Causality and dependencies of events 30 Diagnostic on infrastructures is

    hard • Many systems • Dependencies across systems • Many actors involved An issue on one component can impact hundred systems We need to separate the causes from the symptoms
  31. Causality and dependencies of events 31 Monitoring can only correlate

    Causes and precedences help root cause analysis
  32. Causality and dependencies of events 32 How can we do

    that ??!??
  33. Event sourcing & Tracing 33 Events happen on the whole

    infrastructure Describe and analyze over systems Order events Contextualize
  34. Event sourcing & Tracing 34 Terminology (Dapper & OpenTracing) Trace:

    Description of a “transaction” as it moves through systems Span: Named and timed operation, piece of workflow (+ tags and logs) Span context: Trace information that accompanies the transaction
  35. Event sourcing & Tracing 35 What’s in a span? Operation

    name Start & end timestamps Tags: Set of key:value Logs: Set of key:value SpanContext
  36. Event sourcing & Tracing 36 Temporal relationships between Spans in

    a single Trace https://www.jaegertracing.io/docs/1.9/architecture/
  37. Event sourcing & Tracing 37 What would be the traces?

    Defining the infrastructure state is a trace Each changes before validation is a span Validating results in a change request closes the trace Computing the nodes configurations is a trace Computing targets, overrides and generating files are spans Closes with the serialization of the nodes configurations in database Each run on an node is a trace Each configuration check is a span
  38. Event sourcing & Tracing 38 RULE • Id DIRECTIVE •

    Id GROUP • Id Environmental context • Id : . . . • Generated : . . • Commit id. Files Node configuration Change request RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp • Signature Get config Send configuration reports Expected reports (node id, config id, timestamp) Run reports Historisation Store expected reports Metadata • Integrity • CommitId • Signature Config • For Rule R, Directive D1, Component C Events Commit Id Defining state Trace + Spans Trace Run: Trace Each step: span Message bus
  39. Event sourcing & Tracing 39 • Id : . .

    . • Generated : . . • Commit id. Files Node configuration METADATA • node id • config id • run timestamp RUN METADATA Signature Get config Send configuration reports Expected reports (node id, config id, timestamp) Run reports Store expected reports Metadata • Integrity • CommitId • Signature Config • For Rule R, Directive D1, Component C Trace Message bus Run: Trace Each step: span Compliance CMDB Hooks Monitoring
  40. Event sourcing & Tracing 40 Store Traces & Events: •

    Integrate with systems in place • Many tools are compatible with OpenTracing Correlate with non-observable systems
  41. Closing thoughts 41 With Rudder, information is centralized and made

    available in a relevant way for all actors/things
  42. Closing thoughts 42 How can you benefit more of your

    configuration management?
  43. Closing thoughts 43 What can we do of these billions

    events?
  44. Closing thoughts 44 What can we do of these billions

    events? Reactive approach Query, search and analyze traces in case of problems
  45. Closing thoughts 45 What can we do of these billions

    events? Proactive approach Process mining: Machine Learning on these events Detect unusual behaviours Outliers Inconsistencies across systems
  46. Closing thoughts 46 Mark Burgess Founder of Configuration Management http://markburgess.org/anomalies.htm

    l
  47. rudder.io Questions ? Nicolas CHARLES nicolas@rudder.io - @nico_charles 47

  48. Security? 48 Events, trace and logs hold critical data Within

    a unique system, security can be built-in AuthN/AuthZ For distributed system, it’s much harder Who can see what? Who defines and enforces the authorizations? Tags on events for authorizations
  49. Security? 49 Events, trace and logs hold critical data Cipher

    information vs partial visibility?
  50. rudder.io What uses for observing operations of Configuration Management? Nicolas

    CHARLES nicolas@rudder.io - @nico_charles 50
  51. Event sourcing & Tracing 51 Temporal relationships between Spans in

    a single Trace ––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–> time [Span A···················································] [Span B··············································] [Span D··········································] [Span C········································] [Span E·······] [Span F··] [Span G··] [Span H··] https://opentracing.io/specification/
  52. Event sourcing & Tracing 52 Every components need to know

    the context • Carry the Span Context along each events Add some information for each events • Save on logging thanks to context Send these traces on message bus