Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What uses for observing operations of Configura...

Rudder
February 04, 2019

What uses for observing operations of Configuration Management?

More and more services expose their state, internal details and metrics to be observable, and improve overall quality of service.
But what about observing the infrastructure they are deployed, configured and maintained on?
What can we learn from that, and what do we need from configuration management to get these features and metrics?

Logs from installation is a good start, but they need centralization, aggregation and especially knowledge derivation from these - but also we need to observe these features over time, to trace changes, and correlate them with monitoring.

Rudder was built around the predicate that all actions of the configuration agent need to be traced, centralized and exposed in a meaningful way - with agents ensuring the continuous configuration of systems, and this talk will show the rationale behind this predicate, how we implemented this solution, and the benefits of this approach for the modern IT world.

Nicolas Charles
Configuration Management Camp 2019

Rudder

February 04, 2019
Tweet

More Decks by Rudder

Other Decks in Programming

Transcript

  1. Getting and understanding the info is complex 4 Operators, Managers,

    Experts, APIs have differents needs Frustration if we need a third party to get data We mistrust what we don’t understand
  2. Getting and understanding the info is complex Putting errors into

    perspective Errors can be expected Errors in production can have catastrophic consequences Errors in a Vagrant VM is much less critical
  3. Observability adoption Software Legacy: embedding agent (often proprietary solutions) New

    developments: Best practices Open standards Architectural bricks
  4. These concepts are core to Rudder Technique A set of

    operations & configurations to reach a state With variables for configuration Created by experts
  5. These concepts are core to Rudder Directive Technique + Parameters

    Defines how services must be managed Driven by business needs, managed by admins or APIs
  6. These concepts are core to Rudder Rule The application of

    Directive(s) to Group(s) Defines the targets of the Directive(s) Higher approach of services, managed by admins or APIs
  7. Each can focus on what is relevant 16 Managers APIs

    "rules": [ { "id": "32377fd7-02fd-43d0-aab7-28460a91347b", "name": "Security rules - baseline", "compliance": 100, "mode": "full-compliance", "complianceDetails": { "successAlreadyOK": 87.47, "successNotApplicable": 12.53 }, "directives": [ { "id": "c16e3a90-b9d7-427d-83c1-d80e33124e4c", "name": "CIS Benchmark 2.1.6 - rsh", "compliance": 100.0, "complianceDetails": { "successAlreadyOK": 100.00 }
  8. What is this compliance? PARAM RULE • Id DIRECTIVE •

    Id • (Components) GROUP • Id RUDDER config (global) • Policy Mode • Schedule NODE • Properties • Policy Mode • Schedule Environmental context • Id : . . . • Generated : . . . Files Node configuration Change request Historisation Historization Event logs
  9. What is this compliance? RUDDER config (global) • Policy Mode

    • Schedule NODE • Properties • Policy Mode • Schedule Environmental context • Id : . . . • Generated : . . . Files Node configuration Change request Historisation Event logs PARAM RULE • Id • Groups + Directives DIRECTIVE • Id • Components GROUP • Id Historization
  10. What is this compliance? PARAM RULE • Id DIRECTIVE •

    Id • (Components) GROUP • Id RUDDER config (global) • Policy Mode • Schedule NODE • Properties • Policy Mode • Schedule Environmental context • Id : . . . • Generated : . . . Files Node configuration Change request Historisation Historization Event logs
  11. What is this compliance? PARAM RULE • Id DIRECTIVE •

    Id • (Components) GROUP • Id RUDDER config (global) • Policy Mode • Schedule NODE • Properties • Policy Mode • Schedule Environmental context • Id : . . . • Generated : . . . Files Node configuration Change request Historisation Historization Event logs
  12. What is this compliance? PARAM RULE • Id DIRECTIVE •

    Id • (Components) GROUP • Id RUDDER config (global) • Policy Mode • Schedule NODE • Properties • Policy Mode • Schedule Environmental context • Id : . . . • Generated : . . . Files Node configuration Change request Historisation Historization Event logs
  13. What is this compliance? 22 • Id : . .

    . • Generated : . . . Files Node configuration RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp • Signature Get Policy Send configuration reports Expected reports (node id, config id, timestamp) Run reports Historization Compliance historized Store expected reports Metadata • Integrity • Signature Config • Id • For Rule R, Directive D1, Component C
  14. What is this compliance? 23 • Id : . .

    . • Generated : . . . Files Node configuration Run reports RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp • Signature Get Policy Send configuration reports Expected reports node id config id timestamp end of validity Historization Compliance historized Store expected reports Metadata • Integrity • Signature Config • Id • For Rule R, Directive D1, Component C
  15. What is this compliance? 24 • Id : . .

    . • Generated : . . . Files Node configuration RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp • Signature Get Policy Send configuration reports Expected reports (node id, config id, timestamp) Run reports Historization Compliance historized Store expected reports Metadata • Integrity • Signature Config • Id • For Rule R, Directive D1, Component C
  16. What is this compliance? 25 • Id : . .

    . • Generated : . . . Files Node configuration RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp • Signature Get Policy Send configuration reports Expected reports (node id, config id, timestamp) Run reports Historization Compliance historized Store expected reports Metadata • Integrity • Signature Config • Id • For Rule R, Directive D1, Component C
  17. What is this compliance? 26 • Id : . .

    . • Generated : . . . Files Node configuration RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp • Signature Get Policy Send configuration reports Expected reports (node id, config id, timestamp) Run reports Historization Compliance historized Store expected reports Metadata • Integrity • Signature Config • Id • For Rule R, Directive D1, Component C
  18. Make information available 27 A lot information from inside Rudder,

    usable in Rudder context Details of each run (timestamped info) Policy generation details Serialization of configurations Inventories ...
  19. Causality and dependencies of events 28 Why would we need

    it? • We have logs • We have experts
  20. Causality and dependencies of events 30 Diagnostic on infrastructures is

    hard • Many systems • Dependencies across systems • Many actors involved An issue on one component can impact hundred systems We need to separate the causes from the symptoms
  21. Causality and dependencies of events 31 Monitoring can only correlate

    Causes and precedences help root cause analysis
  22. Event sourcing & Tracing 33 Events happen on the whole

    infrastructure Describe and analyze over systems Order events Contextualize
  23. Event sourcing & Tracing 34 Terminology (Dapper & OpenTracing) Trace:

    Description of a “transaction” as it moves through systems Span: Named and timed operation, piece of workflow (+ tags and logs) Span context: Trace information that accompanies the transaction
  24. Event sourcing & Tracing 35 What’s in a span? Operation

    name Start & end timestamps Tags: Set of key:value Logs: Set of key:value SpanContext
  25. Event sourcing & Tracing 36 Temporal relationships between Spans in

    a single Trace https://www.jaegertracing.io/docs/1.9/architecture/
  26. Event sourcing & Tracing 37 What would be the traces?

    Defining the infrastructure state is a trace Each changes before validation is a span Validating results in a change request closes the trace Computing the nodes configurations is a trace Computing targets, overrides and generating files are spans Closes with the serialization of the nodes configurations in database Each run on an node is a trace Each configuration check is a span
  27. Event sourcing & Tracing 38 RULE • Id DIRECTIVE •

    Id GROUP • Id Environmental context • Id : . . . • Generated : . . • Commit id. Files Node configuration Change request RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp • Signature Get config Send configuration reports Expected reports (node id, config id, timestamp) Run reports Historisation Store expected reports Metadata • Integrity • CommitId • Signature Config • For Rule R, Directive D1, Component C Events Commit Id Defining state Trace + Spans Trace Run: Trace Each step: span Message bus
  28. Event sourcing & Tracing 39 • Id : . .

    . • Generated : . . • Commit id. Files Node configuration METADATA • node id • config id • run timestamp RUN METADATA Signature Get config Send configuration reports Expected reports (node id, config id, timestamp) Run reports Store expected reports Metadata • Integrity • CommitId • Signature Config • For Rule R, Directive D1, Component C Trace Message bus Run: Trace Each step: span Compliance CMDB Hooks Monitoring
  29. Event sourcing & Tracing 40 Store Traces & Events: •

    Integrate with systems in place • Many tools are compatible with OpenTracing Correlate with non-observable systems
  30. Closing thoughts 41 With Rudder, information is centralized and made

    available in a relevant way for all actors/things
  31. Closing thoughts 44 What can we do of these billions

    events? Reactive approach Query, search and analyze traces in case of problems
  32. Closing thoughts 45 What can we do of these billions

    events? Proactive approach Process mining: Machine Learning on these events Detect unusual behaviours Outliers Inconsistencies across systems
  33. Security? 48 Events, trace and logs hold critical data Within

    a unique system, security can be built-in AuthN/AuthZ For distributed system, it’s much harder Who can see what? Who defines and enforces the authorizations? Tags on events for authorizations
  34. Event sourcing & Tracing 51 Temporal relationships between Spans in

    a single Trace ––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–> time [Span A···················································] [Span B··············································] [Span D··········································] [Span C········································] [Span E·······] [Span F··] [Span G··] [Span H··] https://opentracing.io/specification/
  35. Event sourcing & Tracing 52 Every components need to know

    the context • Carry the Span Context along each events Add some information for each events • Save on logging thanks to context Send these traces on message bus