What uses for observing operations of Configuration Management?

rudder.io What uses for observing operations of Configuration Management? Nicolas
CHARLES [email protected] - @nico_charles 1

Are we really looking at logs? 2 I’m sure everyone
here does, but...

No error nor change in logs means success? 3 Aren’t
we missing something?

Getting and understanding the info is complex 4 Operators, Managers,
Experts, APIs have differents needs Frustration if we need a third party to get data We mistrust what we don’t understand

Getting and understanding the info is complex Putting errors into
perspective Errors can be expected Errors in production can have catastrophic consequences Errors in a Vagrant VM is much less critical

Getting and understanding the info is complex Strong reliance on
Expert(s) SPOF Fatigue

Knowing the exact infrastructure state monitoring observability VS

Observability adoption Databases Built in facilities Tooling ecosystem to extract
knowledge

Observability adoption Software Legacy: embedding agent (often proprietary solutions) New
developments: Best practices Open standards Architectural bricks

These concepts are core to Rudder Everyone/thing can be an
actor of configuration management

These concepts are core to Rudder Technique A set of
operations & configurations to reach a state With variables for configuration Created by experts

These concepts are core to Rudder

These concepts are core to Rudder Directive Technique + Parameters
Defines how services must be managed Driven by business needs, managed by admins or APIs

These concepts are core to Rudder Rule The application of
Directive(s) to Group(s) Defines the targets of the Directive(s) Higher approach of services, managed by admins or APIs

Each can focus on what is relevant 15 Operators Security
Experts

Each can focus on what is relevant 16 Managers APIs
"rules": [ { "id": "32377fd7-02fd-43d0-aab7-28460a91347b", "name": "Security rules - baseline", "compliance": 100, "mode": "full-compliance", "complianceDetails": { "successAlreadyOK": 87.47, "successNotApplicable": 12.53 }, "directives": [ { "id": "c16e3a90-b9d7-427d-83c1-d80e33124e4c", "name": "CIS Benchmark 2.1.6 - rsh", "compliance": 100.0, "complianceDetails": { "successAlreadyOK": 100.00 }

What is this compliance? PARAM RULE • Id DIRECTIVE •
Id • (Components) GROUP • Id RUDDER config (global) • Policy Mode • Schedule NODE • Properties • Policy Mode • Schedule Environmental context • Id : . . . • Generated : . . . Files Node configuration Change request Historisation Historization Event logs

What is this compliance? RUDDER config (global) • Policy Mode
• Schedule NODE • Properties • Policy Mode • Schedule Environmental context • Id : . . . • Generated : . . . Files Node configuration Change request Historisation Event logs PARAM RULE • Id • Groups + Directives DIRECTIVE • Id • Components GROUP • Id Historization

What is this compliance? PARAM RULE • Id DIRECTIVE •
Id • (Components) GROUP • Id RUDDER config (global) • Policy Mode • Schedule NODE • Properties • Policy Mode • Schedule Environmental context • Id : . . . • Generated : . . . Files Node configuration Change request Historisation Historization Event logs

What is this compliance? 22 • Id : . .
. • Generated : . . . Files Node configuration RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp • Signature Get Policy Send configuration reports Expected reports (node id, config id, timestamp) Run reports Historization Compliance historized Store expected reports Metadata • Integrity • Signature Config • Id • For Rule R, Directive D1, Component C

. • Generated : . . . Files Node configuration Run reports RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp • Signature Get Policy Send configuration reports Expected reports node id config id timestamp end of validity Historization Compliance historized Store expected reports Metadata • Integrity • Signature Config • Id • For Rule R, Directive D1, Component C

Make information available 27 A lot information from inside Rudder,
usable in Rudder context Details of each run (timestamped info) Policy generation details Serialization of configurations Inventories ...

Causality and dependencies of events 28 Why would we need
it? • We have logs • We have experts

Causality and dependencies of events 29

Causality and dependencies of events 30 Diagnostic on infrastructures is
hard • Many systems • Dependencies across systems • Many actors involved An issue on one component can impact hundred systems We need to separate the causes from the symptoms

Causality and dependencies of events 31 Monitoring can only correlate
Causes and precedences help root cause analysis

Causality and dependencies of events 32 How can we do
that ??!??

Event sourcing & Tracing 33 Events happen on the whole
infrastructure Describe and analyze over systems Order events Contextualize

Event sourcing & Tracing 34 Terminology (Dapper & OpenTracing) Trace:
Description of a “transaction” as it moves through systems Span: Named and timed operation, piece of workflow (+ tags and logs) Span context: Trace information that accompanies the transaction

Event sourcing & Tracing 35 What’s in a span? Operation
name Start & end timestamps Tags: Set of key:value Logs: Set of key:value SpanContext

Event sourcing & Tracing 36 Temporal relationships between Spans in
a single Trace https://www.jaegertracing.io/docs/1.9/architecture/

Event sourcing & Tracing 37 What would be the traces?
Defining the infrastructure state is a trace Each changes before validation is a span Validating results in a change request closes the trace Computing the nodes configurations is a trace Computing targets, overrides and generating files are spans Closes with the serialization of the nodes configurations in database Each run on an node is a trace Each configuration check is a span

Event sourcing & Tracing 38 RULE • Id DIRECTIVE •
Id GROUP • Id Environmental context • Id : . . . • Generated : . . • Commit id. Files Node configuration Change request RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp • Signature Get config Send configuration reports Expected reports (node id, config id, timestamp) Run reports Historisation Store expected reports Metadata • Integrity • CommitId • Signature Config • For Rule R, Directive D1, Component C Events Commit Id Defining state Trace + Spans Trace Run: Trace Each step: span Message bus

Event sourcing & Tracing 39 • Id : . .
. • Generated : . . • Commit id. Files Node configuration METADATA • node id • config id • run timestamp RUN METADATA Signature Get config Send configuration reports Expected reports (node id, config id, timestamp) Run reports Store expected reports Metadata • Integrity • CommitId • Signature Config • For Rule R, Directive D1, Component C Trace Message bus Run: Trace Each step: span Compliance CMDB Hooks Monitoring

Event sourcing & Tracing 40 Store Traces & Events: •
Integrate with systems in place • Many tools are compatible with OpenTracing Correlate with non-observable systems

Closing thoughts 41 With Rudder, information is centralized and made
available in a relevant way for all actors/things

Closing thoughts 42 How can you benefit more of your
configuration management?

Closing thoughts 43 What can we do of these billions
events?

events? Reactive approach Query, search and analyze traces in case of problems

events? Proactive approach Process mining: Machine Learning on these events Detect unusual behaviours Outliers Inconsistencies across systems

Closing thoughts 46 Mark Burgess Founder of Configuration Management http://markburgess.org/anomalies.htm
l

rudder.io Questions ? Nicolas CHARLES [email protected] - @nico_charles 47

Security? 48 Events, trace and logs hold critical data Within
a unique system, security can be built-in AuthN/AuthZ For distributed system, it’s much harder Who can see what? Who defines and enforces the authorizations? Tags on events for authorizations

Security? 49 Events, trace and logs hold critical data Cipher
information vs partial visibility?

rudder.io What uses for observing operations of Configuration Management? Nicolas
CHARLES [email protected] - @nico_charles 50

Event sourcing & Tracing 51 Temporal relationships between Spans in
a single Trace ––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–> time [Span A···················································] [Span B··············································] [Span D··········································] [Span C········································] [Span E·······] [Span F··] [Span G··] [Span H··] https://opentracing.io/specification/

Event sourcing & Tracing 52 Every components need to know
the context • Carry the Span Context along each events Add some information for each events • Save on logging thanks to context Send these traces on message bus

What uses for observing operations of Configura...

What uses for observing operations of Configuration Management?

More Decks by Rudder

Other Decks in Programming

Featured

Transcript