Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Qu’apporte l’observabilité à la gestion de conf...

Rudder
June 14, 2019
14

Qu’apporte l’observabilité à la gestion de configuration ?

On parle d’observabilité des services lorsque ceux-ci exposent des états et métriques internes pour améliorer la disponibilité globale.

Qu’en est-il de l’observabilité des infrastructures sur lesquelles ils sont déployés, configurés et maintenus ?

Les différents logs (centralisés, agrégés) permettent un bon début d’analyse mais il faut aussi observer les systèmes au fil de l’eau pour tracer chaque changement et les corréler avec le monitoring. Aujourd’hui, ces étapes de configuration IT devraient être prises en charge par les outils de gestion de configuration, qui deviennent la passerelle vers l’observabilité des opérations.

Nous montrerons l'intérêt de cette approche pour la gestion IT moderne avec un retour d’expérience sur les challenges de leur mise en place dans Rudder, notre solution libre d’audit et de gestion de configuration en continu.

Nicolas Charles
OSIS 2019

Rudder

June 14, 2019
Tweet

Transcript

  1. OSIS 2019 THE OPEN SOURCE INNOVATION SPRING 2019 @nico_charles [email protected]

    Qu’apporte l’observabilité à la gestion de configuration ?
  2. OSIS 2019 How are the systems? Does no error nor

    change in logs mean success? Aren’t we missing something?
  3. OSIS 2019 Definition Configuration management is a systems engineering process

    for establishing and maintaining consistency of a product [...] throughout its life. Configuration_management “
  4. OSIS 2019 Let's remember: What does configuration management do? configuration

    target state feedback configuration feedback configuration feedback configuration
  5. OSIS 2019 Main challenges faced nowadays DEV QA PRODUCTION RECOVERY

    DEV SEC OPS MGMT EXTERN Multiple teams, diluted expertise, harder reporting Heterogeneous systems, reduced visibility, ease of use and understanding
  6. OSIS 2019 Getting and understanding the info is complex Operators,

    Managers, Experts, APIs have differents needs Frustration when we need a third party to obtain relevant data We mistrust what we don’t understand
  7. OSIS 2019 Getting and understanding the info is complex Putting

    errors into perspective: Error can be expected Error in production can have catastrophic consequences
  8. OSIS 2019 Definition (again) Observability is a measure of how

    well internal states of a system can be inferred from knowledge of its external outputs. Observability “
  9. OSIS 2019 Why we need Observability in Configuration Management? Causality

    Agency Perspective trust and prove configuration states provide insights relevant to different needs help teams find the best levers for their job A B
  10. OSIS 2019 Observability adoption Software Legacy: embedding agent (often proprietary)

    New developments: Best practices Open standards Architectural bricks
  11. OSIS 2019 These concepts are core to Rudder Everyone/thing can

    be an actor of configuration management "rules": [ { "id": "32377fd7-02fd-43d0-aab7-28460a91 "name": "Security rules - baseline", "compliance": 100, "mode": "full-compliance", "complianceDetails": { "successAlreadyOK": 87.47, "successNotApplicable": 12.53 },
  12. OSIS 2019 Compliance? PARAM RULE • Id DIRECTIVE • Id

    • (Components) GROUP • Id RUDDER config (global) • Policy Mode • Schedule NODE • Properties • Policy Mode • Schedule Environmental context • Id : . . . • Generated : . . . Files Node configuration Change request Historisation Historization Event logs
  13. OSIS 2019 Compliance? RUDDER config (global) • Policy Mode •

    Schedule NODE • Properties • Policy Mode • Schedule Environmental context • Id : . . . • Generated : . . . Files Node configuration Change request Historisation Event logs PARAM RULE • Id • Groups + Directives DIRECTIVE • Id • Components GROUP • Id Historization
  14. OSIS 2019 Compliance? PARAM RULE • Id DIRECTIVE • Id

    • (Components) GROUP • Id RUDDER config (global) • Policy Mode • Schedule NODE • Properties • Policy Mode • Schedule Environmental context • Id : . . . • Generated : . . . Files Node configuration Change request Historisation Historization Event logs
  15. OSIS 2019 Compliance? PARAM RULE • Id DIRECTIVE • Id

    • (Components) GROUP • Id RUDDER config (global) • Policy Mode • Schedule NODE • Properties • Policy Mode • Schedule Environmental context • Id : . . . • Generated : . . . Files Node configuration Change request Historisation Historization Event logs
  16. OSIS 2019 Compliance? PARAM RULE • Id DIRECTIVE • Id

    • (Components) GROUP • Id RUDDER config (global) • Policy Mode • Schedule NODE • Properties • Policy Mode • Schedule Environmental context • Id : . . . • Generated : . . . Files Node configuration Change request Historisation Historization Event logs
  17. OSIS 2019 Compliance? • Id : . . . •

    Generated : . . . Files Node configuration RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp • Signature Get Policy Send configuration reports Expected reports (node id, config id, timestamp) Run reports Historization Compliance historized Store expected reports Metadata • Integrity • Signature Config • Id • For Rule R, Directive D1, Component C
  18. OSIS 2019 Compliance? • Id : . . . •

    Generated : . . . Files Node configuration Run reports RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp • Signature Get Policy Send configuration reports Expected reports node id config id timestamp end of validity Historization Compliance historized Store expected reports Metadata • Integrity • Signature Config • Id • For Rule R, Directive D1, Component C
  19. OSIS 2019 Compliance? • Id : . . . •

    Generated : . . . Files Node configuration RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp • Signature Get Policy Send configuration reports Expected reports (node id, config id, timestamp) Run reports Historization Compliance historized Store expected reports Metadata • Integrity • Signature Config • Id • For Rule R, Directive D1, Component C
  20. OSIS 2019 Compliance? • Id : . . . •

    Generated : . . . Files Node configuration RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp • Signature Get Policy Send configuration reports Expected reports (node id, config id, timestamp) Run reports Historization Compliance historized Store expected reports Metadata • Integrity • Signature Config • Id • For Rule R, Directive D1, Component C
  21. OSIS 2019 Compliance? • Id : . . . •

    Generated : . . . Files Node configuration RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp • Signature Get Policy Send configuration reports Expected reports (node id, config id, timestamp) Run reports Historization Compliance historized Store expected reports Metadata • Integrity • Signature Config • Id • For Rule R, Directive D1, Component C
  22. OSIS 2019 Causality and dependencies of events Why would we

    need it? • We have logs • We have experts
  23. OSIS 2019 Causality and dependencies of events Diagnostic on infrastructures

    is hard • Many systems • Dependencies across systems • Many actors involved An issue on one component can impact hundred systems We need to separate the causes from the symptoms
  24. OSIS 2019 Causality and dependencies of events Monitoring can only

    correlate Events happen on the whole infrastructure Causes and precedences help root cause analysis
  25. OSIS 2019 Event sourcing & Tracing Terminology (Dapper & OpenTracing)

    Trace: Description of a “transaction” as it moves through systems Span: Named and timed operation, piece of workflow (+ tags and logs) Span context: Trace information that accompanies the transaction
  26. OSIS 2019 Event sourcing & Tracing What’s in a span?

    Operation name Start & end timestamps Tags: Set of key:value Logs: Set of key:value SpanContext
  27. OSIS 2019 Event sourcing & Tracing Temporal relationships between Spans

    in a single Trace https://www.jaegertracing.io/docs/1.9/architecture/
  28. OSIS 2019 Event sourcing & Tracing Configuration Management: What would

    be the traces? Defining the infrastructure state is a trace Each changes before validation is a span Validating results in a change request closes the trace Computing the nodes configurations is a trace Computing targets, overrides and generating files are spans Closes with the serialization of the nodes configurations in database Each run on an node is a trace Each configuration check is a span
  29. OSIS 2019 Event sourcing & Tracing PARAM RULE • Id

    DIRECTIVE • Id • (Components) GROUP • Id Environmental context • Id : . . . • Generated : . . . Files Node configuration Commit Id RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp RUN • Reports • Reports • ... • ... METADATA • node id • config id • run timestamp • Signature Get config Send configuration reports Expected reports (node id, config id, timestamp) Run reports Historisation Compliance historised Store expected reports Metadata • Integrity • CommitId • Signature Config • For Rule R, Directive D1, Component C Event logs Change request Defining state Trace + Spans Trace Run: Trace Each step: span Message bus Message bus
  30. OSIS 2019 Event sourcing & Tracing Store Traces & Events:

    • Integrate with systems in place • Many tools are compatible with OpenTracing Correlate with non-observable systems
  31. OSIS 2019 What to do of these billions events? Reactive

    approach Query, search and analyze traces in case of problems Proactive approach Process mining: Machine Learning on these events Detect unusual behaviours Outliers Inconsistencies across systems
  32. OSIS 2019 Security? Events, trace and logs hold critical data

    Within a simple system, security can be built-in AuthN/AuthZ For distributed system, it’s much harder Who can see what? Who defines and enforces the authorizations? Partial visibility of events/traces Tags on events for authorizations