Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Observability for Modern Applications

Observability for Modern Applications

AWS Dev Day Oslo, April 3rd, 2019

Danilo Poccia

April 03, 2019
Tweet

More Decks by Danilo Poccia

Other Decks in Programming

Transcript

  1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    O S L O
    2 0 1 9 . 0 4 . 0 3
    Observability for
    Modern Applications
    Danilo Poccia
    Principal Evangelist, Serverless
    @danilop
    M A D 5

    View Slide

  2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Approaches to modern application development
    • Simplify environment management
    • Reduce the impact of code changes
    • Automate operations
    • Accelerate the delivery of new, high-quality services
    • Gain insight across resources and applications
    • Protect customers and the business

    View Slide

  3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Approaches to modern application development
    • Simplify environment management with serverless technologies
    • Reduce the impact of code changes with microservice architectures
    • Automate operations by modeling applications & infrastructure as code
    • Accelerate the delivery of new, high-quality services with CI/CD
    • Gain insight across resources and applications by enabling observability
    • Protect customers and the business with end-to-end security & compliance

    View Slide

  4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Approaches to modern application development
    • Simplify environment management with serverless technologies
    • Reduce the impact of code changes with microservice architectures
    • Automate operations by modeling applications & infrastructure as code
    • Accelerate the delivery of new, high-quality services with CI/CD
    • Gain insight across resources and applications by enabling observability
    • Protect customers and the business with end-to-end security & compliance

    View Slide

  5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Microservices increase release agility
    Monolithic application Microservices

    View Slide

  6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Monolith

    View Slide

  7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Service
    Service
    Service
    Service
    Service
    Service
    Service
    Service
    Service
    Service
    Service
    Service

    View Slide

  8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Rust
    Database
    DB
    Database
    Rust
    Go
    Node.is
    Java
    Node.is
    Node.is

    View Slide

  9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Containers
    Database
    DB
    Database
    Containers
    λ
    Containers
    VMs
    Managed
    Service

    View Slide

  10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Proactive operations helps mitigate issues
    Degraded state
    Outage
    Latency
    Time (ms)

    View Slide

  11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Observability in Control Theory
    On the General Theory of Control Systems
    R. E. KALMAN
    Introduction
    In no small measure, the great technological progress in
    automatic control and communication systems during the past
    two decades has depended on advances and refinements in the
    mathematical study of such systems. Conversely, the growth
    of technology brought forth many new problems (such as those
    related to using digital computers in control, etc.) to challenge
    the ingenuity and competence of research workers concerned
    with theoretical questions.
    Despite the appearance and effective resolution of many new
    problems, our understanding of fundamental aspects of control
    has remained superficial. The only basic advance so far appears
    to be the theory of information created by Shannon 1. The chief
    significance of his work in our present interpretation is the
    discovery of general' laws' underlying the process of information
    transmission, which are quite independent of the particular
    models being considered or even the methods used for the des-
    cription and analysis of these models. These results could be
    compared with the' laws' of physics, with the crucial difference
    that the' laws' governing man-made objects cannot be discovered
    by straightforward experimentation but only by a purely abstract
    analysis guided by intuition gained in observing present-day
    examples of technology and economic organization. We may
    thus classify Shannon's result as belonging to the pure theory
    of communication and control, while everything else can be
    labelled as the applied theory; this terminology reflects the well-
    known distinctions between pure and applied physics or
    mathematics. For reasons pointed out above, in its methodo-
    logy the pure theory of communication and control closely
    resembles mathematics, rather than physics; however, it is not
    a. branch of mathematics because at present we cannot (yet?)
    d1sregard questions of physical realizability in the study of
    mathematical models.
    This paper initiates study of the pure theory of control
    imitating the spirit of Shannon's investigations but
    using entirely different techniques. Our ultimate objective is
    to answer questions of the following type: What kind and how
    much information is needed to achieve a desired type of control?
    What intrinsic properties characterize a given unalterable plant
    as far as control is concerned?
    At present only superficial answers are available to these
    questions, and even then only in special cases.
    Initial results presented in this Note are far from the degree
    of generality of Shannon's work. By contrast, however, only
    metho?s are employed here, giving some hope of
    beIng able to aVOld the well-known difficulty of Shannon's
    theory: methods of proof which are impractical for actually
    constructing practical solutions. In fact, this paper arose
    fr.om the need for a better understanding of some recently
    d1scovered computation methods of control-system syn-
    thesis 2-s. Another by-product of the paper is a new com-
    putation method for the solution of the classical Wiener
    filtering problem 7.
    The organization of the paper is as follows:
    16
    In Section 3 we introduce the models for which a fairly
    complete theory is available: dynamic systems with a finite
    dimensional state space and linear transition functions (i.e.
    systems obeying linear differential or difference equations).
    The class of random processes considered consists of such
    dynamic systems excited by an uncorrelated gaussian random
    process. Other assumptions, such as stationarity, discretiza-
    tion, single input/single output, etc., are made only to facilitate
    the presentation and will be absent in detailed future accounts
    of the theory.
    In Section 4 we define the concept of controllability and show
    that this is the' natural' generalization of the so-called' dead-
    beat' control scheme discovered by Oldenbourg and Sartorius 21
    and later rederived independently by Tsypkin22 and the author17•
    We then show in Section 5 that the general problem of optimal
    regulation is solvable if and only if the plant is completely
    controllable.
    In Section 6 we introduce the concept of observability and
    solve the problem of reconstructing unmeasurable state variables
    from the measurable ones in the minimum possible length of
    time.
    We formalize the similarities between controllability and
    observability in Section 7 by means of the Principle of Duality
    and show that the Wiener filtering problem is the natural dual
    of the problem of optimal regulation.
    Section 8 is a brief discussion of possible generalizations and
    currently unsolved problems of the pure theory of control.
    Notation and Terminology
    The reader is assumed to be familiar with elements of linear
    algebra, as discussed, for instance, by Halmos 8.
    Consider an n-dimensional real vector space X. A basis in
    X is a set of vectors at ... , all in X such that any vector x in X
    can be written uniquely as
    (I)
    the Xi being real numbers, the components or coordinates of x.
    Vectors will be denoted throughout by small bold-face letters.
    The set X* of all real-valued linear functions x* (= covec-
    tors) on X. with the' natural' definition of addition and scalar
    multiplication, is an n-dimensional vector space. The value of
    a covector y* at any vector x is denoted by [y*, x]. We call
    this the inner product of y* by x. The vector space X* has a
    natural basis a* 1 ... , a* n associated with a given basis in X;
    it is defined by the requirement that
    [a*j, aj] = Ojj
    Using the' orthogonality relation' 2, we may write
    form n
    X =
    L [a*j, x]aj
    j= t
    which will be used frequently.
    (2)
    in the
    (3)
    For purposes of numerical computation, a vector may be
    considered a matrix with one column and a covector a matrix
    481
    491
    J.S.I.A.M. CONTROI
    Ser. A, Vol. 1, No.
    Printed in U.,q.A., 1963
    MATHEMATICAL DESCRIPTION OF LINEAR
    DYNAMICAL SYSTEMS*
    R. E. KALMAN
    Abstract. There are two different ways of describing dynamical systems: (i) by
    means of state w.riables and (if) by input/output relations. The first method may be
    regarded as an axiomatization of Newton’s laws of mechanics and is taken to be the
    basic definition of a system.
    It is then shown (in the linear case) that the input/output relations determine
    only one prt of a system, that which is completely observable and completely con-
    trollable. Using the theory of controllability and observability, methods are given
    for calculating irreducible realizations of a given impulse-response matrix. In par-
    ticular, an explicit procedure is given to determine the minimal number of state
    varibles necessary to realize a given transfer-function matrix. Difficulties arising
    from the use of reducible realizations are discussed briefly.
    1. Introduction and summary. Recent developments in optimM control
    system theory are bsed on vector differential equations as models of
    physical systems. In the older literature on control theory, however, the
    same systems are modeled by ransfer functions (i.e., by the Laplace trans-
    forms of the differential equations relating the inputs to the outputs). Two
    differet languages have arisen, both of which purport to talk about the
    same problem. In the new approach, we talk about state variables, tran-
    sition equations, etc., and make constant use of abstract linear algebra.
    In the old approach, the key words are frequency response, pole-zero pat-
    terns, etc., and the main mathematical tool is complex function theory.
    Is there really a difference between the new and the old? Precisely what
    are the relations between (linear) vector differential equations and transfer-
    functions? In the literature, this question is surrounded by confusion [1].
    This is bad. Communication between research workers and engineers is
    impeded. Important results of the "old theory" are not yet fully integrated
    into the new theory.
    In the writer’s view--which will be argued t length in this paperthe
    diiIiculty is due to insufficient appreciation of the concept of a dynamical
    system. Control theory is supposed to deal with physical systems, and not
    merely with mathematical objects such as a differential equation or a trans-
    fer function. We must therefore pay careful attention to the relationship
    between physical systems and their representation via differential equations,
    transfer functions, etc.
    * Received by the editors July 7, 1962 and in revised form December 9, 1962.
    Presented at the Symposium on Multivariable System Theory, SIAM, November 1,
    1962 at Cambridge, Massachusetts.
    This research was supported in part under U. S. Air Force Contracts AF 49 (638)-382
    and AF 33(616)-6952 as well as NASA Contract NASr-103.
    Research Institute for Advanced Studies (RIAS), Baltimore 12, Maryland.
    152
    Downloaded 11/11/13 to 152.3.159.32. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
    1961-62

    View Slide

  12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Observability
    In control theory, observability is a measure of
    how well internal states of a system
    can be inferred from knowledge
    of its external outputs.
    https://en.wikipedia.org/wiki/Observability

    View Slide

  13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Levels of Observability
    Machine (HW, OS)
    Application
    Network

    View Slide

  14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    The Three Pillars of Observability
    Distributed Systems Observability by Cindy Sridharan

    View Slide

  15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    The Three Pillars of Observability
    Event Logs Metrics Tracing
    Distributed Systems Observability by Cindy Sridharan

    View Slide

  16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Using Observability
    Event Logs Metrics Tracing
    Log aggregation
    & analytics
    Visualizations
    Alerting
    Metric Filter

    View Slide

  17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Using Observability on AWS
    CloudWatch
    Logs
    CloudWatch
    Metrics
    AWS
    X-Ray
    Traces
    CloudWatch
    Insights
    CloudWatch
    Dashboard
    CloudWatch
    Alarms
    AWS X-Ray
    ServiceGraph
    Metric Filter

    View Slide

  18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    View Slide

  19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Sending custom metrics to CloudWatch
    const metricData = await cloudWatch.putMetricData({
    MetricData: [
    {
    MetricName: 'My Business Metric',
    Dimensions: [
    {
    Name: 'Location',
    Value: 'Paris'
    }
    ],
    Timestamp: new Date,
    Value: 123.4
    }
    ],
    Namespace: 'My Namespace'
    }).promise();
    • Metric name
    • Dimensions
    • Timestamp
    • Value
    • Namespace

    View Slide

  20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Add correlation IDs to logs – CloudWatch Logs + Insights

    View Slide

  21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    End-to-end tracing – AWS X-Ray Traces

    View Slide

  22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    AWS X-Ray Key Concepts
    Segments
    Subsegments

    View Slide

  23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    End-to-end tracing – AWS X-Ray Service Map

    View Slide

  24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Enabling X-Ray tracing
    AWS Lambda
    Console
    Amazon
    API Gateway
    Console

    View Slide

  25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Enabling X-Ray tracing in your code
    const AWS = require('aws-sdk');
    const AWSXRay = require('aws-xray-sdk');
    const AWS = AWSXRay.captureAWS(require('aws-sdk'));

    View Slide

  26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Enabling X-Ray tracing in your code
    const AWSXRay = require('aws-xray-sdk’);
    const app = express();
    app.use(AWSXRay.express.openSegment('my-segment'));
    app.get('/send', function (req, res) {
    res.setHeader('Content-Type', 'application/json’);
    res.send('{"hello": "world"}');
    });
    app.use(AWSXRay.express.closeSegment());

    View Slide

  27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Understand performance…
    Systems Performance by Brendan Gregg

    View Slide

  28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Understand performance… and latency…
    Systems Performance by Brendan Gregg

    View Slide

  29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Understand performance… and latency… and percentiles!
    P100
    P99
    P90
    P50

    View Slide

  30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Choose the right integration patterns

    Decouple
    and scale
    distributed
    systems

    Decouple
    producers
    from
    subscribers

    Combine
    multiple tasks
    and manage
    distributed state
    Message
    queue
    Pub/sub
    messaging
    Workflows
    Amazon
    Simple Notification
    Service (SNS)
    Amazon
    Simple Queue
    Service (SQS)
    AWS
    Step Functions

    View Slide

  31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Getting control from observability
    Consistent
    communications
    management
    Complete visibility Failure isolation
    and protection
    Fine-grained
    deployment controls

    View Slide

  32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Client side traffic management
    Traffic Shaping
    Service discovery
    Retries
    Timeouts
    Circuit breaks
    Health checks
    Routing Controls
    Protocols support
    Header based
    Cookie based
    Path based
    Host based

    View Slide

  33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Instrumentation options
    Microservice
    Container
    In-process
    (SDK)
    Option 1

    View Slide

  34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Option 1: In-process SDK
    Java
    Scala
    Node.js
    Python
    C++
    Django
    .NET
    GO

    View Slide

  35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Instrumentation options
    Microservice
    Container
    In-process
    (SDK)
    Out-of-process
    (sidecar proxy)
    Option 1 Option 2
    Microservice
    Container
    Proxy

    View Slide

  36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Option 2: Side-car proxy
    Application Code
    Microservice
    Proxy
    Monitoring
    Routing
    Discovery
    Deployment

    View Slide

  37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Option 2: Side-car proxy
    Proxy runs
    as a container
    Task (ECS) or Pod (Kubernetes)
    External traffic
    Application
    Code

    View Slide

  38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Why service mesh proxy
    vs. Libraries or app code
    Overall—migrate to microservices safer and faster
    Reduce work required
    by developers
    Follow best practices Use any language
    or platform
    Simplify visibility,
    troubleshooting, and
    deployments

    View Slide

  39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    App Mesh configures every proxy

    View Slide

  40. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    OSS project
    Wide community support, numerous integrations
    Stable and production-proven
    “Graduated Project” in Cloud Native Computing Foundation
    Started at Lyft in 2016
    App Mesh uses Envoy proxy

    View Slide

  41. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Application observability
    + others
    Faster
    troubleshooting due
    to consistent data
    across services
    Existing tools or
    dashboards with a lot
    more metrics, logs
    and traces
    Distinguish between
    service and network
    issues

    View Slide

  42. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Representing your app in App Mesh
    Elastic
    Load
    Balancing
    Microservices App Mesh
    Mesh – [myapp]
    Virtual
    Node A
    Service
    Discovery
    Listener Backends
    Virtual
    Node B
    Service
    Discovery
    Listener Backends

    View Slide

  43. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Virtual Node
    Virtual Node
    Service
    Discovery
    Backends
    Listeners
    Virtual Node
    Logical representation
    of runtime services
    Backends
    Set of destinations that this node
    will communicate with (hostnames)
    Service Discovery
    Describes how its callers and locate this
    node (DNS hostname or AWS Cloud Map
    namespace, serviced, and selectors)
    Listeners
    Policies to handle
    incoming traffic
    Ed: port, Health check*,
    Circuit breaker*, Retries*

    View Slide

  44. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Virtual routes Destination’s virtual
    router and route
    Virtual router: B
    HTTP routes
    Match
    Prefix: /
    Action:
    Targets
    B
    Route B
    Virtual node
    destination
    + weight
    Route Name: B1
    Match
    Action:
    Route Name: B2
    Other Protocol routes

    View Slide

  45. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Update routes
    Virtual router: B
    HTTP route
    targets:
    prefix: /
    B
    B’
    Route B
    Virtual node
    destination + weight
    Route B’
    New service or service
    version

    View Slide

  46. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Mesh – [myapp]
    Virtual
    Node A
    Service
    Discovery
    Backend
    Listener
    Virtual router
    Domains
    action:
    match: /
    B
    B’
    Service B
    Service B’
    Virtual
    Node B’
    Service
    Discovery
    Listener Backends
    Virtual
    Node B
    Service
    Discovery
    Listener Backends

    View Slide

  47. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Virtual
    Node B1
    Mesh
    Service A
    Service B
    Service C
    Virtual
    router
    Virtual
    router
    Service D
    Virtual
    router

    View Slide

  48. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    App Mesh concepts
    Mesh
    Virtual Service
    Virtual Router
    Virtual Node
    Route

    View Slide

  49. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    View Slide

  50. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Demo – Store & Reply
    AWS Cloud
    Region
    https://github.com/danilop/store-and-reply

    View Slide

  51. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Takeaways
    1. Build the instrumentation you need to understand what is happening
    in your (distributed) application
    2. Use technical and business metrics together to get better insights
    3. Use correlation IDs in log and tracing frameworks to understand
    distributed architectures (such as microservices)
    4. Think at scale and plan for a service mesh control plane that gives you
    observability and control

    View Slide

  52. Thank you!
    © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Danilo Poccia
    @danilop

    View Slide